Statistics - Kolmogorov Smirnov Test

This test is used in situations where a comparison has to be made between an observed sample distribution and theoretical distribution.

K-S One Sample Test

This test is used as a test of goodness of fit and is ideal when the size of the sample is small. It compares the cumulative distribution function for a variable with a specified distribution. The null hypothesis assumes no difference between the observed and theoretical distribution and the value of test statistic 'D' is calculated as:

Formula

$D = Maximum |F_o(X)-F_r(X)|$

Where −

${F_o(X)}$ = Observed cumulative frequency distribution of a random sample of n observations.
and ${F_o(X) = \frac{k}{n}}$ = (No.of observations ≤ X)/(Total no.of observations).
${F_r(X)}$ = The theoretical frequency distribution.

The critical value of ${D}$ is found from the K-S table values for one sample test.

Acceptance Criteria: If calculated value is less than critical value accept null hypothesis.

Rejection Criteria: If calculated value is greater than table value reject null hypothesis.

Example

Problem Statement:

In a study done from various streams of a college 60 students, with equal number of students drawn from each stream, are we interviewed and their intention to join the Drama Club of college was noted.

	B.Sc.	B.A.	B.Com	M.A.	M.Com
No. in each class	5	9	11	16	19

It was expected that 12 students from each class would join the Drama Club. Using the K-S test to find if there is any difference among student classes with regard to their intention of joining the Drama Club.

Solution:

${H_o}$: There is no difference among students of different streams with respect to their intention of joining the drama club.

We develop the cumulative frequencies for observed and theoretical distributions.

Streams	No. of students interested in joining		${F_O(X)}$	${F_T(X)}$	${\|F_O(X)-F_T(X)\|}$
	Observed (O)	Theoretical (T)
B.Sc.	5	12	5/60	12/60	7/60
B.A.	9	12	14/60	24/60	10/60
B.COM.	11	12	25/60	36/60	11/60
M.A.	16	12	41/60	48/60	7/60
M.COM.	19	12	60/40	60/60	60/60
Total	n=60

Test statistic ${|D|}$ is calculated as:

$D = Maximum {|F_0 (X)-F_T (X)|} \\[7pt] \, = \frac{11}{60} \\[7pt] \, = 0.183$

The table value of D at 5% significance level is given by

${D_0.05 = \frac{1.36}{\sqrt{n}}} \\[7pt] \, = \frac{1.36}{\sqrt{60}} \\[7pt] \, = 0.175$

Since the calculated value is greater than the critical value, hence we reject the null hypothesis and conclude that there is a difference among students of different streams in their intention of joining the Club.

K-S Two Sample Test

When instead of one, there are two independent samples then K-S two sample test can be used to test the agreement between two cumulative distributions. The null hypothesis states that there is no difference between the two distributions. The D-statistic is calculated in the same manner as the K-S One Sample Test.

Formula

${D = Maximum |{F_n}_1(X)-{F_n}_2(X)|}$

Where −

${n_1}$ = Observations from first sample.
${n_2}$ = Observations from second sample.

It has been seen that when the cumulative distributions show large maximum deviation ${|D|}$ it is indicating towards a difference between the two sample distributions.

The critical value of D for samples where ${n_1 = n_2}$ and is ≤ 40, the K-S table for two sample case is used. When ${n_1}$ and/or ${n_2}$ > 40 then the K-S table for large samples of two sample test should be used. The null hypothesis is accepted if the calculated value is less than the table value and vice-versa.

Thus use of any of these nonparametric tests helps a researcher to test the significance of his results when the characteristics of the target population are unknown or no assumptions had been made about them.