Polygon 2016

Page 1


I

Editorial Note: Polygon is MDC Hialeah's Academic Journal. It is a multi-disciplinary online publication whose purpose is to display the academic work produced by faculty and staff. We, the editorial committee of Polygon, are pleased to publish the 2016 Spring Issue Polygon which is the tenth consecutive issue of Polygon. It includes five regular research papers. We are pleased to present work from a diverse array of fields written by faculty from across the college. The editorial committee of Polygon is thankful to the Miami Dade College President, Dr. Eduardo J. Padrón, Miami Dade College District Board of Trustees, the Hialeah Campus Academic Dean, Professor Joaquin G. Martinez, Chairperson of Hialeah Campus Liberal Arts and Sciences, Dr. Caridad Castro, Chairperson of Hialeah Campus English, Communications and World Languages, Dr. Victor McGlone, Director of Hialeah Campus Administrative Services, Ms. Andrea M. Forero, all staff and faculty of Hialeah Campus and Miami Dade College, in general, for their continued support and cooperation for the publication of Polygon. Sincerely, The Editorial Committee of Polygon: Dr. M. Shakil (Mathematics), Dr. Jaime Bestard (Mathematics), and Professor Victor Calderin (English) Patrons: Professor Joaquin Martinez, Dean of Academic and Student Affairs Dr. Caridad Castro, Chair of Liberal Arts and Sciences Dr. Jon Mcglone, Chair of World Language Miami Dade College District Board of Trustees: Helen Aguirre Ferré, Chair Armando J. Bucelo Jr. Benjamin León III Marili Cancio Jose K. Fuentes Armando J. Olivera Bernie Navarro Eduardo J. Padrón, College President Mission of Miami Dade College The mission of the College is to provide accessible, affordable, high ‐quality education that keeps the learner’s needs at the center of the decision-making process.


II

CONTENTS ARTICLES

AUTHOR(S)

Dynamic Stability Analysis of Tumor-Host Interactions

Dr. Keysner Boet

A comparison between TRON and Levenberg-Marquardt methods and their relationship to Tikhonov’s Regularization Method in Nonlinear Parameter Estimation

Dr. Justina L. Castellanos and Dr. Angel P´erez

SURVEY OF STUDENTS’ FAMILIARITY WITH DEVELOPMENTAL MATHEMATICSA STATISTICAL ANALYSIS

Dr. M. Shakil

Item Analysis Statistics and Their Uses: An Overview

Dr. M. Shakil

Testing the Goodness of Fit of Continuous Probability Distributions to Some Flood Data

Dr. M. Shakil

Comments about Polygon: (http://www.mdc.edu/hialeah/Polygon2013/docs2013b/Comments_About_Polygon.pdf)


III

Previous Editions       

Polygon, 2008 (http://issuu.com/polygon5/docs/polygon2008) Polygon, 2009 (http://issuu.com/polygon5/docs/polygon2009) Polygon, 2010 (http://issuu.com/polygon5/docs/polygon_2010) Polygon, 2011 (http://issuu.com/polygon5/docs/polygon_2011) Polygon, 2012 (http://issuu.com/polygon5/docs/polygon_2012) Polygon, 2013 (http://issuu.com/polygon5/docs/polygon2013) Polygon, 2014 (http://issuu.com/polygon5/docs/polygon_2014)

Disclaimer: The views and perspectives presented in the articles published in Polygon do not represent those of Miami Dade College. Back to Front Cover



A comparison between TRON and Levenberg-Marquardt methods and their relationship to Tikhonov’s Regularization Method in Nonlinear Parameter Estimation Justina L. Castellanos∗

Angel P´erez†

Abstract Parameter estimation problems are usually solved by minimizing a nonlinear or linear least squares function (NLS or LLS). For the nonlinear case, the Levenberg-Marquardt method (L-M) has for long been the best method. The connection of this method to Tikhonov’s Regularization method is described. The possibility of using a Trust Region Newton’s Method (TRON) that deals with bound constraints for NLS problems, that performs like the L-M method, is shown. Key words: Inverse Problem, Newton method, Trust Region strategy, Tikhonov method

1

Introduction

The parameter estimation problem for nonlinear models is of great interest not only for mathematicians but for many specialists in other applied areas such as engineering, biology, and so forth. It is usually posed as the solution of the following Nonlinear Least Squares (NLS) problem M inimize F (x) = s.t

f (x) = (f1 (x), .., fm (x))

m X ¡ ¢2 1 2 φ(x; ti ) − yiobs = kf (x)k2 2 i=1 l≤x≤u

t

(1)

l,x,u∈Rn

fi (x) = φ(x; ti ) − yiobs , i = 1, ..., m

where φ(x; t) represents © the ª desired model function, with t an independent variable where the data yiobs are measured, which may be subject to experimental error. The independent variables {xj }, j = 1, ..., n, can be interpreted as parameters of the problem that are to be manipulated in order to adjust the model ∗Miami

Dade College, e-mail: jcastel2@mdc.edu Inc., e-mail: angel.perez@woolton.com

†Woolton

1


to the data. If the model is to have any validity, we can expect that kf (x∗ )k2 (with x∗ being the solution of (1)) will be ”small” and that m, the number of data points, will be much greater than n. The vector function f is called the residual vector and vectors u and l are the upper and lower bounds on the unknown vector of parameters x, respectively. Although problem (1) can be minimized by any general nonlinear optimization method, in most circumstances, the properties of function F make it worthwhile to use methods designed specifically for the least squares problem. In particular, the gradient and Hessian matrix of F have a special struct t ture. If J = [∇f1t (x) ∇f2t (x)...., ∇fm (x)] is the m × n Jacobian matrix of f (x), t then g(x) = ∇F (x) = J(x) f (x), and the n×n Hessian matrix is ∇2 F (x) = m ° ∗ ° P t J(x) J(x) + fi (x)(∇2 fi (x)). If °f (x )° is sufficiently small, then the Hes2

i=1

t

sian matrix can be approximated by J(x) J(x). One of these methods is the well known Levenberg-Marquardt Method (L-M), which in the other hand, can be viewed as the iterative solution of the approximation of the nonlinear problem (1) by a linear-least squares problem, as we will see ahead. The Linear Least-Squares problem (LLS) is the solution of

M inimize n x∈R

1 2 kAx − bk2 2

A : m×n matrix, b : m − data vector

(2)

where the difficulty to find a r easonable a pproximate s olution c omes f rom the usually ill-posedness of the problem, which is reflected in the ill-conditioning of the matrix A.This is a cause of the quasi-linear dependency of its columns and may produce solutions quite far from the correct one, because of small errors in the data. Tikhonov’s regularization method solves this problem penalizing the LLS function in order to force the solution vector to be not so big: M inimize n x∈R

1 2 2 kAx−bk2 + λ kxk2 2

(3)

The difficulty is to select the scalar λ which is problem dependent (see [7]). t In NLS problems is used the approximation ∇2F (xk) ≈ J(xk) J(xk) at each iteration and a Linear Least-Squares problem like (2) is solved, but with the Jacobian matrix J(xk) and residual vector f (xk) at the current iteration being A and b respectively. The minimization process seeks the descent direction s that solves M inimize

1 2 kJ(xk )s + f (xk )k2 2

(4)

s∈Rn

The ill-conditioning of the Jacobian matrix can cause that the iteration of the NLS method may generate solutions to the associated LLS subproblem that are quite far from the exact one. Then, the resulting solution to the NLS problem might also be far from the expected. Thus a Tikhonov’s regularization would be useful in order to avoid this situation. 2


In this paper, the relationship between the iteration of a Levenberg-Marquardt and the Tikhonov’s regularization methods is presented to explain the good performance of the former method. Afterwards, the use of a Trust Region Newton’s method (TRON) [5] to solve the NLS problem taking the Hessian matrix as the t approximation J(xk) J(xk) at each iteration is shown. This method can deal with bound constraints and its relationship to the Levenberg-Marquardt iteration guarantees its good behavior for the NLS problems. Our main goal is to point out these relationships among the methods and, in a paper coming soon, to use these ideas in practical applications. The paper is organized as follows: the connection of the linear iteration of the Levenberg-Marquardt method to Tikhonov’s regularization method is made in section 2. Section 3 describes the TRON method pointing out its similarity to the L-M method when applied to NLS using only first order information in the approximation of the Hessian matrix and thus, the corresponding connection of the former method to the Tikhonov’s regularization method. Section 4 presents the way Levenberg-Marquardt and TRON methods solve the linear subproblem (4) and compute the regularization parameter. Section 5 is devoted to the conclusions. In what follows we will use the following notation: fk for f (xk), gk for g(xk) and Jk for J(xk). The norm used here will be the Euclidean norm, so we omit the subscript.

2 Relationship between the Levenberg-Marquardt and Tikhonov’s Regularization methods The Levenberg-Marquardt method (L-M) [2] has been used to solve the NLS problem with great success, since it takes advantage of the specific form of the function to be minimized; that is to say it exploits the particular form of the gradient and the possibility of approximating the true Hessian matrix at each iteration, under the assumption of small residuals, by Jtk Jk , as was noticed in the introduction to this work. The search direction is defined as the solution of the equations ¡ t ¢ Jk Jk + λk I sk = −Jtk fk where λk is a non-negative scalar. A unit step is always taken along sk , giving xk+1 = xk + sk . It can be shown that, for some scalar ∆k related to λk , the vector sk is the solution of the constrained subproblem M inimize

1 2 kJk s + fk k 2

(5)

subject to ksk≤∆k

which is equivalent to the solution of the unconstrained minimization of the Lagrangian function for problem (5):

3


M inimize

³ ´ 1 2 2 kJk s + fk k + λk ksk − ∆2k . 2 n s∈R

This is also equivalent to Tikhonov’s Regularization (3) for LLS problems except that, in that case, the bound ∆k on the size of the descent direction is not known and therefore, λk must be fixed. In the L-M iteration, ∆k is given in some way by the major iteration, and then λk can be chosen as explained in section 4. This relationship between the L-M method and Tikhonov’s regularization is the reason for the good behavior of the L-M on noisy problems, since, in a certain way, it prevents the size of the iteration vector from growing too much.

3

TRON: Trust Region Newton’s method

TRON [5] is a routine that uses a trust region version of Newton’s method for general nonlinear minimization of bound constrained problems, i.e., M inimize F (x) s.t.

l≤x≤u

where F is a nonlinear smooth scalar funtion in Rn , l and u are the lower and upper bounds, respectively. The basic iteration of the method is xk+1 = xk + sk ¶ µ 1 t 2 t sk = arg min mk (s) = s ∇ Fk s + s gk 2 s.t.

ksk≤∆k

When applied to the problem of parameter estimation (1) with the true Hessian matrix approximated using first order information (∇2 Fk ≈ Jtk Jk ), this method becomes a Levenberg-Marquardt method, since at each iteration the nonlinear least squares problem is approximated by the solution of the associated linear least-squares problem. The basic iteration now becomes xk+1 = xk + sk ¶ µ 1 2 sk = arg min mk (s) = kJk s + fk k2 . 2 s.t.

(6)

ksk≤∆k

To solve the constrained LLS subproblem (6), the vector sk is obtained using a Linear Preconditioned Conjugate Gradient method (LPCG). This is the essential part of the TRON method in which we are interested in this work. For more information about the treatment of the bound constraints see [5]. As the TRON method solves the same subproblem (6) as the L-M method, when applied to a nonlinear least-squares problem using the approximate Hessian matrix Jt J, the equivalence to Tikhonov’s regularization method is also valid for this method. 4


4

Relationship between the solution of the Linear Least Squares subproblem by L-M and TRON methods

The L-M algorithm is of the trust region type and a ”good ” value of λk (or∆k) must be chosen in order to ensure descent. If λk is zero, sk is the Gauss-Newton direction, as λk → ∞, ∆k → 0 and kskk → 0 and sk becomes parallel to the steepest-descent direction. The difficulty in this approach is an appropriate strategy for choosing ∆k, which must rely on heuristic considerations. Most standard strategies (see Dennis and Schnabel [1], More [2]), have originally been developed to ”globalize” the convergence of the Gauss-Newton iteration for wellposed minimization problems, so the parameter λk is chosen once the corresponding ∆k has been fixed by some criteria on the agreement between the nonlinear model and the linear one. If the Gauss-Newton direction (sk =−(Jkt Jk)−1Jkt fk) satisfies the constraint on the norm and the matrix JtkJk is non-singular, then λk is set to zero, but if one of these conditions fails, it is used an algorithm to compute the scalar λk that is the root of the equation ksk(λ)k − ∆k ( where sk(λ) = −(JtkJk + λI)−1Jkt fk). The TRON method solves (5) using a LPCG algorithm that begins at s0k = 0, stopping the iteration whenever: ° i° °sk ° > ∆k ,

(7)

° t ° °Jk Jk sik + gk ° ≤ rtol kgk k , and

(8)

pti Jtk Jk pi = 0

(9)

where pi is the conjugate gradient direction, the subscript i is the counter of the LPCG iterations, and rtol is a tolerance less than one. These criteria solve the problem of the size of the descent direction sk and the possible singularity of the approximate Hessian matrix. If the n iterations of the LPCG are made or (8) is satisfied with a sufficiently low value of rtol or kgk k near zero, then an approximate solution to the Gauss-Newton equations is obtained. If rtol is not sufficiently small or kgk k is large, then the iteration is stopped very early and a direction close to the steepest descent is accepted. If (7) or (9) ° ° are satisfied, then the LPCG finds a scalar τ > 0 such that ksk k = °si + τ pi ° = ∆k . As the LPCG iteration starts at s0 = 0, the iteration vector k k moves from the right-hand side of the equations (−gk ) to their complete solution (Gauss-Newton direction) then, the vector sk is the same as in the L-M iteration and the criterion (7) controls the size of the vector. The regularizing effect (see [4]) of the LPCG iterations assures that the approximate solution to the linear subproblem will be a regularized solution, where the regularization parameter λk is implicitly given by the number of the iteration where the LPCG stops.

5


Thus, the TRON and L-M methods solve the same subproblem (5) by doing a Tikhonov’s regularization to the linearized problem. The difference between them is in the method for computing the scalar λk , which in both cases depends on the trust region radius. In the former method λk is given implicitly by the iteration number of the LPCG method and in the L-M method, an algorithm explicitly computes the root of the univariate equation ksk (λ)k − ∆k .

5

Conclusions

The Levenberg-Marquardt Method is a standard method that gives very good results when applied to Nonlinear Least-Squares Problems. Its good behaviour relies on the fact that at each iteration a Tikhonov’s regularization is made for the associated linear least-squares subproblem, so that the iterations do not grow too much outside a ”permissible” region, where the regularization parameter is chosen by an optimization criterion on the objective function. The use of the TRON method using the approximate Hessian JtJ is an interesting alternative method, since it deals with bounds on the variables and the iterations have the nice properties of the L-M method, as was shown in this paper. Also the use of a LPCG in the inner iteration to compute the descent direction guarantees the regularization of the associated LLS subproblem in the presence of errors, because of the regularizing effect of this latter method. The use of the TRON method in nonlinear parameter estimation problems will be presented in a paper coming soon.

References [1] Dennis J.E., Schnabel R.B., Numerical Methods for Unconstrained Optimization and Nonlinear Equations, SIAM, Philadelphia, 1996 [2] Mor´e J.J., The Levenberg-Marquardt algorithm: implementation and theory, in Numerical Analysis (G.A. Watson, ed.), Lecture Notes in Mathematics 630, Springer-Verlag, pp.105-116, 1977 [3] Hanke M., Regularizing properties of a Truncated Newton-CG algorithm for Nonlinear Inverse Problems, Numer. Funct. Anal. Optim., 18, 971-993, 1997 [4] Hansen P.C., Rank-Deficient and Discrete Ill-posed Problems, SIAM Phipadelphia, 1998 [5] Mor´e J.J., Chih-Jen L., Newton’s Method for large-scale bound constrained optimization problems., SIAM Journal on Opt. Vol.9, No. 4, pp. 1100–1127, 1999. [6] Nocedal J., Stephen J.W., Numerical Optimization, Springer, 1999

6


[7] Tikhonov, A.N., Solution of incorrectly formulated problems and the regularization method, Soviet Math. Dokl., 4, pp.1035-1038, 1963; English translation of Dokl. Akad. Nauk. SSSR, 151, pp. 501-504, 1963

7


1 SURVEY OF STUDENTS’ FAMILIARITY WITH DEVELOPMENTAL MATHEMATICS – A STATISTICAL ANALYSIS M. Shakil, Ph.D. Professor of Mathematics Department of Liberal Arts and Sciences Miami Dade College, Hialeah Campus FL 33012, USA E-mail: mshakil@mdc.edu ABSTRACT In recent years, there has been a great interest in developmental mathematics and college students’ familiarity with it at all level. In this paper, the students’ familiarity with developmental mathematics has been studied from a statistical point of view. By administering a survey developmental mathematics in some math classes, the data have been analysed statistically which shows some interesting results. It is hoped that the findings of the paper will be quite useful for researchers in various disciplines. KEYWORDS ANOVA, Developmental Mathematics, Hypothesis Testing, Shannon’s Diversity Index.

1. INTRODUCTION The importance of college students’ familiarity with developmental mathematics in the present day instruction of mathematics at various levels cannot be overlooked. It appears from the literature that not much work has been done on the problem of students’ familiarity with developmental mathematics. Motivated by the importance of college students’ familiarity with developmental mathematics, in this paper, the students’ familiarity with developmental mathematics has been statistically investigated and analyzed. The interested readers are referred to Shakil et al. (2010) and references therein, where the authors have investigated similar studies to analyze the students’ familiarity with grammar and mechanics of English language from an exploratory point of view. Please also Shanno (1951) and Siromoney (1964), among others. The organization of this paper is as follows. Section 2 discusses the methodology. The results are given in section 3. The discussion and conclusion are provided in Section 4.

2. METHODOLOGY A survey consisting of 20 multiple choice questions on developmental mathematics (see Appendix I) were constructed to test students’ familiarity with developmental mathematics. It was administered to six different math courses in the spring semester of 2016. The courses were MAC 1147, MAC 2233 (two sections) and STA 2023 (three sections), which be referred as MAC 2233-A, MAC2233-B, STA2023-A, STA2023-B and STA2023-C. The survey was administered online in Blackboard by the instructor in each of these courses. A total of 126 students (out of 151 enrolled students) participated in the survey the details of which are provided in the following Table 1 below. Table 1: Surveyed Courses Discipline STA MAC Total

Courses STA2023 (Three Sections) MAC1147, MAC2233 (Two Sections) 6

Respondents 67 59 126


2

3. RESULTS 3.1 MASTERY REPORT The total number of questions in the survey was 20. Each question was assigned 1 point. The possible points in the survey were 20. The score unit was assumed to be percent. There was no passing or failure score in the survey. However, it was expected that the students at the above level of courses will achieve 100 % (that is, 20 out of 20 points) or at least 75 % (that is, 15 out of 20 points) in the developmental mathematics survey. Thus, the minimum % of score of passing the developmental mathematics survey was assumed to be 75 % for a satisfactory knowledge of developmental mathematics. There were two students in STA2023 course who scored 5 (25 %) and 8 (40 %) out of 20 points respectively, and so were discarded from the analysis. The mastery report of the 124 survey participants (excluding the above two students in STA2023 courses) is provided in the Table 2 and Figure 1 below. Table 2: Mastery Report Proportion of Students Total Number of Students Surveyed Reported: 124 Total Number of Survey Questions: 20 Points Assigned Per Question: 1 Minimum % of Passing Score: 75 % % of Students Scoring 20 Points (100 %)

% of Students Scoring 15-19 Points (75-95 %)

40.30%

59.70%

Figure 1: Mastery Report 3.2 PERFOMANCE ANALYSIS For the performance analysis of students in the developmental mathematics survey, the participants were divided into two different categories as follows: Category (A): i. MAC Group: MAC1147, MAC2233 (Two Sections); ii. STA Group: STA2023 (Three Sections).


3 Category (B): (i) MAC1147; (ii) MAC2233 (Two Sections); (iii) STA2023 (Three Sections). Category (C): (i) (ii) (iii) (iv) (v) (vi)

MAC1147; MAC2233 (Section-1); MAC2233 (Section-2); STA2023 (Section-1) ; STA2023 (Section-2) ; STA2023 (Section-3).

The descriptive statistics of the performance of the Categories (A) and (B) in the survey are provided respectively in Tables 3 and 4 below. For the descriptive statistics of the performance of the Category (C), please see the Table 7 in Sub-Section 3.3 below. Table 3: Descriptive Statistics of Category (A) Group

Respondent

Mean

Median

St. Dev.

Min. Score

Max. Score

Q1

Q2

Q3

1.26

Coeff. Of Var. 6.71%

MAC Group STA Group

59

18.73

19

15

20

18

19

20

65

18.85

19

1.30

6.91%

15

20

18

19

20

Table 4: Descriptive Statistics of Category (B) Group

Respondent

Mean

Median

St. Dev.

Min. Score

Max. Score

Q1

Q2

Q3

1.30

Coeff. Of Var. 6.92%

MAC1147

23

18.70

19

16

20

18

19

20

MAC2233 (Two Sections) STA2023 (Three Sections)

36

18.75

19

1.25

6.67%

15

20

18

19

20

65

18.85

19

1.30

6.91%

15

20

18

19

20

3.3 HYPOTHESIS TESTING: INFERENCES ABOUT MEAN SCORES This section discusses the hypothesis testing and draws inferences about the mean scores of different independent samples. The results of these tests of hypotheses are provided below. (I) INFERENCES ABOUT MEAN SCORES OF CATEGORY (A): MAC AND STA PARTICIPANTS Here we discuss the hypothesis testing and draw the inferences about the mean scores of two independent samples: MAC and STA groups, defined as follows: Category (A): i. MAC Group: MAC1147, MAC2233 (Two Sections); ii. STA Group: STA2023 (Three Sections).


4 For the descriptive statistic of MAC and STA Groups, please see the Table 3 above. Following the procedure on Pages 474 475 in Triola (2010) of not equal variances: no pool, the hypothesis testing was conducted for these two independent groups by using the statistical software package STATDISK. The results of the hypothesis test to draw the inferences about the mean scores of MAC and STA Groups are provided in Table 5 and Figure 3 below Table 5: Hypothesis Testing about Mean Scores of MAC and STA Groups Assumption: Not Equal Variances: No Pool Alpha = 0.05 Let µ1 = Mean Score of MAC Group and µ2 = Mean Score of STA Group. Claim: µ1 = µ2 (Null Hypothesis) Not eq. vars: No Pool Alternative Hypothesis: µ1 not equal µ2 Test Statistic, t: -0.5217 Critical t: ±1.979685 P-Value: 0.6028 Degrees of freedom: 121.4640 95% Confidence interval: -0.5753641 < µ1-µ2 < 0.3353641 Fail to Reject the Null Hypothesis There is not enough evidence to warrant the rejection of the claim that µ1 = µ2.

Figure 2: Hypothesis Testing about Mean Scores of MAC and STA Groups (II) ANALYSIS OF VARIANCE (ANOVA): INFERENCES ABOUT MEAN SCORES OF CATEGORY (B): MAC1147, MAC2233 (TWO SECTIONS), AND STA2023 (THREE SECTIONS) PARTICIPANTS For the descriptive statistic of Category B: MAC1147, MAC2233 (Two Sections), and STA2023 (Three Sections) Groups, please see the Table 4 above. Following the procedure on Pages 628 - 631 in Triola (2010), we discuss here the ANOVA for testing the hypothesis of the equality of the mean scores of three independent groups based on the courses, that is, MAC1147, MAC2233 (Two Sections), and STA2023 (Three Sections). The results of ANOVA are provided in Table 6 and Figure 3 below.


5 Table 6: ANOVA: Hypothesis Testing About Equality of Mean Scores ANOVA OF MAC1147, MAC2233 (Two Sections), and STA2023 (Three Sections). Alpha = 0.05 Claim: Equality of the mean scores of three independent groups based on the courses (Null Hypothesis) Alternative Hypothesis: Mean Scores are not equal Source: DF: SS: MS: Test Stat, F: Critical F: P-Value: Treatment: 2 0.467283 0.233642 0.141296 3.071137 0.868375 Error: 121 200.081104 1.653563 Total: 123 200.548387 Fail to Reject the Null Hypothesis There is enough evidence to support the claim of the equality of the mean scores of three independent groups based on the courses, that is, MAC1147, MAC2233 (Two Sections), and STA2023 (Three Sections).

Figure 3: ANOVA: Hypothesis Testing About Equality of Mean Scores (III) ANALYSIS OF VARIANCE (ANOVA): INFERENCES ABOUT MEAN SCORES OF CATEGORY (C): MAC1147, MAC2233 (SECTION-1), MAC2233 (SECTION-2), STA2023 (SECTION-1), STA2023 (SECTION-2), AND STA2023 (SECTION-3) PARTICIPANTS For the descriptive statistic of Category C: MAC1147, MAC2233 (Section-1), MAC2233 (Section-2), STA2023 (Section-1), STA2023 (Section-2), and STA2023 (Section-3) Groups, please see the Table 7 below.


6 Table 7: Descriptive Statistics of Category (C) Group

Respondent

Mean

Median

St. Dev.

Min. Score

Max. Score

Q1

Q2

Q3

1.30

Coeff. Of Var. 6.92%

MAC1147

23

18.70

19

16

20

18

19

20

MAC2233 (Section1) MAC2233 (Section2) STA2023 (Section1) STA2023 (Section2) STA2023 (Section3)

18

18.44

18.5

1.20

6.50%

15

20

18

18.5

19

18

19.06

20

1.26

6.61%

17

20

18

20

20

23

18.83

20

1.61

8.57%

15

20

17

20

20

21

18.76

19

1.14

6.05%

17

20

18

19

20

21

18.95

19

1.12

5.89%

17

20

18

19

20

Following the procedure on Pages 628 - 631 in Triola (2010), we discuss here the ANOVA for testing the hypothesis of the equality of the mean scores of six independent groups based on the courses, that is, Category C: MAC1147, MAC2233 (Section-1), MAC2233 (Section-2), STA2023 (Section-1), STA2023 (Section-2), and STA2023 (Section-3) Groups. The results of ANOVA are provided in Table 8 and Figure 4 below. Table 8: ANOVA: Hypothesis Testing About Equality of Mean Scores ANOVA OF Category C: MAC1147, MAC2233 (Section-1), MAC2233 (Section-2), STA2023 (Section-1), STA2023 (Section-2), and STA2023 (Section-3) Groups. Alpha = 0.05 Claim: Equality of the mean scores of Category C: MAC1147, MAC2233 (Section1), MAC2233 (Section-2), STA2023 (Section-1), STA2023 (Section-2), and STA2023 (Section-3) Groups (Null Hypothesis) Alternative Hypothesis: Mean Scores are not equal Source: DF: SS: MS: Test Stat, F: Critical F: P-Value: Treatment: 4 1.704643 0.426161 0.25042 2.461696 0.9088 Error: 101 171.880262 1.701785 Total: 105 173.584906 Fail to Reject the Null Hypothesis There is enough evidence to support the claim of the equality of the mean scores of Category C: MAC1147, MAC2233 (Section-1), MAC2233 (Section-2), STA2023 (Section-1), STA2023 (Section-2), and STA2023 (Section-3) Groups.


7

Figure 4: ANOVA: Hypothesis Testing About Equality of Mean Scores

3.4 DIVERSITY ANALYSIS This sub-section discusses the diversity analysis for testing the hypothesis of evenness ratio of respondents (two independent samples: MAC and STA groups) based on gender. All these analyses were carried out by using the statistical software packages STATDISK and EXCEL. (I) Respondent Performance Based on Gender (Two Independent Samples: MAC and STA Groups Based on Gender): The performance of respondents (two independent samples: MAC and STA groups) based on gender is provided in Table 9 and Figure 5 below. Table 9: Respondent Performance Based on Gender (Two Independent Samples: MAC and STA Groups Based on Gender) Group- Gender

MAC Group- Male STA Group- Male MAC GroupFemale STA GroupFemale

% of Students (out of 124) Scoring 20 Points (100 %) 5.65% 7.26%

% of Students (out of 124) Scoring 15-19 Points (75-95 %) 16.13% 8.87%

12.10%

13.71%

15.32%

20.97%


8

Figure 5: Respondent Performance Based on Gender (II) Diversity Analysis: This Sub-section discusses the diversity analysis for testing the hypothesis of evenness ratio of respondent performance based on gender belonging to two independent samples: MAC and STA groups. For Diversity Analysis, we first compute the proportion (p) of male and female student population scoring 20 and 15-19 points out of 20 points respectively belonging to the MAC and STA groups, thus making eight different categories, which are provided in Table 10 below. Table 10: Diversity Analysis Based on Gender Group

Gender

MAC Group STA Group MAC Group STA Group

Male

Proportion (p) of Students Scoring 20 Out of 20 Points

Proportion (p) of Students Scoring 15-19 Out of 20 Points

0.0565

0.1613

0.0726

0.0887

0.1210

0.1371

0.1532

0.2097

Male Female Female

Hypothesis: Does the respondent performance (that is, the proportion (p) of male and female student population scoring 20 and 15-19 points out of 20 points respectively belonging to the eight different categories as given in Table 10 above) suggest more diversity in the groups’ familiarity with developmental mathematics? The above hypothesis can be analyzed by applying Shannon’s Measure of Diversity Index (or, entropy) (Shannon, 1948), which is a measure of the diversity of a population, as given below. Shannon’s Diversity Index: For a discrete random variable associated with

n (countable) possible outcomes E i ’s, where P E i   pi , and P   p1 , p2 , , pn  , Shannon’s diversity index (or, entropy), H n P  , or, simply, H , is defined by the

following formula: n

H    pi ln  pi 

(1)

i 1

It can be easily verified that Shannon’s diversity index (or, entropy),

H , satisfies the following conditions:


9 (i)

H is maximum when p1  p2    pn 

(ii)

1 . n

H is minimum when pi  1 , p j  0 , j  i , i  1, 2 , , n , that is, H is minimum, when one of the probabilities is

unity and all others are zero. (iii) From (i) and (ii), it follows that, for the discrete case,

0  H  ln n . Further, the largest value of Shannon’s diversity index,

H max , is given by the following formula:

H max  ln S  ,

(2)

where S denotes the number of categories in the population. Evenness Ratio: The evenness ratio,

EH  where

E H , is given by the following formula:

H , H max

(3)

0  EH  1. Note that if EH  1 , there is complete evenness.

Now, using the values of the proportion (p) from the Table 10 in Equations (1), (2) and (3), the values of Shannon’s diversity index, H , the largest value of Shannon’s diversity index, H max , and the evenness ratio, E H , are computed as follows:

H  2.004878 ,

H max  2.079442 ,

and

EH  0.96414237 . Since EH  0.96414237  1 , there appears to be complete evenness in the respondent performance (that is, the proportion (p) of male and female student population scoring 20 and 15-19 points out of 20 points respectively belonging to the eight different categories as given in Table 10 above). 4. CONCLUSIONS This paper discussed the students’ familiarity with developmental mathematics from a statistical point of view. A survey consisting of 20 multiple choice questions on developmental mathematics was constructed to test students’ familiarity with developmental mathematics in six different courses, that is, MAC 1147, MAC 2233 (two sections) and STA 2023 (three sections), during the spring semester of 2016, and was administered online in Blackboard by the instructor. A total of 126 students (out of 151 enrolled students) participated in the survey. There were two students in STA2023 course who scored 5 (25 %) and 8 (40 %) out of 20 points respectively, and so were discarded from the analysis. The mastery report of the 124 survey participants (excluding the above two students in STA2023 courses) is provided in the Table 2 and Figure 1 in Subsection 3.1. The minimum % to pass was 60. Out of 124 survey participants, considered in this research project, 40.30 %


10 students scored 20 out of 20 points, whereas 59.70% students scored 15-19 out of 20 points. Based on the hypothesis testing, the following inferences were drawn about the survey participants. 

There was sufficient evidence to support the claim that the mean scores of MAC group participants were same.

There was sufficient evidence to support the claim of the equality of the mean scores of three independent group participants based on the courses, that is, MAC1147, MAC2233 (Two Sections), and STA2023 (Three Sections).

There is was sufficient to support the claim of the equality of the mean scores of Category C: MAC1147, MAC2233 (Section-1), MAC2233 (Section-2), STA2023 (Section-1), STA2023 (Section-2), and STA2023 (Section-3) group participants.

There appeared to be complete evenness in the respondent performance (that is, the proportion (p) of male and female student population scoring 20 and 15-19 points out of 20 points respectively belonging to the eight different categories.

and STA group

It is hoped that the findings of the paper will be quite useful for researchers in various disciplines.

ACKNOWLEDGMENT The author would like to express his sincere gratitude and acknowledge their indebtedness to his students of the courses, that is, MAC 1147, MAC 2233 (two sections) and STA 2023 (three sections), in the spring semester of 2016, for their cooperation in participating in the survey. Further, the author would like to thank the Editorial Committee of Polygon for accepting this paper for publication in Polygon. I would also like to acknowledge my sincere indebtedness to the works of various authors and resources on the subject which I have consulted during the preparation of this research project. The author is thankful to his wife for her patience and perseverance for the period during which this paper was prepared. The author would like to dedicate this paper to his late parents, brothers and sisters. Last but not least, the author is thankful to the Miami Dade College for giving an opportunity to serve this college, without which it was impossible to conduct his research. REFERENCES [1] Shakil, M., Calderin, V., and Pierre-Philippe, L. (2010). Survey of Students’ Familiarity with Grammar and Mechanics of English Language – An Exploratory Analysis, Polygon, Vol. 4, pp. 43 – 55. [2] Shannon, C.E. (1948). A Mathematical Theory of Communication. Bell System Technical Journal, 27, pp. 379 - 423; 623 - 656. [3] Shannon, C.E. (1951). Prediction and Entropy of Printed English. Bell System Technical Journal, 30, pp. 50 - 64. [4] Siromoney, G. (1964). An Information-theoretical Test for Familiarity with a Foreign Language. Journal of Psychological Researches, viii, pp. 267 – 272. [5] Triola, M. F. (2010). Elementary Statistics. Addison-Wesley, N. Y.


11 APPENDIX I Spring 16 "Survey of Student's Familiarity with Developmental Mathematics" Name:

GENDER:

Current GPA:

Major:

Course Name/Reference:

Term/Year:

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Simplify. 1) Write "forty-one thousand five hundred forty-three" in standard form. A) 410,543 B) 415,043 C) 41,543

1) D) 401,543

Solve. Write the answer in simplest form.

2

2) Mary is saving 19 of her monthly income of $5358 for retirement. How much money is she setting

aside each month for retirement? A) $50,901 B) $282

C) $141

D) $564

C) 44

D) 45

C) 10.85

D) 10.7

2)

Solve. 3) Subtract 9 from 54.

A) 46

3)

B) 55

Round the decimal to the indicated place value. 4) 10.849, nearest tenth

A) 10.9

B) 10.8

4)

Perform the indicated operations. Round the result to the nearest thousandth, if necessary.

5) 85.42 + 79.65 + 15.475 A) 180.645 B) 181.545

5) C) 180.555

D) 180.545

Simplify the expression.

6) -12 - (-7) A) -5

B) 19

C) 5

D) -19

7) 5(-11) A) -55

B) -60

C) -550

D) -155

6) 7)

Perform the indicated operations. Round the result to the nearest thousandth, if necessary.

8) A country reports total exports of $4,771 million last year. Write this number using 8) standard notation. A) $4,771,000 B) $4,771,000,000 C) $4,771 D) $4,771,000,000,000 Write the decimal as a percent.

9) 0.41 A) 410%

9) B) 0.041%

C) 4.1%

D) 41%


12

Solve. 10) In a survey of 100 people, 47 preferred ketchup on their hot dogs. What percent preferred ketchup? A) 47%

B) 0.47%

C)

47 % 100

10)

D) 4.7%

The following plane figure is called a triangle. The sum of the three angles of a triangle is always 180°. Find the measure of the missing angle in the figure.

11)

11) 48° 66°

A) 58°

B) 66°

C) 76°

D) 48°

C) >

D) like

Fill in the blank with one of the words or phrases listed below. equivalent, or >, least common denominator, or mixed number, <, least common multiple, like 12) The symbol A) equivalent

means is greater than. B) <

12)

Find the GCF for the list.

13) 36, 15 A) 6

13) B) 1

C) 15

D) 3

Simplify the radical. Indicate if the radical is not a real number. Assume that x represents a positive real number. 14) 625 14) A) -25 B) 312 C) 25 D) Not a real number

15) In the following figure, the sum of the angles  x and 30° is 90°, that is, the angles  x15) and 30° are complementary of each other. Find the measure of  x .

30° A) 55°

B) 115°

C) 150°

D) 60°

Insert <, >, or = to make the statement true.

16) - 6 - 3 A) =

16) B) <

C) >


13 Evaluate the expression for the given replacement values. for x = 5 and y = - 2 17) x2 + y2 A) 29 B) 100 C) 14

17) D) 20

Simplify the expression.

18) 7x + 2 - 3x + 1 A) 4x + 1

B) 7x

C) 4x + 3

18) D) 10x + 3

Solve the equation. Don't forget to first simplify each side of the equation, if possible.

19) 6x - 5x + 4 = 4 A) 4

19) B) -4

C) 0

D) 8

The bar graph shows the number of students who flunk Dr. Jones' class each year.

20) During which year(s) did Dr. Jones' have more than 10 students flunk his class? 20) A) 1998, 1999 B) 2002 C) 1998, 1999, 2000 D) 1998, 2002


1

Item Analysis Statistics and Their Uses: An Overview M. Shakil, Ph.D. Professor of Mathematics Department of Liberal Arts and Sciences Miami Dade College, Hialeah Campus FL 33012, USA E-mail: mshakil@mdc.edu

Abstract In this paper, we have presented an overview of some item analysis statistics which are available in the ParSCORETM analysis report. The uses of item analysis statistics to some multiple-choice math examinations have been investigated. It is hoped that the present study would be quite useful in recognizing the most critical pieces of the test items data, and evaluating whether or not that test item needs revision. The methods discussed in this project can be used to describe the relevance of test item analysis to classroom tests.

Keywords: Item Analysis Statistics, Multiple-Choice Examinations, ParSCORETM Analysis.

1. Introduction An item analysis involves many statistics that can provide useful information for determining the validity and improving the quality and accuracy of multiple-choice or true/false items. These statistics are used to measure the ability levels of examinees from their responses to each item. The ParSCORETM item analysis report, when a Multiple-Choice Exam is machine scored, consists of three types of reports, that is, a summary of test statistics, a test frequency table, and item statistics. The test statistics summary and frequency table describe the distribution of test scores. The item analysis statistics evaluate class-wide performance on each test item. The ParSCORETM report on item analysis statistics gives an overall view of the test results and evaluates each test item, which are also useful in comparing the item analysis for different test forms. The organization of this paper is as follows. In Section 2, descriptions of some useful, common item analysis statistics, that is, item difficulty, item discrimination, distractor analysis, and reliability, are presented. For the sake of completeness, in Section 2, definitions of some test statistics as reported in the ParSCORETM analysis report are also provided. Section 3 contains the uses of item analysis statistics to some multiple-choice math examinations. The concluding remarks are presented in Section 4.

2. Item Analysis Statistics In what follows, we shall present some commonly used item analysis statistics available on ParSCORETM report when a Multiple-Choice Exam is machine scored. For details on these, the


2 interested readers are referred to Wood (1960), Lord & Novick (1968), Henrysson (1971), Nunally (1978), Thompson and Levitov (1985), Crocker and Algina (1986), Ebel and Frisbie (1986), Suen (1990), Thorndike et al. (1991), DeVellis (1991), Millman and Greene (1993), Haladyna (1999), Tanner (2001), Haladyna et al. (2002), and Mertler (2003), among others. (I) Item Difficulty: Item difficulty is a measure of the difficulty of an item. For items (that is, multiple-choice questions) with one correct alternative worth a single point, the item difficulty (also known as the item difficulty index, or the difficulty level index, or the difficulty factor, or the item facility index, or the item easiness index, or the p -value) is defined as the proportion of respondents (examinees) selecting the answer to the item correctly, and is given by

p

c n

where p  the difficulty factor, c  the number of respondents selecting the correct answer to an item, and n  total number of respondents. Item difficulty is relevant for determining whether students have learned the concept being tested. It also plays an important role in the ability of an item to discriminate between students who know the tested material and those who do not. Note that (i)

0  p  1.

(ii)

A higher value of p indicate low difficulty level index, that is, the item is easy. A lower value of p indicate high difficulty level index, that is, the item is difficult. In general, an ideal test should have an overall item difficulty of around 0.5; however it is acceptable for individual items to have higher or lower facility (ranging from 0.2 to 0.8). In a criterion-referenced test (CRT), with emphasis on mastery-testing of the topics covered, the optimal value of p for many items is expected to be 0.90 or above. On the other hand, in a norm-referenced test (NRT), with emphasis on discriminating between different levels of achievement, it is given by p  0.50 . For details on these, see, for example, Chase(1999), among others.

(iii)

To maximize item discrimination, ideal (or moderate or desirable) item difficulty level, denoted as p M , is defined as a point midway between the probability of success, denoted as p S , of answering the multiple - choice item correctly (that is, 1.00 divided by the number of choices) and a perfect score (that is, 1.00) for the item, and is given by

pM  pS  (iv)

1  pS . 2

Thus, using the above formula in (iv), ideal (or moderate or desirable) item difficulty levels for multiple-choice items can be easily calculated, which are provided in the following table, (for details, see, for example, Lord, 1952; among others).


3

Number of Alternatives

Probability of Success ( pS )

Ideal Item Difficulty Level ( pM )

2 3 4 5

0.50 0.33 0.25 0.20

0.75 0.67 0.63 0.60

(Ia) Mean Item Difficulty (or Mean Item Easiness): Mean item difficulty is the average of difficulty easiness of all test items. It is an overall measure of the test difficulty and ideally ranges between 60 % and 80 % (that is, 0.60  p  0.80 ) for classroom achievement tests. Lower numbers indicate a difficult test while higher numbers indicate an easy test. (II) Item Discrimination: The item discrimination (or the item discrimination index) is a basic measure of the validity of an item. It is defined as the discriminating power or the degree of an item's ability to discriminate (or differentiate) between high achievers (that is, those who scored high on the total test) and low achievers (that is, those who scored low), which are determined on the same criterion, that is, (1) internal criterion, for example, test itself; and (2) external criterion, for example, intelligence test or other achievement test. Further, the computation of the item discrimination index assumes that the distribution of test scores is normal and that there is a normal distribution underlying the right or wrong dichotomy of a student’s performance on an item. For details on the item discrimination index, see, for example, Kelly (1939), Wood (1960), Henrysson (1971), Nunally (1972), Ebel (1979), Popham (1981), Ebel & Frisbie (1986), Weirsma & Jurs (1990), Glass & Hopkins (1995), Brown (1996), Chase (1999), Haladyna (1999), Nitko (2001), Tanner (2001), Oosterhof (2001), Haladyna et al. (2002), and Mertler (2003), among others. There are several ways to compute the item discrimination, but, as shown on the ParSCORETM item analysis report and also as reported in the literature, the following formulas are most commonly used indicators of item’s discrimination effectiveness. (a) Item Discrimination Index (or Item Discriminating Power, or D -Statistics), D : Let the students’ test scores be rank-ordered from lowest to highest. Let

pU 

No. of students in upper 25 %  30 % group answering the item correctly , Total Number of students in upper 25 %  30 % group

pL 

No. of students in lower 25 %  30 % group answering the item correctly Total Number of students in lower 25 %  30 % group

and

The ParSCORETM item analysis report considers the upper 27 % and the lower 27 % as the analysis groups. The item discrimination index, D , is given by


4 D  pU  p L . Note that (i) (ii) (iii) (iv) (v)

(vi)

 1  D  1. Items with positive values of D are known as positively discriminating items, and those with negative values of D are known as negatively discriminating items. If D  0 , that is, pU  p L , there is no discrimination between the upper and lower groups. If D   1.00 , that is, pU  1.00 and p L  0 , there is a perfect discrimination between the two groups. If D   1.00 , that is, pU  0 and p L  1.00 , it means that all members of the lower group answered the item correctly and all members of the upper group answered the item incorrectly. This indicates the invalidity of the item, that is, the item has been miskeyed and needs to be rewritten or eliminated. A guideline for the value of an item discrimination index is provided in the following table, see, for example, Chase(1999), and Mertler(2003), among others. Item Discrimination Index, D

D  0.50 0.40  D  0.49 0.30  D  0.39 0.20  D  0.29 D  0.20

Quality of an Item Very Good Item; Definitely Retain Good Item; Very Usable Fair Quality; Usable Item Potentially Poor Item; Consider Revising Potentially Very Poor; Possibly Revise Substantially, or Discard

(b) Mean Item Discrimination Index, D : This is the average discrimination index for all test items combined. A large positive value (above 0.30) indicates good discrimination between the upper and lower scoring students. Tests that do not discriminate well are generally not very reliable and should be reviewed. (c) Point-Biserial Correlation (or Item-Total Correlation or Item Discrimination) Coefficient, rpbis : The point-biserial correlation coefficient is another item discrimination index of assessing the usefulness (or validity) of an item as a measure of individual differences in knowledge, skill, ability, attitude, or personality characteristic. It is defined as the correlation between the student performance on an item (correct or incorrect) and overall test score, and is given by either of the following two equations (which are mathematically equivalent). (i)

Suen (1990); DeVellis (1991); Haladyna (1999)


5

rpbis

   X C  XT  p,    q s  

where rpbis  the point-biserial correlation coefficient; X C  the mean total score for 

examinees who have answered the item correctly; X T  the mean total score for all examines; p  the difficulty value of the item; q  1  p ; and s  the standard deviation of total exam scores. (ii)

Brown (1996)

 m p  mq  rpbis    pq , s   where rpbis  the point-biserial correlation coefficient; m p  the mean total score for examinees who have answered the item correctly; mq  the mean total score for examinees who have answered the item incorrectly; p  the difficulty value of the item; q  1  p ; and s  the standard deviation of total exam scores. Note that (i)

The interpretation of the point-biserial correlation coefficient, rpbis , is same as that of the D -statistic.

(ii)

It assumes that the distribution of test scores is normal and that there is a normal distribution underlying the right or wrong dichotomy of a student performance on an item.

(iii)

It is mathematically equivalent to the Pearson (product moment) correlation coefficient, which can be shown by assigning two distinct numerical values to the dichotomous variable (test item), that is, incorrect = 0 and correct = 1.

(iv)

 1  rpbis   1 .

(v)

rpbis  0 means little correlation between the score on the item and the score on the test.

(vi)

A high positive value of rpbis indicates that the examinees who answered the item

(vii)

correctly also received higher scores on the test than those examinees who answered the item incorrectly. A negative value indicates that the examinees who answered the item correctly received low scores on the test and those examinees who answered the item incorrectly did better on the test. It is advisable that an item with rpbis  0 or with large negative value of rpbis should be eliminated or revised. Also, an item with low positive value of rpbis should be revised for improvement.


6 (viii)

Generally, the value of rpbis for an item may be put into two categories as provided in the following table.

(ix)

Point-Biserial Correlation Coefficient, rpbis

Quality

rpbis  0.30

Acceptable Range

rpbis  1

Ideal Value

The statistical significance of the point-biserial correlation coefficient, rpbis , may be determined by applying the Student’s t test; see, for example, Triola (2007), among others.

Remark: It should be noted that the use of point-biserial correlation coefficient, rpbis , is more advantageous than that of item discrimination index statistics, D , because every student taking the test is taken into consideration in the computation of rpbis , whereas only 54 % of test-takers passing each item in both groups (that is, the upper 27 % + the lower 27 %) are used to compute D . (d) Mean Item-Total Correlation Coefficient, rpbis : It is defined as the average correlation of all the test items with the total score. It is a measure of overall test discrimination. A large positive value indicates good discrimination between students. (III) Internal Consistency Reliability Coefficient (Kuder-Richardson 20, KR20 , Reliability Estimate): The statistic that measures the test reliability of inter-item consistency, that is, how well the test items are correlated with one another, is called the internal consistency reliability coefficient of the test. For a test, having multiple-choice items that are scored correct or incorrect, and that is administered only once, the Kuder-Richardson formula 20 (also known as KR-20) is used to measure the internal consistency reliability of the test scores; see, for example, Nunally (1972), and Haladyna (1999), among others. The KR-20 is also reported in the ParSCORETM item analysis. It is given by the following formula: n  2   n  s   pi qi  i 1  KR20   2 s n  1

where KR20 = the reliability index for the total test; n = the number of items in the test; s 2 = the variance of test scores; p i = the difficulty value of the item; and qi  1  pi . Note that (i)

0.0  KR20  1.0 .

(ii)

KR20  0 indicates a weaker relationship between test items, that is, the overall test score is less reliable. A large value of KR20 indicates high reliability.

(iii)

Generally, the value of KR20 for an item may be put into the following categories as provided in the table below.


7

(iv)

KR20

Quality

KR20  0.60

Acceptable Range

KR20  0.75

Desirable

0.80  KR20  0.85

Better t

KR20  1

Ideal Value

Remarks: The reliability of a test can be improved as follows: a) By increasing the number of items in the test for which the following Spearman-Brown prophecy formula is used (Mertler, 2003).

rest 

nr 1  n  1 r

where rest = the estimated new reliability coefficient; r = the original KR20 reliability coefficient; n = the number of times the test is lengthened. b) Or, using the items that have high discrimination values in the test. c) Or, performing an item-total statistic analysis as described above.

(IV) Standard Error of Measurement ( SE m ): It is another important component of test item analysis to measure the internal consistency reliability of a test see, for example, Nunally (1972), and Mertler (2003), among others. It is given by the following formula:

SEm  s 1  KR20 , 0.0  KR20  1.0 , where SEm = the standard error of measurement; s = the standard deviation of test scores; and KR20 = the reliability coefficient for the total test. Note that (i)

SEm  0 , when KR20  1 .

(ii)

SEm  1 , when KR20  0 .

(iii)

A small value of SEm (e.g.,  3 ) indicates high reliability; whereas a large value of SEm indicates low reliability.


8 (iv)

Remark: Higher reliability coefficient (i.e., KR20  1 ) and smaller standard deviation for a test indicate smaller standard error of measurement. This is considered to be more desirable situation for classroom tests.

(v) Test Item Distractor Analysis: It is an important and useful component of test item analysis. A test item distractor is defined as the incorrect response options in a multiple-choice test item. According to the research, there is a relationship between the quality of the distractors in a test item and the student performance on the test item, which also affect the student performance on his/her total test score. The performance of these incorrect item response options can be determined through the test item distractor analysis frequency table which contains the frequency, or number of students, that selected each incorrect option. The test item distractor analysis is also provided in the ParSCORETM item analysis report. For details on test item distractor analysis, see, for example, Thompson & Levitov (1985), DeVellis (1991), Milman & Greene (1993), Haladyna (1999), and Mertler (2003), among others. A general guideline for the item distractor analysis is provided in the following table:

Item Response Options

Item Difficulty

p

Item Discrimination Index D or rpbis

Correct Response

0.35  p  0.85 (Better)

D  0.30 or rpbis  0.30

Distractors

p  0.02 (Better)

(Better)

D  0 or rpbis  0 (Better)

(v) Mean: The mean is a measure of central tendency and gives the average test score of a sample of respondents (examinees), and is given by n

x

 x  i 1

n

i

,

where xi  individual test score , xi  individual test score , n  no. of respondents . (vi) Median: If all scores are ranked from lowest to highest, the median is the middle score. Half of the scores will be lower than the median. The median is also known as the 50th percentile or the 2nd quartile. (vii) Range of Scores: It is defined as the difference of the highest and lowest test scores. The range is a basic measure of variability. (viii) Standard Deviation: For a sample of n examinees, the standard deviation, denoted by s , of test scores is given by the following equation


9

s

2

    xi  x    i 1 , n 1 n

where xi  individual test score and x  average test score . The standard deviation is a measure of variability or the spread of the score distribution. It measures how far the scores deviate from the mean. If the scores are grouped closely together, the test will have a small standard deviation. A test with a large value of the standard deviation is considered better in discriminating the student performance levels. (ix) Variance: For a sample of n examinees, the variance, denoted by s 2 , of test scores is defined as the square of the standard deviation, and is given by the following equation 2

   x  x    i  i 1 2 . s  n 1 (x) Skewness: For a sample of n examinees, the skewness, denoted by  3 , of the distribution n

of the test scores is given by the following equation  3 n   x  x n     i , 3     n  1n  2  i  1  s       

where xi  individual test score , x  average test score and

s  s tan dard deviation of test scores . It measures the lack of symmetry of the distribution. The skewness is 0 for symmetric distribution and is negative or positive depending on whether the distribution is negatively skewed (has a longer left tail) or positively skewed (has a longer right tail). (xi) Kurtosis: For a sample of n examinees, the kurtosis, denoted by  4 , of the distribution of the test scores is given by the following equation  4   2 n  x  x n n  1 3 n  1   i   4      s   n  2n  3 ,     n  1 n  2 n  3 i  1       

where xi  individual test score , x  average test score , and

s  s tan dard deviation of test scores . It measures the tail-heaviness (the amount of probability in the tails). For the normal distribution,  4  3 . Thus, depending on whether  4  3 or  3 , a distribution is heavier tailed or lighter tailed than the normal distribution.


10 3. Use of Item Analysis Statistics This section provides some of the uses of item analysis statistics to some multiple-choice math examinations (which we call as MAT0000-Version A and MAT0000-Version B). It consists of three parts, which are described below. 3.1. Test Item Analysis of MAT0000-Version A and MAT0000-Version B Exams An item analysis of the data obtained from my MAT0000-Version A and MAT0000-Version B Exam Items is presented here based upon the classical test theory (CRT). Various test item statistics and relevant statistical graphs (for both test forms, Versions A and B) using the ParSCORETM item analysis report and the Minitab software are computed and summarized in the Tables 1 – 5 below. Each version consisted of 30 items. There were two different groups of 7 students for each version. It appears from these statistical analyses that a large value of KR20  0.90  1 for Version B indicates its high reliability in comparison to Version A, which is also substantiated by large positive values of Mean DI  0.450  0.3 and Mean Pt. Bisr.  0.4223 , small value of standard error of measurement (that is, SEM  1.82 ), and an ideal value of mean (that is,   19.57  18 , the passing score) for Version B. These analyses are also evident by the bar charts and scatter plots drawn for various test item statistics using Minitab, that is, item difficulty ( p ), item discrimination index ( D ) and pointbiserial correlation coefficient ( rpbis ), which are presented below in Figures 1 and 2. Table 1 A Comparison of MAT0000-Version A and MAT0000-Version B Exam Test Items Exam. Version

Re liability

A B

0.53 0.90

Mean

SD

SEM

p  0.3

0.3  p  0.7

17.14 19.57

2.80 5.75

1.92 1.82

8 1

10 15

p  0.7

D  0.2

KR  20

Exam. Version

Mean DI

Mean Pt . Bisr.

A B

0.233 0.450

0.2060 0.4223

12 14

14 20


11 Table 2 (MAT0000-Version A - Data Display) Disc. Ind. Difficulty Pt-Bis Row PU PL (D) Difficulty (p) (p) % (r) 1 1.0 0.0 1.0 0.4286 42.86 0.78 2 1.0 1.0 0.0 0.8571 85.71 0.02 3 1.0 0.5 0.5 0.8571 85.71 0.46 4 1.0 0.0 1.0 0.5714 57.14 0.66 5 1.0 0.0 1.0 0.5714 57.14 0.77 6 1.0 0.0 1.0 0.7143 71.43 0.82 7 0.5 0.0 0.5 0.5714 57.14 0.56 8 1.0 1.0 0.0 1.0000 100.00 0.00 9 0.0 0.5 -0.5 0.1429 14.29 -0.46 10 0.5 0.5 0.0 0.4286 42.86 0.27 11 0.5 0.5 0.0 0.4286 42.86 -0.15 12 1.0 1.0 0.0 1.0000 100.00 0.00 13 1.0 1.0 0.0 1.0000 100.00 0.00 14 0.0 0.0 0.0 0.0000 0.00 0.00 15 1.0 0.5 0.5 0.5714 57.14 0.25 16 1.0 0.5 0.5 0.7143 71.43 0.37 17 1.0 0.5 0.5 0.8571 85.71 0.60 18 1.0 1.0 0.0 1.0000 100.00 0.00 19 1.0 1.0 0.0 1.0000 100.00 0.00 20 1.0 0.5 0.5 0.8571 85.71 0.46 21 1.0 0.5 0.5 0.8571 85.71 0.46 22 0.5 0.5 0.0 0.5714 57.14 -0.16 23 0.0 0.5 -0.5 0.1429 14.29 -0.46 24 0.5 1.0 -0.5 0.5714 57.14 -0.27 25 0.0 0.0 0.0 0.2857 28.57 0.08 26 0.0 0.0 0.0 0.1429 14.29 -0.02 27 1.0 0.5 0.5 0.4286 42.86 0.37 28 0.5 0.0 0.5 0.1429 14.29 0.71 29 0.5 0.0 0.5 0.2857 28.57 0.53 30 0.0 0.5 -0.5 0.1429 14.29 -0.46

Table 3 Descriptive Statistics: MAT0000-Version A Variable Mean SE Mean StDev Variance Minimum Q1 Disc. Ind. (D) 0.2333 0.0821 0.4498 0.2023 -0.5000 0.000000000 Difficulty (p) 0.5714 0.0573 0.3139 0.0985 0.000000000 0.2857 Difficulty (p) % 57.14 5.73 31.39 985.11 0.000000000 28.57 Pt-Bis (r) 0.2063 0.0703 0.3850 0.1482 -0.4600 -0.00500 Variable Median Q3 Maximum Disc. Ind. (D) 0.000000000 0.5000 1.0000 Difficulty (p) 0.5714 0.8571 1.0000 Difficulty (p) % 57.14 85.71 100.00 Pt-Bis (r) 0.1650 0.5375 0.8200


12 Chart of Disc. Ind. (D)

Chart of Pt-Bis (r)

12

6

5

10

5

4

8

3

4 Count

Count

Count

Chart of Difficulty (p) % 6

6

2

4

1

2

3 2 1 0

0

0.00

14.29

28.57

42.86 57.14 Difficulty (p) %

71.43

85.71

100.00

0

-0.5

0.0

0.5

1.0

Disc. Ind. (D)

Scatterplot of Disc. Ind. (D) vs Difficulty (p) %

6 7 6 5 2 0 2 8 5 7 7 6 3 6 0 6 1 7 8 2 .4 .2 .1 .1 .0 0.0 0.0 0.0 0.2 0.2 0.3 0.4 0.5 0.5 0.6 0.6 0.7 0.7 0.7 0.8 -0 - 0 - 0 -0 -0 Pt-Bis (r)

Chart of Disc. Ind. (D) vs Difficulty (p) % 3

1.00 0.75

Disc. Ind. (D)

0.25 0.00

1

0

-0.25 -0.50

-1

0

20

40 60 Difficulty (p) %

80

100

42.86

85.71

57.14

Scatterplot of Pt-Bis (r) vs Difficulty (p) %

71.43 100.00 Difficulty (p) %

14.29

0.00

28.57

Chart of Pt-Bis (r) vs Difficulty (p) %

1.00

2.0

0.75

1.5

0.50

1.0

Pt-Bis (r)

Pt-Bis (r)

Disc. Ind. (D)

2

0.50

0.25

0.5

0.00

0.0

-0.25

-0.5

-0.50

-1.0

0

20

40 60 Difficulty (p) %

80

100

42.86

85.71

57.14

71.43 100.00 Difficulty (p) %

Figure 1 (Bar Charts and Scatter Plots for p , D , and rpbis , Version A)

14.29

0.00

28.57


13 Table 4 (MAT0000-Version B - Data Display) Disc. Ind. Difficulty Pt-Bis Row PU PL (D) Difficulty (p) (p) % (r) 1 1.0 1.0 0.0 1.0000 100.00 0.00 2 1.0 1.0 0.0 0.7143 71.43 0.06 3 1.0 1.0 0.0 1.0000 100.00 0.00 4 1.0 1.0 0.0 0.8571 85.71 0.11 5 1.0 0.5 0.5 0.8571 85.71 0.54 6 1.0 0.5 0.5 0.7143 71.43 0.67 7 1.0 0.0 1.0 0.4286 42.86 0.92 8 1.0 0.5 0.5 0.4286 42.86 0.37 9 0.5 0.5 0.0 0.4286 42.86 0.42 10 1.0 0.0 1.0 0.4286 42.86 0.92 11 1.0 0.5 0.5 0.5714 57.14 0.69 12 1.0 1.0 0.0 1.0000 100.00 0.00 13 1.0 0.5 0.5 0.8571 85.71 0.32 14 0.5 0.0 0.5 0.4286 42.86 0.37 15 1.0 0.5 0.5 0.5714 57.14 0.54 16 0.5 0.0 0.5 0.5714 57.14 0.34 17 1.0 0.0 1.0 0.5714 57.14 0.69 18 1.0 1.0 0.0 1.0000 100.00 0.00 19 1.0 1.0 0.0 1.0000 100.00 0.00 20 1.0 0.5 0.5 0.8571 85.71 0.54 21 0.5 1.0 -0.5 0.8571 85.71 -0.39 22 1.0 0.5 0.5 0.7143 71.43 0.67 23 0.5 0.0 0.5 0.1429 14.29 0.67 24 1.0 0.0 1.0 0.4286 42.86 0.92 25 1.0 0.0 1.0 0.5714 57.14 0.44 26 1.0 0.0 1.0 0.4286 42.86 0.67 27 1.0 0.5 0.5 0.7143 71.43 0.06 28 0.5 0.0 0.5 0.1429 14.29 0.67 29 1.0 0.5 0.5 0.8571 85.71 0.54 30 1.0 0.0 1.0 0.4286 42.86 0.92

Table 5 Descriptive Statistics: MAT0000-Version B Variable Mean SE Mean StDev Variance Minimum Q1 Disc. Ind. (D) 0.4500 0.0733 0.4015 0.1612 -0.5000 0.000000000 Difficulty (p) 0.6524 0.0458 0.2508 0.0629 0.1429 0.4286 Difficulty (p) % 65.24 4.58 25.08 628.81 14.29 42.86 Pt-Bis (r) 0.4223 0.0628 0.3440 0.1183 -0.3900 0.0600 Variable Median Q3 Maximum Disc. Ind. (D) 0.5000 0.6250 1.0000 Difficulty (p) 0.6429 0.8571 1.0000 Difficulty (p) % 64.29 85.71 100.00 Pt-Bis (r) 0.4900 0.6700 0.9200


14

Chart of Difficulty (p) %

5

14 12

6

10

5

8

Count

7

4

4

Count

8

Count

Chart of Pt-Bis (r)

Chart of Disc. Ind. (D)

9

6

2

3 4

2 1

2

0

0

14.29

42.86

57.14 71.43 Difficulty (p) %

85.71

100.00

1

-0.5

0.0

0.5

1.0

Disc. Ind. (D)

Scatterplot of Disc. Ind. (D) vs Difficulty (p) %

0

-0.39 0.00 0.06 0.11 0.32 0.34 0.37 0.42 0.44 0.54 0.67 0.69 0.92 Pt-Bis (r)

Chart of Disc. Ind. (D) vs Difficulty (p) %

1.00

6

0.75

5

0.50

4

Disc. Ind. (D)

Disc. Ind. (D)

3

0.25 0.00

3 2

-0.25

1

-0.50 10

20

30

40

50 60 70 Difficulty (p) %

80

90

0

100

100.00

6

0.75

5

0.50

4

0.25

2

-0.25

1

-0.50 20

30

40

50 60 70 Difficulty (p) %

80

57.14

14.29

3

0.00

10

85.71 42.86 Difficulty (p) %

Chart of Pt-Bis (r) vs Difficulty (p) %

1.00

Pt-Bis (r)

Pt-Bis (r)

Scatterplot of Pt-Bis (r) vs Difficulty (p) %

71.43

90

100

0

100.00

71.43

85.71 42.86 Difficulty (p) %

57.14

14.29

Figure 2 (Bar Charts and Scatter Plots for p , D , and rpbis , Version B)

3.2. A Comparison of MAT0000-Version A and MAT0000-Version B Exams Performance A Two-Sample T-Test: To identify if there is a significant difference between the MAT0000Version A and MAT0000-Version B Exams Performance of the students, a two-sample T-test was conducted using the Minitab and Statdisk software. For this, first the assumption of normality was checked using the Anderson-Darling Test for both groups, and the normality tests were met. The results are provided in the Tables 6 -7. Moreover, at the significance level of   0.05 , the two-sample T-test conducted fails to reject the claim that  A   B , that is, the sample does not provide enough evidence to reject the claim.


15 Table 6 Descriptive Statistics: MAT0000-Version A and MAT0000-Version B Exams Total Variable Count N Mean SE Mean StDev Variance Minimum Q1 Median MAT0000A 7 7 17.14 1.14 3.02 9.14 13.00 14.00 17.00 MAT0000B 7 7 19.57 2.35 6.21 38.62 12.00 15.00 18.00

Variable Q3 Maximum Skewness Kurtosis MAT0000A 19.00 22.00 0.16 -0.03 MAT0000B 25.00 29.00 0.40 -1.31

Table 7 Two-Sample T-Test and CI: MAT0000-Version A and MAT0000-Version B (Assume Unequal Variances) Two-sample T for MAT0000-Version A vs MAT0000-Version B N Mean StDev SE Mean MAT0000A 7 17.14 3.02 1.1 MAT0000B 7 19.57 6.21 2.3

Difference = mu (MAT0000A) - mu (MAT0000B) Estimate for difference: -2.42857 95% CI for difference: (-8.45211, 3.59497) T-Test of difference = 0 (vs not =): T-Value = -0.93 P-Value = 0.380 DF = 8

Two-Sample T-Test and CI: MAT0000-Version A and MAT0000-Version B (Assume Equal Variances) Two-sample T for MAT0000-Version A vs MAT0000-Version B N Mean StDev SE Mean MAT0000A 7 17.14 3.02 1.1 MAT0000B 7 19.57 6.21 2.3

Difference = mu (MAT0000A) - mu (MAT0000B) Estimate for difference: -2.42857 95% CI for difference: (-8.11987, 3.26273) T-Test of difference = 0 (vs not =): T-Value = -0.93 P-Value = 0.371 DF = 12 Both use Pooled StDev = 4.8868


16 3.3. A Comparison of MAT0000 Classroom Test Aver (Pre) Vs Final Exam (Post) Performance A Paired Samples T-Test: To identify if there is a significant gain in the MAT0000 posttest (state exit exam) compared to the pretest (classroom test Average) performance of the students, a paired samples T-test was conducted using the Minitab and Statdisk software. For this, first to check whether normality assumption for a paired samples t-test is met, the hypothesis tests for the gain scores were conducted using Minitab, which suggested that the normality tests were met, the distributions being close to normal. The results are provided in the Tables 8 -10 and Figure 5 below below. It is evident that the normality tests are easily met. Moreover, at the significance level of   0.05 , the paired samples T-test conducted fails to reject the claim that  A   B , that is, the sample does not provide enough evidence to reject the claim. STATDISK OUTPUT: MAT0000 Paired T-Test and CI: MAT0000-Post, MAT0000-Pre, (Gain Score = Post – Pre)

Figure 5 (Paired Samples T-Test: MAT0000 Pre Vs Post (Exams)

Table 8


17 MINITAB OUTPUT Data Display: MAT0000 MAT0000-Post, MAT0000-Pre (Gain Score = Post – Pre) Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14

20071-Pre 20071-Post Gain 69.4 56.7 -12.7 63.2 50.0 -13.2 54.8 60.0 5.2 78.0 83.3 5.3 75.6 76.7 1.1 66.8 63.3 -3.5 51.8 46.7 -5.1 44.6 40.0 -4.6 72.6 56.7 -15.9 68.4 60.0 -8.4 67.2 50.0 -17.2 76.6 96.7 20.1 82.6 73.3 -9.3 49.0 43.3 -5.7

Table 9 MAT0000 Descriptive Statistics: MAT0000-Post, MAT0000-Pre (Gain Score = Post – Pre) Total Variable Count N Mean SE Mean StDev Variance Minimum Q1 Median 20071-Post 14 14 61.19 4.33 16.21 262.62 40.00 49.18 58.35 20071-Pre 14 14 65.76 3.12 11.66 136.01 44.60 54.05 67.80 Gain 14 14 -4.56 2.67 10.01 100.14 -17.20 -12.83 -5.40

Variable Q3 Maximum Range IQR Skewness Kurtosis 20071-Post 74.15 96.70 56.70 24.98 0.84 0.22 20071-Pre 75.85 82.60 38.00 21.80 -0.51 -0.80 Gain 2.13 20.10 37.30 14.95 1.10 1.56

Table 10 MAT0000 Paired T-Test and CI: MAT0000-Post, MAT0000-Pre (Gain Score = Post – Pre) Paired T for 20071-Post - 20071-Pre N Mean StDev SE Mean 20071-Post 14 61.1929 16.2056 4.3311 20071-Pre 14 65.7571 11.6622 3.1169 Difference 14 -4.56429 10.00704 2.67450

95% CI for mean difference: (-10.34218, 1.21361) T-Test of mean difference = 0 (vs not = 0): T-Value = -1.71 P-Value = 0.112


18 4. Concluding Remarks and Recommendation for Future Research This paper discusses some item analysis statistics which are available in the ParSCORETM analysis report. The uses of item analysis statistics to some multiple-choice math examinations have been investigated. It is hoped that the present study would be helpful in recognizing the most critical pieces of the state exit test items data, and evaluating whether or not that test item needs revision. The methods discussed in this project can be used to describe the relevance of test item analysis to classroom tests. These procedures can also be used or modified to measure, describe and improve tests or surveys such as college mathematics placement exams (that is, CPT), mathematics study skills, attitude survey, test anxiety, information literacy, other general education learning outcomes, etc. Further, research based on Bloom’s cognitive taxonomy of test items, the applicability of Beta-Binomial models and Bayesian analysis of test items and item response theory (IRT) using the 1-parameter logistic model (also known as Rasch model), 2- & 3- parameter logistic models, plots of the item characteristic curves (ICCs) of different test items, and other characteristics of measurement instruments of IRT may also be investigated.

Acknowledgments

The author would like to thank the Editorial Committee of Polygon for accepting this paper for publication in Polygon. I would also like to acknowledge my sincere indebtedness to the works of various authors and resources on the subject which I have consulted during the preparation of this research project. The author is thankful to his wife for her patience and perseverance for the period during which this paper was prepared. The author would like to dedicate this paper to his late parents, brothers and sisters. Last but not least, the author is thankful to the Miami Dade College for giving an opportunity to serve this college, without which it was impossible to conduct his research. References

Brown, J. D. (1996). Testing in language programs. Prentice Hall, Upper Saddle River, NJ. Chase, C. I. (1999). Contemporary assessment for educators. Longman, New York. Crocker, L. and Algina, J. (1986). Introduction to classical and modern test theory. Holt, Rinehart and Winston, New York. DeVellis, R. F. (1991). Scale development: Theory and applications. Sage Publications, Newbury Park. Ebel, R.L. (1979). Essentials of educational measurement (3rd ed). Prentice Hall, Englewood Cliffs, NJ. Ebel, R. L. and Frisbie, D. A. (1986). Essentials of educational measurement. PrenticeHall, Inc, Englewood Cliffs, NJ. Glass, G. V. and Hopkins, K. D. (1995). Statistical Methods in Education and


19 Psychology, 3rd edition, Allyn & Bacon, Boston. Haladyna. T. M. (1999). Developing and validating multiple-choice exam items, 2nd ed. Lawrence Erlbaum Associates, Mahwah, NJ. Haladyna, T. M., Downing, S.M. and Rodriguez, M.C. (2002). A review of multiplechoice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15(3), 309-334. Henrysson, S. (1971). Gathering, analyzing, and using data on test items. In R.L. Thorndike (Ed.), Educational Measurement (p. 141). American Council on Education, Washington DC. Kelley, T. L. (1939). The selection of upper and lower groups for the validation of test items. J. Ed. Psych., 30, 17-24. Lord, F. M. and Novick, M. R. (1968). Statistical Theories of Mental Test Scores. Addison-Wesley, Reading, MA. Mertler, C. A. (2003). Classroom Assessment – A Practical Guide for Educators. Pyrczak Publishing, Los Angeles, CA. Millman, J. and Greene, J. (1993). The specification and development of tests of achievement and ability. In R.L. Linn (Ed.), Educational measurement (pp. 335-366). Oryx Press, Phoenix, AZ. Nitko, A. J. (2001). Educational assessment of students (3rd edition). Prentice Hall, Upper Saddle River, NJ Nunnally, J. C. (1972). Educational measurement and evaluation (2nd ed). McGraw-Hill, New York. Nunnally, J. C. (1978). Psychometrics Theory, Second Edition. : McGraw Hill, New York. Oosterhof, A. (2001). Classroom applications for educational measurement. Merrill Prentice Hall, Upper Saddle River, NJ. Popham, W. J. (1981). Modern educational measurement. Prentice-Hall, Englewood Cliff, NJ. Suen, H. K. (1990). Principles of exam theories. Lawrence Erlbaum Associates, Hillsdale, NJ. Tanner, D. E. (2001). Assessing academic achievement. Allyn & Bacon, Boston. Thompson, B. and Levitov, J. E. (1985). Using microcomputers to score and evaluate test items. Collegiate Microcomputer, 3, 163-168. Thorndike, R. M., Cunningham, G. K., Thorndike, R. L. and Hagen, E.P. (1991). Measurement and evaluation in psychology and education (5th ed). MacMillan, New York.


20 Triola, M. F. (2006). Elementary Statistics. Pearson Addison-Wesley, New York. Wiersma, W. and Jurs, S. G. (1990). Educational measurement and testing (2nd ed). Allyn and Bacon, Boston, MA. Wood, D. A. (1960). Test construction: Development and interpretation of achievement tests. Charles E. Merrill Books, Inc, Columbus, OH.


1

Testing the Goodness of Fit of Continuous Probability Distributions to Some Flood Data M. Shakil, Ph.D. Professor of Mathematics Department of Liberal Arts and Sciences Miami Dade College, Hialeah Campus FL 33012, USA E-mail: mshakil@mdc.edu

Abstract In this paper, we have tested the goodness of fit for Cauchy, generalized extreme value, Laplace, log-Pearson 3, logistic, and normal probability distributions to ordered differences in flood heights for two stations on the Fox River in Wisconsin for 33 years, as reported in in Best et al. (2008). It was found that the generalized extreme value distribution was the best fit amongst the six continuous probability distributions for the ordered differences in flood heights data based on both Kolmogorov-Smirnov and Anderson-Darling goodness of fit tests. On the other hand, it was found that log-Pearson 3 distribution fitted reasonably well to the ordered differences in flood heights data based on Chi-Squared tests goodness of fit test. Since fitting of a probability distribution to flood data may be helpful in predicting the probability or forecasting the frequency of occurrence of the flood during monsoon and hurricanes, and planning beforehand, it is hoped that this study will be quite useful in many problems of business and economic planning, hydrological processes and designs, and other applied research.

2010 Mathematics Subject Classifications: 62C12, 62F03, 62N02, 62N03, 62-07. Keywords: Flood data, Goodness of fit test, Hurricane, Monsoon, Probability distribution. 1. Introduction According to the Wikipedia, “a flood is an overflow of water that submerges land which is usually dry. The European Union (EU) Floods Directive defines a flood as a covering by water of land not normally covered by water. Flooding may occur as an overflow of water from water bodies, such as a river, lake, or ocean, in which the water overtops or breaks levees, resulting in some of that water escaping its usual boundaries, or it may occur due to an accumulation of rainwater on saturated ground in an areal flood. Floods can also occur in rivers when the flow rate exceeds the capacity of the river channel, particularly at bends or meanders in the waterway. Floods often cause damage to homes and businesses if they are in the natural flood plains of rivers. Some floods develop slowly, while others such as flash floods, can develop in just a few minutes and without visible signs of rain,� (https://en.wikipedia.org/wiki/Flood). The rainfall or


2 other types of precipitation produced by hurricanes cause also widespread flooding in the affected areas due to which people have to face a lot of damage and destruction of their property, including loss of life, resulting into great socio-economic problems. The statistical analysis of flood data is therefore very crucial, and plays an important role in many studies of hydrological processes and designs. Many researchers have investigated the statistical analysis of flood data, see, for example, Pericchi and RodrĂ­guez-Iturbe (1985), Opere et al. (2006), Yiou et al. (2006), Van Bladeren et al. (2007), Ghorbani et al. (2011), Win and Win (2014), and Ahn et al. (2014), and references therein. Since fitting of a probability distribution to flood data may be helpful in predicting the probability or forecasting the frequency of occurrence of the flood during monsoon and hurricanes, and planning beforehand, it is hoped that this study will be useful in many problems of business and economic planning, hydrological processes and designs, and other applied research. Therefore, the statistical analysis of flood data is very necessary and important. Motivated by the importance of the study of flood data in many problems of hydrological processes and designs, we have investigated in this paper the goodness of fit test (GOF) for Cauchy, generalized extreme value, Laplace, log-Pearson 3, logistic, and normal probability distributions to ordered differences in flood heights for two stations on the Fox River in Wisconsin for 33 years, as reported in in Best et al. (2008) to determine their applicability and best fit to these data based on the goodness of fit (GOF) tests, namely, Kolmogorov-Smirnov, Anderson-Darling, and Chi-Squared tests for the goodness-of-fit. The other researchers have also investigated their statistical analysis of these data for the ordered differences in flood heights for two stations on the Fox River in Wisconsin for 33 years, see, for example, Bain and Engelhardt (1973), Puig and Stephens (2000), Meintanis (2004), Krishnamoorthy (2006), and Gulati, S. (2011). For the applications of the log Pearson type-3 distribution in hydrology, see, for example, Phien and Ajirajah (1984). Also, for a discussion of the GOF tests, the interested readers are referred to Massey (1951), Stephens (1974), Conover (1999), Blischke and Murthy (2000), Hogg and Tanis (2006), and Ahsanullah et al. (2014), among others. The organization of this paper is as follows, Section 2 contains the Methodology, along with the description of the ordered differences in flood heights for two stations on the Fox River in Wisconsin for 33 years, as reported in in Best et al. (2008). Also, in Section 2, we have provided the continuous probability distributions considered in this paper, namely, Cauchy, generalized extreme value, Laplace, log-Pearson 3, logistic, and normal probability distributions. In Section 3, we have presented the results and discussions of our findings respectively. Some concluding remarks are given in Section 4.

2. Methodology In this section, we will test the goodness of fit test (GOF) for Cauchy, generalized extreme value, Laplace, log-Pearson 3, logistic, and normal probability distributions to ordered differences in flood heights for two stations on the Fox River in Wisconsin for 33 years, as reported in in Best et al. (2008) to determine their applicability and best fit to these data based on the goodness of fit (GOF) tests, namely, Kolmogorov-Smirnov, Anderson-Darling, and Chi-Squared tests for goodness-of-fit. For the sake of completeness, the ordered differences in flood heights for two stations on the Fox River in Wisconsin for 33 years are provided in Table 1 below. In Table 2, we have provided the probability density functions and parameters of Cauchy,


3 generalized extreme value, Laplace, log-Pearson 3, logistic, and normal probability distributions considered in this paper. Table 1 (Source: Best et al., 2008) Ordered differences in flood heights for two stations on the Fox River in Wisconsin for 33 years 1.96, 1.96, 3.60, 3.80, 4.79, 5.66, 5.76, 5.78, 6.27, 6.30, 6.76, 7.65, 7.84, 7.99, 8.51, 9.18, 10.13, 10.24, 10.25, 10.43, 11.45, 11.48, 11.75, 11.81, 12.34, 12.78, 13.06, 13.29, 13.98, 14.18, 14.40, 16.22 and 17.06.

Table 2 (Continuous Probability Distributions Used in Flood Data Analysis)

f x 

Sl. Name of the No. Distributions Cauchy

f x  

1

2

Generalized Extreme Value

Laplace

f x  

4

 

1   x        

f x  

 2

  0 : scale parameter  (real): location parameter,

1 2

   

1 1    1   1 exp   1  k x     k  1  k x     k               f x      1 exp   x     exp   x               

3

Log-Pearson III (LP3)

1

Parameters

and    x    k0

k0

k x     0 1        x     

for k  0 for k  0

  0 : inverse scale

exp   x   

parameter  (real): location parameter, and    x   

 ln x    1   x       ln x     exp    

k  0 : shape parameter   0 : scale parameter  (real): location parameter, where

  

  

 1

  0 ,   0 ,  , where 0  x  e  when   0 and e   x    when   0


4

5

f x  

Logistic

  0 : scale parameter  (real): location parameter,

exp  x   

 1  exp  x   

2

and    x     1  x   2  f x   exp       2   2     1

6

Normal

  0 : scale parameter  (real): location parameter, and    x   

Fitting of the above-said distributions to flood data are carried as follows. As a first step, using Easyfit software, we have computed the descriptive statistics of the flood data as given in Table 3. Also, using the statdisk software, we have tested the normality of the flood data by RyanJoiner Test (Similar to Shapiro-Wilk Test), along with drawing a histogram of the data, which are given in Figure 1 and Table 4. We have tested the fitting of Cauchy, generalized extreme value, Laplace, log-Pearson 3, logistic, and normal probability distributions to ordered differences in flood heights for two stations on the Fox River in Wisconsin for 33 years (Table 1). For this, we have used the Easyfit software for estimating the parameters of these distributions, and the goodness of fit (GOF) tests, namely, Kolmogorov-Smirnov, Anderson-Darling, and Chi-Squared tests for goodness-of-fit, which are provided in the Tables 5 and 6 below. For the parameters estimated in Table 5, Cauchy, generalized extreme value, Laplace, log-Pearson 3, logistic, and normal probability distributions respectively have been superimposed on the histogram of the ordered differences in flood heights, which is provided in Figure 2 below. For these distributions, we have provided the cumulative distribution function, survival function, hazard function, cumulative hazard function, P-P plot, Q-Q plot and probability difference in Figures 3 – 9 respectively as given below. Table 3 (Descriptive Statistics) Statistic

Value

Percentile

Value

Sample Size

33

Min

1.96

Range

15.1

5%

1.96

Mean

9.3533

10%

3.68

Variance

16.169

25% (Q1)

6.025

Std. Deviation

4.0211

Coef. of Variation

0.42991

75% (Q3)

12.56

Std. Error

0.69999

90%

14.312

Skewness

-0.07331

95%

16.472

Excess Kurtosis

-0.79828

Max

17.06

50% (Median) 10.13


5

Figure 1: Normality Assessment of Flood Data Table 4 (Ryan-Joiner Test of Normality Assessment) Ryan-Joiner Test Test statistic, Rp: 0.9925 Critical value for 0.05 significance level: 0.9666 Critical value for 0.01 significance level: 0.9528 Fail to reject normality with a 0.05 significance level. Fail to reject normality with a 0.01 significance level. Possible Outliers Number of data values below Q1 by more than 1.5 IQR: 0 Number of data values above Q3 by more than 1.5 IQR: 0 Table 5 Fitting Results #

Distribution

Parameters

1

Cauchy

=2.8118 =9.6936

2

Gen. Extreme Value

k=-0.32444 =4.2124 =7.9779

3

Laplace

=0.3517 =9.3533

4

Log-Pearson 3

=2.9931 =-0.31728 =3.065

5

Logistic

=2.217 =9.3533

6

Normal

=4.0211 =9.3533


6 Table 6 Goodness of Fit – Summary

#

Kolmogorov Smirnov

Distribution

Anderson Darling

Chi-Squared

Statistic

Rank

Statistic

Rank

Statistic

Rank

1

Cauchy

0.11607

5

0.85112

5

1.3161

3

2

Gen. Extreme Value

0.07953

1

0.18391

1

1.3503

4

3

Laplace

0.15476

6

1.0484

6

5.5995

6

4

Log-Pearson 3

0.09304

3

0.22833

2

1.0865

1

5

Logistic

0.1142

4

0.4503

4

2.0981

5

6

Normal

0.0929

2

0.2467

3

1.1639

2

Probability Density Function 0.26

0.24

0.22

0.2

0.18

f(x)

0.16

0.14

0.12

0.1

0.08

0.06

0.04

0.02

0 2

3

4

5

6

7

8

9

10

11

12

13

14

15

x Histogram

Cauchy

Laplace

Logistic

Normal

Gen. Extreme Value

Log-Pearson 3

Figure 2: Fitting of Probability Density Functions to the Flood Data

16

17


7

Cumulative Distribution Function 1

0.9

0.8

0.7

F(x)

0.6

0.5

0.4

0.3

0.2

0.1

0 2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

x Sample

Cauchy

Laplace

Logistic

Normal

Gen. Extreme Value

Log-Pearson 3

Figure 3: Fitting of Cumulative Distribution Functions to the Flood Data Survival Function 1

0.9

0.8

0.7

S(x)

0.6

0.5

0.4

0.3

0.2

0.1

0 2

3

4

5

6

7

8

9

10

11

12

13

14

15

x Sample

Cauchy

Laplace

Logistic

Normal

Gen. Extreme Value

Log-Pearson 3

Figure 4: Survival Functions of Distributions for the Flood Data

16

17


8

Hazard Function 0.8

0.72

0.64

0.56

h(x)

0.48

0.4

0.32

0.24

0.16

0.08

0 2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

16

17

x Cauchy

Laplace

Logistic

Normal

Gen. Extreme Value

Log-Pearson 3

Figure 5: Hazard Functions of Distributions for the Flood Data

Cumulative Hazard Function

3.6

3.2

2.8

H(x)

2.4

2

1.6

1.2

0.8

0.4

0 2

3

4

5

6

7

8

9

10

11

12

13

x Cauchy

Laplace

Logistic

Normal

Gen. Extreme Value

Log-Pearson 3

14

15


9 Figure 6: Cumulative Hazard Functions of Distributions for the Flood Data

P-P Plot 1

0.9

0.8

0.7

P (Model)

0.6

0.5

0.4

0.3

0.2

0.1

0 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

P (Empirical) Cauchy

Laplace

Logistic

Normal

Gen. Extreme Value

Log-Pearson 3

Figure 7: P-P Plot of Distributions for the Flood Data Q-Q Plot 17 16 15 14 13 12

Quantile (Model)

11 10 9 8 7 6 5 4 3 2 2

3

4

5

6

7

8

9

10

11

12

13

14

x Cauchy

Laplace

Logistic

Normal

Gen. Extreme Value

Log-Pearson 3

Figure 8: Q-Q Plot of Distributions for the Flood Data

15

16

17


10

Probability Difference 0.32

0.24

0.16

Probability Difference

0.08

0

-0.08

-0.16

-0.24

-0.32 2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

x Cauchy

Laplace

Logistic

Normal

Gen. Extreme Value

Log-Pearson 3

Figure 9: Probability Differences of Distributions for the Flood Data

3. Results and Discussions The descriptive statistics of the ordered differences in flood heights for two stations on the Fox River in Wisconsin for 33 years, as reported in in Best et al. (2008), (please see Table 1), are provided in Table 3 above. Also, we have tested the normality of the flood data by Ryan-Joiner Test (Similar to Shapiro-Wilk Test), along with drawing a histogram of the data, which are given in Figure 1 and Table 4. The following are the observations based on Ryan-Joiner Test of Normality Assessment of the flood data, which is also confirmed from the skewness of the flood data as computed in Table 3: (a) Fail to reject normality with a 0.05 significance level, (b) Fail to reject normality with a 0.01 significance level. Further, we have tested the fitting of Cauchy, generalized extreme value, Laplace, log-Pearson 3, logistic, and normal probability distributions to ordered differences in flood heights for two stations on the Fox River in Wisconsin for 33 years. The estimates of parameters of Cauchy, generalized extreme value, Laplace, log-Pearson 3, logistic, and normal probability distributions for the flood data are given in Table 5. For the parameters estimated in Table 5, the probability density functions of Cauchy, generalized extreme value, Laplace, log-Pearson 3, logistic, and normal probability distributions respectively have been superimposed on the histogram of the precipitation data, which is provided in Figure 2. The goodness of fit (GOF) of Cauchy, generalized extreme value, Laplace, log-Pearson 3, logistic, and normal probability distributions


11 to the flood data by Kolmogorov-Smirnov, Anderson-Darling, and the Chi-Squared GOF tests is summarized in the Table 6 above. Further, for these distributions, we have provided cumulative distribution function, survival function, hazard function, cumulative hazard function, P-P plot, QQ plot and probability difference in Figures 3 – 9 respectively as given above. From the Kolmogorov-Smirnov and Anderson-Darling GOF tests as provided in Table 6 and Figure 2 above, we observed that the generalized extreme value distribution is the best fit amongst the six continuous probability distributions to the ordered differences in flood heights for two stations on the Fox River in Wisconsin for 33 years. On the other hand, Log-Pearson 3 distribution was found to be the best fit for these data by Chi-Squared tests goodness of fit test (Table 6). The graphs of cumulative distribution function, survival function, hazard function, cumulative hazard function, P-P plot, Q-Q plot and probability difference as provided in Figures 3 – 9 respectively also confirm these results.

4. Concluding Remarks In many problems of hydrological processes and designs, fitting of a probability distribution to the flood data may be helpful in predicting the probability or forecasting the frequency of occurrence of the flood, and planning beforehand. Motivated by the importance of the study of flood data in many problems of hydrological processes and designs, and planning beforehand, in this paper, we have tested the goodness of fit of Cauchy, generalized extreme value, Laplace, log-Pearson 3, logistic, and normal probability distributions to the ordered differences in flood heights for two stations on the Fox River in Wisconsin for 33 years, as reported in in Best et al. (2008). It was found that the generalized extreme value distribution was the best fit for these flood data by both Kolmogorov-Smirnov and Anderson-Darling goodness of fit tests, whereas Log-Pearson 3 distribution was found to be the best fit for these flood data by Chi-Squared tests goodness of fit test. It is hoped that this study will be quite helpful in many problems of hydrological research.

Acknowledgment The author would like to thank the Editorial Committee of Polygon for accepting this paper for publication in Polygon. Also, the author would like to thank Professor M. Ahsanullah, Rider University, New Jersey, USA, and Professor B. M. Golam Kibria, FIU, Miami, USA, for their valuable and helpful suggestions, which improved the quality and presentation of the paper. Also, the author is thankful to his wife for her patience and perseverance for the period during which this paper was prepared. The author would like to dedicate this paper to his late parents, brothers and sisters. Last but not least, the author is thankful to the Miami Dade College for giving an opportunity to serve this college, without which it was impossible to conduct his research.


12

References Ahsanullah, M., Kibria, B. M. G., and Shakil, M. (2014). Normal and Student´s t Distributions and Their Applications. Atlantis Press, Paris, France. Ahn, J., Cho, W., Kim, T., Shin, H., & Heo, J. H. (2014). Flood frequency analysis for the annual peak flows simulated by an event-based rainfall-runoff model in an urban drainage basin. Water, 6(12), 3841 - 3863. Bain, L., and Engelhardt, M. (1973). Interval estimation for the two parameter double exponential distribution. Technometrics, 15, 875 – 887. Best, D., Rayner, J., and Thas, O. (2008). Comparison of some tests of fit for the Laplace distribution. Computational Statistics and Data Analysis, 52, 5338 – 5343. Blischke, W. R., and Murthy, D. N. P. (2000). Reliability, Modeling, Prediction, and Optimization. John Wiley & Sons, New York. Conover, W. J. (1999). Practical Nonparametric Statistics, John Wiley & Sons, New York. Ghorbani, M. A., Ruskeepaa, H., Singh, V. P., and Sivakumar, B. (2011). Flood frequency analysis using Mathematica. Turkish Journal of Engineering and Environmental Sciences, 34(3), 171 - 188. Gulati, S. (2011). Goodness of fit test for the Rayleigh and the Laplace distributions. International Journal of Applied Mathematics and Statistics™, 24(SI-11A), 74 - 85. Hogg, R. V., and Tanis, E. A. (2006). Probability and Statistical Inference. Pearson/Prentice Hall, NJ. Krishnamoorthy, K. (2006). Handbook of Statistical Distributions with Applications. Chapman and Hall, CRC, Boca Raton. Massey, F. J. (1951). The Kolmogorov-Smirnov test for goodness of fit. Journal of the American Statistical Association, 6, 68 - 78. Meintanis, S. G. (2004). A class of omnibus tests for the Laplace distribution based on the empirical characteristic function. Communications in Statistics, Theory and Methods, 33(4), 925 – 948. Opere, A. O., Mkhandi, S., and Willems, P. (2006). At site flood frequency analysis for the Nile Equatorial basins. Physics and Chemistry of the Earth, Parts A/B/C, 31(15), 919 - 927. Pericchi, L. R., and Rodríguez-Iturbe, I. (1985). On the statistical analysis of floods. In A celebration of statistics (pp. 511-541). Springer, New York.


13 Phien, H. N., and Ajirajah, T. J. (1984). Applications of the log Pearson type-3 distribution in hydrology. Journal of hydrology, 73, 3, 359 - 372. Puig, P., and Stephens, M. A. (2000). Tests of fit for the Laplace distribution, with applications. Technometrics, 42(4), 417 – 424. Stephens, M. A. (1974). EDF statistics for goodness-of-fit, and some comparisons. Journal of the American Statistical Association, 69, 730 – 737. Van Bladeren, D., Zawada, P. K., and MAHLANGu, D. (2007). Statistical Based Regional Flood Frequency Estimation Study for South Africa Using Systematic, Historical and Palaeoflood Data: Pilot Study, Catchment Management Area 15. Water Research Commission. Win, N. L., and Win, K. M. (2014). Comparative Study of Flood Frequency Analysis on Selected Rivers in Myanmar. In InCIEC 2013 (pp. 287-299). Springer Singapore. Wikipedia. https://en.wikipedia.org/wiki/Flood Yiou, P., Ribereau, P., Naveau, P., Nogaj, M., and Brázdil, R. (2006). Statistical analysis of floods in Bohemia (Czech Republic) since 1825. Hydrological Sciences Journal, 51(5), 930 945.


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.