Missing data

Top Up Down  A A

One of Ben Wright's requirements for valid measurement, derived from the work of L.L. Thurstone, is that "Missing data must not matter." Of course, missing data always matters in the sense that it lessens the amount of statistical information available for the construction and quality-control of measures. Further, if the missing data, intentionally or unintentionally, skew the measures (e.g., incorrect answers are coded as "missing responses"), then missing data definitely do matter. But generally, missing data are missing essentially at random (by design or accident) or in some way that will have minimal impact on the estimated measures (e.g., adaptive tests).

 

Winsteps does not require complete data in order to make estimates. One reason that Winsteps uses JMLE is that it is very flexible as regards estimable data structures. For each parameter (person, item or Rasch-Andrich threshold) there are sufficient statistics: the marginal raw scores and counts of the non-missing observations. During Winsteps estimation, the observed marginal counts and the observed and expected marginal scores are computed from the same set of non-missing observations. Missing data are skipped over in these additions. When required, Winsteps can compute an expected value for every observation (present or missing) for which the item and person estimates are known.

 

The basic estimation algorithm used by Winsteps is:

Improved parameter estimate = current parameter estimate

 + (observed marginal score - expected marginal score) / (modeled variance of the expected marginal score)

 

The observed and expected marginal scores are obtained by summing across the non-missing data. The expected score and its variance are obtained by Rasch estimation using the current set of parameter estimates, see RSA.

 

If data are missing, or observations are made, in such a way that measures cannot be constructed unambiguously in one frame of reference, then the message

WARNING: DATA MAY BE AMBIGUOUSLY CONNECTED INTO nnn SUBSETS

is displayed on the Iteration screen to warn of ambiguous connection.

 


 

Missing data in Tables 23, 24: Principal Components Analysis.

 

For raw observations, missing data are treated as missing. Pairwise deletion is used during the correlation computations.

For residuals, missing data are treated as 0, their expected values. This attenuates the contrasts, but makes them estimable.

You can try different methods for missing data by writing an IPMATRIX= of the raw data to a file, and then using your own statistical software to analyze.