Dimensionality: when is a test multidimensional? |
Top Up Down
A A |
For more discussion see dimensionality and contrasts.
Beware of:
1. | Accidents in the data generating spurious dimensions |
2. | Content strands within a bigger content area ("addition" and "subtraction" within "arithmetic") generating generally inconsequential dimensions. |
3. | Demographic groups within the person sample differential influencing item difficulty, so making a reference-group item dimension and a focal-group item dimension (native and second-language speakers on a language test). |
"Variance explained" depends on the spread of the item and person measures. Please see http://www.rasch.org/rmt/rmt221j.htm - For dimensionality analysis, we are concerned about the "Variance explained by the first contrast in the residuals". If this is big, then there is a second dimension at work. Infit and Outfit statistics are too local (one item or one person at a time) to detect multidimensionality productively. They are too much influenced by accidents in the data (e.g., guessing, response sets), and generally do not detect the more subtle, but pervasive, impact of a second dimension (unless it is huge).
Question: "I can not understand the residual contrast analysis you explained. For example, in Winsteps, it gave me the five contrasts' eigenvalues: 3.1, 2.4, 1.9, 1.6, 1.4. (I have 26 items in this data). The result is the same as when I put the residuals into SPSS."
Reply:
Unidimensionality is never perfect. It is always approximate. The Rasch model constructs from the data parameter estimates along the unidimensional latent variable that best concurs with the data. But, though the Rasch measures are always unidimensional and additive, their concurrence with the data is never perfect. Imperfection results from multi-dimensionality in the data and other causes of misfit.
Multidimensionality always exists to a lesser or greater extent. The vital question is: "Is the multi-dimensionality in the data big enough to merit dividing the items into separate tests, or constructing new tests, one for each dimension?"
The unexplained variance in a data set is the variance of the residuals. Each item is modeled to contribute 1 unit of information (= 1 eigenvalue) to the principal components decomposition of residuals. So the eigenvalue of the total unexplained variance is the number of items (less any items with extreme scores). So when a component (contrast) in the decomposition is of size 3.1, it has the information (residual variance) of about 3 items.
In your example, the first contrast has eigenvalue of 3.1. This means that the contrast between the strongly positively loading items and the strongly negatively loading items on the first contrast in the residuals has the strength of about 3 items. Since positive and negative loading is arbitrary, you must look at the items at the top and the bottom of the contrast plot. Are those items substantively different? Are they so different that they merit the construction of two separate tests?
It may be that two or three off-dimension items have been included in your 26 item instrument and should be dropped. But this is unusual for a carefully developed instrument. It is more likely that you have a "fuzzy" or "broad" dimension, like mathematics. Mathematics includes arithmetic, algebra, geometry and word problems. Sometimes we want a "geometry test". But, for most purposes, we want a "math test".
If in doubt, split your 26 items into two subtests, based on +ve and -ve loadings on the first residual contrast. Measure everyone on the two subtests. Cross-plot the measures. What is their correlation? Do you see two versions of the same story about the persons, or are they different stories? Which people are off-diagonal? Is that important? If only a few people are noticeably off-diagonal, or off-diagonal deviance would not lead to any action, then you have a substantively unidimensional test. A straightforward way to obtain the correlation is to write out a PFILE= output file for each subtest. Read the measures into EXCEL and have it produce their Pearson correlation. If R1 and R2 are the reliabilities of the two subtests, and C is the correlation of their ability estimates reported by Excel, then their latent (error-disattenuated) correlation approximates C / sqrt (R1*R2). If this approaches 1.0, then the two subtests are statistically telling the same story. But you may have a "Fahrenheit-Celsius" equating situation if the best fit line on the plot departs from a unit slope.
You can do a similar investigation for the second contrast of size 2.4, and third of size 1.9, but each time the motivation for doing more than dropping an off-dimension item or two becomes weaker. Since random data can have eigenvalues of size 1.4, there is little motivation to look at your 5th contrast.
"Variance explained" is a newly developing area in Rasch methodology. We learn something new about it every month or so. Perhaps you will contribute to this. So there are no rules, only tentative guidelines based on the current state of theory and empirical experience.
1. Originally Winsteps implemented 3 algorithms for computing variance-explained. Most people used the default algorithm (based on standardized residuals). User experience indicates that one of the other two algorithms was much more accurate in apportioning explained and unexplained variance. So, in the current version of Winsteps, this other algorithm (based on raw residuals) had become the algorithm for this part of the computation. The three algorithms are still implemented for the decomposition of the unexplained variance into contrasts (raw residuals, standardized residuals and logit residuals), and the default remains the standardized residuals for this part of the computation.
2. www.rasch.org/rmt/rmt221j.htm shows the expected decomposition of raw variance into explained variance and unexplained variance under different conditions.
Since the rules are only guidelines, please always verify their applicability in your particular situation. A meaningful way of doing this is to compute the person measures for each of what might be the biggest two dimensions in the data, and then to cross-plot those measures. Are the differences between the measures big enough, and pervasive enough, to be classified as "two dimensions" (and perhaps reported separately) or are they merely a minor perturbation in the data. For instance, in arithmetic, word-problems, abstract-problems and concrete-problems have different response profiles (and so noticeable contrasts), but they are rarely treated as different "dimensions".