Validity investigation

Question: I really want to you to help me in a simple explanation understand how in practice can I go about collecting validity evidences via Rasch Analysis with Winsteps to support use and inference of my test?

Answer: There are many types of validity described in the literature, Nz, but they summarize to two main topics:

1. Construct validity: does the test measure what it is intended to measure?

2. Predictive validity: does the test produce measures which correspond to what we know about the persons?

Investigation of these validities is performed directly by inspection of the results of the analysis (Rasch or Classical or ...), or indirectly through correlations of the Rasch measures (or raw scores, etc.) with other numbers which are thought to be good indicators of what we want.

Question: That is what exactly type of validity questions should I ask and how can I answer them using Rasch analysis?

Answer: 1. Construct validity: we need a "construct theory" (i.e., some idea about our latent variable) - we need to state explicitly, before we do our analysis, what will be a more-difficult item, and what will be a less-difficult item.

Certainly we can all do that with arithmetic items: 2+2=? is easy. 567856+97765=? is hard.

If the Table 1 item map agrees with your statement. Then the test has "construct validity". It is measuring what you intended to measure.

2. Predictive validity: we need to we need to state explicitly, before we do our analysis, what will be a the characteristics of a person with a higher measure, and what will be the characteristics of a person with a lower measure. And preferably code these into the person labels in our Winsteps control file.

For arithmetic, we expect older children, children in higher grades, children with better nutrition, children with fewer developmental or discipline problems, etc. to have higher measures. And the reverse for lower measures.

If the Table 1 person map agrees with your statement. Then the test has "predictive validity". It is "predicting" what we expected it to predict. (In statistics, "predict" doesn't mean "predict the future", "predict" means predict some numbers obtained by other means.

Question: More specifically, in using, for example, DIF analysis via Winsteps what type of validity question I am trying to answer?

Answer: 1. Construct validity: DIF implies that the item difficulty is different for different groups. The meaning of the construct has changed! Perhaps the differences are too small to matter. Perhaps omitting the DIF item will solve the problem. Perhaps making the DIF item into two items will solve the problem.

For instance, questions about "snow" change their difficulty. In polar countries they are easy. In tropical countries they are difficult. When we discover this DIF, we would define this as two different items, and so maintain the integrity of the "weather-knowledge" construct.

2. Predictive validity: DIF implies that the predictions made for one group of persons, based on their measures, differs from the predictions made for another group. Do the differences matter? Do we need separate measurement systems? ...

Question: Similarly, in using Fit statistics, dimensionality, and order of item difficulty what type of validity questions I am attempting to answer via Winsteps?

Answer: They are the same questions every time, Nz. Construct Validity and Predictive Validity. Is there a threat to validity? Is it big enough to matter in a practical way? What is the most effective way of lessening or eliminating the threat?