Disordered rating categories

There is considerable debate in the Rasch community about the meaning of rating (or partial credit) scales and polytomies which exhibit "disorder". Look at Table 3.2, distractor/option analysis. Two types of disorder have been noticed:

(i) Disorder in the "average measures" of the categories can imply disorder in the category definitions.

In this example, from Linacre, J.M. (1999) Category Disordering vs. Step Disordering, Rasch Measurement Transactions 13:1 p. 675, "FIMÔ Level" categories have been deliberately disordered in the data. It is seen that this results in disordering of the "average measures" or "observed averages", the average abilities of the people observed in each category, and also large mean-square fit statistics. The "scale structure measures", also called "step calibrations", "step difficulties", "step measures", "Rasch-Andrich thresholds", "deltas", "taus", etc., remain ordered.

(ii) Disorder in the "step calibrations" or "disordered Rasch-Andrich thresholds" implies less frequently observed intermediate categories, i.e., that they correspond to narrow intervals on the latent variable.

In this example, the FIM categories are correctly ordered, but the frequency of level 2 has been reduced by removing some observations from the data. Average measures and fit statistics remain well behaved. The disordering in the "step calibrations" now reflects the relative infrequency of category 2. This infrequency is pictured in plot of probability curves which shows that category 2 is never a modal category in these data. The step calibration values do not indicate whether measurement would be improved by collapsing levels 1 and 2, or collapsing levels 2 and 3, relative to leaving the categories as they stand.