Continuous, decimal and percentage data

From a Rasch perspective, the relationship between a continuous variable (such as time to run 100 meters) and a Rasch latent variable (such as physical fitness) is always non-linear. Since we do not know the form of the non-linear transformation, we chunk the continuous variable into meaningful intervals, so that the difference between the means of the intervals is greater than the background noise. With percents, the intervals are rarely smaller than 10% wide, with special intervals for 0% and 100%. These chunked data can than be analyzed with a rating-scale or partial-credit model. We can then transform back to continuous-looking output using the item characteristic curve or the test characteristic curve.

Winsteps analyzes ordinal data expressed as integers, cardinal numbers, in the range 0-254, i.e., 255 ordered categories.

Example: The data are reported as 1.0, 1.25, 1.5, 1.75, 2.0, 2.5, ......

Winsteps only accepts integer data, so multiply all the ratings by 4.

If you want the score reports to look correct, then please use IWEIGHT=

IWEIGHT=*

1-100 of items you have 0.25 ; 100 is the number of items

Percentage and 0-100 observations:

Observations may be presented for Rasch analysis in the form of percentages in the range 0-100. These are straightforward computationally but are often awkward in other respects.

A typical specification is:

XWIDE = 3

CODES = " 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19+

+ 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39+

+ 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59+

+ 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79+

+ 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99100"

STKEEP = Yes ; to keep intermediate unobserved categories

Since it is unlikely that all percentages will be observed, the rating (or partial credit) scale structure will be difficult to estimate. Since it is even more unlikely that there will be at least 10 observations of each percentage value, the structure will be unstable across similar datasets.

It is usually better from a measurement perspective (increased person "test" reliability, increased stability) to collapse percentages into shorter rating (or partial credit) scales, e.g., 0-10, using IREFER= and IVALUE= or NEWSCORE=.

Alternatively, model the 0-100 observations as 100 binomial trials. This imposes a structure on the rating scale so that unobserved categories are of no concern. This can be done by anchoring the Rasch-Andrich thresholds at the values: Fj = C * ln(j/(101-j)), or more generally, Fj = C * ln(j / (m-j+1)) where the range of observations is 0-m. Adjust the value of the constant C so that the average mean-square is 1.0.

Decimal observations:

When observations are reported in fractional or decimal form, e.g., 2.5 or 3.9, multiply them by suitable multipliers, e.g., 2 or 10, to bring them into exact integer form.

Specify STKEEP=NO, if the range of observed integer categories includes integers that cannot be observed.

Continuous and percentage observations:

These are of two forms:

(a) Very rarely, observations are already in the additive, continuous form of a Rasch variable. Since these are in the form of the measures produced by Winsteps, they can be compared and combined with Rasch measures using standard statistical techniques, in the same way that weight and height are analyzed.

(b) Observations are continuous or percentages, but they are not (or may not be) additive in the local Rasch context. Examples are "time to perform a task", "weight lifted with the left hand". Though time and weight are reported in additive units, e.g., seconds and grams, their implications in the specific context is unlikely to be additive. "Continuous" data are an illusion. All data are discrete at some level. A major difficulty with continuous data is determining the precision of the data for this application. This indicates how big a change in the observed data constitutes a meaningful difference. For instance, time measured to .001 seconds is statistically meaningless in the Le Mans 24-hour car race - even though it may decide the winner!

To analyze these forms of data, segment them into ranges of observably different values. Identify each segment with a category number, and analyze these categories as rating scales. It is best to start with a few, very wide segments. If these produce good fit, then narrow the segments until no more statistical improvement is evident. The general principle is: if the data analysis is successful when the data are stratified into a few levels, then it may be successful if the data are stratified into more levels. If the analysis is not successful at a few levels, then more levels will merely be more chaotic. Signs of increasing chaos are increasing misfit, categories "average measures" no longer advancing, and a reduction in the sample "test" reliability.

May I suggest that you start by stratifying your data into 2 levels? (You can use Excel to do this.) Then analyze the resulting the 2 category data. Is a meaningful variable constructed? If the analysis is successful (e.g., average measures per category advance with reasonable fit and sample reliability), you could try stratifying into more levels.

Example: My dataset contains negative numbers such as "-1.60", as well as positive numbers such as "2.43". The range of potential responses is -100.00 to +100.00.

Winsteps expects integer data, where each advancing integer indicates one qualitatively higher level of performance (or whatever) on the latent variable. The maximum number of levels is 0-254. There are numerous ways in which data can be recoded. On is to use Excel. Read your data file into Excel. Its "Text to columns" feature in the "Data" menu may be useful. Then apply a transformation to the responses, for instance,

recoded response = integer ( (observed response - minimum response)*100 / (maximum response - minimum response) )

This yields integer data in the range 0-100, i.e., 101 levels. Set the Excel column width, and "Save As" the Excel file in ".prn" (formatted text) format. Or you can do the same thing in SAS or SPSS and then use the Winsteps SAS/SPSS menu.

Example: Question: We want to construct from the index values of the indicators to produce a 'ruler'.

Answer: There are two approaches to this problem, depending on the meaning of the values:

1. If you consider that the values of the indicators are equivalent to "item difficulties", in the Rasch sense, then it is a matter of finding out their relationship to logits. For this, one needs some ordinal observational data of the data. Calibrate the observational data, then cross plot the resulting indicator measures against their reference values. The best-fit line or simple curve gives the reference value to logit conversion.

or 2. If the values are the observations (like weights and heights), then it is a matter of transforming them into ordinal values, and then performing a Rasch analysis on them. The approach is to initially score the values dichotomously high-low (1-0) and see if the analysis makes sense. If so, stratify the values into 4: 3-2-1-0. If the results still make good sense, then into 6, then into 8, then into 10. At about this point, the random noise in the data will start to overwhelm the categorization so that their will be empty categories and many "category average measures" out of sequence. So go back to the last good analysis. The model ICC will give the relationship between values and logits.