Subsets and connection ambiguities

WARNING: DATA ARE AMBIGUOUSLY CONNECTED INTO 5 SUBSETS. MEASURES MAY NOT BE COMPARABLE ACROSS SUBSETS

Quick solution: add to the data file two dummy person records.

Dichotomous data:

Dummy person 1: responses: 010101010...

Dummy person 2: responses: 101010101...

Rating scale data, where "1" is the lowest category, and "9" is the highest category:

Dummy person 1: responses: 191919191...

Dummy person 2: responses: 919191919...

Explanation: Connectivity (or subsetting) is a concern in any data analysis involving missing data. In general,

nested data are not connected.

fully-crossed data (also called "complete data") are connected.

partially-crossed data may or may not be connected.

Winsteps examines the responses strings for all the persons. It verifies that every non-extreme response string is linked into one network of success and failure on the items. Similarly, the strings of responses to the items are linked into one network of success and failure by the persons.

If person response string A has a success on item 1 and a failure on item 2, and response string B has a failure on item 1 and a success on item 2, then A and B are connected. This examination is repeated for all pairs of response strings and all pairs of items. Gradually all the persons are connected with all the other persons, and all the items are connected with all the other items. But it some persons or some items cannot be connected in this way, then Winsteps reports a "connectivity" problem, and reports which subsets of items and persons are connected.

Example:

Dataset 1. The Russian students take the Russian items. This is connected. All the data are in one subset.

Dataset 2. The American students take the American items. This is connected. All the data are in one subset.

Dataset 3. Datasets 1 and 2 are put into one analysis. This is not connected. The data form two subsets: the Russian one and the American one. The raw scores or Rasch measures of the Russian students cannot be compared to those of the American students. For instance, if the Russian students score higher than the American students, are the Russian students more able or are the Russian items easier? The data cannot tell us which is true.

Winsteps attempts to estimate an individual measure for each person and item within one frame of reference. Usually this happens. But there are exceptions. The data may not be "well-conditioned" (Fischer G.H., Molenaar, I.W. (eds.) (1995) Rasch models: foundations, recent developments, and applications. New York: Springer-Verlag. p. 41-43).

See also: G.H. Fischer , On the existence and uniqueness of maximum-likelihood estimates in the Rasch model. Psychometrika 46 (1981), pp. 59–77

Extreme scores (zero, minimum possible and perfect, maximum possible scores) imply measures that our beyond the current frame of reference. Winsteps uses Bayesian logic to provide measures corresponding to those scores.

More awkward situations are shown in this dataset. It is Examsubs.txt.

Title = "Example of subset reporting"

Name1 = 1

Item1 = 10

NI = 10

CODES = 0123

GROUPS = 0

MUCON = 3 ; Subsetting causes very slow convergence

&End

Extreme

Subset 1

Subset 6

Subset 3

Subset 4

END LABELS

Extreme 100000

Subset 1 101001

Subset 1 110001

Subset 1 111011

Subset 1 111101

Subset 2 011

Subset 3 001

Subset 3 010

Subset 4 01

Subset 4 10

Subset 5 12322

Subset 5 13233

The Iteration Screen (Table 0) reports:

PROBING DATA CONNECTION: to skip out: Ctrl+F - to bypass: subset=no

>=====================================<

Consolidating 6 potential subsets ...

WARNING: DATA ARE AMBIGUOUSLY CONNECTED INTO 6 SUBSETS. MEASURES MAY NOT BE COMPARABLE ACROSS SUBSETS

SUBSET 1 OF 4 ITEM AND 4 PERSON

ITEM: 2-5

PERSON: 2-5

SUBSET 2 OF 1 PERSON

PERSON: 6

SUBSET 3 OF 2 ITEM AND 2 PERSON

ITEM: 7-8

PERSON: 7-8

SUBSET 4 OF 2 ITEM AND 2 PERSON

ITEM: 9-10

PERSON: 9-10

SUBSET 5 OF 2 PERSON

PERSON: 11-12

SUBSET 6 OF 1 ITEM

ITEM: 6

There are 10 items. The first item "Dropped" is answered in category 1 by all who responded to it. These are partial-credit items (Groups=0), so we don't know whether "1" is a high or low category. The item is dropped from the analyes. Then Person 1 becomes an Extreme low person and is also excluded.

After eliminating Item 1 and Person 1,

Subset 6: Item 6 "Guttman" has a Guttman pattern. It distinguishes between those who succeeded on it from those who failed, with no contradiction to that distinction in the data. So there is an unknown logit distance between those who succeeded on Item 6 and those who failed on it. Consequently the difficulty of Item 6 is uncertain

The remaining subsets have measures that can be estimated within the subset, but have unknown distance from the persons and items in the other subsets.

Under these circumstance, Winsteps reports one of an infinite number of possible solutions. Measures cannot be compared across subsets. Fit statistics and standard errors are usually correct. Reliability coefficients are accidental and so is Table 20, the score-to-measure Table. Measure comparisons within subsets are correct. Across-subset measure comparisons are accidental.

The subsets are shown in the Measure Tables:

TABLE 14.1 Example of subset reporting

ITEM STATISTICS: ENTRY ORDER

----------------------------------------------------------------------------------------------------------

|------------------------------------+----------+----------+-----------+-----------+--------+------------|

| 2 8 7 1.36 .76| .36 -1.3| .38 -1.1| .94 .86| 83.3 65.6| .06| Subset 1 0 | SUBSET 1

| 3 8 7 1.36 .76|1.01 .2| .92 .1| .86 .86| 50.0 65.6| .06| Subset 1 0 | SUBSET 1

| 4 6 7 2.06 .76| .37 -.9| .31 -.7| .90 .84| 83.3 71.0| .07| Subset 1 0 | SUBSET 1

| 5 6 7 2.06 .76| .37 -.9| .31 -.7| .90 .84| 83.3 71.0| .07| Subset 1 0 | SUBSET 1

| 6 4 8 -1.73 .90| .54 -1.2| .44 -1.0| .84 .62|100.0 72.9| -.20| Subset 6 0 | SUBSET 6

| 7 2 3 -3.25 1.24| .82 -.4| .74 -.3| .50 .32| 66.7 61.5| -.35| Subset 3 0 | SUBSET 3

| 8 2 3 -3.25 1.24| .82 -.4| .74 -.3| .50 .32| 66.7 61.5| -.35| Subset 3 0 | SUBSET 3

| 9 1 2 .69 1.42|1.00 .1|1.00 .1| .00 .00| 50.0 52.1| -.09| Subset 4 0 | SUBSET 4

| 10 1 2 .69 1.42|1.00 .1|1.00 .1| .00 .00| 50.0 52.1| -.09| Subset 4 0 | SUBSET 4

|------------------------------------+----------+----------+-----------+-----------+--------+------------|

A solution would be to anchor equivalent items (or equivalent persons) in the different subsets to the same values - or juggle the anchor values to make the mean of each subset the same (or whatever). Or else do separate analyses. Or construct a real or dummy data records which include 0 & 1 responses to all items.

Winsteps reports entry numbers for each person and each item in each subset, so that you can compare their response strings. To analyze only the items and persons in a particular subset, such as subset 4 above, specify the items and persons in the subset:

IDELETE= +9-10

PDELETE= +9-10