FORMAT= reformat data |
Top Up Down
A A |
Enables you to process awkwardly formatted data! But MFORMS= is easier
FORMAT= is rarely needed when there is one data line per person.
Place the data in a separate DATA= file, then the Winsteps screen file will show the first record before and after FORMAT=. The formatted data records are also shown from the Edit menu, Formatted Data=.
Control instructions to pick out every other character for 25 two-character responses, then a blank, and then the person label:
XWIDE=1
data=datafile.txt
format=(T2,25(1A,1X),T90,1A,Tl1,30A)
This displays on the Winsteps screen:
Opening: datafile.txt
Input Data Record before FORMAT=:
1 2 3 4 5 6 7
1234567890123456789012345678901234567890123456789012345678901234567890
----------------------------------------------------------------------
01xx 1x1 10002000102020202000201010202000201000200ROSSNER, MARC DANIEL
Input Data Record after FORMAT=:
1x11102012222021122021020 L
^I ^N^P
^I is Item1= column
^N is the last item according to NI=
^P is Name1= column
FORMAT= enables you to reformat one or more data record lines into one new line in which all the component parts of the person information are in one person-id field, and all the responses are put together into one continuous item-response string. A FORMAT= statement is required if
1) each person's responses take up several lines in your data file.
2) if the length of a single line in your data file is more than 10000 characters.
3) the person-id field or the item responses are not in one continuous string of characters.
4) you want to rearrange the order of your items in your data record, to pick out sub-tests, or to move a set of connected forms into one complete matrix.
5) you only want to analyze the responses of every second, or nth, person.
FORMAT= contains up to 512 characters of reformatting instructions, contained within (..), which follow special rules. Instructions are:
nA |
read in n characters starting with the current column, and then advance to the next column after them. Processing starts from column 1 of the first line, so that 5A reads in 5 characters and advances to the sixth column. |
---|---|
nX |
means skip over n columns. E.g. 5X means bypass this column and the next 4 columns. |
Tc |
go to column c. T20 means get the next character from column 20. |
TLc |
go c columns to the left. TL20 means get the next character the column which is 20 columns to the left of the current position. |
TRc |
go c columns to the right. TR20 means get the next character the column which is 20 columns to the right of the current position. |
/ |
means go to column 1 of the next line in your data file. |
n(..) |
repeat the string of instructions within the () exactly n times. |
, |
a comma is used to separate the instructions. |
|
Set XWIDE=2 and you can reformat your data from original 1 or 2 column entries. Your data will all be analyzed as XWIDE=2. Then: |
nA2 |
read in n pairs of characters starting with the current column into n 2-character fields of the formatted record. (For responses with a width of 2 columns.) |
A1 |
read in n 1-character columns, starting with the current column, into n 2-character fields of the formatted record. |
Always use nA1 for person-id information. Use nA1 for responses entered with a width of 1-character when there are also 2-character responses to be analyzed. When responses in 1-character format are converted into 2-character field format (compatible with XWIDE=2), the 1-character response is placed in the first, left, character position of the 2-character field, and the second, right, character position of the field is left blank. For example, the 1-character code of "A" becomes the 2-character field "A ". Valid 1-character responses of "A", "B", "C", "D" must be indicated by CODES="A B C D " with a blank following each letter.
ITEM1= must be the column number of the first item response in the formatted record created by the FORMAT= statement. NAME1= must be the column number of the first character of the person-id in the formatted record.
Example 1: Each person's data record file is 80 characters long and takes up one line in your data file. The person-id is in columns 61-80. The 56 item responses are in columns 5-60. Codes are "A", "B", "C", "D". No FORMAT= is needed. Data look like:
xxxxDCBDABCADCDBACDADABDADCDADDCCDADDCAABCADCCBBDADCACDBBADCZarathrustra-Xerxes
Without FORMAT=
XWIDE=1 response width (the standard)
ITEM1=5 start of item responses
NI=56 number of items
NAME1=61 start of name
NAMLEN=20 length of name
CODES=ABCD valid response codes
With FORMAT=
Reformatted record will look like:
DCBDABCADCDBACDADABDADCDADDCCDADDCAABCADCCBBDADCACDBBADCZarathrustra-Xerxes
XWIDE=1 response width (the standard)
FORMAT=(4X,56A,20A) skip unused characters
ITEM1=1 start of item responses
NI=56 number of items
NAME1=57 start of name
NAMLEN=20 length of name
CODES=ABCD valid response codes
Example 2: Each data record is one line of 80 characters. The person-id is in columns 61-80. The 28 item responses are in columns 5-60, each 2 characters wide. Codes are " A", " B", " C", " D". No FORMAT= is necessary. Data look like:
xxxx C D B A C B C A A D D D D C D D C A C D C B A C C B A CZarathrustra-Xerxes
Without FORMAT=
XWIDE=2 response width
ITEM1=5 start of item responses
NI=28 number of items
NAME1=61 start of name
NAMLEN=20 length of name
CODES=" A B C D" valid response codes
With FORMAT=
Columns of reformatted record:
1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-90123456789012345678
C D B A C B C A A D D D D C D D C A C D C B A C C B A CZarathrustra-Xerxes
XWIDE=2 response width
FORMAT=(4X,28A2,20A1) skip unused characters
ITEM1=1 start of item responses in formatted record
NI=28 number of items
NAME1=29 start of name in "columns"
NAMLEN=20 length of name
CODES=" A B C D" valid response codes
Example 3: Each person's data record is 80 characters long and takes one line in your data file. Person-id is in columns 61-80. 30 1-character item responses, "A", "B", "C" or "D", are in columns 5-34, 13 2-character item responses, "01", "02" or "99", are in 35-60.
xxxxDCBDABCADCDBACDADABDADCDADDCCA01990201019902010199020201Zarathrustra-Xerxes.
becomes on reformatting:
Columns:
1234567890123456789012345678901-2-3-4-5-6-7-8-9-0-1-2-3-45678901234567890123
DCBDABCADCDBACDADABDADCDADDCCA01990201019902010199020201Zarathrustra-Xerxes
XWIDE=2 analyzed response width
FORMAT=(4X,30A1,13A2,20A1) skip unused
ITEM1=1 start of item responses in formatted record
NI=43 number of items
NAME1=44 start of name
NAMLEN=20 length of name
CODES="A B C D 010299" valid responses
^ 1-character code followed by blank
Example 4: The person-id is 10 columns wide in columns 15-24 and the 50 1-column item responses, "A", "B", "C", "D", are in columns 4000-4019, then in 4021-50. Data look like:
xxxxxxxxxxxxxxJohn-Smithxxxx....xxxDCBACDADABCADCBCDABDxBDCBDADCBDABDCDDADCDADBBDCDABB
becomes on reformatting:
John-SmithDCBACDADABCADCBCDABDBDCBDADCBDABDCDDADCDADBBDCDABB
FORMAT=(T15,10A,T4000,20A,1X,30A)
NAME1=1 start of person name in formatted record
NAMLEN=10 length of name (automatic)
ITEM1=11 start of items in formatted record
NI=50 50 item responses
CODES=ABCD valid response codes
Example 5: There are five records or lines in your data file per person. There are 100 items. Items 1-20 are in columns 25-44 of first record; items 21-40 are in columns 25-44 of second record, etc. The 10 character person-id is in columns 51-60 of the last (fifth) record. Codes are "A", "B", "C", "D". Data look like:
xxxxxxxxxxxxxxxxxxxxxxxxACDBACDBACDCABACDACD
xxxxxxxxxxxxxxxxxxxxxxxxDABCDBACDBACDCABACDA
xxxxxxxxxxxxxxxxxxxxxxxxACDBACDBACDCABACDACD
xxxxxxxxxxxxxxxxxxxxxxxxDABCDBACDBACDCABACDA
xxxxxxxxxxxxxxxxxxxxxxxxABCDBACDBACDCABACDADxxxxxxMary-Jones
becomes:
ACDBACDBACDCABACDACDDABCDBACDBACDCABACDAACDBACDBACDCABACDACDDABCDBACDBACDCABACDAABCDBACDBACDCABACDADMary-Jones
FORMAT=(4(T25,20A,/),T25,20A,T51,10A)
ITEM1=1 start of item responses
NI=100 number of item responses
NAME1=101 start of person name in formatted record
NAMLEN=10 length of person name
CODES=ABCD valid response codes
Example 6: There are three lines per person. In the first line from columns 31 to 50 are 10 item responses, each 2 columns wide. Person-id is in the second line in columns 5 to 17. The third line is to be skipped. Codes are "A ", "B ", "C ", "D ". Data look like:
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx A C B D A D C B A Dxxxxxxxx
xxxxJoseph-Carlosxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
becomes:
Columns:
1-2-3-4-5-6-7-8-9-0-1234567890123
A C B D A D C B A DJoseph-Carlos
FORMAT=(T31,10A2,/,T5,13A1,/)
ITEM1=1 start of item responses
NI=10 number of items
XWIDE=2 2 columns per response
NAME1=11 starting "A" of person name
NAMLEN=13 length of person name
CODES='A B C D ' valid response codes
If the third line isn't skipped, format a redundant extra column in the skipped last line. Replace the first control variable in this with:
FORMAT=(T31,10A2,/,T5,13A1,/,A1) last A1 unused
Example 7: Pseudo-random data selection
To skip every other record, use (for most situations):
FORMAT=(500A, /) ; skips every second record of two
or
FORMAT=(/, 500A) ; skips every first record of two
You have a file with 1,000 person records. This time you want to analyze every 10th record, beginning with the 3rd person in the file, i.e., skip two records, analyze one record, skip seven records, and so on. The data records are 500 characters long.
XWIDE = 1
FORMAT = (/,/,500A,/,/,/,/,/,/,/)
or
XWIDE = 2
FORMAT = (/,/,100A2,300A1,/,/,/,/,/,/,/) ; 100 2-character responses, 300 other columns
Example 8: Test A, in file EXAM10A.TXT, and TEST B, in EXAM10B.TXT, are both 20 item tests. They have 5 items in common, but the distractors are not necessarily in the same order. The responses must be scored on an individual test basis. Also the validity of each test is to be examined separately. Then one combined analysis is wanted to equate the tests and obtain bankable item difficulties. For each file of original test responses, the person information is in columns 1-25, the item responses in 41-60.
The combined data file specified in EXAM10C.TXT, is to be in RFILE= format. It contains
Person information 30 characters (always)
Item responses Columns 31-64
The identification of the common items is:
Test Item Number (=Location in item string) |
|||||||
Bank |
1 |
2 |
3 |
4 |
5 |
6-20 |
21-35 |
A: |
3 |
1 |
7 |
8 |
9 |
2, 4-6, 10-20 |
|
B: |
4 |
5 |
6 |
2 |
11 |
|
1, 3, 7-10, 12-20 |
I. From Test A, make a response (RFILE=) file rearranging the items with FORMAT=.
; This file is EXAM10A.TXT
&INST
TITLE="Analysis of Test A"
RFILE=EXAM10AR.TXT ; The constructed response file for Test A
NI=20
FORMAT=(25A,T43,A,T41,A,T47,3A,T42,A,T44,3A,T50,11A)
ITEM1=26 ; Items start in column 26 of reformatted record
CODES=ABCD# ; Beware of blanks meaning wrong!
; Use your editor to convert all "wrong" blanks into another code,
; e.g., #, so that they will be scored wrong and not ignored as missing.
KEYFRM=1 ; Key in data record format
&END
Key 1 Record CCBDACABDADCBDCABBCA
BANK 1 TEST A 3 ; first item name
.
BANK 20 TEST A 20
END NAMES
Person 01 A BDABCDBDDACDBCACBDBA
.
Person 12 A BADCACADCDABDDDCBACA
The RFILE= file, EXAM10AR.TXT, is:
Person 01 A 00001000010010001001
Person 02 A 00000100001110100111
.
Person 12 A 00100001100001001011
II. From Test B, make a response (RFILE=) file rearranging the items with FORMAT=. Responses unique to Test A are filled with 15 blank responses to dummy items.
; This file is EXAM10B.TXT
&INST
TITLE="Analysis of Test B"
RFILE=EXAM10BR.TXT ; The constructed response file for Test B
NI=35
FORMAT=(25A,T44,3A,T42,A,T51,A,T100,15A,T41,A,T43,A,T47,4A,T52,9A)
; Blanks are imported from an unused part of the data record to the right!
; T100 means "go beyond the end of the data record"
; 15A means "get 15 blank spaces"
ITEM1=26 ; Items start in column 26 of reformatted record
CODES=ABCD# ; Beware of blanks meaning wrong!
KEYFRM=1 ; Key in data record format
&END
Key 1 Record CDABCDBDABCADCBDBCAD
BANK 1 TEST B 4
.
BANK 5 TEST B 11
BANK 6 TEST A 2
.
BANK 20 TEST A 20
BANK 21 TEST B 1
.
BANK 35 TEST B 20
END NAMES
Person 01 B BDABDDCDBBCCCCDAACBC
.
Person 12 B BADABBADCBADBDBBBBBB
The RFILE= file, EXAM10BR.TXT, is:
Person 01 B 10111 010101001000100
Person 02 B 00000 010000000001000
.
Person 11 B 00010 001000000000100
Person 12 B 00000 000101000101000
III. Analyze Test A's and Test B's RFILE='s together:
; This file is EXAM10C.TXT
&INST
TITLE="Analysis of Tests A & B (already scored)"
NI=35
ITEM1=31 ; Items start in column 31 of RFILE=
CODES=01 ; Blanks mean "not in this test"
DATA=EXAM10AR.TXT+EXAM10BR.TXT ; Combine data files
; or, first, at the DOS prompt,
; C:> COPY EXAM10AR.TXT+EXAM10BR.TXT EXAM10AB.TXT(Enter)
; then, in EXAM10C.TXT,
; DATA=EXAM10AB.TXT
PFILE=EXAM10CP.TXT ; Person measures for combined tests
IFILE=EXAM10CI.TXT ; Item calibrations for combined tests
tfile=* ; List of desired tables
3 ; Table 3.1 for summary statistics, 3.2, ...
10 ; Table 10 for item structure
*
PRCOMP=S ; Principal components/contrast analysis with standardized residuals
&END
BANK 1 TEST A 3 B 4
.
BANK 35 TEST B 20
END NAMES
Shortening FORMAT= statements
If the required FORMAT= statement exceeds 512 characters, consider using this technique:
Relocate an entire item response string, but use an IDFILE= to delete the duplicate items, i.e., replace them by blanks. E.g., for Test B, instead of
FORMAT=(25A, T44,3A,T42,A,T51,A, T100,15A, 41,A,T43,A,T47,4A,T52,9A)
NI=35
Put Test 2 as items 21-40 in columns 51 through 70:
FORMAT=(25A, T44,3A,T42,A,T51,A, T100,15A, T41,20A)
NI=40
Blank out (delete) the 5 duplicated items with an IDFILE= containing:
24-26
22
31