FORMAT= reformat data

Enables you to process awkwardly formatted data! But MFORMS= is easier

FORMAT= is rarely needed when there is one data line per person.

Place the data in a separate DATA= file, then the Winsteps screen file will show the first record before and after FORMAT=. The formatted data records are also shown from the Edit menu, Formatted Data=.

Control instructions to pick out every other character for 25 two-character responses, then a blank, and then the person label:

XWIDE=1

data=datafile.txt

format=(T2,25(1A,1X),T90,1A,Tl1,30A)

This displays on the Winsteps screen:

Opening: datafile.txt

Input Data Record before FORMAT=:

1 2 3 4 5 6 7

1234567890123456789012345678901234567890123456789012345678901234567890

----------------------------------------------------------------------

01xx 1x1 10002000102020202000201010202000201000200ROSSNER, MARC DANIEL

Input Data Record after FORMAT=:

1x11102012222021122021020 L

^I ^N^P

^I is Item1= column

^N is the last item according to NI=

^P is Name1= column

FORMAT= enables you to reformat one or more data record lines into one new line in which all the component parts of the person information are in one person-id field, and all the responses are put together into one continuous item-response string. A FORMAT= statement is required if

1) each person's responses take up several lines in your data file.

2) if the length of a single line in your data file is more than 10000 characters.

3) the person-id field or the item responses are not in one continuous string of characters.

4) you want to rearrange the order of your items in your data record, to pick out sub-tests, or to move a set of connected forms into one complete matrix.

5) you only want to analyze the responses of every second, or nth, person.

FORMAT= contains up to 512 characters of reformatting instructions, contained within (..), which follow special rules. Instructions are:

nA	read in n characters starting with the current column, and then advance to the next column after them. Processing starts from column 1 of the first line, so that 5A reads in 5 characters and advances to the sixth column.
nX	means skip over n columns. E.g. 5X means bypass this column and the next 4 columns.
Tc	go to column c. T20 means get the next character from column 20. T55 means "tab" to column 55, not "tab" passed 55 columns (which is TR55).
TLc	go c columns to the left. TL20 means get the next character the column which is 20 columns to the left of the current position.
TRc	go c columns to the right. TR20 means get the next character the column which is 20 columns to the right of the current position.
/	means go to column 1 of the next line in your data file.
n(..)	repeat the string of instructions within the () exactly n times.
,	a comma is used to separate the instructions.
	Set XWIDE=2 and you can reformat your data from original 1 or 2 column entries. Your data will all be analyzed as XWIDE=2. Then:
nA2	read in n pairs of characters starting with the current column into n 2-character fields of the formatted record. (For responses with a width of 2 columns.)
A1	read in n 1-character columns, starting with the current column, into n 2-character fields of the formatted record.

Always use nA1 for person-id information. Use nA1 for responses entered with a width of 1-character when there are also 2-character responses to be analyzed. When responses in 1-character format are converted into 2-character field format (compatible with XWIDE=2), the 1-character response is placed in the first, left, character position of the 2-character field, and the second, right, character position of the field is left blank. For example, the 1-character code of "A" becomes the 2-character field "A ". Valid 1-character responses of "A", "B", "C", "D" must be indicated by CODES="A B C D " with a blank following each letter.

ITEM1= must be the column number of the first item response in the formatted record created by the FORMAT= statement. NAME1= must be the column number of the first character of the person-id in the formatted record.

Example 1: Each person's data record file is 80 characters long and takes up one line in your data file. The person-id is in columns 61-80. The 56 item responses are in columns 5-60. Codes are "A", "B", "C", "D". No FORMAT= is needed. Data look like:

xxxxDCBDABCADCDBACDADABDADCDADDCCDADDCAABCADCCBBDADCACDBBADCZarathrustra-Xerxes

Without FORMAT=

XWIDE=1 response width (the standard)

ITEM1=5 start of item responses

NI=56 number of items

NAME1=61 start of name

NAMLEN=20 length of name

CODES=ABCD valid response codes

With FORMAT=

Reformatted record will look like:

DCBDABCADCDBACDADABDADCDADDCCDADDCAABCADCCBBDADCACDBBADCZarathrustra-Xerxes

XWIDE=1 response width (the standard)

FORMAT=(4X,56A,20A) skip unused characters

ITEM1=1 start of item responses

NI=56 number of items

NAME1=57 start of name

NAMLEN=20 length of name

CODES=ABCD valid response codes

Example 2: Each data record is one line of 80 characters. The person-id is in columns 61-80. The 28 item responses are in columns 5-60, each 2 characters wide. Codes are " A", " B", " C", " D". No FORMAT= is necessary. Data look like:

xxxx C D B A C B C A A D D D D C D D C A C D C B A C C B A CZarathrustra-Xerxes

Without FORMAT=

XWIDE=2 response width

ITEM1=5 start of item responses

NI=28 number of items

NAME1=61 start of name

NAMLEN=20 length of name

CODES=" A B C D" valid response codes

With FORMAT=

Columns of reformatted record:

1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-90123456789012345678

C D B A C B C A A D D D D C D D C A C D C B A C C B A CZarathrustra-Xerxes

XWIDE=2 response width

FORMAT=(4X,28A2,20A1) skip unused characters

ITEM1=1 start of item responses in formatted record

NI=28 number of items

NAME1=29 start of name in "columns"

NAMLEN=20 length of name

CODES=" A B C D" valid response codes

Example 3: Each person's data record is 80 characters long and takes one line in your data file. Person-id is in columns 61-80. 30 1-character item responses, "A", "B", "C" or "D", are in columns 5-34, 13 2-character item responses, "01", "02" or "99", are in 35-60.

xxxxDCBDABCADCDBACDADABDADCDADDCCA01990201019902010199020201Zarathrustra-Xerxes.

becomes on reformatting:

Columns:

1234567890123456789012345678901-2-3-4-5-6-7-8-9-0-1-2-3-45678901234567890123

DCBDABCADCDBACDADABDADCDADDCCA01990201019902010199020201Zarathrustra-Xerxes

XWIDE=2 analyzed response width

FORMAT=(4X,30A1,13A2,20A1) skip unused

ITEM1=1 start of item responses in formatted record

NI=43 number of items

NAME1=44 start of name

NAMLEN=20 length of name

CODES="A B C D 010299" valid responses

^ 1-character code followed by blank

Example 4: The person-id is 10 columns wide in columns 15-24 and the 50 1-column item responses, "A", "B", "C", "D", are in columns 4000-4019, then in 4021-50. Data look like:

xxxxxxxxxxxxxxJohn-Smithxxxx....xxxDCBACDADABCADCBCDABDxBDCBDADCBDABDCDDADCDADBBDCDABB

becomes on reformatting:

John-SmithDCBACDADABCADCBCDABDBDCBDADCBDABDCDDADCDADBBDCDABB

FORMAT=(T15,10A,T4000,20A,1X,30A)

NAME1=1 start of person name in formatted record

NAMLEN=10 length of name (automatic)

ITEM1=11 start of items in formatted record

NI=50 50 item responses

CODES=ABCD valid response codes

Example 5: There are five records or lines in your data file per person. There are 100 items. Items 1-20 are in columns 25-44 of first record; items 21-40 are in columns 25-44 of second record, etc. The 10 character person-id is in columns 51-60 of the last (fifth) record. Codes are "A", "B", "C", "D". Data look like:

xxxxxxxxxxxxxxxxxxxxxxxxACDBACDBACDCABACDACD

xxxxxxxxxxxxxxxxxxxxxxxxDABCDBACDBACDCABACDA

xxxxxxxxxxxxxxxxxxxxxxxxACDBACDBACDCABACDACD

xxxxxxxxxxxxxxxxxxxxxxxxDABCDBACDBACDCABACDA

xxxxxxxxxxxxxxxxxxxxxxxxABCDBACDBACDCABACDADxxxxxxMary-Jones

becomes:

ACDBACDBACDCABACDACDDABCDBACDBACDCABACDAACDBACDBACDCABACDACDDABCDBACDBACDCABACDAABCDBACDBACDCABACDADMary-Jones

FORMAT=(4(T25,20A,/),T25,20A,T51,10A)

ITEM1=1 start of item responses

NI=100 number of item responses

NAME1=101 start of person name in formatted record

NAMLEN=10 length of person name

CODES=ABCD valid response codes

Example 6: There are three lines per person. In the first line from columns 31 to 50 are 10 item responses, each 2 columns wide. Person-id is in the second line in columns 5 to 17. The third line is to be skipped. Codes are "A ", "B ", "C ", "D ". Data look like:

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx A C B D A D C B A Dxxxxxxxx

xxxxJoseph-Carlosxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

becomes:

Columns:

1-2-3-4-5-6-7-8-9-0-1234567890123

A C B D A D C B A DJoseph-Carlos

FORMAT=(T31,10A2,/,T5,13A1,/)

ITEM1=1 start of item responses

NI=10 number of items

XWIDE=2 2 columns per response

NAME1=11 starting "A" of person name

NAMLEN=13 length of person name

CODES='A B C D ' valid response codes

If the third line isn't skipped, format a redundant extra column in the skipped last line. Replace the first control variable in this with:

FORMAT=(T31,10A2,/,T5,13A1,/,A1) last A1 unused

Example 7: Pseudo-random data selection

To skip every other record, use (for most situations):
FORMAT=(500A, /) ; skips every second record of two
or
FORMAT=(/, 500A) ; skips every first record of two

You have a file with 1,000 person records. This time you want to analyze every 10th record, beginning with the 3rd person in the file, i.e., skip two records, analyze one record, skip seven records, and so on. The data records are 500 characters long.

XWIDE = 1

FORMAT = (/,/,500A,/,/,/,/,/,/,/)

XWIDE = 2

FORMAT = (/,/,100A2,300A1,/,/,/,/,/,/,/) ; 100 2-character responses, 300 other columns

Example 8: Test A, in file EXAM10A.TXT, and TEST B, in EXAM10B.TXT, are both 20 item tests. They have 5 items in common, but the distractors are not necessarily in the same order. The responses must be scored on an individual test basis. Also the validity of each test is to be examined separately. Then one combined analysis is wanted to equate the tests and obtain bankable item difficulties. For each file of original test responses, the person information is in columns 1-25, the item responses in 41-60.

The combined data file specified in EXAM10C.TXT, is to be in RFILE= format. It contains

Person information 30 characters (always)

Item responses Columns 31-64

The identification of the common items is:

Test Item Number (=Location in item string)
Bank	1	2	3	4	5	6-20	21-35
A:	3	1	7	8	9	2, 4-6, 10-20
B:	4	5	6	2	11		1, 3, 7-10, 12-20

I. From Test A, make a response (RFILE=) file rearranging the items with FORMAT=.

; This file is EXAM10A.TXT

&INST

TITLE="Analysis of Test A"

RFILE=EXAM10AR.TXT ; The constructed response file for Test A

NI=20

FORMAT=(25A,T43,A,T41,A,T47,3A,T42,A,T44,3A,T50,11A)

ITEM1=26 ; Items start in column 26 of reformatted record

CODES=ABCD# ; Beware of blanks meaning wrong!

; Use your editor to convert all "wrong" blanks into another code,

; e.g., #, so that they will be scored wrong and not ignored as missing.

KEYFRM=1 ; Key in data record format

&END

Key 1 Record CCBDACABDADCBDCABBCA

BANK 1 TEST A 3 ; first item name

BANK 20 TEST A 20

END NAMES

Person 01 A BDABCDBDDACDBCACBDBA

Person 12 A BADCACADCDABDDDCBACA

The RFILE= file, EXAM10AR.TXT, is:

Person 01 A 00001000010010001001

Person 02 A 00000100001110100111

Person 12 A 00100001100001001011

II. From Test B, make a response (RFILE=) file rearranging the items with FORMAT=. Responses unique to Test A are filled with 15 blank responses to dummy items.

; This file is EXAM10B.TXT

&INST

TITLE="Analysis of Test B"

RFILE=EXAM10BR.TXT ; The constructed response file for Test B

NI=35

FORMAT=(25A,T44,3A,T42,A,T51,A,T100,15A,T41,A,T43,A,T47,4A,T52,9A)

; Blanks are imported from an unused part of the data record to the right!

; T100 means "go beyond the end of the data record"

; 15A means "get 15 blank spaces"

ITEM1=26 ; Items start in column 26 of reformatted record

CODES=ABCD# ; Beware of blanks meaning wrong!

KEYFRM=1 ; Key in data record format

&END

Key 1 Record CDABCDBDABCADCBDBCAD

BANK 1 TEST B 4

BANK 5 TEST B 11

BANK 6 TEST A 2

BANK 20 TEST A 20

BANK 21 TEST B 1

BANK 35 TEST B 20

END NAMES

Person 01 B BDABDDCDBBCCCCDAACBC

Person 12 B BADABBADCBADBDBBBBBB

The RFILE= file, EXAM10BR.TXT, is:

Person 01 B 10111 010101001000100

Person 02 B 00000 010000000001000

Person 11 B 00010 001000000000100

Person 12 B 00000 000101000101000

III. Analyze Test A's and Test B's RFILE='s together:

; This file is EXAM10C.TXT

&INST

TITLE="Analysis of Tests A & B (already scored)"

NI=35

ITEM1=31 ; Items start in column 31 of RFILE=

CODES=01 ; Blanks mean "not in this test"

DATA=EXAM10AR.TXT+EXAM10BR.TXT ; Combine data files

; or, first, at the DOS prompt,

; C:> COPY EXAM10AR.TXT+EXAM10BR.TXT EXAM10AB.TXT(Enter)

; then, in EXAM10C.TXT,

; DATA=EXAM10AB.TXT

PFILE=EXAM10CP.TXT ; Person measures for combined tests

IFILE=EXAM10CI.TXT ; Item calibrations for combined tests

tfile=* ; List of desired tables

3 ; Table 3.1 for summary statistics, 3.2, ...

10 ; Table 10 for item structure

PRCOMP=S ; Principal components/contrast analysis with standardized residuals

&END

BANK 1 TEST A 3 B 4

BANK 35 TEST B 20

END NAMES

Shortening FORMAT= statements

If the required FORMAT= statement exceeds 512 characters, consider using this technique:

Relocate an entire item response string, but use an IDFILE= to delete the duplicate items, i.e., replace them by blanks. E.g., for Test B, instead of

FORMAT=(25A, T44,3A,T42,A,T51,A, T100,15A, 41,A,T43,A,T47,4A,T52,9A)

NI=35

Put Test 2 as items 21-40 in columns 51 through 70:

FORMAT=(25A, T44,3A,T42,A,T51,A, T100,15A, T41,20A)

NI=40

Blank out (delete) the 5 duplicated items with an IDFILE= containing:

24-26