FORMAT= reformat data

Top Up Down  A A

Enables you to process awkwardly formatted data! But MFORMS= is easier

 

FORMAT= is rarely needed when there is one data line per person.

 

Place the data in a separate DATA= file, then the Winsteps screen file will show the first record before and after FORMAT=. The formatted data records are also shown from the Edit menu, Formatted Data=.

 

Control instructions to pick out every other character for 25 two-character responses, then a blank, and then the person label:

 XWIDE=1

 data=datafile.txt

 format=(T2,25(1A,1X),T90,1A,Tl1,30A)

 

This displays on the Winsteps screen:

 

Opening: datafile.txt

Input Data Record before FORMAT=:

         1         2         3         4         5         6         7

1234567890123456789012345678901234567890123456789012345678901234567890

----------------------------------------------------------------------

01xx 1x1 10002000102020202000201010202000201000200ROSSNER, MARC DANIEL

Input Data Record after FORMAT=:

1x11102012222021122021020 L

^I                      ^N^P

 

^I is Item1= column

^N is the last item according to NI=

^P is Name1= column

 

FORMAT= enables you to reformat one or more data record lines into one new line in which all the component parts of the person information are in one person-id field, and all the responses are put together into one continuous item-response string. A FORMAT= statement is required if

1) each person's responses take up several lines in your data file.

2) if the length of a single line in your data file is more than 10000 characters.

3) the person-id field or the item responses are not in one continuous string of characters.

4) you want to rearrange the order of your items in your data record, to pick out sub-tests, or to move a set of connected forms into one complete matrix.

5) you only want to analyze the responses of every second, or nth, person.

 

FORMAT= contains up to 512 characters of reformatting instructions, contained within (..), which follow special rules. Instructions are:

 

nA

read in n characters starting with the current column, and then advance to the next column after them. Processing starts from column 1 of the first line, so that 5A reads in 5 characters and advances to the sixth column.

nX

means skip over n columns. E.g. 5X means bypass this column and the next 4 columns.

Tc

go to column c. T20 means get the next character from column 20.
T55 means "tab" to column 55, not "tab" passed 55 columns (which is TR55).

TLc

go c columns to the left. TL20 means get the next character the column which is 20 columns to the left of the current position.

TRc

go c columns to the right. TR20 means get the next character the column which is 20 columns to the right of the current position.

/

means go to column 1 of the next line in your data file.

n(..)

repeat the string of instructions within the () exactly n times.

,

a comma is used to separate the instructions.

 

Set XWIDE=2 and you can reformat your data from original 1 or 2 column entries. Your data will all be analyzed as XWIDE=2. Then:

nA2

read in n pairs of characters starting with the current column into n 2-character fields of the formatted record. (For responses with a width of 2 columns.)

A1

read in n 1-character columns, starting with the current column, into n 2-character fields of the formatted record.

 

Always use nA1 for person-id information. Use nA1 for responses entered with a width of 1-character when there are also 2-character responses to be analyzed. When responses in 1-character format are converted into 2-character field format (compatible with XWIDE=2), the 1-character response is placed in the first, left, character position of the 2-character field, and the second, right, character position of the field is left blank. For example, the 1-character code of "A" becomes the 2-character field "A ". Valid 1-character responses of "A", "B", "C", "D" must be indicated by CODES="A B C D " with a blank following each letter.

 

ITEM1= must be the column number of the first item response in the formatted record created by the FORMAT= statement. NAME1= must be the column number of the first character of the person-id in the formatted record.

 

Example 1: Each person's data record file is 80 characters long and takes up one line in your data file. The person-id is in columns 61-80. The 56 item responses are in columns 5-60. Codes are "A", "B", "C", "D". No FORMAT= is needed. Data look like:

xxxxDCBDABCADCDBACDADABDADCDADDCCDADDCAABCADCCBBDADCACDBBADCZarathrustra-Xerxes

 

 Without FORMAT=

   XWIDE=1 response width (the standard)

   ITEM1=5 start of item responses

   NI=56  number of items

   NAME1=61 start of name

   NAMLEN=20 length of name

   CODES=ABCD valid response codes

 

 With FORMAT=

Reformatted record will look like:

DCBDABCADCDBACDADABDADCDADDCCDADDCAABCADCCBBDADCACDBBADCZarathrustra-Xerxes

   XWIDE=1 response width (the standard)

   FORMAT=(4X,56A,20A) skip unused characters

   ITEM1=1 start of item responses

   NI=56  number of items

   NAME1=57 start of name

   NAMLEN=20 length of name

   CODES=ABCD valid response codes

 

Example 2: Each data record is one line of 80 characters. The person-id is in columns 61-80. The 28 item responses are in columns 5-60, each 2 characters wide. Codes are " A", " B", " C", " D". No FORMAT= is necessary. Data look like:

xxxx C D B A C B C A A D D D D C D D C A C D C B A C C B A CZarathrustra-Xerxes

 Without FORMAT=

   XWIDE=2 response width

   ITEM1=5 start of item responses

   NI=28  number of items

   NAME1=61 start of name

   NAMLEN=20 length of name

   CODES=" A B C D" valid response codes

 

 With FORMAT=

Columns of reformatted record:

1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-90123456789012345678

 C D B A C B C A A D D D D C D D C A C D C B A C C B A CZarathrustra-Xerxes

   XWIDE=2 response width

   FORMAT=(4X,28A2,20A1) skip unused characters

   ITEM1=1 start of item responses in formatted record

   NI=28  number of items

   NAME1=29 start of name in "columns"

   NAMLEN=20 length of name

   CODES=" A B C D" valid response codes

 

Example 3: Each person's data record is 80 characters long and takes one line in your data file. Person-id is in columns 61-80. 30 1-character item responses, "A", "B", "C" or "D", are in columns 5-34, 13 2-character item responses, "01", "02" or "99", are in 35-60.

xxxxDCBDABCADCDBACDADABDADCDADDCCA01990201019902010199020201Zarathrustra-Xerxes.

becomes on reformatting:

Columns:

1234567890123456789012345678901-2-3-4-5-6-7-8-9-0-1-2-3-45678901234567890123

DCBDABCADCDBACDADABDADCDADDCCA01990201019902010199020201Zarathrustra-Xerxes

 

   XWIDE=2 analyzed response width

   FORMAT=(4X,30A1,13A2,20A1) skip unused

   ITEM1=1 start of item responses in formatted record

   NI=43  number of items

   NAME1=44 start of name

   NAMLEN=20 length of name

   CODES="A B C D 010299" valid responses

     ^ 1-character code followed by blank

 

Example 4: The person-id is 10 columns wide in columns 15-24 and the 50 1-column item responses, "A", "B", "C", "D", are in columns 4000-4019, then in 4021-50. Data look like:

xxxxxxxxxxxxxxJohn-Smithxxxx....xxxDCBACDADABCADCBCDABDxBDCBDADCBDABDCDDADCDADBBDCDABB

becomes on reformatting:

John-SmithDCBACDADABCADCBCDABDBDCBDADCBDABDCDDADCDADBBDCDABB

   FORMAT=(T15,10A,T4000,20A,1X,30A)

   NAME1=1 start of person name in formatted record

   NAMLEN=10 length of name (automatic)

   ITEM1=11 start of items in formatted record

   NI=50  50 item responses

   CODES=ABCD valid response codes

 

Example 5: There are five records or lines in your data file per person. There are 100 items. Items 1-20 are in columns 25-44 of first record; items 21-40 are in columns 25-44 of second record, etc. The 10 character person-id is in columns 51-60 of the last (fifth) record. Codes are "A", "B", "C", "D". Data look like:

xxxxxxxxxxxxxxxxxxxxxxxxACDBACDBACDCABACDACD

xxxxxxxxxxxxxxxxxxxxxxxxDABCDBACDBACDCABACDA

xxxxxxxxxxxxxxxxxxxxxxxxACDBACDBACDCABACDACD

xxxxxxxxxxxxxxxxxxxxxxxxDABCDBACDBACDCABACDA

xxxxxxxxxxxxxxxxxxxxxxxxABCDBACDBACDCABACDADxxxxxxMary-Jones

 

becomes:

ACDBACDBACDCABACDACDDABCDBACDBACDCABACDAACDBACDBACDCABACDACDDABCDBACDBACDCABACDAABCDBACDBACDCABACDADMary-Jones

 

   FORMAT=(4(T25,20A,/),T25,20A,T51,10A)

   ITEM1=1 start of item responses

   NI=100  number of item responses

   NAME1=101 start of person name in formatted record

   NAMLEN=10 length of person name

   CODES=ABCD valid response codes

 

Example 6: There are three lines per person. In the first line from columns 31 to 50 are 10 item responses, each 2 columns wide. Person-id is in the second line in columns 5 to 17. The third line is to be skipped. Codes are "A ", "B ", "C ", "D ". Data look like:

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx A C B D A D C B A Dxxxxxxxx

xxxxJoseph-Carlosxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

 

becomes:

Columns:

1-2-3-4-5-6-7-8-9-0-1234567890123

 A C B D A D C B A DJoseph-Carlos

 

   FORMAT=(T31,10A2,/,T5,13A1,/)

   ITEM1=1 start of item responses

   NI=10  number of items

   XWIDE=2 2 columns per response

   NAME1=11 starting "A" of person name

   NAMLEN=13 length of person name

   CODES='A B C D ' valid response codes

 

  If the third line isn't skipped, format a redundant extra column in the skipped last line. Replace the first control variable in this with:

   FORMAT=(T31,10A2,/,T5,13A1,/,A1) last A1 unused

 

Example 7: Pseudo-random data selection

To skip every other record, use (for most situations):
FORMAT=(500A, /) ; skips every second record of two
or
FORMAT=(/, 500A) ; skips every first record of two

 

You have a file with 1,000 person records. This time you want to analyze every 10th record, beginning with the 3rd person in the file, i.e., skip two records, analyze one record, skip seven records, and so on. The data records are 500 characters long.

   XWIDE = 1

   FORMAT = (/,/,500A,/,/,/,/,/,/,/)

 or  

   XWIDE = 2

   FORMAT = (/,/,100A2,300A1,/,/,/,/,/,/,/) ; 100 2-character responses, 300 other columns

 

Example 8: Test A, in file EXAM10A.TXT, and TEST B, in EXAM10B.TXT, are both 20 item tests. They have 5 items in common, but the distractors are not necessarily in the same order. The responses must be scored on an individual test basis. Also the validity of each test is to be examined separately. Then one combined analysis is wanted to equate the tests and obtain bankable item difficulties. For each file of original test responses, the person information is in columns 1-25, the item responses in 41-60.

 

The combined data file specified in EXAM10C.TXT, is to be in RFILE= format. It contains

 

Person information 30 characters (always)

Item responses           Columns 31-64

 

The identification of the common items is:

Test Item Number (=Location in item string)

Bank

1

2

3

4

5

6-20

21-35

A:

3

1

7

8

9

2, 4-6, 10-20

 

B:

4

5

6

2

11

 

1, 3, 7-10, 12-20

 

I. From Test A, make a response (RFILE=) file rearranging the items with FORMAT=.

 

; This file is EXAM10A.TXT

&INST

TITLE="Analysis of Test A"

RFILE=EXAM10AR.TXT ; The constructed response file for Test A

NI=20

FORMAT=(25A,T43,A,T41,A,T47,3A,T42,A,T44,3A,T50,11A)

ITEM1=26  ; Items start in column 26 of reformatted record

CODES=ABCD#  ; Beware of blanks meaning wrong!

; Use your editor to convert all "wrong" blanks into another code, 

; e.g., #, so that they will be scored wrong and not ignored as missing.

KEYFRM=1  ; Key in data record format

&END

Key 1 Record                            CCBDACABDADCBDCABBCA

BANK 1   TEST A 3 ; first item name

 .

BANK 20  TEST A 20

END NAMES

Person 01 A                             BDABCDBDDACDBCACBDBA

 .

Person 12 A                             BADCACADCDABDDDCBACA

 

The RFILE= file, EXAM10AR.TXT, is:

 

Person 01 A                   00001000010010001001

Person 02 A                   00000100001110100111

   .

Person 12 A                   00100001100001001011

 

II. From Test B, make a response (RFILE=) file rearranging the items with FORMAT=. Responses unique to Test A are filled with 15 blank responses to dummy items.

 

; This file is EXAM10B.TXT

&INST

TITLE="Analysis of Test B"

RFILE=EXAM10BR.TXT ; The constructed response file for Test B

NI=35

FORMAT=(25A,T44,3A,T42,A,T51,A,T100,15A,T41,A,T43,A,T47,4A,T52,9A)

   ; Blanks are imported from an unused part of the data record to the right!

   ; T100 means "go beyond the end of the data record"

   ; 15A means "get 15 blank spaces"

ITEM1=26  ; Items start in column 26 of reformatted record

CODES=ABCD#  ; Beware of blanks meaning wrong!

KEYFRM=1  ; Key in data record format

&END

Key 1 Record                            CDABCDBDABCADCBDBCAD

BANK 1   TEST B 4

  .

BANK 5   TEST B 11

BANK 6   TEST A 2

  .

BANK 20  TEST A 20

BANK 21  TEST B 1

  .

BANK 35  TEST B 20

END NAMES

Person 01 B                             BDABDDCDBBCCCCDAACBC

  .

Person 12 B                             BADABBADCBADBDBBBBBB

 

The RFILE= file, EXAM10BR.TXT, is:

 

Person 01 B                   10111               010101001000100

Person 02 B                   00000               010000000001000

  .

Person 11 B                   00010               001000000000100

Person 12 B                   00000               000101000101000

 

III. Analyze Test A's and Test B's RFILE='s together:

 

; This file is EXAM10C.TXT

&INST

TITLE="Analysis of Tests A & B (already scored)"

NI=35

ITEM1=31  ; Items start in column 31 of RFILE=

CODES=01  ; Blanks mean "not in this test"

DATA=EXAM10AR.TXT+EXAM10BR.TXT ; Combine data files

 

; or, first, at the DOS prompt,

;  C:> COPY EXAM10AR.TXT+EXAM10BR.TXT EXAM10AB.TXT(Enter)

; then, in EXAM10C.TXT,

;  DATA=EXAM10AB.TXT

 

PFILE=EXAM10CP.TXT ; Person measures for combined tests

IFILE=EXAM10CI.TXT ; Item calibrations for combined tests

tfile=*  ; List of desired tables

3   ; Table 3.1 for summary statistics, 3.2, ...

10   ; Table 10 for item structure

*

PRCOMP=S  ; Principal components/contrast analysis with standardized residuals

&END

BANK 1   TEST A 3 B 4

  .

BANK 35  TEST B 20

END NAMES

 

Shortening FORMAT= statements

If the required FORMAT= statement exceeds 512 characters, consider using this technique:

 

Relocate an entire item response string, but use an IDFILE= to delete the duplicate items, i.e., replace them by blanks. E.g., for Test B, instead of

 FORMAT=(25A, T44,3A,T42,A,T51,A, T100,15A, 41,A,T43,A,T47,4A,T52,9A)

 NI=35

 

Put Test 2 as items 21-40 in columns 51 through 70:

 FORMAT=(25A, T44,3A,T42,A,T51,A, T100,15A, T41,20A)

 NI=40

 

Blank out (delete) the 5 duplicated items with an IDFILE= containing:

 24-26

 22

 31