Finding Cell Records in BIFF Files

Microsoft Excel uses the INDEX and DBCELL records to optimize the lookup of cell records (RK, FORMULA, and so on). You can use these records to optimize your code when reading a BIFF file, or you can just read the entire workbook stream to find the cell values you want. The unoptimized method may be slower, depending on the size, structure, and complexity of the file.

If your code writes a BIFF file, you must include the INDEX and DBCELL records with correct values in the record fields. If you do not do this, Microsoft Excel will not be able to optimize lookup, and the program's performance will suffer, especially when the user tries to copy data out of the file that your application has written.

Microsoft Excel stores cell records in blocks that have at most 32 rows. Each row that contains cell records has a corresponding ROW record in the block, and each block contains a DBCELL record at the end of the block.

The following illustration shows how to use the INDEX record to locate the DBCELL records at the end of the record blocks. Notice that the stream position at the start of the first BOF record in the workbook stream is 6F1h. To find the start of each DBCELL record, add this number to each member of the rgibRw array in the INDEX record.

After your code has computed the location of the DBCELL records, you can use the dbRtrw field to find the location of the start of the first ROW record for each block. This field is stored as a positive long integer, although the offset is really a "negative" offset to an earlier position in the file. See the following illustration for details.

Finally, your code can compute the start of each cell record in the block by using the members in the rgdb array in the DBCELL record. The offsets in this array use the start of the second ROW record in the block as the initial offset. This is because the code has to read the first ROW record to know what the row number is (and then to make a decision based on the row number), and the stream pointer is at the start of the second ROW record after this. See the following illustration for details.