Forty-Two Ways to Make DAO Faster—Programming to DAO Using Microsoft Access

Click to open or copy the presentation file.

Presented by: Michael Mee

Michael Mee is a program manager in the Microsoft Database Group and has designed key parts of Microsoft Access and FoxPro. He currently designs future enhancements to Data Access Objects (DAO).

Introduction

This document provides a cornucopia of tips and tricks for making your DAO code as fast as possible. While the code samples are written and tested in Microsoft® Access 2.0, almost all of them apply equally to Microsoft Visual Basic®.

In studying the samples, it is important to keep the following in mind:

The samples are written to be as brief and succinct as possible. As such, they sometimes sacrifice correctness for the sake of clarity. For example, although one of the very best performance improvements in this document is number 14, you won't find many examples actually using this advice because the extra lines of code required detracts from the point being made.
There is no such thing as a hard and fast rule. All of the suggestions contained herein have been tested to confirm that they do in fact provide performance improvements. However, sometimes those improvements are marginal, and, worse, sometimes those improvements would be reversed if certain fundamental parameters were changed (e.g. number of rows). The text accompanying each suggestion attempts to draw attention to the critical details.
The examples try to stick purely to DAO code changes that will affect performance. In particular, no attempt is made to try and point out how to tune a given query to execute faster, as those issues are germane to running queries either from code or directly within Microsoft Access. As a general rule, you will get more benefit from tweaking queries than from tweaking code. These tips assume that you've already tweaked the queries.

A common thread emerges from a close study of the examples given. Having a good understanding of the differences between tables, dynasets and snapshots is a great start to correctly predicting the performance characteristics of a given situation. This is particularly critical for working with ODBC data sources.

An equally important lesson to learn from the examples, is that when processing data, the time spent retrieving and updating data almost totally overwhelms other speed concerns. For example, it quickly became clear while preparing this presentation, that issues of, say, using a Variant data type instead of a String data type, become almost irrelevant if a field is being fetched in the same loop. This is not to say that data retrieval is particularly slow, but just that anything involving a disk drive (or even a disk cache) is at least an order of magnitude slower than doing an in-memory bit test to determine the sub-type of a variant. In short, it is far better to worry about tuning an OpenRecordset statement or pre-assigning Field objects before entering a loop, than deciding whether to Dim variables as String or Variant.

Finally, the single most important thing learnt from preparing this paper was that all the predictions and postulations about how fast a particular operation will be don't mean anything until you actual measure the speed! In their initial form, several of the tips below proved to be totally wrong. The lesson to take away is that, until you test your speed improvements in a real environment, you can't be sure of anything!

Overview

The two tables below list the hints which follow in a loose order of priority. The ratings given are somewhat arbitrary and are an attempt to answer the question "which tip should I look at applying first." Because of the differences between ODBC and .MDB access, sometimes a rating of "1" for MDB won't be as appropriate for ODBC. Reading the detailed descriptions in the next section should help.

The first table contains tips that apply to Microsoft Access data formats and which generally apply to other ISAM formats such as Paradox and Xbase. If you never use ODBC data formats, then this is for you.

Tips for Microsoft Access data formats

Rating	Tip#	Title
1	1	Replace Find with Seek
1	2	Use table-type Recordsets instead of dynaset or snapshot-types
1	40	Replace use of .Sort with specifying an index (or a query)
1	26	Replace use of .Filter with Seek (or a query)
1	20	Use the DB_APPENDONLY flag if you're adding records to dynasets, esp ODBC
1	12	Always wrap transactions around your DAO code
2	13	Return to your location using bookmarks instead of using Find
2	27	Re-query recordsets instead of reopening them
2	43	Avoid refreshing collections unless you absolutely have to
2	14	In inner loops, use Field objects instead of myRS!fieldname (Q112724)
2	23	For snapshots, select just the fields you need instead of all fields
2	7	Store queries instead of using dynamic SQL text - esp low memory machines
2	3	Open attached tables directly by opening the database directly
2	42	Add the DB_DENYWRITE flag if no one else will be writing
3	11	Replace DAO code loops with the equivalent SQL statements - but not always
3	17	Cache tabledef and field property collection references if used many times
3	8	Parameterize queries instead of using dynamic SQL text (especially for ODBC)
3	22	Replace short memo fields with long text fields
3	15	Store infrequently updated tables in the local MDB
3	32	'Posting' queries are faster than explicit code
3	25	Try InStr() instead of Like if you're not concerned with foreign accent chars
3	28	Dim objects and Set them rather than using lengthy references
3	41	Replace floating point numbers with integral numbers
3	36	Compile modules and queries before distributing your application
3	35	Clone dynaset recordsets instead of opening new ones
3	31	Replace old ListTables (etc.) code with collection based code
3	24	Split .mdb files that have lots of tables to minimize DAO collection initialization time
4	9	Open databases exclusive and read-only if for fastest single-user network performance
4	33	Refresh current field values with Move 0 instead of MoveNext/Previous
4	18	Speed finds by creating a temporary indexed table and seeking
4	19	Replace repeated execution of dynamic SQL with a temporary query
4	34	Abandon an Edit or AddNew with Move 0
5	21	Marginal: replace variants with specific data types
5	30	Take advantage of default collections and drop the extra name
5	29	Replace text collection lookup with ordinals (but field object is better)
5	39	Replace Variant string functions with '$' functions if using strings

The second table contains tips that apply to ODBC databases. Since many tips apply to both data formats, there is quite a bit of overlap between the tables. If you are trying to speed up access to ODBC data in particular, this table will probably be of more interest.

Tips for ODBC data formats

Rating	Tip#	Title
1	12	Always wrap transactions around your DAO code
1	40	Replace use of .Sort with specifying an index (or a query)
1	6	With ODBC dynasets, use CacheStart, CacheFill and CacheSize functionality
1	38	On ODBC data use find only on indexed fields, otherwise open a new cursor
1	26	Replace use of .Filter with Seek (or a query)
1	10	With ODBC SQL statements, use pass-through where possible
1	20	Use the DB_APPENDONLY flag if you're adding records to dynasets, esp ODBC
2	43	Avoid refreshing collections unless you absolutely have to
2	23	For snapshots, select just the fields you need instead of all fields
2	27	Re-query recordsets instead of reopening them
2	4	Use snapshots instead of dynasets for remote data (but beware of memos!)
2	13	Return to your location using bookmarks instead of using Find
2	7	Store queries instead of using dynamic SQL text - esp low memory machines
2	37	For ODBC data with OLE or memo fields use dynasets instead of snapshots
2	5	Supply the DB_FORWARDONLY option on ODBC snapshots
2	14	In inner loops, use Field objects instead of myRS!fieldname (Q112724)
3	11	Replace DAO code loops with the equivalent SQL statements - but not always
3	15	Store infrequently updated tables in the local MDB
3	17	Cache tabledef and field property collection references if used many times
3	8	Parameterize queries instead of using dynamic SQL text (especially for ODBC)
3	32	'Posting' queries are faster than explicit code
3	41	Replace floating point numbers with integral numbers
3	36	Compile modules and queries before distributing your application
3	28	Dim objects and Set them rather than using lengthy references
3	35	Clone dynaset recordsets instead of opening new ones
3	22	Replace short memo fields with long text fields
3	31	Replace old ListTables (etc.) code with collection based code
4	33	Refresh current field values with Move 0 instead of MoveNext/Previous
4	16	Reduce your ODBC time-outs to get faster return to DAO
4	18	Speed finds by creating a temporary indexed table and seeking
4	19	Replace repeated execution of dynamic SQL with a temporary query
4	34	Abandon an Edit or AddNew with Move 0
5	30	Take advantage of default collections and drop the extra name
5	29	Replace text collection lookup with ordinals (but Field object is better)
5	21	Marginal: replace variants with specific data types
5	39	Replace Variant string functions with '$' functions if using strings

Sample Code

This section discusses the code found in the database FAST42.MDB. Use this paper in conjunction with the form in that database that lets you test each function.

The sample code consists of functions that directly correspond to the tips given below. All the functions are structured similarly - the slow way comes first, followed by the faster method. The functions are designed to be run with the tables and queries in the "fast42.mdb" database. This database in turn, expects to link to nwind.mdb and a Microsoft SQL Server™ installation with the Pubs database installed to be able to run some of the functions.

Note that benchmarking is a fine art (or perhaps an evil science?), and the code samples do not claim to be exhaustive. They should be taken as indicative only. In particular, no accounting has been made for:

caching effects from running one technique directly before the other, or
initial startup or load time. This is particularly an issue for low memory machines. For example, some of the examples that compile a SQL string run much more slowly on a low memory machine, because the query compiler has to be paged into memory to compile the query. On such machines, pre-compiled queries run much faster than SQL strings.

As always, test any improvements you make on the exact machine and software configuration that they will ultimately be used in (including having other programs loaded and minimized if need be).

50 Ways to Speed Up DAO Code

Replace Find with Seek

If you select a table-type recordset with an index, you can use the Seek method to position the cursor against any criteria that are stored in that index. This will always be the fastest possible way to find a particular piece of data. Even though Find will use indexes where possible, because it is based on a dynaset or snapshot, the overhead of that mechanism will always be somewhat greater than the equivalent Seek.

Be careful about giving in to the temptation of indexing all the fields so that you can seek on any criteria. Unless the table data is purely read-only, the overhead of maintaining indexes will erase the gains of using seek on an obscure column once in a blue moon.

Multi-field indexes can be used with Seek even if you don't want to match all the fields. The one restriction is that you have to provide any field preceding the other fields in the index. For example, if you have an index on Part#, Cust# and Emp#, you can search for a Cust# as long as you have the Part#. Similarly to find an Emp# you must have a Part# and Cust#. Of course you can also use this index to search for a Part# alone.

One of the obstacles to using Seek is that you cannot open attached tables as table-type recordsets. Tip #3 shows how you can look inside the connection information for an attached table and use that information to open the database and hence the table directly.

This tip cannot be used against ODBC data sources because they cannot be opened as tables directly. This is an inherent architectural limitation of the way client-servers work.

Use table-type Recordsets instead of dynaset or snapshot-types

Using a table-type recordset will be faster than a dynaset or snapshot for all operations that they support. The two exceptions to this are the .Filter and .Sort properties (see #26 and #40). If you don't need to join data or do complex .Find requests, then this is a great way to speed up your data access. As a general "counter rule", however, as soon as you do need to do any joins, sorting, etc., it will be quicker to use the appropriate dynaset or snapshot (see also #4).

One of the obstacles to using table-type recordsets is that you cannot open attached tables as table-type recordsets. Tip #3 shows how you can look inside the connection information for an attached table and use that information to open the database and hence the table directly.

If the tables involved are small, then the benefits of using this technique will be small and you might choose to use dynasets or snapshots for convenience.

This tip cannot be used against ODBC data sources because they cannot be opened as tables directly. This is an inherent architectural limitation of the way client-servers work.

Open attached tables directly by opening the database directly

Note Do not open tables in ODBC directly. This will always be slower than using an attached table. This hint is only for ISAM type databases such as .MDB files, Paradox, or Xbase files.

A common model for distributing applications is to put local data, forms, reports, etc. in an .mdb file that resides on the user's machine and to attach those tables to an .mdb on a common network location (a setup sometimes erroneously referred to as "client-server"). One limitation of this approach is that your code can no longer open the tables as table-type recordsets because attached tables can only be opened as dynasets or snapshots.

The code sample shows how you can work around this restriction in a generic fashion and get the best of both worlds by reading the Connect property to find the name of the database file and then opening the database directly. Doing this allows you to take advantage of tips #1 and #2.

Of course if the tables involved are small, then the benefits of this will be rather small and may actually be counter productive due to the extra overhead of having another database open, etc.

Use snapshots instead of dynasets for remote data (but beware of memos!)

Microsoft Jet provides two kinds of recordsets: dynasets, which can be updated, and snapshots, which cannot. If you don't need to update data and the recordset contains fewer than 500 records, you can reduce the time it takes the recordset to open by requesting a snapshot using DB_OPEN_SNAPSHOT. This causes Jet to use a snapshot, which is generally faster than using a dynaset.

However, if the recordset you're using is large or contains Memo or OLE Object fields, using a dynaset is more efficient. If you use a dynaset, Jet retrieves only the primary key of each record; if you use a snapshot, it retrieves the entire record. A dynaset is also more efficient because Memo and OLE Object fields are retrieved only if they are referenced.

Rather than transferring the data in each column of the row during the fetch, Jet builds a dynaset by fetching the key values that uniquely identify each of the rows. The data from the bookmarked rows is fetched only when needed to populate the table or query datasheet view. Using key values speeds the chunking operation and minimizes network traffic, especially when browsing large tables. Data retrieval is also optimized in that Memo and OLE objects are not retrieved unless they need to be displayed.

A snapshot Recordset does not use bookmarks; instead, the data contained in each of the rows is fetched. Creating snapshot Recordset objects can be faster with small tables, but can be quite time-consuming when large tables with many columns are involved, and network traffic increases drastically. This is especially true if the table or query includes Memo fields and OLE objects. (Note, however, that .MDB snapshots containing Memo or OLE fields will not retrieve the Memo or OLE data into the snapshot until it is actually referenced.) You can further minimize the performance hit with snapshot Recordsets by ensuring you are returning only the columns you need.

Supply the DB_FORWARDONLY option on ODBC snapshots

Recordset objects of the Snapshot type provide bi-directional scrolling capability by default. If you only need to make a single pass through the records of a Snapshot, you can Or the DB_FORWARDONLY flag with any other existing flag in OpenRecordset()'s intOptions argument. This makes the Snapshot more efficient, because the result set is not copied to a scrollable buffer area.

With ODBC dynasets, use CacheStart, CacheFill, and CacheSize functionality

Microsoft Access 2.0's CacheSize and CacheStart properties let you store all or a part of the data contained in a dynaset-type Recordset in local memory. Local caching of rows dramatically speeds up operations when traversing dynaset rows bi-directionally and shows significant improvements even when moving in a forward only direction.

To use caching, you specify the number of rows to be stored by the value of CacheSize (Long) and the beginning row by the bookmark stored as the value of the CacheStart (String) property. Applying the FillCache method fills the cache with server data. Fetches within the cache boundary occur locally, speeding display of the cached records in a datasheet or in a continuous form. The allowable range of the value of CacheSize is between five and 1,200 records; if the size of the cache exceeds available memory, the excess records spill into a temporary disk file. Typically, you set the value of CacheSize to 100. To recover the cache memory, set CacheSize = 0.

Once you establish the cache, you need to keep moving its location to stay synchronized with the set of records you're working with. Using a cache can provide twice the performance of not specifying cache. If the application requires backward scrolling within the cached region the performance improvements will be even larger. Depending on your scenario, using a cache may be faster than using a read-only forward-only snapshot (especially if the snapshot contains memo or long binary fields which may only be referenced occasionally).

The size of the cache you use will be determined by the application needs. For example, if you are displaying these records to the user, then you might use a cache size determined by the number of rows they can have on the screen. If the code has no user interaction, then a tradeoff between local memory availability, network traffic, row size, and fetch delay can be made, usually by experimentation.

Store queries instead of using dynamic SQL text - especially low memory machines

DAO makes it easy to create SQL statements on the fly. As a general guideline, it is better to create stored queries and to open those directly rather than use SQL text in your code. The advantages include:

faster performance on low memory machines because the query compiler won't be needed and hence won't take away valuable memory from your application. Machines with plenty of memory will find that query compilation step is fairly fast, so don't be mislead by this if you intend the final application to be used on low memory machines (or high memory machines with lots of other applications running);
faster performance because the query compilation is avoided;
better maintainability because the query is not buried in amongst the rest of your source code;
better maintainability because you can modify the query to adapt to data layout changes without modifying the source code;
easier auditing because the queries are easier to find; and
better security capabilities because you can take advantage of "Run with Owner Access permissions" to grant users limited access to tables that they otherwise reach.

The example code allows you to time a stored versus a dynamic query. However it is not really designed to help you evaluate the hit on the first query compilation, for example. However, running it on a low memory machine (e.g. 6-8Mb) should give you good feel for the trade-offs involved.

Parameterize queries instead of using dynamic SQL text (especially for ODBC)

Note This tip is especially important for users with limited memory. Avoiding query compilation means that quite a large part of the Jet code will not be used, leaving the memory available for your application to use. See also #7.

A common requirement applications have is to build a SQL string based on values that the user provides. This often leads to code that looks like the following, which finds all authors whose name start with a given letter firstletter:

Dim d As Database, rs As recordset
Set d = OpenDatabase("biblio.mdb")
firstletter = "G"
sqlstr = "Select * from Authors where Author like """ & firstletter & "*"""
Set rs = d.OpenRecordset(sqlstr)
Debug.Print rs!Author

This code uses Basic to build a SQL string and then asks Jet to execute that SQL statement. This technique is often referred to as using dynamic SQL. Some of the reasons given for using this are:

It is easy and intuitive.
You don't clutter up the .MDB with queries.
There's no other way - not true!

Unfortunately the penalties with doing this can be quite high - especially when using ODBC data. The main steps that happen are:

1. Jet parses the SQL string.

2. Jet compiles the SQL string, determining which parts go to the ODBC server.

3. Jet builds a server specific query and sends it to the ODBC server.

4. The ODBC server parses and compiles the SQL string.

5. The ODBC server retrieves the results and returns them to Jet.

6. DAO looks at the results and builds the appropriate recordset and field objects.

Another alternative to creating SQL strings dynamically is to use parameterized queries that you create ahead of time. These are queries that have variables in place of actual values in the SQL statement. Your code gathers the values for these variables from the user and then tells the engine what they are. Once the parameterized query has been saved and compiled, Jet can do the following, omitting several steps above:

1. Jet sends a parameterized query to the ODBC server.

2. The ODBC server parses and compiles the SQL string.

3. Jet passes the values for the parameters.

4. The ODBC server retrieves the results and returns them to Jet.

Already this is less steps than above. However, if you wish to get more values from the user and run the same query with different values, you only need to repeat steps 3 and 4. This is a substantial saving, especially for client-server applications. (Note: if you're not interested in updating the results and the query uses only server data, pass-through queries might be even better.)

To do this in DAO, you use the Parameters collection on a Querydef. For example, the following code creates the query (which you would normally do once through the Microsoft Access query designer):

' This code only executed once - or done in query designer
Dim qd As querydef
Set qd = d.CreateQueryDef("AuthorLike", "PARAMETERS FirstLetter Text; SELECT DISTINCTROW Authors.Au_ID, Authors.Author FROM Authors WHERE ((Authors.Author Like [FirstLetter]))")
qd.Close

Once the query is created, you can set the parameters from code as follows:

firstletter = InputBox("Enter first letter of author's name")
Set qd = d.OpenQueryDef("AuthorLike")
qd!firstletter = "A*"
Set ds = qd.CreateDynaset()
...
qd!firstletter = "S*"
Set ds = qd.CreateDynaset()

As you can see, the code is shorter, easier to understand, easier to maintain - and actually runs faster as well.

Open databases exclusive and read-only if for fastest single-user network performance

If you open an .MDB file both exclusive and read-only, this will avoid the creation of the .LDB file and will also stop any locks from being placed on the database. Over a network, this will result in less network traffic and speed up access because no file locks are being placed. On a local hard disk, the differences are likely to be negligible. Similarly it may be hard to measure the differences on a lightly loaded network.

With ODBC SQL statements, use pass-through where possible

One of the major enhancements to Microsoft Jet 2.0 was the ability to create pass-through queries. With an ordinary query, Microsoft Jet compiles and runs the query, combining the power of the server with the capabilities of the Jet database engine. With a pass-through query, you enter an SQL statement that Microsoft Jet sends directly to the server without stopping to compile the query.

Pass-through queries offer the following advantages over other queries. Some of the more important ones are:

Pass-through queries can use server-specific functionality — such as stored procedures and server-based intrinsic functions — that have no Visual Basic equivalent.
Pass-through queries can log informational messages returned by the server, such as warnings and statistics.
Update, delete, and append pass-through action queries are faster than action queries based on attached remote tables, especially when many records are affected. However, a pass-through action query cannot partially succeed as an ordinary action query can. (In addition, you are limited to action queries that your server can run; if an action query uses multiple tables, you may have no choice but to use an ordinary query.)

Full details on pass-through queries can be found by searching Microsoft Access Help for SQL: pass-through queries. See also the Developing Client/Server Applications chapter of Building Application in the Access Developer's Toolkit.

Replace DAO code loops with the equivalent SQL statements - but not always

As a general rule, it is better to use a single stored query than to write the equivalent code in DAO. All the reasons given in tip #7 apply, except even more so when it comes to maintainability. However, as the sample code demonstrates, there are scenarios where the DAO code will be faster than the equivalent Jet 2.0 query. In particular, if you're doing a simple single table update it may be quicker to use DAO code rather than using a SQL query. This will only be true if:

You can open the table as a table-type recordset.
You are not dealing with ODBC data.
You must use transactions.
You are not updating a large number of fields.
It is a simple, single table update (i.e., not a joined update).

In the above case, you may wish to experiment with using code instead of a SQL statement - knowing that you give up the maintainability of a stored query in doing so.