Efficient Index Design
Efficient index design is based on an understanding of the SQL Server optimizer as well as the internal index storage structures and the queries that will access the data. Therefore, the following guidelines should be applied to the design of optimal indexes.
General Guidelines
Examine all queries in detail based on the query design guidelines. Base index design on the optimization characteristics of the queries. Important optimizer characteristics are as follows:
- An UPDATE will be direct, provided that a unique index exists from which the query optimizer can determine that no more than one row qualifies for the update and that the target table is the only table in the UPDATE statement. Moreover, the columns being updated must be fixed length rather than variable length. (NULLs are internally stored as variable-length data type columns.)
- A DELETE will be direct, provided that a unique index exists from which the query optimizer can determine that either zero or one row qualifies for the update and that the target table is the only table in the DELETE statement.
- An INSERT will be direct, provided that the table into which the rows are being inserted cannot be a table that is being read from in the same command. In addition, if rows are being inserted into the target table, then no columns from the target table can be used in the WHERE clause of the query.
- Indexes are more efficient when narrow. Narrow indexes result in more index rows per page and fewer index levels. Consequently, a greater number of index key values can be cached, which results in fewer I/Os.
- The SQL Server optimizer is capable of analyzing a great number of index and join possibilities. Thus, a greater number of narrow indexes provides the optimizer with more selections than only a few wide indexes. However, do not maintain unnecessary or under-used indexes because they add to storage and maintenance overhead.
- The SQL Server optimizer only maintains distribution statistics on the most significant columns of a compound, composite, or multicolumn index. Therefore, selectivity should be great for the first column of the index.
- A large number of indexes on a table will affect UPDATE, INSERT, and DELETE performance, since all indexes will be appropriately adjusted. In addition, all paging operations are logged, causing further I/O.
- Indexing a column that is frequently updated will severely degrade performance.
- Because of storage overhead and I/O considerations, a small contrived index will result in better performance than a larger index. However, there may be a significant tradeoff with respect to maintaining the contrived column.
- Attempt to analyze the frequency of use of each critical query because this will point to the indexes that are used the most. Accordingly, apply appropriate optimizations to these indexes first.
- Any column in the WHERE clause of a query is a potential index column since the optimizer is primarily focused on this clause.
- Indexing small tables (less than one extent (eight pages) in size) is not optimal because it is usually faster and cheaper to perform a table scan.
- Design indexes that will generate the fewest number of I/Os by using page reads. An effective approach is to calculate the number of index rows per page and analyze the worst case scenario with respect to the number of reads required to obtain the solution set.
Clustered Indexes
The following guidelines apply to the design of clustered indexes:
- Use the primary key column(s) when used for the WHERE clause of queries.
- Examine any columns used in joins.
- Queries that return many rows may benefit from the use of a clustered index.
- A column that contains a lot of redundant data values (moderate number of unique values) is a candidate for a clustered index.
- Columns accessed by range are excellent candidates for clustered indexes.
- Clustered indexes should not be built on columns that undergo frequent change because this will result in the entire row moving. You must be especially aware of this in high-volume transaction-processing systems where data tends to be volatile.
- Columns used in conjunction with ORDER BY or GROUP BY clauses are candidates for clustered indexes. If such an index exists on the column(s) specified in the ORDER BY, then a worktable will not be created, since the rows are already ordered. GROUP BY always results in the creation of a worktable.
- Columns that are accessed sequentially are candidates for clustered indexes.
- Do not use clustered indexes for "covered queries." The more columns there are in an index, the greater the chance for index column change, resulting in excessive page I/O.
Nonclustered Indexes
The following guidelines apply to the design of nonclustered indexes:
- Nonclustered indexes are exceptional for use by queries where you get an exact match.
- Decision support systems should take advantage of nonclustered indexes since there are few data update, insert, or delete operations. Therefore, a liberal number of nonclustered indexes can be built to aid in information retrieval. However, you may want to drop and rebuild these indexes during data refresh cycles to avoid index-manipulation overhead.
- Covered queries are queries in which all the fields identified in the SELECT statement are represented in the nonclustered index; therefore, SQL Server can satisfy the query without accessing the data page. Adding columns to these indexes to enable a covered query can improve performance, but this approach needs to be weighed against the increased update and storage costs of maintaining extra columns in the index.