Creating Useful Indexes

The usefulness of an index depends on the selectivity of the data for a query. Selectivity is an estimate of the percentage of the rows in a table that are returned for a query. The SQL Server optimizer uses the distribution page statistics to evaluate available indexes, estimate the number of page I/Os required, and chooses the method that will result in the fewest number of logical page I/Os.

Examine the WHERE clause of your queries, since that is the primary focus of the optimizer. Each column included in the WHERE clause is a possible candidate for an index. For optimal performance, consider the following useful indexes for a given column1 in the WHERE clause:

A single-column index on column1
A multicolumn index, where column1 is the first column of the index

Avoid defining a multicolumn index where column1 is the second or later column in the index. This would not be a useful index.

For example, given the following query using the pubs database:

SELECT au_id, au_lname, au_fname FROM authors
WHERE au_lname = 'White'

an index on the following columns could be useful to the optimizer:

au_lname
au_lname, au_fname

but an index on these columns would not be useful to the optimizer:

au_address
au_fname, au_lname

Consider using narrow indexes with one or two columns. Narrow indexes are often more effective than multicolumn, compound indexes. Narrow indexes have more rows per page and fewer index levels, boosting performance. The optimizer can rapidly and effectively analyze hundreds, or even thousands, of index and join possibilities. Having a greater number of narrow indexes provides the optimizer with more possibilities to choose from, which usually helps performance. Having a lesser number of wide, multicolumn indexes provides the optimizer with fewer possibilities to choose from, which may hurt performance.

For multicolumn indexes, SQL Server maintains density statistics (used for joins) on all columns of the index and histogram statistics on the first column of the index. Note that if the first column of a compound index has poor selectivity, the optimizer may not use the index for some queries.

Useful indexes can improve the performance of SELECT statements, as well as INSERT, UPDATE, and DELETE statements. However, because changes to a table might affect the indexes on that table, each indexes can slow down INSERT, UPDATE, and DELETE performance to some degree. Experiment with a new index and examine its effect on performance. Avoid a large number of indexes on a single table, and avoid overlapping indexes that contain shared columns.

Examine the number of unique data values in a column and compare it to the number of rows in the table. This is the selectivity of that column, which can help you decide if a column is a candidate for an index and, if so, what type of index.

You can use the following query to return the number of unique values for a column:

SELECT COUNT (DISTINCT column_name) FROM table_name

This table shows some example return values and the type of index you might consider putting on column_name for a 10,000-row table_name:



Unique values	Index

5000	Nonclustered index
20	Clustered index
3	No index