Unlike many nonrelational systems, relational indexes are not considered part of the logical database design. Indexes can be dropped, added, and changed without impacting the database schema or application design in any way other than performance. Efficient index design is paramount to achieving good SQL Server performance. For these reasons, you should not hesitate to experiment with different indexes.
The optimizer reliably chooses the most effective index in the majority of cases. The overall index design strategy should be to provide a good selection of indexes to the optimizer, and trust it to make the right decision. This reduces analysis time and gives good performance over a wide variety of situations.
The following are index design recommendations:
Each column listed in the WHERE clause or in the JOIN portions of the FROM clause is a possible candidate for an index. If you have too many queries to examine, pick a representative set or just the slow ones. If your development tool transparently generates SQL code, this is more difficult. Many of these tools allow the logging of the generated SQL syntax to a file or screen for debugging purposes. You may want to find out from the tool vendor if such a feature is available or use SQL Trace to capture the SQL syntax on the server side.
Narrow indexes are often more effective than multicolumn, compound indexes. Narrow indexes have more rows per page and fewer index levels, which improves performance. The SQL Server optimizer maintains distribution statistics on all prefix sets of a compound index. Therefore, if the first column or columns of a compound index have poor selectivity, the optimizer may not use the index.
The optimizer can rapidly and effectively analyze hundreds or even thousands of index and join possibilities. Having a greater number of narrow indexes provides the optimizer with more possibilities to choose from, which usually helps performance. Having a lesser number of wide, multicolumn indexes provides the optimizer with fewer possibilities to choose from, which may hurt performance.
It is often best not to adopt a strategy of emphasizing a fully covered query. It is true that if all columns in your SELECT statement are covered by a nonclustered index, the optimizer can recognize this and provide excellent performance. However, this often results in excessively wide indexes and stakes too much on the hope that the optimizer will use this strategy. Usually, you should use more numerous narrow indexes, which often provide better performance over a wider range of queries.
You should not have more indexes than are necessary to achieve adequate read performance because of the overhead involved in updating those indexes. However, even most update-oriented operations require far more reading than writing. Therefore, do not hesitate to try a new index if you think it will help; you can always drop it later.
Appropriate use of clustered indexes can tremendously increase performance. Even UPDATE and DELETE operations are often accelerated by clustered indexes, since these operations require much reading. You can only have a single clustered index per table, so use this index wisely. Queries that return numerous rows or queries involving a range of values are good candidates for acceleration by a clustered index.
Examples:
SELECT * FROM PHONEBOOK
WHERE LASTNAME='SMITH'
Or
SELECT * FROM MEMBERTABLE
WHERE MEMBER_NO > 5000
AND MEMBER_NO < 6000
By contrast, the LASTNAME or MEMBER_NO columns mentioned above are probably not good candidates for a nonclustered index if this type of query were common. Try to use nonclustered indexes on columns where few rows are returned.
This helps you decide what column is a candidate for a clustered index, nonclustered index, or no index.
The following example query allows you to examine column uniqueness:
SELECT COUNT (DISTINCT COLNAME), COUNT(*)
FROM TABLENAME
This returns the number of unique values in the column and the total number of rows. Compare these values. On a 10,000-row table, 5,000 unique values would make the column a good candidate for a nonclustered index. On the same table, 20 unique values would better suit a clustered index. Three unique values should not be indexed at all. These are only examples, not hard-and-fast rules. Remember to place the indexes on the individual columns listed in the WHERE and FROM clauses of the queries.
A long-running query often is caused by indexing a column with few unique values or performing a JOIN on such a column. This is a fundamental problem with the data and query itself and cannot usually be resolved without identifying this situation. For example, a physical telephone directory sorted alphabetically on last name will not expedite looking up a person if all people in the city are named just SMITH or JONES. In addition to the above query, which gives a single figure for column uniqueness, you can use a GROUP BY query to see the data distribution of the indexed key values. This provides a higher resolution picture of the data, and a better perspective for how the optimizer views the data.
The following example query allows you to examine data distribution of indexed key values, assuming a two-column key on COL1 and COL2.
SELECT COL1, COL2, COUNT(*)
FROM TABLENAME
GROUP BY COL1, COL2
This returns one row for each key value along with a count of the instances of each value. To reduce the number of rows returned, it may be helpful to exclude some with a HAVING clause. For example, to exclude all rows that have a unique key, use the following HAVING clause:
HAVING COUNT(*) > 1
The number of rows returned in a query is also an important factor in index selection. The optimizer considers a nonclustered index to cost at least one page I/O per returned row. At this rate, it quickly becomes more efficient to scan the entire table. This is another reason to restrict the size of the result set or to locate a large result set with a clustered index.
Do not always equate index usage with good performance, and vice versa. If using an index always produced the best performance, the optimizer's job would be simple�always use any available index. In reality, incorrect choice of indexed retrieval can result in poor performance. Therefore, the optimizer's task is to select indexed retrieval where it will help performance and avoid indexed retrieval where it will hurt performance.