Av rating:
Total votes: 130
Total comments: 24


Robert Sheldon
SQL Server Index Basics
25 November 2008

Given the fundamental importance of indexes in databases, it always comes as a surprise how often the proper design of indexes is neglected. It often turns out that the programmer understands detail, but not the broad picture of what indexes do. Bob Sheldon comes to the rescue with a simple guide that serves either to remind or educate us all!

One of the most important routes to high performance in a SQL Server database is the index. Indexes speed up the querying process by providing swift access to rows in the data tables, similarly to the way a book’s index helps you find information quickly within that book. In this article, I provide an overview of SQL Server indexes and explain how they’re defined within a database and how they can make the querying process faster. Most of this information applies to indexes in both SQL Server 2005 and 2008; the basic structure has changed little from one version to the next. In fact, much of the information also applies to SQL Server 2000. This does not mean there haven’t been changes. New functionality has been added with each successive version; however, the underlying structures have remained relatively the same. So for the sake of brevity, I stick with 2005 and 2008 and point out where there are differences in those two versions.

Index Structures

Indexes are created on columns in tables or views. The index provides a fast way to look up data based on the values within those columns. For example, if you create an index on the primary key and then search for a row of data based on one of the primary key values, SQL Server first finds that value in the index, and then uses the index to quickly locate the entire row of data. Without the index, a table scan would have to be performed in order to locate the row, which can have a significant effect on performance.

You can create indexes on most columns in a table or a view. The exceptions are primarily those columns configured with large object (LOB) data types, such as image, text, and varchar(max). You can also create indexes on XML columns, but those indexes are slightly different from the basic index and are beyond the scope of this article. Instead, I'll focus on those indexes that are implemented most commonly in a SQL Server database.

An index is made up of a set of pages (index nodes) that are organized in a B-tree structure. This structure is hierarchical in nature, with the root node at the top of the hierarchy and the leaf nodes at the bottom, as shown in Figure 1.

Figure 1: B-tree structure of a SQL Server index

When a query is issued against an indexed column, the query engine starts at the root node and navigates down through the intermediate nodes, with each layer of the intermediate level more granular than the one above. The query engine continues down through the index nodes until it reaches the leaf node. For example, if you’re searching for the value 123 in an indexed column, the query engine would first look in the root level to determine which page to reference in the top intermediate level. In this example, the first page points the values 1-100, and the second page, the values 101-200, so the query engine would go to the second page on that level. The query engine would then determine that it must go to the third page at the next intermediate level. From there, the query engine would navigate to the leaf node for value 123. The leaf node will contain either the entire row of data or a pointer to that row, depending on whether the index is clustered or nonclustered.

Clustered Indexes

A clustered index stores the actual data rows at the leaf level of the index. Returning to the example above, that would mean that the entire row of data associated with the primary key value of 123 would be stored in that leaf node. An important characteristic of the clustered index is that the indexed values are sorted in either ascending or descending order. As a result, there can be only one clustered index on a table or view. In addition, data in a table is sorted only if a clustered index has been defined on a table.

Note: A table that has a clustered index is referred to as a clustered table. A table that has no clustered index is referred to as a heap.

Nonclustered Indexes

Unlike a clustered indexed, the leaf nodes of a nonclustered index contain only the values from the indexed columns and row locators that point to the actual data rows, rather than contain the data rows themselves. This means that the query engine must take an additional step in order to locate the actual data.

A row locator’s structure depends on whether it points to a clustered table or to a heap. If referencing a clustered table, the row locator points to the clustered index, using the value from the clustered index to navigate to the correct data row. If referencing a heap, the row locator points to the actual data row.

Nonclustered indexes cannot be sorted like clustered indexes; however, you can create more than one nonclustered index per table or view. SQL Server 2005 supports up to 249 nonclustered indexes, and SQL Server 2008 support up to 999. This certainly doesn’t mean you should create that many indexes. Indexes can both help and hinder performance, as I explain later in the article.

In addition to being able to create multiple nonclustered indexes on a table or view, you can also add included columns to your index. This means that you can store at the leaf level not only the values from the indexed column, but also the values from non-indexed columns. This strategy allows you to get around some of the limitations on indexes. For example, you can include non-indexed columns in order to exceed the size limit of indexed columns (900 bytes in most cases).

Index Types

In addition to an index being clustered or nonclustered, it can be configured in other ways:

  • Composite index: An index that contains more than one column. In both SQL Server 2005 and 2008, you can include up to 16 columns in an index, as long as the index doesn’t exceed the 900-byte limit. Both clustered and nonclustered indexes can be composite indexes.
  • Unique Index: An index that ensures the uniqueness of each value in the indexed column. If the index is a composite, the uniqueness is enforced across the columns as a whole, not on the individual columns. For example, if you were to create an index on the FirstName and LastName columns in a table, the names together must be unique, but the individual names can be duplicated.

A unique index is automatically created when you define a primary key or unique constraint:

    • Primary key: When you define a primary key constraint on one or more columns, SQL Server automatically creates a unique, clustered index if a clustered index does not already exist on the table or view. However, you can override the default behavior and define a unique, nonclustered index on the primary key.
    • Unique: When you define a unique constraint, SQL Server automatically creates a unique, nonclustered index. You can specify that a unique clustered index be created if a clustered index does not already exist on the table.
  • Covering index: A type of index that includes all the columns that are needed to process a particular query. For example, your query might retrieve the FirstName and LastName columns from a table, based on a value in the ContactID column. You can create a covering index that includes all three columns.

Index Design

As beneficial as indexes can be, they must be designed carefully. Because they can take up significant disk space, you don’t want to implement more indexes than necessary. In addition, indexes are automatically updated when the data rows themselves are updated, which can lead to additional overhead and can affect performance. As a result, index design should take into account a number of considerations.

Database

As mentioned above, indexes can enhance performance because they can provide a quick way for the query engine to find data. However, you must also take into account whether and how much you’re going to be inserting, updating, and deleting data. When you modify data, the indexes must also be modified to reflect the changed data, which can significantly affect performance. You should consider the following guidelines when planning your indexing strategy:

  • For tables that are heavily updated, use as few columns as possible in the index, and don’t over-index the tables.
  • If a table contains a lot of data but data modifications are low, use as many indexes as necessary to improve query performance. However, use indexes judiciously on small tables because the query engine might take longer to navigate the index than to perform a table scan.
  • For clustered indexes, try to keep the length of the indexed columns as short as possible. Ideally, try to implement your clustered indexes on unique columns that do not permit null values. This is why the primary key is often used for the table’s clustered index, although query considerations should also be taken into account when determining which columns should participate in the clustered index.
  • The uniqueness of values in a column affects index performance. In general, the more duplicate values you have in a column, the more poorly the index performs. On the other hand, the more unique each value, the better the performance. When possible, implement unique indexes.
  • For composite indexes, take into consideration the order of the columns in the index definition. Columns that will be used in comparison expressions in the WHERE clause (such as WHERE FirstName = 'Charlie') should be listed first. Subsequent columns should be listed based on the uniqueness of their values, with the most unique listed first.
  • You can also index computed columns if they meet certain requirements. For example, the expression used to generate the values must be deterministic (which means it always returns the same result for a specified set of inputs). For more details about indexing computed columns, see the topic “Creating Indexes on Computed Columns” in SQL Server Books Online.

Queries

Another consideration when setting up indexes is how the database will be queried. As mentioned above, you must take into account the frequency of data modifications. In addition, you should consider the following guidelines:

  • Try to insert or modify as many rows as possible in a single statement, rather than using multiple queries.
  • Create nonclustered indexes on columns used frequently in your statement’s predicates and join conditions.
  • Consider indexing columns used in exact-match queries.

Index Basics

In this article, I’ve tried to give you a basic overview of indexing in SQL Server and provide some of the guidelines that should be considered when implementing indexes. This by no means is a complete picture of SQL Server indexing. The design and implementation of indexes are an important component of any SQL Server database design, not only in terms of what should be indexed, but where those indexes should be stored, how they should be partitioned, how data will be queried, and other important considerations. In addition, there are index types that I have not discussed, such as XML indexes as well as the filtered and spatial indexes supported in SQL Server 2008. This article, then, should be seen as a starting point, a way to familiarize yourself with the fundamental concepts of indexing. In the meantime, be sure to check out SQL Server Books Online for more information about the indexes described here as well as the other types of indexes.



This article has been viewed 14848 times.
Robert Sheldon

Author profile: Robert Sheldon

After being dropped 35 feet from a helicopter and spending the next year recovering, Robert Sheldon left the Colorado Rockies and emergency rescue work to pursue safer and less painful interests—thus his entry into the world of technology. He is now a technical consultant and the author of numerous books, articles, and training material related to Microsoft Windows, various relational database management systems, and business intelligence design and implementation. He has also written news stories, feature articles, restaurant reviews, legal summaries, and the novel 'Dancing the River Lightly'. You can find more information at http://www.rhsheldon.com.

Search for other articles by Robert Sheldon

Rate this article:   Avg rating: from a total of 130 votes.


Poor

OK

Good

Great

Must read
 
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: Multiple Indexes
Posted by: DaveB (not signed in)
Posted on: Wednesday, November 26, 2008 at 3:40 AM
Message: One question I have on indexes which I am struggling to get a definitive answer on is, is SQL Server clever enough to use more than one index on the same table when reading?

For example, imagine I have a sales order table. Some queries list all orders for a customer. Other queries list all orders for a date.
To make these queries faster I build two indexes, one on each column respectives.

Now if a query is run which asks for all orders for a specific customer and specific date, will SQL Server use both indexes or will the query optimizer choose the best single index to limit the data and then scan the pages returned for the records it needs, ignoring the second index?

Subject: Multiple Indexes
Posted by: GunterKDR (not signed in)
Posted on: Wednesday, November 26, 2008 at 7:01 AM
Message: SQL2005 will consider using more than one. Ms. Tripp has some very useful presentations and seminars covering this topic.



Subject: Multiple Indexes
Posted by: MilesM (view profile)
Posted on: Wednesday, November 26, 2008 at 11:48 AM
Message: The optimizer in SS2K5 will consider doing an index intersection or index union depending on the situation. If one of the indexes is a clustered index, it may just choose that over the index intersection.
A good reference book on indexes is "Inside Microsoft SQL Server 2005 Query Tuning and Optimization" by Kalen Delaney and others. It goes into great details on query execution plans etc.

Subject: composite index
Posted by: Anonymous (not signed in)
Posted on: Thursday, November 27, 2008 at 8:33 PM
Message: I always thought that the order of columns in composite index makes no difference. I read about it in the past more than once. It was true for SQL Server 6.5/7/2000.
Any comment on this?

Subject: Ms. Tripp
Posted by: jlangdon (not signed in)
Posted on: Thursday, November 27, 2008 at 9:32 PM
Message: I read Paul Randall and Kimberly Tripp all the time. Can you provide a link to some of Ms. Tripp's sessions on this subject? Thank you.

Subject: Re: composite index
Posted by: GilaMonster (view profile)
Posted on: Friday, November 28, 2008 at 3:52 PM
Message: "I always thought that the order of columns in composite index makes no difference."

It makes a large difference. Consider a table with 4 columns (A, B, C and D) and a nonclustered index on A, B. That index is seekable for queries of the following forms

WHERE A = @A and B = @B
WHERE A = @A and B > @B
WHERE A = @A
WHERE A > @A

It is not seekable for a query
WHERE B = @B

The latter would be like asking someone to use the phone book (which can be imagined as an index on surname, firstname, address) to find everyone named "Matthew" who lives in Chicago.
It's doable, but it requires reading the entire book (scanning).
Finding everyone named "Matthew Brown" is significantly easier.

Subject: RE: composite index
Posted by: Anonymous (not signed in)
Posted on: Monday, December 01, 2008 at 7:03 AM
Message: @GilaMonster

- is it better to have only one nonclustered index on A, B or is it equal to have a nonclustered index on A and a second nonclustered index on B when using those statments:

WHERE A = @A and B = @B
WHERE A = @A and B > @B
WHERE A = @A
WHERE A > @A

Subject: What constitutes a heavily updated table
Posted by: ChasF (not signed in)
Posted on: Monday, December 01, 2008 at 8:27 AM
Message: Great article and very useful for an absolute beginner like me.

One small question I have is to do with Robert's statement:

"For tables that are heavily updated, use as few columns as possible in the index, and don’t over-index the tables".

Where a table has new rows added regularly, but those rows do not subesequently get changed, does this constitute a heavily updated table? I'm hoping it doesn't.


Subject: Included Columns Ques.
Posted by: VicF (not signed in)
Posted on: Monday, December 01, 2008 at 9:07 AM
Message: Good Article. I am sending it to others. One question....

How do you include a non indexed column at the leaf level?

Subject: Included Columns Ques.
Posted by: VicF (not signed in)
Posted on: Monday, December 01, 2008 at 9:09 AM
Message: Good Article. I am sending it to others. One question....

How do you include a non indexed column at the leaf level?

Subject: Excellent Article and Even better questions
Posted by: Mross01 (not signed in)
Posted on: Monday, December 01, 2008 at 9:38 AM
Message: I have struggled explaining all the different possibilities to my SQL Developers on how to use idexes, this article makes it short and easy to understand. Now I have somewhere to point.

The questions also have been excellent. Glad to see a blog without all the rift raft that usually goes on.

I am especially interested in the answer to the question, "How do you include a non indexed column at the leaf level?".

Inserts do matter with indexes, any time up update or insert a row the indexes are also inserted or updated. So if a lot of inserts are done, indexes can hurt performance. There is no real world measurement (cut off points) of what constitutes when indexes are good or bad for heavily updated tables. Sometimes you just have to try and balance the performance on both sides, reading and updating. Sometimes its the user's perception or the window for inserts that matters. On bulk inserts, sometimes (sometimes), it's better to drop the indexes and then rebuild them after the load is done.

Subject: Re: Excellent Article and Even better questions
Posted by: MMCXII (not signed in)
Posted on: Monday, December 01, 2008 at 11:44 AM
Message: Good point made by Mross01.

I'd also add that if you have a table that's getting a lot of inserts and updates, but which you need to report against as well, it is sometimes a good idea to minimize or eliminate indexes on that table, but create a duplicate table just for reporting. You can index the heck out of the duplicate table, and just truncate and reload it periodically from the original.

This assumes you don't need up-to-the-second reporting on the table, but that is rarely a real business need. In most cases, you can get away with data that is slightly aged.

Subject: small tables
Posted by: Anonymous (not signed in)
Posted on: Monday, December 01, 2008 at 1:59 PM
Message: "...use indexes judiciously on small tables because the query engine might take longer to navigate the index than to perform a table scan..."

What rule of thumb can I use to tell me what a small table is? Are the number and size of the columns to be taken into consideration as well as the number of rows?

Regards

GPO

Subject: LARGE Tables
Posted by: Krishna (view profile)
Posted on: Monday, December 01, 2008 at 2:22 PM
Message: We have a Database(SQl 2000) which is updated every hour (150 Entries per Hour. Each entry has a 120 columsn, 5 DateTime and the rest are Numerical) using a ETL package. Now we have a database which dates back to 2 years. Over the years, the performanace has been slowing down. One reason, I can think of is due to the vast data. I used the SQL 2000 index tuning, but the results were not pleasing.

Are there any other Indexing Machisms or Other tuning procedures that can be done to improve the performance?

Any sort of advice would be apreciated.

Email: krishnat@gwsolutions.com

Subject: LARGE DATABASE
Posted by: krishnat (view profile)
Posted on: Monday, December 01, 2008 at 2:25 PM
Message: We have a Database(SQl 2000) which is updated every hour (150 Entries per Hour. Each entry has a 120 columsn, 5 DateTime and the rest are Numerical) using a ETL package. Now we have a database which dates back to 2 years. Over the years, the performanace has been slowing down. One reason, I can think of is due to the vast data. I used the SQL 2000 index tuning, but the results were not pleasing.

Are there any other Indexing Machisms or Other tuning procedures that can be done to improve the performance?

Any sort of advice would be apreciated.

Email: krishnat@gwsolutions.com

Subject: What's the difference between a Composite Index and a Covering Index?
Posted by: Anonymous (not signed in)
Posted on: Tuesday, December 02, 2008 at 7:19 AM
Message: What's the difference between a Composite Index and a Covering Index.
From the article makes them sound identical.

Subject: Re: What's the difference between a Composite Index and a Covering Index?
Posted by: Anonymous (not signed in)
Posted on: Tuesday, December 02, 2008 at 7:59 AM
Message: A "composite index" is simply an index on more than one column. As stated above, a "covering index," by definition, will contain all of the columns necessary for the optimizer to execute the query; all data will be returned from the index and the clustered index (or heap) will not be read.

Unless you are SELECTing only one column, a "covering index" will always be a "composite index," but a "composite index" is not always (and most likely will not be) a "covering index."

Hope this helps!

Subject: RE: composite index
Posted by: Anonymous (not signed in)
Posted on: Tuesday, December 02, 2008 at 8:17 AM
Message: The order of columns in a "composite index" will also dictate whether the optimizer does an index scan vs. an index seek. Most of the time, a seek is better than a scan. For example, you have a "composite index" on col1 and col2, in that order on table t1 and you SELECT col1, col2 FROM t1 WHERE col2 = x, the optimizer will perform an index scan because the first column in the index was not referenced in the WHERE clause; the optimizer can't traverse the b-tree structure because it's missing the data necessary to traverse the pages, top-down.

An index scan is just like a table scan except that it's the index's leaf level that's being scanned and not the table's leaf level (clustered index or heap).

If you executed SELECT col2 FROM t1 WHERE col1 = x, the the optimizer would use an index seek because the first column in the index is the one the WHERE clause referenced. The optimizer now has enough information to traverse the b-tree structure from top to bottom.

It is also my understanding that statistics are only computed on the first column of any index and that can affect whether or not the optimizer chooses to utilize an index. So, in closing, the order of columns in a "compostite index" definition is very important.

Subject: RE: composite index
Posted by: Anonymous (not signed in)
Posted on: Tuesday, December 02, 2008 at 8:31 AM
Message: The order of columns in a "composite index" will also dictate whether the optimizer does an index scan vs. an index seek. Most of the time, a seek is better than a scan. For example, you have a "composite index" on col1 and col2, in that order on table t1 and you SELECT col1, col2 FROM t1 WHERE col2 = x, the optimizer will perform an index scan because the first column in the index was not referenced in the WHERE clause; the optimizer can't traverse the b-tree structure because it's missing the data necessary to traverse the pages, top-down.

An index scan is just like a table scan except that it's the index's leaf level that's being scanned and not the table's leaf level (clustered index or heap).

If you executed SELECT col2 FROM t1 WHERE col1 = x, the the optimizer would use an index seek because the first column in the index is the one the WHERE clause referenced. The optimizer now has enough information to traverse the b-tree structure from top to bottom.

It is also my understanding that statistics are only computed on the first column of any index and that can affect whether or not the optimizer chooses to utilize an index. So, in closing, the order of columns in a "compostite index" definition is very important.

Subject: Included Columns and Covering Index
Posted by: Diana Dee (not signed in)
Posted on: Tuesday, December 02, 2008 at 12:18 PM
Message: I have only one criticism of the article. Sheldon says:

" ... you can also add included columns to your index. This means that you can store at the leaf level not only the values from the indexed column, but also the values from non-indexed columns. This strategy allows you to get around some of the limitations on indexes. For example, you can include non-indexed columns in order to exceed the size limit of indexed columns (900 bytes in most cases)."

Later on, he defines Covering Index.

Now, if you think about it, an included column occurs only in the leaf level, so it is not useful for looking up a value. So why would you include a column at the leaf level only? Surely the purpose cannot be to "get around" a limitation on index length. The purpose is to create a covering index without lengthening the index key.

):-D

Subject: Large Database
Posted by: Dan (view profile)
Posted on: Tuesday, December 02, 2008 at 1:13 PM
Message: Krishna,

Indexes must be maintained. With 2000 lookup DBCC SHOWCONTIG in bol with 2005 and 2008 lookup dm_db_index_physical_stats. Be careful some maintenance operations are offline and will lock tables.

Dan

Subject: clarification for Diana Dee
Posted by: Karl (not signed in)
Posted on: Wednesday, December 03, 2008 at 9:28 AM
Message: Included columns are still useful for creating covering indexes. I'm not positive how SQL Server creates the interior nodes of its indexes but I know that other systems only store the key and a pointer to a node further down the tree. The diagram in the article suggests that this is also true for SQL Server. Therefore, the index must be followed all the way to the leaf to find the information necessary to find the row in the clustered index (we should generally have a clustered index if we have non-clustered ones). Since included columns are in the leaf node, the query does not have to probe into the clustered index to find that information. You can see this in action by examining query plans. A query that can get all fields from an index will only process an index but a query that needs more fields will have a join to the clustered index. As mentioned in the article, the advantage of including the non-key fields only in the leaf nodes is that it is not taking up space in the intermediate nodes.

Subject: I think Index doesn't work
Posted by: Marjan (not signed in)
Posted on: Wednesday, December 10, 2008 at 12:12 PM
Message: I have a database with a table that have about 2500000 records and add records to it every minutes. I had have an index on two varchar columns. Now I have drop that index and create an index on two another columns. but it doesn't work. I redesign the database and import data, but again search speed doesn't changed.
What is the solution?

Subject: Anonymous commenting disabled
Posted by: Chris Massey (view profile)
Posted on: Wednesday, December 17, 2008 at 8:12 AM
Message: Anonymous commenting has had to be disabled due to spam. If you want to leave a comment you're either have to sign in or sign up. Sorry for any inconvenience.

 









Phil Factor
The Data Center that Exploded
 A while back, in a Simple-Talk editorial meeting, someone bet Phil that he couldn't come up with a Halloween story.... Read more...



 View the blog
SQL Server 2008: Performance Data Collector
 With Performance Data Collector in SQL Server 2008, you can now store performance data from a number of... Read more...

SQL Toolbelt 2008: Predominantly an Engineering Task
 The conversion of the Red-Gate tools to be compatible with SQL Server 2008 might not seem, on first... Read more...

Audit Crosschecks
 In this short article, the second of a 2-part series, William suggests a solution, using SQL Data... Read more...

SQL Response: The dim sum interview
 Richard Morris met David and Nigel of the SQL Response team, in a dim sum Restaurant in Cambridge. They... Read more...

XML Jumpstart Workbench
 In which Robyn and Phil decide that the best way of starting to learn XML is to jump in and take a ride... Read more...

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 Database design and implementation is the cornerstone of any data centric project (read 99.9% of... Read more...

SQL Server Full Text Search Language Features
 SQL Full-text Search (SQL FTS) is an optional component of SQL Server 7 and later, which allows fast... Read more...

Beginning SQL Server 2005 Reporting Services Part 2
 Continuing his in-depth tour of SQL Server 2005 Reporting Services, Steve Joubert demonstrates the most... Read more...

Executing SSIS Packages
 Nigel Rivett demonstrates how to execute all SSIS packages in a given folder using either an SSIS... Read more...

Over 150,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.

Join Simple Talk