Saturday, 21 December 2013

SQL Server indexing basics – a bit of revision

This post really just for me. You may get something from it but don’t expect very much! It’s been a while since I visited SQL Server indexing fundamentals so I thought I’d do a bit of revision and this post is really just my notes.

Pages

“The fundamental unit of data storage in SQL Server is the page. The disk space allocated to a data file (.mdf or .ndf) in a database is logically divided into pages numbered contiguously from 0 to n. Disk I/O operations are performed at the page level. That is, SQL Server reads or writes whole data pages.” [1]

  • The page size is 8 KB
  • This means databases have 128 pages per megabyte
  • Each page begins with a 96-byte header including:
    • Page number
    • Page type
    • Amount of free space on the page
    • Allocation unit ID of the object that owns the page

 

http://i.technet.microsoft.com/dynimg/IC147464.gif

 

Extents

“Extents are the basic unit in which space is managed. An extent is eight physically contiguous pages, or 64 KB. This means SQL Server databases have 16 extents per megabyte.” [1]

 

Heaps

“A heap is a table without a clustered index.” [2]

“If a table is a heap and does not have any nonclustered indexes, then the entire table must be examined (a table scan) to find any row.” [2]

  • Do not use a heap when
    • the data is frequently returned in a sorted order
    • the data is frequently grouped together
    • ranges of data are frequently queried from the table
    • there are no nonclustered indexes and the table is large

 

Indexes

“An index is an on-disk structure associated with a table or view that speeds retrieval of rows from the table or view. An index contains keys built from one or more columns in the table or view. These keys are stored in a structure (B-tree) that enables SQL Server to find the row or rows associated with the key values quickly and efficiently.” [3]

 

Clustered Indexes

  • Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be sorted in only one order. [3]

  • The only time the data rows in a table are stored in sorted order is when the table contains a clustered index. When a table has a clustered index, the table is called a clustered table. If a table has no clustered index, its data rows are stored in an unordered structure called a heap. [3]

 

  • Clustered indexes
    • have a row in sys.partitions (with index_id = 1)
    • determine the physical order of data
  • Consider using when
    • There are a large number of distinct values in a column
    • On columns that are
      • frequently accessed
      • frequently searched for a range of values
    • Queries return very large result sets

 

Nonclustered indexes

  • Nonclustered indexes
    • do not affect the physical order of the data rows
    • have a structure separate from the data rows
    • contain the nonclustered index key values and each key value entry has a pointer to the data row that contains the key value
  • The pointer from an index row in a nonclustered index to a data row is called a row locator

 

Primary keys

  • A primary key automatically creates a clustered index, except when
    • a nonclustered primary key has been explicitly specified
    • a clustered index already exists

“Indexes are automatically created when PRIMARY KEY and UNIQUE constraints are defined on table columns. For example, when you create a table and identify a particular column to be the primary key, the Database Engine automatically creates a PRIMARY KEY constraint and index on that column.” [3]

 

Unique indexes (unique constraints)

  • A unique index is created when you create PRIMARY KEY or UNIQUE constraint. [4]
  • You can create a unique index independent of a constraint or when you create an indexed view

 

Filtered indexes

“A filtered index is an optimized nonclustered index especially suited to cover queries that select from a well-defined subset of data. It uses a filter predicate to index a portion of rows in the table. A well-designed filtered index can improve query performance as well as reduce index maintenance and storage costs compared with full-table indexes.” [5]

  • Use a filtered index when a well defined subset of results are part of a select statement.

 

Index Maintenance

Reorganise

  • Physically reorganises the leaf nodes of the index only

Rebuild

  • Drops the existing index and recreates it

Tips

  • Avoid DBCC SHRINKDB because it increases index fragmentation
  • Put the clustered index on the column that is distinct and increasing in value to avoid fragmentation
  • Check index fragmentation and then
    • If fragmentation is >5% <= 30% then reorganise the index
    • If fragmentation >30% then rebuild the index [6]

 

Beware

  • Too many indexes on a table can reduce performance
    • Execution plan can be less efficient
    • Queries can be less efficient
    • Queries may use the wrong – less efficient – index
  • Duplicate indexes offer no advantages and can reduce performance during inserts, updates and deletions
  • Unused indexes may also reduce performance during inserts, updates and deletions

 

References

[1] Understanding Pages and Extents – SQL 2008 documentation

[2] Heaps (Tables without Clustered Indexes) – SQL 2012 documentation

[3] Clustered and Nonclustered Indexes Described – SQL Server 2012 documentation

[4] Create Unique Indexes - SQL Server 2012 documentation

[5] Create Filtered Indexes - SQL Server 2012 documentation

[6] Reorganize and Rebuild Indexes - SQL Server 2012 documentation