In the performance tuning posts (db2 performance issues and tuning part 1, part 2) , we have discussed how indexing can help tuning database performance to a great level but, today we are going to discuss about the proper design of indexes.
As indexes provide easy access to the data rows in a table just like a book’s index, it speeds up the query execution process and thus helps to tune the database performance.
But, what it does actually and how it does?
An example will help us to understand it in a better way.
Say we have a tblOrder table with 100 rows and three simple columns: OrderId, ItemName and ItemPrice.
We will try to find out the orders within a specific price range:
The query would look like:SELECT OrderId, ItemName, ItemPrice
WHERE ItemPrice>10 AND ItemPrice<20
Currently, we assume that this table does not have any index. So, to execute the above query, the processor starts searching each and every row one by one to get the required data.
The below diagram depicts the actual processing done to do so. It definitely becomes an overhead to search all the 100 rows only to get a few set of records.
Now, let us add an index on the ItemPrice data column of this table.
The query will look like:
CREATE INDEX IX_tblOrder_ItemPrice
ON tblOrder (ItemPrice ASC)
It acts just like the index in a book. Adding an index to this column means, it stores the copy of the column’s data and a reference to the row where the actual data is stored. These index entries are also sorted in ascending order by SQL.
The pictorial representation will be somewhat similar to below:
Now, when we execute the above query again with an index added to the table, it will first find the value in the index and then it will take the reference stored in that index to point the actual data row.
Thus index makes the query processing fast and wipes out the overhead of scanning each and every row in a table.
An index is a set of pages or index nodes which are organized in a structure called B-tree structure. It is a hierarchical structure with the root node placed at the top and the leaf nodes at the bottom of the hierarchy as demonstrated in the below figure.
When a query is fired against an indexed column, the query processor starts scanning from the top which is the root node. Slowly it navigates down the intermediate levels or the next levels. The more we go down through the intermediate level, the more granular division comes into picture. The processor continues scanning the index nodes until the leaf node is arrived.
Let us take an example of searching the value of 87 in an indexed column. As it is said earlier, the processor starts scanning from the root node and then gradually it goes down through the intermediate levels till the leaf nodes are reached. Here, we are trying to find the value of 87 which leads to the 1st page of the 1st intermediate level, i.e. the page with 1-100 values. Now it further goes down to get the exact value and it understands that the value can be there in the 2nd page of the 2nd intermediate level which is 51-100. As 51-100 is also not the leaf node, it searches down further and finally reaches to the leaf node and points to the page 76-100 which is the 4th page of the leaf node level.
The leaf node may contain the full data row or the reference to that row. It solely depends on the type of index that we declare such as, clustered index or non-clustered index.
We will discuss the types of indexes in our next post…Stay tuned…