In the performance tuning posts (db2 performance issues and tuning part 1, part 2) , we have discussed how
indexing can help tuning database performance to a great level but, today we
are going to discuss about the proper design of indexes.
As indexes provide easy access to the data rows in a
table just like a book’s index, it speeds up the query execution process and
thus helps to tune the database performance.
But, what it does actually and how it does?
An example
will help us to understand it in a better way.
Say we have a tblOrder table with 100 rows and three simple
columns: OrderId, ItemName and ItemPrice.
We will try to find out the orders within a specific price
range:
The query would look like:
SELECT
OrderId, ItemName, ItemPriceFROM tblOrder
WHERE ItemPrice>10 AND ItemPrice<20
Currently, we assume that this table does not have any
index. So, to execute the above query, the processor starts searching each and
every row one by one to get the required data.
The below diagram depicts the actual processing done to
do so. It definitely becomes an overhead to search all the 100 rows only to get
a few set of records.
Now, let us add an index on the ItemPrice data column of this table.
The query will look like:
CREATE
INDEX IX_tblOrder_ItemPrice
ON tblOrder (ItemPrice
ASC)
It acts just like the index in a book. Adding an index to
this column means, it stores the copy of the column’s data and a reference to
the row where the actual data is stored. These index entries are also sorted in
ascending order by SQL.
The pictorial representation will be somewhat similar to
below:
Now, when we execute the above query again with an index
added to the table, it will first find the value in the index and then it will take
the reference stored in that index to point the actual data row.
Thus index makes the query processing fast and wipes out
the overhead of scanning each and every row in a table.
Index Structure
An index is a set of pages or index nodes which are
organized in a structure called B-tree
structure. It is a hierarchical structure with the root node placed at the top and the leaf nodes at the bottom of the hierarchy as demonstrated in the
below figure.
When a query is fired against an indexed column, the query
processor starts scanning from the top which is the root node. Slowly it
navigates down the intermediate levels or the next levels. The more we go down
through the intermediate level, the more granular division comes into picture.
The processor continues scanning the index nodes until the leaf node is
arrived.
Let us take an example of searching the value of 87 in an
indexed column. As it is said earlier, the processor starts scanning from the
root node and then gradually it goes down through the intermediate levels till
the leaf nodes are reached. Here, we are trying to find the value of 87 which
leads to the 1st page of the 1st intermediate level, i.e.
the page with 1-100 values. Now it further goes down to get the exact value and
it understands that the value can be there in the 2nd page of the 2nd
intermediate level which is 51-100. As 51-100 is also not the leaf node, it
searches down further and finally reaches to the leaf node and points to the
page 76-100 which is the 4th page of the leaf node level.
The leaf node may contain the full data row or the reference
to that row. It solely depends on the type of index that we declare such as,
clustered index or non-clustered index.
We will discuss the types of indexes in our next post…Stay
tuned…