Difference between table scan, index scan, and index seek in Database

Good understanding of Indexes is very important while working in a database and SQL and that's why you will find many questions based upon indexes on Programming Job interviews. One of such frequently asked SQL question is the real difference between table scan, index scan, an index seek? which one is faster and why? How does database chooses which scan or seek to use? and How you can optimize the performance of your SQL SELECT queries by using this knowledge. In general, there are only two ways in which your query engine retrieves the data, using a table scan or by using an index. Which method is used for your particular query depends upon what indexes are available in that table, what columns you are requesting in your query, the kind of joins you are doing, and the size of your tables.

If you have a clear understanding of how the index works and how SQL engine retrieves data from the disk then you can quickly identify performance problems and solve them. That's where most of the SQL developers, especially the Java application developer who write queries and design database lack.

Btw, if you are not familiar with what is an index and how to create and drop an index, then I suggest you to first go through The Complete SQL Bootcamp Become an expert at SQL!&nbs p;course to learn those basics.

In this article, we'll go through each three i.e. table scan, index scan, and index seek and try to understand how database process a particular query, hence basic understanding of database, SQL and index are required.

Table Scan
A table scan is a pretty straightforward process. When your query engine performs a table scan it starts from the physical beginning of the table and goes through every row in the table. If a row matches the criterion then it includes that into the result set.

You might have heard nasty things about table scans but in truth, it's the fastest way to retrieve data especially if your table is quite small. It starts being bad when your table starts growing. You can imagine doing a full table scan in a table with 4 million rows and full table scan in a table with just 100 rows.

In a small table, query engine can load all data in just one shot but in a large table, it's not possible, which means more IO and more time to process those data.

Normally, a full table scan is used when your query doesn't have a WHERE clause i.e. you want more or less every record from a table e.g. following query will use a full table scan:

SELECT * from Employee;

Btw, If your query is taking too long in a large table then most likely it using either table scan or index scan. You can see that by enabling execution plan e.g. by doing Ctrl + A in Microsoft SQL Server Management Studio.

Index Scan
If your table has a clustered index and you are firing a query which needs all or most of the rows i.e. query without WHERE or HAVING clause, then it uses an index scan. It works similar to the table scan, during the query optimization process, the query optimizer takes a look at the available index and chooses the best one, based on information provided in your joins and where clause, along with the statistical information database keeps.

Once the right index is chosen, SQL Query processor or engine navigates the tree structure to the point of data that matches your criteria and again extract only the records it needs. See SQL Performance Explained by Markus Winand to learn more about how indexes work in different databases.

 table scan, index scan, and index seek in SQL server

The main difference between a full table scan and an index scan is that because data is sorted in index tree, the query engine knows when it has reached the end of the current it is looking for. It can then send the query, or move on to the next range of data as necessary.

For example, the following query, same as above will use Index scan if you have a clustered index in your table:

SELECT * From Employee;

This is slightly faster than the table scan but considerably slower than an index seek which we'll see in next section.

Difference between table scan, index scan, and index seek in Database

Index Seek
When your search criterion matches an index well enough that the index can navigate directly to a particular point in your data, that's called an index seek. It is the fastest way to retrieve data in a database. The index seeks are also a great sign that your indexes are being properly used.

This happens when you specify a condition in WHERE clause e.g. searching an employee by id or name if you have a respective index.

For example, following query will use an index seek, you can also confirm that by checking the execution plan of this query when you run this on SQL server:

SELECT * from Employee where EmployeeId=3;

In this case, the Query Optimizer can use an index to directly go to the third employee and retrieve the data. If you look at the execution plan shown below, you can see that it uses an index seek using the index created on EmployeeId.

If you want to learn more about how SQL engine process request I suggest you go through SQL Tuning course from Udemy to learn more about SQL performance tuning.

Difference between table scan, index scan, and index seek in SQL

Based upon our understanding of indexes, you can now deduce following points to summarize the difference between table scan, index scan, and index seek in a database:

1) Table scan and an Index scan is used when you need to retrieve all data e.g. 90% to 100% while index seek is used when you need to retrieve data based upon some conditions e.g. 10% of data.

2) If your query doesn't have WHERE clause and your table doesn't have a clustered index then a full table scan is used, if it does have a clustered index then index scan is used.

3) index scan is faster than a table scan because they look at sorted data and query optimizer know when to stop and look for another range.

4) index seek is the fastest way to retrieve data and it comes into picture when your search criterion is very specific. Normally, when you have WHERE clause in your query and you are using a column which also has an index, then index seek is used to retrieve data as shown in the following query:

select * from Employee where Id= 3;

You can confirm that by actually looking at the execution plan for your query.

In MSSQL management studio, you can see the execution plan by clicking Ctrl + A and then running your query.

That's all about the difference between table scan, index scan and index seek in a database. As I told, there are only two ways to retrieve data in a database either by using table scan or by using an index. The later is faster in case of large tables. The choice of index depends upon multiple things e.g. the WHERE clause and joins in your table, the columns you are requesting, the size of tables etc.

If you feel that your query is slow, you must check the execution plan to confirm whether its using index seeks or index scan or table scan. Then you can optimize your query by introducing right index or tunning your query.

Further Reading
Advanced SQL Server Performance Tuning
SQL Performance Explained by Markus Winand
SQL Server: Why Physical Database Design Matters
A Better way to write SQL queries
Things to remember while running SQL queries on production database
A Developer's list of must-read SQL books

Thanks for reading this article so far. If you like this article then please share with your friends and colleagues. If you have any question or feedback then please drop a comment.

No comments:

Post a Comment