10 tips about SQL Server that every developer should know
Working as DBA to a corporative environment for over 5 years, I have seen things that even the least experienced SQL Server developer would believe. For example, every developer who programs for SQL Server knows, or at least should know, that one of the main requirements to assure the good performance of his query is to analyze its execution plan and to assure that it is making adequate use of the table’s indexes.
However, what I have evidenced in my day-to-day is that still many are the developers that so much as worry about the table’s indexes. What one notices, is that at least at first, the developer is more worried about making his query work and in delivering the data to the user.
As result, in medium term what you have is high waste of the server’s resources and the so hated delay in the application. It is clear that there exist many other points that lower an application’s performance as, for example: outdated data access statistics, connections’ blocks (most of the times, due to the lack of indexes), excessive use of cursors etc.
But you can be certain: the bad use of index and even their absence is the greatest cause of SQL Server applications’ performance problems. In this article I will present 10 important tips that every developer must know when working with the SQL Server.
Tips on how to analyze the execution plan, methods for the substitution of cursors, use of sub-queries, use of indexed column in the where clause, at end, tips that certainly will help you to gain a greater benefit from the SQL Server.
Well, given this small introduction, let us move on to what matters.
1. I always analyze the queries execution plan.
As I have said previously, the execution plan analysis is one of the main requirements to assure the query’s good performance. The execution plan describes the path used by the SQL Server optimizer to arrive at the requested data and shows which operations had been executed during the query processing. In the execution plan, each operation is represented by an icon and a set of arrows that unite these icons. This makes it easy to understand a query‘s performance characteristics.
To see a query’s execution plan, enter the SQL Server 2000’s Query Analyzer or the SQL Server 2005’s Query Editor, write the query and type Ctrl+L or select the menu option Query> Display Estimated Execution Plan in the menus bar.
You will notice that the query will not be actually executed, that will only occur before the creation of the execution plan. When analyzing an execution plan, you must have in mind that it is generated based on the existing statistics for the table or indexes used by the query; therefore it is very important that when analyzing the execution plan the objects access statistics are updated.
If the statistics are not updated, the execution plan will be generated on top of inconsistent data that may not reflect reality. You must always update the statistics after performing operations that put into motion great bundles of data or the creation and alteration of indexes.
In Figure 1 we have the example of the execution plan of a query executed over the tables Publishers and Title of the SQL Server 2000’s Pubs database.
Figure 1. Example of an execution plan
Observe that each icon represents a specific operation and the arrows indicate the path to be followed. The red arrow indicates that the analysis must always be performed from right to left and from top to bottom.
2. When analyzing the execution plan, start by searching for operations with high consumption.
During the execution plan analysis, begin looking for operations that have a high percentage of consumption. Looking for operations with high consumption allows arranging in order of priority which problem should be “attacked” first. Amongst the operations that have greater consumption and that, therefore, must be prevented we have:
· Table and Index Scan operations;
· Operations that have arrows too “thick”;
· Bookmark Lookups operations;
· Sort operations.
Table and Index Scan operations
Table Scans and Index Scans operations are slow operations and that generate high server consumption. This because these are operations that navigate through all the lines of the table or the index, performing a sequential sweeping and returning the lines that satisfy the where clause (assuming that you use a where clause).
It is true that depending on the size of the table, the amount of lines being returned and the quality of the filter rule, Table Scan may not point a problem, but mainly, when we speak of great tables, the Table Scan is the worst of all the operations and indicates that the table does not have index or, if it has, it is not being used adequately by the query.
Whenever you find a Table Scan in your execution plan, do not restrain from investigating it. The Index Scan and Clustered Index Scan perform a sequential sweeping in a table’s index pages. Since they act on an index, they are better than the Table Scan, but they also deserve an investigation. This because in general if you have Index Scan, then a great amount of data is being returned and in the majority of the times you do not need all this data.
The scans, in their majority, are solved through the modification or creation of proper indexes. Some solutions also include modifying the queries in such a way as to be more selective, that is, to use the where clause to filter at the maximum possible the returned registers.
Figure 2 shows the graphic representation of the Table, Clustered Index Scan and Index Scan operators.
Figure 2. Graphic representation of the Table, Clustered Index Scan and Index Scan operators
Operations that have arrows too “thick”
The arrows are not operators: they are simply used to unite an operator to another. Through the arrows we have an estimate of the amount of lines affected by the operation since its thickness is directly related to the amount of lines returned by the operation.
The bigger the thickness of the arrow, the greater is the amount of lines involved in the operation or the amount of lines passed from an operator to another. In order for you to see the quantity and size estimate of the affected lines, it is enough to place the cursor over the arrow.
When analyzing the execution plan, always give a special attention to the thickest arrows, for a very thick arrow can indicate a high operation of I/O and, consequently, containment in the server’s subsystem disk. Another important point is that most of the times, the very thick arrows are associated to a Table Scan.
To solve this kind of problem we must once more make use of the where clause to filter the returned data and make the arrow as thin as possible. If the arrow is associated to a Table Scan, first analyze the latter, because probably when solving the Table Scan you will also solve the arrow’s thickness. The suggestion here is: avoid obtaining more lines than necessary.
In Figure 3 we have a query‘s execution plan (modified Employee table of the Pubs database) which returns almost 700 thousand records. Observe that the query does not use the where clause.
Figure 3. Execution plan for a query without where clause
Bookmark Lookups Operations
The Bookmark Lookup operator takes place when the index can be used to satisfy the search criterion, but it does not have all the data requested by the query. Normally it occurs together with an Index Scan when the query requires information on columns that are not part of the index key. In this scenario, look for Bookmark Lookup that has a high consumption percentage.
In Figure 4 we can observe that, to obtain the data requested by the query, the SQL Server executed a Bookmark Lookup operation which consumed 41% of the query’s total time of execution.
Figure 4. Execution plan with a Bookmark Lookup
This happened because the fname and lname columns are not part of the index key, which is composed only of the hire_date column, and with that the SQL Server needs to access the table’s data pages in order to obtain the data regarding fname and lname.
If the Bookmark Lookup operation cost is too high, check if a cluster index or a non-cluster index, composed by the researched columns, can be used. In Figure 5, the creation of a cluster index composed by the fname and lname columns solved the problem.
Figure 5. Execution plan after the creation of a cluster index
A word of advice: always when possible avoid the Bookmark Lookup operation in your query. This because, although the Bookmark Lookup which deals with small amounts of data is not a problem, this operation in large amounts of data increases the I/O rate and consequently damages the query performance.
The sort operation ajust all the lines in an ascending or descending order, depending on the Order By clause of its query. Sort operations, besides using the TEMPDB system database for a temporary storage area, also add a great I/O rate to the operations.
Therefore, if you are used to seeing the Sort operator frequently in its queries and this operator has a high consumption operation, consider removing the mentioned clause. On the other hand, if you know that will always organize your query by a specific column, consider indexing it.
In the Create Index command you can determine the ordering direction (Asc or Desc) of a particular index. Figure 6 presents a Sort operation consuming 23% of the query’s total execution time.
Figure 6. Sort operation with 23% consumption
3. Avoid the use of cursors and whenever possible substitute them for “while”.
The great problem in the use of cursors is that these, by nature are slow and consume a great deal of the server resource. This happens because the relational databases are optimized to work with sets of records.
Each set of records is known as Result Set and treated as a single unit. As example, the set of records returned by a Select statement consists of all the lines that satisfy the where clause condition.
The cursor goes in the opposite direction of this concept, since it was developed thinking about the work line by line. That is, you use it to navigate line by line inside a set of records or a Result Set returned by a Select statement.
As consequence of this use, we have a great volume of packages being sent through the network, high compilation time and parse of the Fetch statements, blocking of connections due to the locks in tables or records, at last, high consumption of the server’s resources and low performance of its application.
In the face of this, some methods emerged using Transact SQL which can be used to replace the use of the cursors. Following, I will be presenting two of these methods.
First, so that you can understand how the cursor is used, in Listing 1 we have a very simple example of cursor that navigates line by line in the Authors table of the Pubs database, displaying the information of the id, last name and first name fields.
Listing 1. Example of cursor utilization
1. DECLARE @au_id varchar(15)
2. DECLARE @au_fname varchar(15)
3. DECLARE @au_lname varchar(15)
4. DECLARE cur_authors CURSOR
5. FOR SELECT au_id, au_fname,au_lname FROM authors
6. OPEN cur_authors
7. FETCH NEXT FROM cur_authors INTO @au_id,@au_fname,
8. WHILE @@FETCH_STATUS=0
10. SELECT @au_id,@au_fname,@au_lname
11. FETCH NEXT FROM cur_authors INTO @au_id,
13. CLOSE cur_authors
14. DEALLOCATE cur_authors
In the example, from lines 1 to 3 we have the statement of the variables that will be used to store the data of the fields returned by Select. It is important to observe that these variables must have the same data type as the columns of the table
In lines 4 and 6, we have the declaration and opening of the cursor itself. Observe that the Result Set generated at the opening of the cursor will include all the records and only the columns au_id, au_fname and au_lname of the authors table. In line 7, Fetch Next takes charge of taking the next record and filling in the variables with the data obtained from the record.
From lines 8 through 12, we have the Loop which will keep on being executed for as long as there are still records (@@FETCH_STATUS=0). In practice, line 8 verifies if still it has records, if there are, in line 10 “prints” the content of the variables on the screen and in line 11, takes the next record again and fills in the variables with the data obtained in the record.
In line 13, we have the closing of the cursor and in line 14 the release of the memory used by the cursor takes place. Given the example, we will see two methods that can be used to accomplish the same task, however without the use of cursors.
The first method, which is presented in Listing 2, makes use of temporary table and the Top clause. With this method, you create a snapshot of the desired information throwing the result of Select in a temporary table.
Listing 2. Method to replace cursor using temporary table
1. DECLARE @au_id char(11)
2. SELECT au_id, au_fname,au_lname
INTO #tb_tmp_authors FROM authors
3. SELECT TOP 1 @au_id = au_id FROM #tb_tmp_authors
4. WHILE @@rowcount <> 0
6. SELECT au_id, au_fname,au_lname
8. FROM #tb_tmp_authors WHERE au_id = @au_id
9. DELETE #tb_tmp_authors WHERE au_id = @au_id
10. SELECT TOP 1 @au_id = au_id FROM #tb_tmp_authors
12. DROP TABLE #tb_tmp_authors
In the method of Listing 2, line 1 simply declares a variable to store the content of the au_id column. In line 2, we have the same Select that was used in the Listing’s 1 cursor declaration. The difference is here that instead of throwing the result of the Select into a cursor, this result is being thrown into a temporary table called #tb_tmp_authors.
After the temporary table load, we can then work with it to deal with the registers line by line. Observe that in line 3 the Top clause is used with value 1. The Top 1 clause assures that only the first record of the temporary table will be returned by Select and consequently we will have the value of the au_id column from the first record stored in the variable @au_id.
In line 4 the while command is used to make the Loop and to check the value of the @@rowcount global variable. This is a system variable which is automatically filled with the amount of records affected by the Select executed in line 3. Since we use the Top 1 in the Select, this will always affect a record at a time and the @@rowcount variable will always be 1 until the temporary table is empty.
From lines 6 to 10 are then performed all the desired processing, in this in case we only print the data in the screen, and observe that in line 9 the record of the temporary table whose au_id was obtained in the Select of line 3 is excluded.
This will make it so that the second record of the temporary table becomes the first. In line 10 the same Select of line 3 is again executed taking the value from the first record of the column au_id. And thus the process will continue until the temporary table is empty and with this @@rowcount be equal to 0. At the end of the processing, line 12 excludes the temporary table.
The second method, which is presented in Listing 3, does not use a temporary table, but uses the Min() function to take a record at a time from the Authors table.
Listing 3. Method to replace cursor using the Min() function
1. DECLARE @au_id char(11)
2. SELECT @au_id = min(au_id) from authors
3. WHILE @au_id is not null
5. SELECT au_id, au_fname,au_lname FROM authors
6. WHERE au_id = @au_id
7. SELECT @au_id = min( au_id ) FROM authors
WHERE au_id > @au_id
Since the method uses the Min() function it is necessary to guarantee that the verification is made over a column that is single and crescent. This will guarantee that new lines will always have a bigger identifier than the identifier of the line being processed.
In this example, in line 1 we have the variable declaration which will store the content of the au_id column. In line 2, the Select obtains from the Authors table the record that has the lesser value for the au_id column and stores this value in the @au_id variable.
In line 3 the while command is used to make the Loop and to verify if the @au_id variable is not null, for when it is null it means that no more records to be processed exist and with this we leave the Loop. Being the @au_id variable different than null, we then enter the Loop and from lines 5 to 7 we perform the desired processing.
In this case we only print the data in the screen and in line 7 we obtain a new record where the value of the au_id column is larger than the value already stored in the @au_id variable. And thus the processing continues until it reaches the last record in the table.
As we have seen, not always do we need to use cursor to process our data inside the SQL Server. If you execute the three examples mentioned here you will see that the result will be the same.
4. Substitute the UNION operator by UNION ALL whenever possible.
When you are using the Union operator to combine the result of two queries, keep in mind that this perform a Select Distinct in the final result to remove possible duplicate records, even if there are no duplicate records.
Before this, the advice is that not having the possibility of duplicate records or if there are no problems for the application that the final result presents duplications, use the Union All operator. Since this operator does not execute Select Distinct in the final result, it uses less SQL Server resources and therefore, improves the query performance.
In Figure 7 we have two queries that perform the same operation over the Orders table of the Northwind database, one using the Union operator and the other, the Union All.
Observe that the query with Union All will display all the records including the duplicates. But this consumes less of the server’s resource for not performing the Select Distinct in the final result.
Figure 7. Example of queries using the Union and Union All operators
5. Substitute Sub-queries by Joins.
Many Transact SQL instructions that make use of sub-queries may be rewritten using joins. It is true that many of the times you will not have performance benefits when using sub-queries or joins, but in some cases where, for example, the existence of a value needs to be verified, the use of joins will produce better results.
So being, whenever possible look to substitute your sub-queries for joins. In Listing 4 we have two Select instructions being one written with sub-query and the other with join.
Listing 4. Select instructions using sub-query and join
-- SELECT instruction using sub-query
SELECT ProductID,SupplierID, ProductName
FROM Products WHERE SupplierID IN
(SELECT SupplierID FROM Suppliers
WHERE (country = 'Brazil'))
-- SELECT instruction using join
SELECT prd.ProductID, prd.SupplierID, prd.ProductName
FROM Products prd INNER JOIN Suppliers sup
ON prd.SupplierID = sup.SupplierID
WHERE sup.country = 'Brazil'
Observe that when being executed both will produce the same result: a list with all the products of suppliers in Brazil.
6. In the WHERE clause do not use indexed column in functions.
The simplest way to make a column not able to be indexed, is to put this column in a function! In the SQL Server, the use of the Substring function in the where clause is every common, however, what very few know is that when you place an indexed column inside a function, the SQL Server ends up not using the index in a suitable fashion and many times does not even use it at all.
In these situations, the best thing to do is to move the function to the other side of the equation in the where clause or if possible not use it at all. In Figure 8 we have an example of how the use of function in indexed column in the where clause can prevent the SQL Server from using the index correctly.
Figure 8. Example of queries using the SUBSTRING function and the LIKE command
The two queries presented in Figure 8 have as objective to obtain all employees whose first name starts with the characters “Ma”. Observe that in the first query the Substring function is used to break the fname column taking only the first two characters and comparing them to the “Ma” string.
Since the SQL Server never knows which will be the characters to be researched, the process is performed for each of the table’s lines, it ends up performing an Index Scan, sweeping all the pages of the index in a sequential way.
In the second query the function was subtracted by the Like command. In this case, since the indexed column is not affected, the SQL Server manages to use the index in a suitable fashion performing an Index Seek in the index pages.
7. Whenever possible try to use operators that are capable of being indexed.
Similar to the problem of the use of indexed columns in functions, there is also a set of operators which when used, they can prevent the SQL Server from using the index in a suitable fashion. These are known as operators incapable of being indexed.
The positive operators are generally capable of being indexed: =, >, >=, <, <=, Between and Like when used in the: Like ‘word%’ manner. The negative operators are generally incapable of being indexed: <>, Not, Not Exists, Not In, Not Like and Like when used in the: Like ‘%word’ manner.
8. When in need of obtaining an amount of records form a table, avoid using the Select Count(*).
Whenever we need to obtain the amount of records from a table, the first T-SQL instruction that comes to the mind is: “Select Count (*) From table”.
The problem with this instruction is that most of the times it performs a Table or Index Scan to return the amount of records in the table. For large tables this is a synonymous of slow query and high consumption of server resources.
A simpler way to perform this same task without causing impact is using the systems table called sysindexes. This table has a column called rows which stores the total amount of records for each table of your database.
Being so, whenever possible use the T-SQL instruction that follows, to obtain the amount of records in the table.
SELECT rows FROM sysindexes
WHERE id = OBJECT_ID('table_name') AND indid<2
Since the sysindexes table does not have a column with the name of the tables, but only their ids, the instruction uses the Object_ID() function so that the SQL Server may, by the name of the table, identify its respective id inside the sysindexes table.
9. Always use the Set NoCount On inside your Stored Procedures.
This is a “best practice” which I rarely see developers use. Certainly when running T-SQL instructions such as Select, Insert, Update and Delete you must have already seen the “nn row(s) affected” message making part of the result of your query.
Maybe you do not know, but this apparently harmless message may generate a great impact in the performance of your Stored Procedures. This because when you execute a SP which has several T-SQL instructions, this message is sent to the client for every instruction inside the SP, which ends up generating an unnecessary network traffic.
The Set NoCount option disables the sending of these messages. With this, the SPs that possess several T-SQL instructions, may present significant performance improvement once the network traffic will be greatly reduced.
10. When creating composed index, order the columns in the index in a way as to satisfy the Where clause of most of your queries.
One thing that must always be kept in mind when working with composed (indexes with two or more of the table’s columns), is that the index will only be used by the query if the first column of the key of a composed index is specified in the where clause.
Therefore, when working with composed index, the order of the columns in the index is very important. In order for you to understand better, let us move to an example: Assume that your database has a table called tb_employee and that this table has an index composed by the columns last_name, first_name in that respective order. When analyzing the execution plan of a query that has the where clause as:
You will see that the index was adequately used by the SQL Server’s consultation optimizer, but when analyzing the plan from a consultation that has the where clause as:
You will see that the index was not used. Therefore, when using composed indexes, make sure that the where clause always has the first column of the index.
As we have seen, countless are the resources and techniques that we may use to extract a better profit of the SQL Server. In general, when we work with performance problems in query, one of the basic points is to assure that the tables have indexes and mainly assure that the queries are making proper use of these indexes.
The analysis of the execution plan may help in this. Another important point is to guarantee that the statistics are always kept updated. Remember that the SQL Server optimizer uses the statistics to make his decisions. Greetings and see you next time.