Optimizing Query Performance

Top Of Page

Application and Database Design

The biggest performance gains come from changes to the application and database design. You might change your configuration settings and add heftier hardware and be thrilled when performance doubles, but changes to the application can often result in even larger performance increases. There are as many approaches to software development as there are pages in this book. No single approach is the right approach—yours must be tailored to the size of the project, your team, and the skill level of the team members.

Take a look at the list of suggestions on the facing page for planning and implementing good performance in your system. We'll explain these items in detail in this chapter and in Chapter 15.

Develop expertise on your development team.
Understand that there is no substitute for solid application and database design.
State performance requirements for peak, not average, use.
Consider perceived response time for interactive systems.
Prototype, benchmark, and test throughout the development cycle.
Create useful indexes.
Choose appropriate hardware.
Use cursors judiciously.
Use stored procedures almost always.
Minimize network round-trips.
Understand concurrency and consistency tradeoffs.
Analyze and resolve locking (blocking) problems.
Analyze and resolve deadlock problems.
Monitor and tune queries using SQL Server Profiler.
Monitor system using Performance Monitor.
Review and adjust Windows NT settings.
Review and adjust SQL Server configuration settings.
Make only one change at a time, and measure its effect.
Do periodic database maintenance.

Normalize Your Database

We'll assume that if you're reading this book, you understand the concept of normalization and terms such as third normal form. (If you don't, see Candace Fleming and Barbara Vonhalle's Handbook of Relational Database Design and Michael Hernandez's Database Design for Mere Mortals. A plethora of other books about database design and normalization are also available.)

A normalized database eliminates functional dependencies in the data so that updating the database is easy and efficient. But querying from that database might require a lot of joins between tables, so common sense comes into play. If the most important and time-critical function your system must perform is fast querying, it often makes sense to back off from a normalized design in favor of one that has some functional dependencies. (That is, the design is not in third normal form or higher.) Think of normalization as typically being good for updating but potentially bad for querying. Start with a normalized design and then look at all the demands that will be placed on the system.

Note: There really isn't a binary concept of being "normalized" or "not normalized." There are only degrees of normalization. It is common to refer to a database that is at least in third normal form as "normalized" and to refer to a database at a lower level of normalization as "unnormalized" or "denormalized." To keep the discussion simple, we'll use the terms in that way, as imprecise as that might be. (The terms "unnormalized" and "denormalized" have slightly different meanings. An unnormalized database is one that has never been normalized. A denormalized database is one that was normalized at some point, but for specific performance-related reasons the design was backed down from the normalized version. Our apologies to those who make a living doing entity-relationship diagrams and are horrified by the loose use of these terms.)

If you understand the data elements you need to record and you understand data modeling, producing a normalized design is not difficult. But it might take some time to learn about the way a business operates. If you already understand the underlying processes to be modeled and know how to do data modeling, the mechanics of producing a normalized database design are quite straightforward.

Once you produce a normalized design, which you can also think of as the logical design, you must decide if you can implement the design nearly "as is" or if you need to modify it to fit your performance characteristics. A lot of people have trouble with this. Rather than try to articulate specific performance characteristics, they generalize and strive for "as fast as possible" or "as many users as we can handle." Although goals can be difficult to articulate precisely, you should at least set relative goals. You should understand the tradeoffs between update and query performance, for example. If a salesperson must call up all of a customer's records while the customer is waiting on the phone, that action should be completed within a few seconds. Or if you want to run a bunch of batch processes and reports for your manufacturing operation each night and you have a window of four hours in which to do it, you have a pretty clear objective that must be met.

Evaluate Your Critical Transactions

One thing that you should do immediately is look at your critical transactions—that is, transactions whose performance will make or break the system. (In this context, we use the term "transaction" loosely; it means any operation on the database.) Which tables and joins will be required for your critical transactions? Will data access be straightforward or complicated?

For example, if it is imperative that a given query have less than a 2-second response time but your normalized design would require a seven-way join, you should look at what denormalizing would cost. If tables are properly indexed, the query is well qualified, the search parameters are quite selective, and not a lot of data needs to be returned, the quick response might be possible. But you should note any seven-way joins and consider other alternatives. (You should probably look for alternatives any time you get beyond a four-way join.)

In our example, you might decide to carry a little redundant information in a couple of tables to make it just a three-way join. You'll incur some extra overhead to correctly update the redundant data in multiple places, but if update activity is infrequent or less important and the query performance is essential, altering your design is probably worth the cost. Or you might decide that rather than compute a customer's balance by retrieving a large amount of data, you can simply maintain summary values. You can use triggers to update the values incrementally when a customer's records change. (For example, you can take the old value and add to or average it but not compute the whole thing from scratch each time.) When you need the customer balance, it is available, already computed. You incur extra update overhead for the trigger to keep the value up-to-date, and you need a small amount of additional storage.

Proper indexes are extremely important for getting the query performance you need. But you must face query-vs.-update tradeoffs similar to those described earlier because indexes speed up retrieval but slow down updating. Chapter 8 explains the extra work required when your updates require index maintenance. (Because of the way that nonclustered indexes are stored and updated, the overhead of index maintenance is not nearly as severe in SQL Server 7 as in previous versions of the product.) You might want to lay out your critical transactions and look for the likely problems early on. If you can keep joins on critical transactions to four tables or less and make them simple equijoins on indexed columns, you'll be in good shape.

None of these considerations are new, nor are they specific to SQL Server. Back in the mainframe days, there was a technique known as "completing a CRUD chart." CRUD stands for Create-Retrieve-Update-Delete. In SQL, this would translate as ISUD—Insert-Select-Update-Delete. Conceptually, CRUD is pretty simple. You draw a matrix with critical transactions on the vertical axis and tables with their fields on the horizontal axis. The matrix gets very big very quickly, so creating it in Microsoft Excel or in your favorite spreadsheet program can be helpful. For each transaction, you note which fields must be accessed and how they will be accessed, and you note the access as any combination of I, S, U, or D, as appropriate. You make the granularity at the field level so you can gain insight into what information you want in each table. This is the information you need if you decide to carry some fields redundantly in other tables to reduce the number of joins required. Of course, some transactions require many tables to be accessed, so be sure to note whether the tables are accessed sequentially or via a join. You should also indicate the frequency and time of day that a transaction runs, its expected performance, and how critical it is that the transaction meet the performance metric.

How far you carry this exercise is up to you. You should at least go far enough to see where the potential hot spots are for your critical transactions. Some people try to think of every transaction in the system, but that's nearly impossible. And what's more, it doesn't matter: only a few critical transactions need special care so they don't lead to problems. (You shouldn't worry much about such things as noncritical reports that run only during off-hours.) For example, if you have to do frequent select operations simultaneously on the tables that are being updated the most, you might be concerned about locking conflicts. You need to consider what transaction isolation level to use and whether your query can live with Read Uncommitted and not conflict with the update activity.

If you are doing complex joins or expensive aggregate functions—SUM(), AVG(), and so on—for common or critical queries, you should explore techniques such as the following and you should understand the tradeoffs between query performance improvement and the cost to your update processes:

Add logically redundant columns to reduce the number of tables to be joined.
Use triggers to maintain aggregate summary data, such as customer balances, the highest value, and so forth. Such aggregates can usually be incrementally computed quickly. The update performance impact can be slight, but the query performance improvement can be dramatic.

Keep Table Row Lengths and Keys Compact

When you create tables, you must understand the tradeoffs of using variable-length columns. (See Chapter 6.) As a general rule, data with substantial variance in the actual storage length is appropriate for variable-length columns. Also remember that the more compact the row length, the more rows will fit on a given page. Hence, a single I/O operation with compact rows is more efficient than an I/O operation with longer row lengths—it returns more rows and the data cache allows more rows to fit into a given amount of memory.

As with tables, when you create keys you should try to make the primary key field compact because it frequently occurs as a foreign key in other tables. If no naturally compact primary key exists, you might consider using an identity or uniqueidentifier column as a surrogate. And recall that if the primary key is a composite of multiple columns, the columns are indexed in the order that they are declared to the key. The order of the columns in the key can greatly affect how selective, and hence how useful, the index is.

Your clustered key should also be as compact as possible. If your clustered key is also your primary key (which is the default when you declare a PRIMARY KEY constraint), you might already have made it compact for the reason mentioned above. There are additional considerations for your clustered key because SQL Server automatically keeps the clustered key in all nonclustered indexes, along with the corresponding nonclustered key. For example, if your clustered index is on zipcode and you have a nonclustered index on employee_id, every row in the nonclustered index stores the corresponding zipcode value along with the employee_id value. We discussed the structure of indexes in Chapters 3 and 6, and we'll look at it again later in this chapter when we look at how to choose the best indexes.

Occasionally, a table will have some columns that are infrequently used or modified and some that are very hot. In such cases, it can make sense to break the single table into two tables; you can join them back together later. This is kind of the reverse of denormalization as you commonly think of it. In this case, you do not carry redundant information to reduce the number of tables; instead, you increase the number of tables to more than are logically called for to put the hot columns into a separate, narrower table. With the more compact row length, you get more rows per page and potentially a higher cache-hit ratio. As with any deviation from the normalized model, however, you should do this only if you have good reason. After you complete a CRUD chart analysis, you might see that while your customer table is frequently accessed, 99 percent of the time this access occurs just to find out a customer's credit balance. You might decide to maintain this balance via a trigger rather than by recomputing it each time the query occurs. Information such as customer addresses, phone numbers, e-mail addresses, and so on are large fields that make the table have a wide row length. But not all that information is needed for critical transactions—only the customer balance is needed. In this case, splitting the table into two might result in the difference between fitting, say, 150 rows on a page instead of only 2 or 3 rows. A more narrow table means a greater likelihood that the customer balance can be read from cache rather than by requiring physical I/O.

Top Of Page

Planning for Peak Usage

People often ask questions like, "Can SQL Server handle our system? We do a million transactions a day." To answer this question, you have to know exactly what transactions they have in mind. You also need to know the system's peak usage. If a million transactions a day are nicely spread out over 24 hours, that's less than 12 transactions per second. In general, a 12-TPS (transactions-per-second) system would be adequate. But if 90 percent of the million transactions come between 2 p.m. and 3 p.m., it's a very different situation. You'll have rates of about 275 TPS during that hour and probably peaks of more than 350 TPS. You can use benchmarks, such as Debit-Credit, in which SQL Server performs over 1500 TPS, but all transactions are different. It is meaningless to refer to transactions per second in SQL Server or any other system without also talking about the types of transactions. This is precisely the reason that standardized tests are available from the Transaction Processing Council for comparing database performance.

Regardless of the specific transactions, though, you must design and build your system to handle peak usage. In the case above, the important consideration is peak usage, which determines whether you should target your system to a volume of 350 TPS for unevenly distributed usage or to only 12 TPS with usage spread out evenly. (Most systems experience peaks, and daily usage is not so nicely spread out.)

Top Of Page

Perceived Response Time for Interactive Systems

Systems are often built and measured without the appropriate performance goals in mind. When measuring query performance, for example, most designers tend to measure the time that it takes for the query to complete. By default, this is how SQL Server decides to cost query performance. But this might not be the way your users perceive system performance. To users, performance is often measured by the amount of time that passes between pressing the Enter key and getting some data. As a program designer, you can use this to your advantage. For example, you can make your application begin displaying results as soon as the first few rows are returned; if many rows will appear in the result set, you don't have to wait until they are all processed. You can use such approaches to dramatically improve the user's perception of the system's responsiveness. Even though the time required to get the last row might be about the same with both approaches, the time it takes to get the first row can be different—and the perceived difference can translate into the success or failure of the project.

By default, SQL Server optimizes a query based on the total estimated cost to process the query to completion. Recall from Chapter 3 that if a significant percentage of the rows in a table must be retrieved, it is better to scan the entire table than to use a nonclustered index to drive the retrieval. (A clustered index, of course, would be ideal because the data would be physically ordered already. The discussion here pertains only to the performance tradeoff of scan-and-sort vs. using a nonclustered index.)

Retrieving a page using the nonclustered index requires traversing the B-tree to get the address of a data page or the clustering key and then retrieving that page (by using the RID to directly access it or traversing the clustered index to find the data page) and then traversing the nonclustered B-tree again to get the location information for the next data page and retrieving it…and so on. Many data pages are read many times each, so the total number of page accesses can be more than the total number of pages in the table. If your data and the corresponding nonclustered index are not highly selective, SQL Server usually decides not to use that nonclustered index. That is, if the index is not expected to eliminate more than about 90 percent of the pages from consideration, it is typically more efficient to simply scan the table than to do all the extra I/O of reading B-trees for both the nonclustered and clustered indexes as well as the data pages. And by following the index, each data page must frequently be accessed multiple times (once for every row pointed to by the index). Subsequent reads are likely to be from cache, not from physical I/O, but this is still much more costly than simply reading the page once for all the rows it contains (as happens in a scan).

Scanning the table is the strategy SQL Server chooses in many cases, even if a nonclustered index is available that could be used to drive the query, and even if it would eliminate a sort to return the rows based on an ORDER BY clause. A scan strategy can be much less costly in terms of total I/O and time. However, the choice of a full scan is based on SQL Server's estimate of how long it would take the query to complete in its entirety, not how long it would take for the first row to be returned. If an index exists with a key that matches the ORDER BY clause of the query and the index is used to drive the query execution, there is no need to sort the data to match the ORDER BY clause (because it's already ordered that way). The first row is returned faster by SQL Server chasing the index even though the last row returned might take much longer than if the table were simply scanned and the chosen rows sorted.

In more concrete terms, let's say that a query that returns many rows takes 1 minute to complete using the scan-and-sort strategy and 2 minutes using a nonclustered index. With the scan-and-sort strategy, the user doesn't see the first row until all the processing is almost done—for this example, about 1 minute. But with the index strategy, the user sees the first row within a subsecond—the time it takes to do, say, five I/O operations (read two levels of the nonclustered index, two levels of the clustered index, and then the data page). Scan-and-sort is faster in total time, but the nonclustered index is faster in returning the first row.

SQL Server provides a query hint called FAST that lets SQL Server know that having the first n rows returned quickly is more important than the total time, which would be the normal way query plans get costed. Later in this chapter, we'll look at some techniques for speeding up slow queries and discuss when it is appropriate to use the query hint. For now, you should understand the issues of response time (the time needed to get the first row) vs. throughput (the time needed to get all rows) when you think about your performance goals. Typically, highly interactive systems should be designed for best response time, and batch-oriented systems should be designed for best throughput.

Top Of Page

Prototyping, Benchmarking, and Testing

As you make changes to your application design or hardware configuration, you should measure the effects of these changes. A simple benchmark test to measure differences is a tremendous asset. The benchmark system should correlate well with the expected performance of the real system, but it should be relatively easy to run. If you have a development "acceptance test suite" that you run before checking in any significant changes, you should add the benchmark to that test suite.

Tip You should measure performance with at least a proxy test; otherwise, you're setting yourself up for failure. Optimism without data to back it up is usually misguided.

Your benchmark doesn't have to be sophisticated initially. You can first create your database and populate it with a nontrivial amount of data—thousands of rows at a minimum. The data can be randomly generated, although the more representative you can make the data the better. Ideally, you should use data from the existing system, if there is one. For example, if a particular part represents 80 percent of your orders, you shouldn't make all your test data randomly dispersed. Any differences in the selectivity of indexes between your real data and the test data will probably cause significant differences in the execution plans you choose. You should also be sure that you have data in related tables if you use FOREIGN KEY constraints. As we explained earlier in this book, the enforcement of FOREIGN KEY constraints requires that those related tables (either referenced or referencing) be accessed if you are modifying data in a column that is participating in the constraint. So the execution plan is sometimes considerably more complicated due to the constraints than might be apparent, and a plethora of constraints can result in a system that has no simple operations.

As a rule of thumb, you should start with at least enough data so the difference between selecting a single row based on its primary key by using an index is dramatically faster than selecting such a row using a table scan. (This assumes that the table in production will be large enough to reflect that difference.) Remember that the system will perform much differently depending on whether I/O operations are physical or from the cache. So don't base your conclusions on a system that is getting high cache-hit ratios unless you have enough data to be confident that this behavior also will be true for your production system. In addition, keep in mind that running the same test multiple times might yield increasingly short response times. If you are testing on a dedicated machine, no other processes will be using SQL Server's memory, and the data you read in from the disk the first time will already be in cache for subsequent tests.

Tip If you want to run your tests repeatedly under the same conditions, use the command DBCC DROPCLEANBUFFERS after each test run to remove all data from memory.

Early in the development process, you should identify areas of lock contention between transactions and any specific queries or transactions that take a long time to run. The SQL Server Profiler, discussed in Chapter 15, can be a wonderful tool for tracking down your long-running queries. And if table scans will be a drain on the production system, you should have enough data early on so that the drain is apparent when you scan. If you can run with several thousand rows of data without lock contention problems and with good response time on queries, you're in a good position to proceed with a successful development cycle. Of course, you must continue to monitor and make adjustments as you ramp up to the actual system and add more realistic amounts of data and simultaneous users. And, of course, your system test should take place on an ongoing basis before you deploy your application. It's not a one-time thing that you do the night before you go live with a new system.

Tip Before you roll out your production system, you should be able to conduct system tests with the same volumes of data and usage that the real system will have when it goes live. Crossing your fingers and hoping is not good enough. Two tools in the Microsoft BackOffice Resource Kit might prove useful for this purpose. The filltabl utility populates a specified table with any number of rows of random data. A load simulator (sqlls) lets you run one or more SQL scripts with up to 64 operating system threads executing each script. These and other tools from the BackOffice Resource Kit are written for the previous version of SQL Server, but they will work with SQL Server 7.

Obviously, if your smallish prototype is exhibiting lock contention problems or the queries do not perform well within your desired goals, it's unlikely that your real system will perform as desired. Run the stored procedures that constitute your critical transactions. You can use a simple tool like OSQL.EXE to dispatch them. First run each query or transaction alone—time it in isolation and check the execution plans (SET SHOWPLAN_TEXT ON). Then run multiple sessions to simulate multiple users, either by manually firing off multiple OSQL commands or by using the SQL Load Simulator mentioned above. (A bit later, we'll look at how to analyze and improve a slow-running query.)

Also, based on your CRUD chart analysis (or on any other analysis of critical transactions), you should identify tasks that will run at the same time as your critical transactions. Add these tasks to your testing routine to determine whether they will lead to contention when they run simultaneously with your critical transactions. For example, suppose proc_take_orders is your critical transaction. When it runs, some reports and customer status inquiries will also run. You should run some mixture of these types of processes when you analyze proc_take_orders. This will help you identify potential lock contention issues or other resource issues, such as high CPU usage or low cache.

You might also want to use the SQL Server benchmark kit, which is available on the companion CD as well as at the Microsoft Web site (Top Of Page