June 18, 2015

Database Performance Optimization

Introduction

Imagine a customer using your application to get to a product or service of interest. There would be huge amounts of data to process and/or multiple rules applied for helpful features like auto-suggest, comparisons and reviews.

While designing such an application, if you have faced/facing any of the following pains, then you are not alone:

  • Previously smooth working applications slow down all of a sudden
  • Periodic overload of data
  • Seemingly innocent queries taking strangely huge time to execute

In today’s extremely fast-paced world, if the end user has to wait before getting what they desire, they would simply go away. And go away permanently, never to come back again.

Solution: Ensure a high level of responsiveness for applications so that whatever features are available, they can be interacted with by the end user. And since the biggest crunch happens for databases, this is the first area to be optimized and tuned.

Database Performance Tuning/Optimization

A database is optimized in two stages:

  1. The design phase, of taking care to setting up the structures and ways of utilization
  2. The runtime phase, where bottlenecks based on actual usage are identified and corrective actions like feature hints and indexes are taken

The optimizations plan can be effectively covered by following these “Rules of Hand,” which you should take care to employ so as to ensure a happy relationship between the database and application.

Rule of Palm – Start with Effective Design

A good start is a job half done. The database architect needs to study application requirements before jumping on the design.

Normalization

Here the data structure is modularized into simpler tables based on application needs such that duplicity is avoided while still having relevant data together.

A high degree of normalization will lead to too many tables, hence too many joins. This would make the query difficult to understand and visualize, leading to bugs and mis-interpretations. The database engine would also need to play around the structures.

However having denormalized data will lead to clogged organization and huge data pages, leading to slow speed.

Care must be taken to stop at the level required.

Data Types
Type of data indicates possible operations. However, usually the entity properties are set up as default options or at maximum permissive ones.

  • Size limits allow engine to enforce constraints upon allocation of data pages
  • Do not use Unicode enabler “n” prefix if working on only single language data, like name. This saves half the size of data
  • For very long strings, never use “max” as size even if unsure, rather use actual limit as per application. Using “max” sets it to a large object block “LOB” page which is allocated in a separate page table

Indexes
Usually designed based on expected search criteria, these should be periodically revisited to ensure that their purpose is actually being met. We need to create new indexes, update existing ones and remove unutilized based on performance monitor logs and query execution plans.

Clustered indexes should be created on the most used filter criteria and additional ones as non-clustered. If we have the entire filter criteria set up as a single index (called as covering), the increase in performance can be dramatic.

Rule of Thumb – Get only the “required” data

Usually we have a procedure which gets the biggest chunk of data as required by most of the requirements. This procedure is used by many operations, each taking a subset as per their need and discarding remaining. The engine has to first determine how to get some data which is not required at all and then actually retrieve it from data structures and send over the network.

Intermingling requirements of too many operations into a single implementation is a violation of the Single Responsibility Principle and causes maintenance nightmares.

Hence we need to divide data access implementation based on need and for modularity use the “Views.” This not only makes the design clean and maintainable but also removes unwanted data. Therefore, we must avoid this statement ‘Select * from table’ for better performance of query.

Rule of Fore-Finger –  Choose and Use Temporary Storage Effectively

While getting records to be processed, they usually are stored in a transient storage which can be temporary tables or table variables. Both are specific to their applicable areas and would not perform optimally in the other’s case. The table variables are created in memory and hence are the fastest (in principle). However, once the size of the dataset getting manipulated grows above a few thousand records, memory bottlenecks force us to move to more scalable temporary tables. These are stored in a dedicated database structure “tempdb” which can be distributed to a different physical storage.

If we have specific data from a huge table to be used repeatedly in multiple operations, such data should be copied to a temp table. This can then be used for specific searching or joining with other result sets for faster access.

Rule of Middle Finger – Execution Plans

Database engine prepares a plan for executing any given query; in case of stored procedures, it caches this plan so executions are faster. This plan is of two stages, first initial; based on estimations by the engine and then actual; based on real time conditions. Both can be seen in SQL Server Management Studio. Based on this, a developer can identify which query is taking the most effort and focus attention at that point.

display estimated

include actual

query server

The effort breakup is shown as relative to the whole batch query, so focus should be on the maximum effort (in example, 1st query). Upon hovering an operation, the details are shown in a popup which indicate the underlying statistics.

Although this caching of execution plans is a very big benefit (a major reason to use a stored procedure), there are times when this can be an overhead. If the size of the dataset and processing is dependent on the business state, the logic can change from one run to another. In such case, if the cached plan is used, that would not give optimum performance.

So use the SQL hint OPTION (RECOMPILE) to force the plan to be regenerated upon every execution.

Example query:

Before
before

After
after

Rule of Middle Finger – Runtime State Checks

SQL Server offers various commands to check the state of databases as DBCC (Database Console Commands). These commands allow the administrator to identify runtime bottlenecks and take corrective action on a continual basis. Common ones include DBCC FREEPROCCACHE, DBCC SHRINKDATABASE, DBCC SHRINKFILE, DBCC DROPCLEANBUFFERS, DBCC REINDEX.

Rule of Small Finger – Effective Joins

Joins are a necessity when using normalized tables. However, using them effectively requires following certain guidelines which can mean the difference between a few seconds and many minutes.

Index on “joined” column

If we use regular joins between two or more tables in our queries, performance can be optimized if each of the joined columns have their own indexes. This includes adding indexes to the columns in each table used to join the tables.

Pre-filter data using the “where” clause

If we know the specific data to be used, it can be specified as part of the join query itself rather than providing later. The overall operation will be faster as it will not need to remove the extra rows in the next query. If we are building a View / Stored Procedure for this task, the filter should be passed as a parameter (parameterized view instead of normal view).

Example:

Assuming we need only data related to ‘AMEX’ exchange in the operation.

assuming

Instead of

instead of

Conclusion

If you follow these Rules of the Hand, you’ll be well on your way to developing high-performing database-driven applications.