SQL Server Performance Rotating Header Image

Write Logs

Today’s blog is the beginning of your lessons on Write Logs. So what’s the definition of a write log. Well when a SQL Server session waits on the WRITELOG wait type, it’s waiting to write the contents of that log cache to disk where of course the transaction log is stored.

We will explain the process in a little more detail, so let’s assume a session starts a transaction that will perform several INSERT statements. While the data is being inserted, two actions will occur:

  1. The data page in the buffer cache is updated with the new data.
  2. The data is written to the log cache which is a segment of memory used to record data that will be used for rolling back the transaction or be written to the log file.

This process continues until the transaction has completed or committed at which time the data in the log cache is immediately written to the physical log file. When SQL Server is in the act of flushing the log cache to disk, the session will wait on the WRITELOG wait type. Now you have the basics of a write log!

How to Conduct a SQL Server Performance Audit

We have been doing DBA work for long, if you have been a DBA for a number of years you know that performance tuning SQL Server is not an exact science. To do a pretty thorough audit we have put together a quick checklist.

To make your SQL Server performance audit easier, we have divided it into sections.

* Use Performance Monitor to Identify SQL Server Hardware Bottlenecks
* Server Hardware Performance Checklist
* Operating System Performance Checklist
* SQL Server 2000 Configuration Performance Checklist
* Database Configuration Settings Performance Checklist
* Index Performance Checklist
* Application and Transact-SQL Performance Checklist
* Use Profiler to Identify Poor Performing Queries

The best way to conduct your SQL Server performance audit is to first review each of the above sections, and to print them out. Go through each section and write down the results as you collect them. You don’t have to use the above order, we only listed the above steps in this order because this is the way we generally approach a performance audit.

Performance Intelligence

Performance Intelligence is technology that uses historical trend data to resolve current database performance problems. We have been testing a free download of Ignite 8 from Confio Software. The marketing propaganda says that Ignite uses Performance Intelligence to analyze millions of data points, captured in the Data Warehouse along multiple dimensions, and identify which issues have most impact on the IT end user. They go on to say “Ignite presents the results in a way that leads the user directly to the most critical problems and the correlated evidence showing the root cause.” We are always skeptical of marketing and sales data sheets but so far our observations are positive.  We will be using Ignite 8 on some real live problems and get back with you on the results.

We try to stay away from recommending products unless we have done some extensive testing. Stay tuned for future feedback!

Troubleshooting Performance Problems in SQL Server 2008

Troubleshooting Performance Problems in SQL Server 2008 Whitepaper

This SQL Server white paper gives you step-by-step guidelines for diagnosing and troubleshooting common performance problems by using the following tools:

▪ SQL Server Profiler
▪ System Monitor (in the Windows Server 2003 operating system) or
▪ Performance Monitor (in Windows Vista operating system and Windows Server 2008), known as Perfmon
▪ Dynamic management views (referred to as DMVs)
▪ SQL Server Extended Events (Extended Events) and the data collector, which are new in SQL Server 2008.

Analyzing the SQL Diagram

Analyzing the SQL Diagram begins by looking for the smallest, underlined number. In our case it
is 0.002 next to the CLASS table. To limit the number of rows the query needs to process,
starting here will trim our result sets the soonest. In other words, if we can make SQL Server
start with this table the query can be executed optimally. I will leave it to Dan Tow to prove this
fact.

We know that we want SQL Server to hit the CLASS table first, but how do we do that? The
easiest way is to ensure an index exists on our criteria. In this case that means the NAME
column of the CLASS tables. In our case, an index does not currently exist, so let’s create it and
see what our new execution plan is:
create index cl_name on class(name)

The query executed in 20 seconds (which is a 90% improvement from 200 seconds) and the plan
has changed. The CL_NAME index is now used to find the two rows in the CLASS table with a
NAME=’SQL TUNING’, which is far better than performing an Index Scan. We achieved partial
results but an Index Scan is still done against the REGISTRATION table. Why would that be?
When we review the indexes that exist on that REGISTRATION we find one index named
REG_PK that contains the STUDENT_ID and CLASS_ID columns in that order. Since our query
is hitting the CLASS table first it cannot use this index that starts with STUDENT_ID so it must
perform a full Index Scan. This index could be modified to switch the columns around, but that
could affect many other queries and may require significant testing. In this case, and since I
know I need an index with a leading edge of CLASS_ID, I created an index on just that column:

Creating a SQL Diagram

We would like to introduce a technique that we use to tune SQL statements correctly without the trial and error that we were often faced with before. The concept of SQL Diagramming was introduced to us through a book named SQL Diagramming by Dan Tow. For a complete understanding of this topic we encourage you to buy the book and get more information from http://www.singingsql.com.
For our purposes, I will explain the basics of the following SQL diagram:

registration .04–>student

registration .04–>class .002

The diagram looks a lot like an ERD diagram. To build it:

· Start with any table in the FROM clause and put it on paper (the easiest way to draw
these diagrams is by hand). In our case we started with the first table in the FROM clause
named STUDENT.
· Take the next table, which is REGISTRATION and place it either above or
below the existing tables based on the relationship. Since the REGISTRATION relates to
the STUDENT table uniquely, i.e. one row in REGISTRATION points to one row in the
STUDENT table, we put it above and draw and arrow downwards.
· Take the next table, CLASS, and we know that one row in REGISTRATION points to one
row in CLASS so we put it below REGISTRATION with another downward pointing arrow.
The next step is to further understand the criteria used in the query to limit the number of rows
from each of the tables. The first criteria to explore is anything in the WHERE clause to limit
the rows from REGISTRATION. This criteria is:

r.signup_date BETWEEN
DATEADD(day, -1, current_timestamp) AND current_timestamp
AND r.cancelled = ‘N’

To understand the selectivity of the criteria, run a query against the table and use the same
criteria:

select count(*) from registration r
WHERE r.signup_date BETWEEN
DATEADD(day, -1, current_timestamp) AND current_timestamp
AND r.cancelled = ‘N’
Results – 3,562 / 80,000 (total rows in REGISTRATION) = 0.0445 = 4.4% selectivity

Use this number on the diagram next to the REGISTRATION table and underline it. This represents the selectivity of the criteria against that table. The next table to review in the same manner is the CLASS table. The query we would run becomes:

SELECT count(1) FROM class
WHERE name = ‘SQL TUNING’
Results – 2 / 1000 (total rows in CLASS) = 0.002 = 0.2% selectivity

Add this to the diagram next to the CLASS table. The next table to explore is the STUDENT
table, but there are no direct criteria against this table, so no underlined number next to that table.
The diagram has now been completed for our example, however, Dan Tow goes into more detail
than we will include for now.

Entity Relationship Diagram

Now to understand the relationships of the tables involved in the sql statement. The sample SQL statement we will tune is as follows and answers the question: “Who registered for the SQL Tuning class within the last day?”:

SELECT s.fname, s.lname, r.signup_date
FROM student s
INNER JOIN registration r ON s.student_id = r.student_id
INNER JOIN class c ON r.class_id = c.class_id
WHERE c.name = ‘SQL TUNING’
AND r.signup_date BETWEEN
DATEADD(day, -1, current_timestamp) AND current_timestamp
AND r.cancelled = ‘N’

The ERD for this query is very simple and restricts the universe to the objects the query is accessing. If the statement only accesses 3 tables, do not review a full ERD containing 100s or 1000s of tables. Only review the relationships of the three tables. You will be surprised by the number of times you can find mistakes in the SQL statement by reviewing the ERD.

CLASS
class_id
name
class_level

REGISTRATION
class_id
student_id
signup_date
cancelled

STUDENT
student_id
fname
lnam

Table and Index Statistics

The next step is to gather information about each table being accessed inefficiently. These tables come from a review of the execution plan and are a result of finding the highest execution steps. There is no use gathering data for objects that are already being accessed efficiently. Confio Software has a script on their support site (http://support.confio.com/kb/1534) that will help with this process and will gather information such as:

  1. Table sizes – a full table scan on a 20 row table is better than using an index even if it existed, so understand if you have small, medium, large or very large tables involved.
  2. Existing indexes – get a list of all indexes that already exist on these tables.
  3. Understand the selectivity or cardinality of columns contained in the WHERE clause. I will dicuss this in much more detail in the SQL Diagramming section below.

SQL Execution Plan

Let’s talk about your SQL execution plan, once you have broken the query down you need to understand how each component behaves. Yeah, sure you do! This is where execution plans help supply costing information, data access paths, join operations and many other things.

As you would have guessed, not all plans are the same. Do you know the “Estimated Plan” or “Actual Plan” from SQL Server Management Studio (SSMS) could be wrong and not match how SQL Server is really executing the statement?

You say, “how can the plan from SSMS be wrong”? There are many reasons including:

  1. The “estimated plan” can be wrong because SSMS is just providing an estimate of how SQL server would execute the SQL statement.
  2. The “actual plan” can also be wrong because it is typically executed in a completely different environment in SSMS and not from the application code and environment. The application may use specific settings that are not used in the SSMS environment and those settings can drastically affect SQL execution

There are many reasons the plan could be different so what should we do. There are ways to gather an execution plan that will be correct:

  • DM_EXEC_QUERY_PLAN – contains plans of executed SQL statements. It provides the exact plan SQL Server used so why not go straight to the source.
  • Tracing – gives all sorts of great information as well as executions plans for SQL statements.
  • Historical Data – if you can, collect and save execution plan information so you can go back to a week ago and understand why the SQL statement started performed poorly. Plan changes are very commonly associated with SQL statements that suddenly start performing poorly.

An example query to gather information from the DM_EXEC_QUERY_PLAN is shown below. This statement finds the top five SQL statement based on average execution time (aka average response time) and retrieves the execution plans for that statement. If run from SSMS, the Query_Plan column will contain a link to a graphical plan.

SELECT TOP 5 total_elapsed_time/execution_count AS [Avg Response Time],
SQL_handle, statement_start_offset, statement_end_offset,
Plan_handle, query_plan
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle)
ORDER BY total_elapsed_time/execution_count DESC;

Once you have an execution plan, determine how each of the SQL components is being executed. Based on wait time data, you should already have a feel whether table or index scans is your problem (waits on PAGEIOLATCH_*), locking (LCK_M_* waits) or something else. Reading execution plans are outside the scope of this document, but a few tips are:

  • Look for expensive steps
  • Look for large arrows that represent large intermediate result sets
  • Look for table scans or index scans which mean the entire object is being read

Gathering SQL Statement Metrics

The next phase in getting better performance out of your sql server application and to tune sql statements is to gather critical information and metrics about the SQL statement. These metrics should include the following:

  • How long does the statement take now?
  • What is acceptable to the end users? If they want the query to return in 10 seconds and the query is complex and reads a lot of data, you may have to stop the tuning project immediately, because you will not satisfy expectations.
  • Collect wait time information because all performance problems are not created equal.
  • If you find a query waiting on locking/blocking wait events, you know that tuning the end user wait time is not necessarily about tuning the SQL statement. It could include tuning other SQL statements in the transaction that cause resources to be locked longer than needed. You should become very good friends with the developers and the people that designed the application.
  • If you find a statement waiting on physical I/O, e.g. any wait type that starts with PAGEIOLATCH, this also helps you get to the next step. Significant waits on these wait types are many times related to table or index scans. Tuning in this case is often accomplished by adding or modifying indexes.
  • If you have latch contention, you have to determine the exact latch causing the bottleneck and understand why it is so popular. Typically, SQL tuning is the answer.
  • If you see network waits, it is rarely a network problem. Is the statement returning a lot of data, either in the form of a lot of rows, or large columns like LOBs? On rare occasions there could be real network latency between the database and the client, and in this case, make friends with your network administrator.
  • Quite likely there will be multiple problems. There could be a table scan on Table1 and an inefficient index being used to access Table2 which is also causing latching problems.

If a query executes for 3 minutes, understand what makes up that 3 minutes from a wait time view, i.e. it waits 2:30 on PAGEIOLATCH_SH, 15 seconds on LATCH_SH, and 15 seconds onCPU (service time).