Graphical Execution Plan

The execution plan is the method by which SQL Server accesses the data that resides in the underlying tables.  It examines your query then determines the most efficient way to retrieve the data.  Execution plans are compiled and stored with stored procedures, so SQL Server will know the best way to access data when a given stored procedure is called.  SQL Server provides a GUI that displays how the execution plan is used to retrieve the data.  When developing procedures, you should utilize the graphical execution plan to assist in writing the most efficient stored procedures.  Please keep the following tips in mind when examining the graphical execution plan:

  • If you would like to see the execution plan, but not execute the query, choose “Display Estimated Execution Plan”.  This will show you an estimate of the actual execution plan.  Note that this will not work if temp tables are used because the temp tables cannot be parsed until runtime.  If you use a table variable, however, the estimated execution plan will return successfully.
  • Graphical Execution plans are read from right to left, and the arrows on each step correspond to the number of rows returned for that step.  The fatter the arrow, the more rows returned.  The execution plan eventually merges into one last step on the left side of the plan.  Thick arrows should appear on the right of the execution plan and get smaller as the plan moves left.
  • Execution plans are broken down into smaller substeps.  Each substep is displayed as a percentage of runtime versus the query as a whole.  For a long running query, identify the substep with the highest percentage and begin tuning at that step.
  • Each action within the substep is displayed as a percentage of runtime versus the substep as a whole.  For long running queries, look to tune the long running actions.
  • If you move your cursor over any of the displayed actions, a pop up window appears with more detailed information on that particular step.
  • If you see the below in an execution plan, look to modify the query so that these actions are eliminated or reduced.  It’s not always possible to avoid them, but the more they can be prevented, the better performance will be:
    • Index or table scans: May indicate a need for better or additional indexes
    • Bookmark Lookups:  Consider changing the current clustered index, consider using a covering index, reduce the number of columns in the SELECT statement
    • Filter: Remove any functions in the WHERE clause , don’t include Views in your Transact-SQL code,  may need additional indexes
    • Sort: Does the data really need to be sorted?  Can sorting be done at the client more efficiently?
  •  While I/O and CPU cost don’t directly refer to distinct measurements (they are internal SQL counters), higher I/O and CPU cost values in the execution plan utilize more resources than lower values
  •  Another cause of Bookmark lookups are the use of “SELECT *”
  •  Sometimes SQL Server will need to create a temporary worktable in the tempdb database.  Anytime a worktable is used, performance is hurt because of the additional I/O for maintaining the worktable. 
  • The graphical execution plan will warn you that a worktable is being used when the following actions are displayed:
    • Index Spool
    • Row Count Spool
    • Table Spool

Naming Conventions

There are many naming conventions out there like camelCase, PascalCase, under_scores etc.  Each organization has their own preferences.  As long as you have one stick to it as it will ensure cleaner code and easier management.  Here is an example of one that I have used in the past:

Stored Procedures

Stored procedures should never be prefixed with “sp_”.  This prefix is reserved for system stored procedures.   Instead, a prefix of “spp_” should be used, followed by the namespace of the procedure, followed by it’s action, and potentially, it’s object.  Each component of this naming convention may not always apply, but every effort should be made to adhere to this naming standard.  The syntax should be “spp_<Namespace>_<Action>_<Object>”.  Examples are below:

  • spp_App_Insert_Employees
  • spp_ToolA_View_StaffNew
  • spp_Admin_ArchiveWS

User Defined Functions

User defined functions accept one or more parameters and can be used to return single values or table values.  User defined functions can provide programming flexibility by allowing in-line calls within a SELECT statement.  Naming for user defined functions should have a “fn_” prefix, followed by the namespace and it’s action.  The syntax should be “fn_<Namespace>_<Action>”.  Examples are below:

  • fn_App_ChainLayawayReturnsByDate
  • fn_App_ChainSalesByDate
  • fn_App_SalesByDate

 Views

               Views are virtual tables that are compiled at runtime.  The data associated with views are not stored in the view, but rather, are stored in the base tables.  In certain instances, it may be possible to modify data through a view. Naming for a view should have a “vw_” prefix, followed by the namespace and the namespace and it’s results.  The syntax should be “vw_<Namespace>_<Result>”.  Examples are below:

  • vw_App_SODetails
  • vw_App_UploadRMDetail
  • vw_App_DDSDetail

Triggers

               Triggers are pieces of code that are attached to tables and potentially get executed based on the manipulation of the table.  Naming for a trigger should have a “tr_” prefix, followed by the triggering action, and the table name.  The syntax should be “tr_<Ins,Upd, or Del>_<Table>”.  Examples are below:

  • tr_Ins_SODetailStore
  • tr_Upd_SODetailStore
  • tr_Del_SODetailStore

Indexes

               Indexes are database objects that are used to help the SQL Server query engine locate the data in the most efficient manner.  Indexes are attached to tables, and are defined by the columns within those tables.  Naming for an index should have an “IX_” prefix, followed by the table name, the label ‘Covering’ (if applicable), and columns.  If a covering index is used, the column should be replaced with the stored procedure name that required the stored procedure.  The syntax should be “IX_<TableName>_<Covering(Optional)>_<Column(s)>”.  Examples are below:

  • IX_SODetailStore_ChangeTypeID
  • IX_SODetailStore_Covering_CCAItemAvailability
  • IX_WSException_SODetailIDActualItemID

 Primary Keys

               Primary keys are unique indexes that are placed on a table that guarantees the uniqueness of a row as determined by the columns used to make up the key.  A table can have only one primary key.  A primary key can be created as a clustered or non-clustered key.  Naming for a primary key should have a “PK_” prefix, followed by the table name.  The syntax should be “PK_<TableName>”.  Examples are below:

  • PK_SOStoreDetail
  • PK_WSException
  • PK_Items

 Foreign Keys

               Foreign keys are constraints that are used to maintain referential integrity throughout the database.  Foreign keys should exist between any tables that may be joined in SELECT operations.  Naming for a foreign key should have an “FK_” prefix, followed by the target table name, followed by the source table name.  The syntax should be “FK_<TargetTable>_<SourceTable>”.  Examples are below:

  • FK_SODetailStore_EventHeader
  • FK_SODetailStore_EventMembers
  • FK_SODetailStore_StatusHierarchy

 Constraints

 There are two remaining constraints – Unique Constraints and Default Constraints.  Unique constraints enforce that each column contains a maximum of one distinct value.  Default constraints insert a value into a column when a value is not explicitly specified.  Naming for unique constraints should have a “UQ_” prefix, followed by the table name, followed by the column name.  Naming for default constraints should have a “DF_” prefix, followed by the table name, followed by the column name.  Note:  Default constraints should not be created using Enterprise Manager, as EM creates a cryptic name for default constraints that do not conform to our standards.  The syntax for a unique constraint should be “UQ_<TableName>_<ColumnName(s)>”.  The syntax for column defaults should be “DF_<TableName>_<ColumnName>”.  Examples of each are below:

  • UQ_ActivityLogHO_HOLogID        
  • UQ_EventOwners_LastNameFirstname
  • DF_Employees_IsActive
  • DF_Employees_CreateDate

Stored Procedure Considerations

All SQL should be written using ANSI-SQL standards.  ANSI standards call for the joins to be explicitly listed instead of being joined in the where clause.

 

ANSI standard:

Select d.DepartmentName, e.LastName, e.FirstName

From Department d

Join Employee e on

         d.DeptID = e.DeptID

 

Non-ANSI:

Select d.DepartmentName, e.LastName, e.FirstName

From Department d, Employee e

Where d.DeptID = e.DeptID

 

ANSI compliant code runs faster, is easier to read, and is certain to be supported by future versions of SQL.

Use ‘Set NoCount On’ at the beginning of the stored procedure.  If you do not need to know the number of rows returned, you will save network traffic because the result set is not returned to the client.

Transactions should be kept as small as possible.  If you open a transaction, don’t begin any actions that wait for user input as this may lock the table if the user is not able to respond to the application.  Keep in mind that if a transaction is rolled back due to a failure, all the previous changes prior to the failure will not roll back unless ‘SET XACT_ABORT ON’ is specified.

Do not use “SELECT * in your queries.  Always list the required columns.  This results in reduced I/O and better performance. 

Avoid the creation of temporary tables whenever possible.  Temporary tables require more I/O and may lock up more system resources.  Use table variables whenever you would consider the use of temp tables.  If temp tables are absolutely necessary, remember to add indexes to them.  Indexes should be added after data is inserted into the temp table.

When using derived tables, assign an alias name to it as below:

 


Select  Salary

From Employee

Where EmployeeID in

 (Select EmployeeID from Employee
               Where lastname = ‘Smith’) a

 

Try to avoid wildcard characters at the beginning of a word while searching using the LIKE keyword, as that results in an index scan.  The following statement results in an index scan, while the second statement results in an index seek:

  • Select LocationID from Locations Where Specialties Like ‘%pples’
  • Select LocationID from Locations Where Specialties Like ‘A%s’

Avoid searching using not equals operators (<> and NOT) as they result in table and index scans

Avoid dynamic SQL statements inside your stored procedures as much as possible, as they tend to be slower than static SQL because they are not pre-compiled.  If possible, use IF and CASE statements to avoid dynamic SQL.  Another drawback to dynamic SQL is that the users must have access to the underlying objects that are called within the dynamic SQL, as opposed to only needing rights to execute the procedure.  If you must use dynamic SQL, instead of executing it with the ‘EXECUTE’ statement, call the ‘SP_EXECUTESQL’ stored procedure.  This allows the execution plan of the sql to remain in cache.

Use caution when working with NULLs.  NULLs do not evaluate to anything when comparing them, so it is good practice to convert them to another value using ISNULL or COALESCE.

If possible, avoid using cursors.  They use a lot of SQL Server resources and reduce the performance and scalability of applications.  If row by row processing is needed, try to find another method to perform the task.  Some options may be to perform the task at the client, use table variables, derived tables, IF statements, or CASE statements.

Use caution when working with dates.  Remember that SQL Server stores dates as a date/time field.  Whenever a date is passed to SQL Server, if a time is not specified, SQL Server defaults to midnight (00:00:000).  If you must compare dates, add midnight to the date you are passing instead of converting date values from a table.  When performing this type of conversion against a table, a table scan is used.

Avoid using SELECT INTO as it locks system tables and can prevent other users from accessing the data they need while the data is being inserted.  Instead, create the target table first, then execute an INSERT command.

If there is a function that repeatedly gets called throughout your stored procedure, call that function in the beginning of your procedure and store the value inside of a variable.  You can refer to the variable during successive calls.

If there are large tables that you will join to several times within your procedure, try finding a subset of the table that will satisfy the requirements each time it is used in a join.  Take this subset and store it in a table variable, then instead of joining to the original table, join to the table variable.

When inserting values into a table, always list the columns of the table.  This ensures that your procedure will behave as expected and will not break if the columns of the table change.

Try not to use NOT IN because SQL has to use a table scan.  Instead, try to use:

  • EXISTS or NOT EXISTS
  • Use IN
  • Perform a LEFT OUTER JOIN and check for a NULL condition

Use BETWEEN if you have the option to use it in place of IN.  The second query below performs faster than the first:

  • Select * from Customer where customer_number in (1,2,3)
  • Select * from Customer where customer_number between 1 and 3

If your WHERE clause has multiple conditions joined by AND, place the one that is least likely to be false first.  If it is false, then SQL Server will stop evaluating the rest of the expression, saving processing time.

Do not use ORDER BY in your SELECT statements unless it is absolutely necessary because it adds extra overhead.  If possible, sort the data at the client.

Avoid the use of GOTO commands as they have the potential of adding disorder to program flow.

Do not use ‘Select Max(ID)’ when trying to determine the maximum value of an identity column.  Use Scope_Identity or Ident_Current to get the latest value of an identity column.  Scope_Identity maintains the current scope of the query, so that may be the safer option.

Make sure that the @@ERROR global variable is checked after every statement that modifies data.  Make sure that rollbacks are performed prior to inserting rows into an exception table.

Use the SET XACT_ABORT ON to rollback all of the statements within a transaction if there is a transaction with multiple updates and one of them fails.

Table Performance Considerations

Table Performance Considerations

The following performance tips should be followed whenever possible.  These tips are based on industry standard best practices guidelines and should result in increased application performance when followed.

  • Tables should always have a primary key assigned to them. 
  • The primary key does not have to be configured to be a clustered key.
  • Tables should be normalized to the Third Normal Form.  This means that values that are not a part of the primary key should not be included in the table.  Consider the below table:

 

EmployeeID LastName FirstName Department State

1

Smith

Joe

Accounting

PA

2

Williams

Janice

MIS

MI

3

Thompson

Michael

Sales

FL

 

This table is holds employee information.  The Department and State columns are columns that can be used by other tables in the database, and should have their own tables assigned, complete with primary keys:

 

DepartmentId DepartmentName

1

Accounting

2

Sales

3

MIS

 

StateID StateName LongStateName

1

PA

Pennsylvania

2

TX

Texas

3

MI

Michigan

4

FL

Florida

5

CA

California

 

The redesigned employee table will have references to the primary keys of the department and state tables:

 

EmployeeID LastName FirstName DepartmentID StateID

1

Smith

Joe

1

1

2

Williams

Janice

3

3

3

Thompson

Michael

2

4

  •  Tables that contain a reference to another table should have a foreign key constraint to enforce referential integrity. 
  • In the above example, there should be a foreign key constraint between the Employee table and the Department table.
  • It should be noted that data warehouse schemas typically do not follow third normal form, and a de-normalization can help to increase performance in a warehouse environment.
  • The use of triggers on tables should be utilized with caution.  Triggers cause additional processing upon the data manipulation of a table.  This may result in increased I/O, and lock contention, and may degrade the overall performance of the database.  If possible, the triggering action should be included as part of the procedure that produces the appropriate insert/update/delete action.
  • The number of indexes that are created on a given table should be limited.  Whenever data is modified on a table, the indexes associated with those tables need to be maintained.  This causes increased activity on the server and may result in performance issues.  For tables that are relatively static, this is not a major concern.  For tables that are volatile, you should generally not create more than three or four indexes.
  • Indexes should be placed on columns that are highly selective.  Highly selective columns are columns that have a high number of distinct values.  SQL Server would probably ignore a non-clustered index on a gender column because there are only 2 possible values (M/F).
  • If creating covering indexes (composite indexes), choose columns that support a large number of the queries you are trying to improve.
  • Consider using a clustered index when the following occur:
    • The physical ordering supports the range retrievals of important queries, or equality returns many duplicates.
    • The clustered index key is used in the ORDER BY  clause or  GROUP BY  clause of critical queries.
    • The clustered index key is used in important joins to relate the tables – that is, it supports the foreign key
    • The clustered index columns are not updated regularly
  • Consider using a non-clustered index when the following occur:
    • The query is highly selective
    • The column is involved in a foreign key relationship
    • A covering index is required
  • Use user defined datatypes if a particular column repeats in a lot of tables, so that the datatype of that column is consistent across all tables.
  • Always use varchar in place of char.  The use of char always takes up space in the database, even if no values are entered. 
  • Never use nvarchar or nchar as datatypes.  These denote the use of Unicode, and this should be employed only in systems where multiple language sets are expected to be used.
  • Avoid the use of TEXT datatypes as they are extremely inflexible to query against.
  • Try to reduce the number of columns in a table.  The fewer the number of columns in a table, the less space the table will use, since more rows will fit on a single data page, and less I/O overhead will be required to access the table’s data.

Trigger Hell

Hallelujah and Amen Brother!  That is all I can say after reading Thomas LaRocks new article titled “The Trouble with Triggers.”  Here’s a small exerpt from his article:

“Imagine you are developing an application. You spend a few hours putting together some pages into a Web application, and now you want to verify that the data is getting inserted and updated correctly. You enter some data, click the Submit button, and go to the database to confirm that the data is there. Only it isn’t there, at least not your data.

You then spend the next few hours debugging your code. You step through every line imaginable. You call home so your kids can hear the sound of your voice because you know they will be in bed if and when you should ever get home. And then, out of nowhere, you get an idea. Could a trigger be changing the data? Sure enough, you find that a trigger has indeed been at work all along, and your code is working as expected. You’re just not seeing the expected results.

…Why no love for these seemingly benign and inanimate pieces of a database design? Because they’re typically built in so that you don’t even know they’re there. And that results in a waste of time when trying to troubleshoot application code, for example, and I hate wasting time.”

I am currently working on a project where they used triggers for everything and I mean everything.  They used triggers for Audit, Security, referential integrity, business logic, DML, and DDL alike.  To top it all off they often added cursors and scalar functions so there were single row based processing going on everywhere too.  Let me tell you what a performance nightmare it was trying to debug and tune this goliath of a pig database in the beginning.  I have made some tremendous ground on this project and let me tell you what a difference it has made by eliminating the triggers almost entirely.  The database code has actually shrunk dramatically and the performance of the apps has skyrocketed.  And it is all due to use cutting down on all the triggers. 

While triggers still have their strengths to apply certain business rule logic, I think the majority of them in databases currently are  just old school technology that should be forgotten.  They are outdated and most of their use can be ported over into stored proceedures that actually perform the updates themselves so it makes support so much easier.  What are your thoughts about Triggers in your organization?  I would love to hear your comments.

Reduce your code by 75% and an alternative to cursors

Ok this is more just a little trick with Dynamic sql, but it saves you tons of code and you avoid having to use cursors or single row iterations.  You can use it anywhere you would a cursor.  More set based operations…Horray!!!!

For this example we will do backups 

declare @sql nvarchar(max) -- used for dynamic sql
declare @fileDate varchar(20) -- used for file name

select @fileDate = convert(varchar(20),getdate(),112) -- used for file name

set @sql = 'declare @fileName varchar(256)' + char(13)-- we need a declare statement for filename

select @sql = @sql + char(13)+
'set @fileName = ''c:/backup/'+ name + '_' + @fileDate + '.BAK''' + char(13)+
'backup database '+ [name] + ' to disk = @fileName '+ char(13)
from sys.sysdatabases
where name not in ('master','model','msdb','tempdb') 

exec sp_executesql @sql

Here is a snippit from http://www.mssqltips.com/tip.asp?tip=1070 that illustrates a simple cursor alternative

DECLARE @name VARCHAR(50) -- database name
DECLARE @path VARCHAR(256) -- path for backup files
DECLARE @fileName VARCHAR(256) -- filename for backup
DECLARE @fileDate VARCHAR(20) -- used for file name

SET @path = 'C:Backup' 

SELECT @fileDate = CONVERT(VARCHAR(20),GETDATE(),112)

DECLARE db_cursor CURSOR FOR
SELECT name
FROM master.dbo.sysdatabases
WHERE name NOT IN ('master','model','msdb','tempdb') 

OPEN db_cursor
FETCH NEXT FROM db_cursor INTO @name  

WHILE @@FETCH_STATUS = 0
BEGIN
       SET @fileName = @path + @name + '_' + @fileDate + '.BAK'
       BACKUP DATABASE @name TO DISK = @fileName 

       FETCH NEXT FROM db_cursor INTO @name
END  

CLOSE db_cursor
DEALLOCATE db_cursor

When you compare the two and look at the profiler results it is an extreme performance increase as well

dbcc dropcleanbuffers
dbcc freeproccache
go

declare @sql nvarchar(max) -- used for dynamic sql
declare @fileDate varchar(20) -- used for file name

select @fileDate = convert(varchar(20),getdate(),112) -- used for file name

set @sql = 'declare @fileName varchar(256)' + char(13)-- we need a declare statement for filename

select @sql = @sql + char(13)+
'set @fileName = ''c:/backup/'+ name + '_' + @fileDate + '.BAK''' + char(13)+
'backup database '+ [name] + ' to disk = @fileName '+ char(13)
from sys.sysdatabases
where name not in ('master','model','msdb','tempdb') 

print( @sql )
go

dbcc dropcleanbuffers
dbcc freeproccache
go

DECLARE @name VARCHAR(50) -- database name
DECLARE @path VARCHAR(256) -- path for backup files
DECLARE @fileName VARCHAR(256) -- filename for backup
DECLARE @fileDate VARCHAR(20) -- used for file name

SET @path = 'C:Backup' 

SELECT @fileDate = CONVERT(VARCHAR(20),GETDATE(),112)

DECLARE db_cursor CURSOR FOR
SELECT name
FROM master.dbo.sysdatabases
WHERE name NOT IN ('master','model','msdb','tempdb') 

OPEN db_cursor
FETCH NEXT FROM db_cursor INTO @name  

WHILE @@FETCH_STATUS = 0
BEGIN
       SET @fileName = @path + @name + '_' + @fileDate + '.BAK'
      Print ('BACKUP DATABASE @name TO DISK = ' + @fileName) 

       FETCH NEXT FROM db_cursor INTO @name
END  

CLOSE db_cursor
DEALLOCATE db_cursor

When you run this and profile the results you can see the duration, reads and cpu use are dramatically less.

profiler