What the heck is a BTree and why is it important to us?

I was recently discussing indexes with  some developers of mine and in our discussions the subject of a BTree came up.  You don’t really hear too much about these things every day, but they are very important to understand so I thought it would be a great blog post subject to write about today.  So, what the heck is a BTree and why is it so important to us?  Let’s first just focus on the word itself.  The meaning of the “B” in “BTree” has been debated throughout the community. Some have suggested “Balanced”, “Bayer”, “Boeing”, “broad”,  and even “bushy”.  Could it be “branch”?   Hmm.  It really doesn’t matter but the “tree” part is a bit more meaningful.   You see SQL Server stores its data like a tree.  There are different kinds of trees, B-Trees and B+Trees.   A B-Tree has the root node that contains the keys at the top and branches down to the leaf nodes to find all the data.  A B+tree is the opposite. B+Trees are therefore much easier and higher performing when performing full scans as you are already down to the leaf nodes. To do a full scan with a B-Tree you need to do a full tree traversal to find all the data.  B-Trees on the other hand can be faster when you do a seek (looking for a specific piece of data by key) especially when the tree resides in RAM or other non-block storage. Since you can elevate commonly used nodes in the tree there are less comparisons required to get to the data.  An easy way to remember is to think about it in terms of time.   Generally. tables with no indexes take longer to traverse as the data is stored as heaps in B+Trees.  Tables with indexes are typically faster as the data is stored in ordered hierarchies as a B-Tree.

heap takes more time B+   

Indexed is faster so B-

The following illustration I found on the internet somwhere but I do not recall where.  It does make it a lot easier to understand though. 

  btree

Obfuscation and how it can help you today

In order to dive deeper into the subject we first need to develop a firm understanding of the word itself.  Obfuscation literally means “to conceal meaning, or to make more difficult to interpret.”  So, if we are obfuscating data we are masking the true meaning of the data itself.  So how is this then useful and important to database administrators?

In today’s world we store tremendous amounts of sensitive information in our databases.  We store people’s names, addresses, email addresses, account numbers, social security numbers and even bank account numbers.  We have so much identifying information in our databases and without careful preparations to secure this information; we are heading for disaster.  Lucky for us there are many ways we can secure this data.  We can encrypt our databases entirely, delete the sensitive data, or we can buy software to hide the data for us. This, however, doesn’t always solve the problem entirely for us. 

Consider the following situation:  We have a production database environment.  Any modifications to the production database should first be implemented and tested on a development or test database.  In order to ensure the accuracy of this testing, the development database often needs to mimic production as much as possible, in terms of the data contained etc.  The problem is that the developers are now having elevated rights and can see production level data that they otherwise would not have had access to see. This is a severe and intolerable security risk to the organization and its customers. 

In order to avoid these risks we need to disguise the data.  There are ways to perform these tasks without needing any algorithm, encryption or data type change.  We can obfuscate the data just as good for most companies using the following methods.

  • Character Scrambling
  • Repeating Character Masking
  • Numeric Variance
  • Nulling
  • Artificial Data Generation
  • Truncating
  • Encoding
  • Aggregating.

John Magnabosco’s article “Obfuscating your SQL Server Data” describes these methods to obfuscate data in further detail and he also includes a lot of really great example scripts that you can use to help you get started on your own methods.

What to do when a Rollback doesn’t rollback

I recently came across an article by Becky Sweger   that reminded me of a situation I experienced myself a while ago where a transaction did not rollback as planned.  This caused quite a surprise to me when I experienced it myself, so I thought I would write a small blog about it to help others as well.  Say for instance, that you have code that is issuing two or more dml statements  and they are grouped inside of a single transaction.  I will use Becky’s example here for simplicity sake:

 

BEGIN TRANSACTION

INSERT INTO dbo.Table1 (some COLUMNS)
VALUES (some VALUES)

SELECT @error = @@error, @rowcount = @@rowcount

IF @error <> 0
BEGIN ROLLBACK TRANSACTION RAISERROR('error on the first insert!', 16, 1) RETURN END

[DO STUFF]

INSERT INTO dbo.Table2 (some COLUMNS)
VALUES (some VALUES)

SELECT @error = @@error, @rowcount = @@rowcount

IF @error <> 0
BEGIN ROLLBACK TRANSACTION RAISERROR('error on the second insert!', 16, 1) RETURN END

COMMIT TRANSACTION

One would believe the error handling would be sufficient enough and the transaction would rollback when an error occurs.  While most of the time this is true, there are some occasions where it will not.  For instance, if there is a run-time error on the second statement, the first statement statement would not rollback.  Runtime errors could be anything from a renamed object to a devide by zero error etc.  You would be surprised how often this can happen. 

To prevent this from happening to you try to always include a SET XACT_ABORT to ON statement with all your transactions.  This will specify that the current transaction will automatically be rolled back in the event of a run-time error.

Beware, though, that even when XACT_ABORT is ON, not all errors terminate the batch. Here are the exceptions I know of:

  • Errors you raise yourself with RAISERROR.
  • Compilation errors (which normally terminate the scope) do not terminate the batch.
  • Error 266, Transaction count after EXECUTE indicates that a COMMIT or ROLLBACK TRANSACTION statement is missing.

So at a minimum you still need to check @@error after the execution of a stored procedure or a block of dynamic SQL even if you use XACT_ABORT ON.

Upgrading your SQL Servers

 Do you want to go from 2000-2005 or 2000 -2008 or do you want to by-pass 2005 and just go from 2000-2008.  The decision is based mainly on your comfort level and your cost benefit analysis.  Regardless of your choice I am writing this article based on a 2000-2008 upgrade. 

Microsoft has two distinct ways to Update SQL Server : a side-by-side upgrade and an in-place upgrade. Each employs a unique methodology with regard to the upgrade process.  There are advantages and disadvantages associated with each. Some factors to include when determining which of the upgrade solutions to choose include scheduled downtime, cost, complexity, and other factors.  Below will be an overview of the steps involved with each of the methods, as well as the strengths and weaknesses associated with each method.

Side-By-Side Upgrade

The side-by-side upgrade process is a manual process that utilizes two distinct servers – one running SQL Server 2000 (SQL 2k) and the other running SQL Server 2008 (SQL 2k8).  Each of these servers is configured as distinct entities with distinct server names and properties. 

Upgrade Steps

Below are the complete steps associated with a Side-By-Side Upgrade.

  1. Install a separate instance of SQL Server 2008 on a separate server. The legacy instance continues to be available.
  2. Run the SQL Server 2008 Upgrade Advisor (will be explained in detail later in this document) against the legacy instance and remove any upgrade blocker issues.
  3. Stop all update activity to the legacy instance. This may involve disconnecting all users or forcing applications to read-only activity.
  4. Transfer data, packages, or other objects from the legacy instance to the SQL Server 2008 instance.   This is an entirely manual process which introduces the possibility of human error.
  5. Apply supporting objects such as SQL Server Agent jobs, security settings, configuration settings, and DTS packages to the new SQL Server 2008 instance.
  6. Verify that the new instance supports the required applications, using validation scripts and user-acceptance tests.
  7. If the new instance passes validations and acceptance tests, redirect applications and users to the new instance. At this point, the new instance is available and databases are online.
  8. If the new instance does not pass validation or acceptance tests, direct users and applications to the legacy database server.

Advantages of a Side-By-Side Update

Below are some advantages of proceeding with a Side-By-Side Update.

  1. You gain more granular control over what database objects you want to upgrade.  You have the capability of choosing individual databases from a server and upgrading them while leaving the rest intact.
  2. You can run the legacy database server alongside the new server, perform a test migration, and research compatibility issues without disturbing the production system.
  3. The legacy database server remains available during the migration, although it cannot be updated for at least the time it takes to transfer data.
  4. You can roll back to the legacy system if the migration does not pass validation and acceptance tests.

Disadvantages of a Side-By-Side Upgrade

Below are some disadvantages of proceeding with a Side-By-Side upgrade:

  1. A side-by-side upgrade will require new or additional hardware resources.  In production environments, this requirement may make this upgrade method cost prohibitive.
  2. You must redirect applications and users to a new instance and possibly a new server. This will require some recoding in the application and configuration settings.
  3. You must manually transfer data, security, configuration settings, and other supporting objects to the new instance.
  1. The downtime might be longer in some cases because large amounts of data must be transferred from one server to another.

In-Place Upgrade

            The in-place upgrade is an automated process that is handled through Microsoft’s Database Server Upgrade executable.  This executable is a wizard driven application that moves an entire instance of SQL 2k to SQL 2k8.  At the completion of the in-place upgrade, the legacy SQL 2k instance is no longer available, as it is replaced with the new SQL 2k8 instance.

Upgrade Steps

Below are the complete steps associated with an In-Place upgrade.

  1. The SQL Server 2008 Setup prerequisites are installed (Microsoft .NET Framework, SQL Server Native Client, and so on). The legacy instance databases continue to be available.
  2. Setup checks for upgrade blockers, a small set of issues that will completely block an upgrade. If any are found, Setup will warn and exit.  These blockers, if any, are usually identified during the testing phase with the SQL Server 2008 Upgrade Advisor.
  3. Setup installs the required SQL Server 2008 executables and support files
  4. Setup stops the legacy SQL Server service. At this point, the legacy instance is no longer available.
  5. SQL Server 2008 updates the selected component data and objects.
  6. Setup removes the legacy executables and support files as well as the legacy tools. The new SQL Server 2008 instance is now fully available.
  7. An in-place upgrade is easier and faster than a side-by-side upgrade.
  8. It is mostly an automated process.
  9. The instance will be offline for only the minimum amount of time.
  10. The resulting upgraded instance has the same server name and configuration settings as the original.
  11. Applications continue to connect to the same instance name without any modification to the code base.
  12. No additional hardware is required because only the one instance is involved.
  13. You must upgrade an entire instance or a major SQL Server component. 
  14. You cannot directly upgrade a single database.
  15. There will be some disk use growth during the metadata upgrade process.  This would require the disk to have room for growth on each database log file,  as well as the tempdb log and data files.
  16. You must inspect the entire instance for backward compatibility issues.  This is to ensure that little used functionality continues to behave as expected.
  17. You cannot run the newly upgraded version alongside the legacy version for comparison.
  18. Rollback of upgraded data and the upgraded instance is complex.  Because the legacy version of the system is unavailable, the only way to rollback from an in-place upgrade is to re-install the legacy SQL Server and then manually restore each of the objects.

Advantages of an In-Place Upgrade

  1. An in-place upgrade is easier and faster than a side-by-side upgrade.
  2. It is mostly an automated process.
  3. The instance will be offline for only the minimum amount of time.
  4. The resulting upgraded instance has the same server name and configuration settings as the original.
  5. Applications continue to connect to the same instance name without any modification to the code base.
  6. No additional hardware is required because only the one instance is involved.

Disadvantages of an In-Place Upgrade

  1. You must upgrade an entire instance or a major SQL Server component. 
  2. You cannot directly upgrade a single database.
  3. There will be some disk use growth during the metadata upgrade process.  This would require the disk to have room for growth on each database log file,  as well as the tempdb log and data files.
  4. You must inspect the entire instance for backward compatibility issues.  This is to ensure that little used functionality continues to behave as expected.
  5. You cannot run the newly upgraded version alongside the legacy version for comparison.
  6. Rollback of upgraded data and the upgraded instance is complex.  Because the legacy version of the system is unavailable, the only way to rollback from an in-place upgrade is to re-install the legacy SQL Server and then manually restore each of the objects.

Upgrade Tools

            There are three primary tools that are used to plan, test, and verify the upgrade.  The tools that will be mentioned and described in this section are Microsoft’s SQL Server 2008 Upgrade Advisor,  Scalability Expert’s SQL Server 2008 Application Upgrade Advisor and DTS xChange.  Below is a description of each of the tools to be used.  Later, it will be explained how each of these tools will be used for planning and verifying the upgrade.

SQL Server 2008 Upgrade Advisor

            SQL Server 2008 Upgrade Advisor is a tool developed by Microsoft to help identify any issues that may arise during the process of upgrading from SQL 2k to SQL 2k8.  The Upgrade Advisor will highlight issues that might prevent a successful upgrade, as well as issues that might introduce backwards compatibility issues.  Upgrade Advisor analyzes your legacy instances and produces reports detailing upgrade issues by SQL Server component. The resulting reports show detected issues and also guidance on how to fix the issues or work around them.  These reports are stored on disk, so they can be retrieved and reviewed multiple times.  Upgrade Advisor executes read only operations against the server and is CPU intensive, so it is best to run off of a remote machine.  When you perform the analysis with Upgrade Advisor, you have the option to analyze any or all of the SQL Server components: SQL Server Database Engine, Analysis Services, Reporting Services, Notification Services, and Data Transformation Services.  When analyzing the Database Engine, you also have the option to select which of the databases you wish to include in the analysis.

            Once the analysis is performed on each of these components, the Upgrade Advisor prepares a report of its findings.  The report is divided by component, and highlights issues that need to be addressed.  The report classifies these issues into issues that need to be addressed prior to the upgrade, issues that should be addressed after the upgrade, and issues that should be addressed at any time.  Clicking on any issues that are highlighted will bring up suggestions by Microsoft regarding what steps may be taken to correct the issues.  Because the analysis can be run multiple times, we would run the analysis until the report only shows issues to be addressed after the upgrade.  We will not upgrade until all of the issues identified as needing to be addressed prior to the upgrade, or addressed at anytime are completed.

SQL Server Application Upgrade Advisor

            The Application Upgrade Advisor is a third party application that was developed by Scalability Experts.  Like the Database Upgrade Advisor, the Application Upgrade Advisor performs an analysis and generates reports based on that analysis.  While the Database Upgrade Advisor performs a validation to see if the database would successfully update, the Application Upgrade Advisor is used to validate whether the applications would be able to successfully operate under SQL Server 2008.  To successfully utilize the Application Upgrade Advisor, you have to incorporate results generated from the SQL Server Database Upgrade Advisor.  Listed below are the steps to run the Application Upgrade Advisor.

Running the Application Upgrade Advisor

  1. Create a Test Environment – Ensure that the environment used for testing is configured as closely as possible to the specifications of the production environment.
  2. Capture a Playback – A playback consists of a backup of all system and user databases and a trace that can be replayed.  To capture the trace, the front end GUI that utilizes the database should be used.  If there is an automated script that is used for regression testing, this should be used to guarantee that a comprehensive test is performed.
  3. Setup Playback Baseline System – Restore the backups of all of the databases that were taken in Step 2.
  4. Run SQL Server Database Upgrade Advisor – Run the Upgrade Advisor as described above and fix any issues that are identified to be fixed.
  5. Replay Trace on SQL Server 2000 – After running the Database Upgrade Advisor, replay the trace created in step 2.  The replayed trace result becomes the baseline from SQL 2k.  This baseline will be compared with the same trace, replayed on SQL 2k8 to help us understand any changes to expect after upgrading the production servers.
  6. Setup Playback Test System – During this step, the backups from step 2 are restored.
  7. Upgrade to SQL Server 2k8 – With the databases restored, upgrade the test server to the latest version of SQL Server.
  8. Replace Trace on SQL Server 2k8 – To evaluate the success of the upgrade, replay the trace against the SQL Server 2k8 database.
  9. Compare Trace Files – At this stage, there will be results of the trace from SQL 2k and the results of the trace from SQL 2k8.  Application Upgrade Advisor will evaluate both trace file results and generate a report comparing the two.
  10. Analyze Results – Use the Application Upgrade viewer to evaluate the differences between the results of the two traces.

DTS xChange

Although SQL Server 2008 Integration Services may be considered the fourth iteration of an ETL tool in SQL Server, it is dramatically different from SQL Server 2000 Data Transformation Services (DTS). The architecture of SSIS has changed drastically to support an in-memory ETL method that can efficiently support a load of millions of rows. To take advantage of this new architecture, organizations must migrate their DTS packages into SSIS and apply new best practices to the migrated packages.

SQL Server 2008 comes with the DTS Package Migration Wizard, a built-in means of migrating simple DTS packages to SSIS. However, this method will not meet the needs of all organizations. For organizations in need of a more robust solution that can handle thousands of packages with little administrator intervention, DTS xChange is an enterprise solution, offered by a Microsoft partner Pragmatic Works, that migrates DTS packages to Integration Services while applying a series of best practices rules to the packages.

DTS xChange is broken into three components:

  1. Profile: DTS xChange Profiler helps organizations estimate their migration project in hours and dollar cost whether they choose to use an automation tool or not.
  2. Convert: DTS xChange migrates packages, applying rules to each DTS package as it migrates them to enforce best practices.
  3. Monitor: The SSIS Performance Warehouse is a software development kit (SDK) to help organizations get the most out of their new Integration Services environment. It contains a series of reports and a data warehouse to monitor administrators’ Integration Services package execution.

Testing Process

            We will validate that our applications in their current state will continue to operate under SQL Server 2k8.  Below will be a description of the requirements of the testing process, as well as a detailed description of each of the phases.

Testing Requirements

            During testing, we will validate our applications ability to run successfully against SQL Server 2008.  This will be done primarily using SQL Server Application Upgrade Advisor.

  1. Restore backup of legacy databases onto server to be upgraded.
  2. I will ensure that restore operations are successful and connectivity has been established.
  3. QA group verifies that restored databases behave as expected.
  4. Proceed with detailed steps that were highlighted under description of SQL Server Application Upgrade Advisor.  This description contains instructions for capturing trace files, running the Database Upgrade Advisor, upgrading the database, and re-running the trace files.
    1. During the running of the Database Upgrade Advisor, a list of issues that need to be addressed will be created.  It will be the responsibility of the DBA group and the Developer groups to address these issues.  Note that this may potentially be a lengthy process as there may be a number of server and/or database related issues.  The Database Upgrade Advisor is designed to be run multiple times, and each time should theoretically produce less issues.  These issues and their resolutions would be saved.  If possible, the solutions to the issues would be developed in a repeatable format so that the solutions may be reused when they are encountered in the production environment.
  5. Compare the results of the trace files that were run against the SQL 2k and SQL 2k8 versions of the database.
  6. DBA and QA groups document any known issues.
  7. DBA and Developer groups address any discrepancies between the results of the two trace files.  Once the discrepancies are corrected, the process is repeated until the results of the two trace files are identical.
  8. When the results of the trace files are identical, each group signs off that the application is verified for SQL Server 2008.

Searching for the “Qure” to your Performance Headaches?

Today I discovered a new product called “Qure“.  Qure is a SQL Server performance tuning tool and it requires only two inputs:  
  1. A trace of your production server activity
  2. A current backup of your production database  

I took a look at their tutorial video and it looks quite impressive.  Their documentation states that they focus on 4 areas for SQL Server Performance.  They are as follows:  

1) Smart Indexing 

Qure recommends creating, removing and modifying indexes for best overall performance.
Qure evaluates all possible indexes for a query and then uses heuristics to combine the indexes for all queries, guaranteeing that:

  • Redundant indexes are eliminated
  • Indexes have the highest performance impact for as many queries as possible
  • The creation and dropping of indexes produces a significant, database-wide performance improvement

My Take on it: The product page states that they use a patented algorithm that suggests code modifications and places where indexes should be changed. I would like to review and compare, how much better it is to the Index Tuning Wizard.

2) Query Syntax Optimization –

Qure evaluates numerous parameters for each query and generates pinpoint recommendations for query rewrites, including correctional scripts!

Query rewrite example 

The following query retrieves all orders made in 1997 and uses the YEAR function, which may prevent efficient index use:
This query can be rewritten without the YEAR function, potentially enabling efficient index seeking. In this case, Qure provides the following alternative syntax recommendation:
 
SELECT *
FROM Orders AS o
WHERE YEAR(OrderDate) = 1997
 
SELECT *
FROM Orders AS o
WHERE OrderDate >= (’1997-01-01T00:00:00′)
AND
OrderDate < (’1998-01-01T00:00:00′)
 

My Take on it: This is just cool, no?  It makes changes to the code and then gives you the actual performance gains and time diferences?  That’s just amazing.

3) Schema Optimization

Qure evaluates the scehma structure and can recommendt changing column data-types, adding or removing constraints and correcting potential design anomalies.
Qure performs a comprehensive statistical analysis of the data and its usage patterns and may suggest modifying the schema to enhance performance.

My Take on it: This feature is like combining data profiling capabilities of SSIS with Database Tuning Advisor. This really makes me curious of exploring, how effective this feature is. If it works to my expectation, it can revolutionize database modeling.

 
4) Additional Optimizations

Miscellaneous performance tuning recommendations, which may have a huge impact on performance
For example:

  • Coding tecnhiques and best practices violations
  • Maintenance optimization
  • Hint usage
  • Potentially erroneous queries
  • Database and server settings

My Take on it: I would like to compare performance of this feature with that of Best Practices Analyzer tool. 

Conclusion

Overall this seems to be quite a powerful tool.  The cost of this product seems a bit higher than other vendors products that I have seen in the past so I’m not sure if there is any justification. I will review in depth in the coming days and I will post a followup so I can give a more diffinitive asnwer.

Graphical Execution Plan

The execution plan is the method by which SQL Server accesses the data that resides in the underlying tables.  It examines your query then determines the most efficient way to retrieve the data.  Execution plans are compiled and stored with stored procedures, so SQL Server will know the best way to access data when a given stored procedure is called.  SQL Server provides a GUI that displays how the execution plan is used to retrieve the data.  When developing procedures, you should utilize the graphical execution plan to assist in writing the most efficient stored procedures.  Please keep the following tips in mind when examining the graphical execution plan:

  • If you would like to see the execution plan, but not execute the query, choose “Display Estimated Execution Plan”.  This will show you an estimate of the actual execution plan.  Note that this will not work if temp tables are used because the temp tables cannot be parsed until runtime.  If you use a table variable, however, the estimated execution plan will return successfully.
  • Graphical Execution plans are read from right to left, and the arrows on each step correspond to the number of rows returned for that step.  The fatter the arrow, the more rows returned.  The execution plan eventually merges into one last step on the left side of the plan.  Thick arrows should appear on the right of the execution plan and get smaller as the plan moves left.
  • Execution plans are broken down into smaller substeps.  Each substep is displayed as a percentage of runtime versus the query as a whole.  For a long running query, identify the substep with the highest percentage and begin tuning at that step.
  • Each action within the substep is displayed as a percentage of runtime versus the substep as a whole.  For long running queries, look to tune the long running actions.
  • If you move your cursor over any of the displayed actions, a pop up window appears with more detailed information on that particular step.
  • If you see the below in an execution plan, look to modify the query so that these actions are eliminated or reduced.  It’s not always possible to avoid them, but the more they can be prevented, the better performance will be:
    • Index or table scans: May indicate a need for better or additional indexes
    • Bookmark Lookups:  Consider changing the current clustered index, consider using a covering index, reduce the number of columns in the SELECT statement
    • Filter: Remove any functions in the WHERE clause , don’t include Views in your Transact-SQL code,  may need additional indexes
    • Sort: Does the data really need to be sorted?  Can sorting be done at the client more efficiently?
  •  While I/O and CPU cost don’t directly refer to distinct measurements (they are internal SQL counters), higher I/O and CPU cost values in the execution plan utilize more resources than lower values
  •  Another cause of Bookmark lookups are the use of “SELECT *”
  •  Sometimes SQL Server will need to create a temporary worktable in the tempdb database.  Anytime a worktable is used, performance is hurt because of the additional I/O for maintaining the worktable. 
  • The graphical execution plan will warn you that a worktable is being used when the following actions are displayed:
    • Index Spool
    • Row Count Spool
    • Table Spool

Naming Conventions

There are many naming conventions out there like camelCase, PascalCase, under_scores etc.  Each organization has their own preferences.  As long as you have one stick to it as it will ensure cleaner code and easier management.  Here is an example of one that I have used in the past:

Stored Procedures

Stored procedures should never be prefixed with “sp_”.  This prefix is reserved for system stored procedures.   Instead, a prefix of “spp_” should be used, followed by the namespace of the procedure, followed by it’s action, and potentially, it’s object.  Each component of this naming convention may not always apply, but every effort should be made to adhere to this naming standard.  The syntax should be “spp_<Namespace>_<Action>_<Object>”.  Examples are below:

  • spp_App_Insert_Employees
  • spp_ToolA_View_StaffNew
  • spp_Admin_ArchiveWS

User Defined Functions

User defined functions accept one or more parameters and can be used to return single values or table values.  User defined functions can provide programming flexibility by allowing in-line calls within a SELECT statement.  Naming for user defined functions should have a “fn_” prefix, followed by the namespace and it’s action.  The syntax should be “fn_<Namespace>_<Action>”.  Examples are below:

  • fn_App_ChainLayawayReturnsByDate
  • fn_App_ChainSalesByDate
  • fn_App_SalesByDate

 Views

               Views are virtual tables that are compiled at runtime.  The data associated with views are not stored in the view, but rather, are stored in the base tables.  In certain instances, it may be possible to modify data through a view. Naming for a view should have a “vw_” prefix, followed by the namespace and the namespace and it’s results.  The syntax should be “vw_<Namespace>_<Result>”.  Examples are below:

  • vw_App_SODetails
  • vw_App_UploadRMDetail
  • vw_App_DDSDetail

Triggers

               Triggers are pieces of code that are attached to tables and potentially get executed based on the manipulation of the table.  Naming for a trigger should have a “tr_” prefix, followed by the triggering action, and the table name.  The syntax should be “tr_<Ins,Upd, or Del>_<Table>”.  Examples are below:

  • tr_Ins_SODetailStore
  • tr_Upd_SODetailStore
  • tr_Del_SODetailStore

Indexes

               Indexes are database objects that are used to help the SQL Server query engine locate the data in the most efficient manner.  Indexes are attached to tables, and are defined by the columns within those tables.  Naming for an index should have an “IX_” prefix, followed by the table name, the label ‘Covering’ (if applicable), and columns.  If a covering index is used, the column should be replaced with the stored procedure name that required the stored procedure.  The syntax should be “IX_<TableName>_<Covering(Optional)>_<Column(s)>”.  Examples are below:

  • IX_SODetailStore_ChangeTypeID
  • IX_SODetailStore_Covering_CCAItemAvailability
  • IX_WSException_SODetailIDActualItemID

 Primary Keys

               Primary keys are unique indexes that are placed on a table that guarantees the uniqueness of a row as determined by the columns used to make up the key.  A table can have only one primary key.  A primary key can be created as a clustered or non-clustered key.  Naming for a primary key should have a “PK_” prefix, followed by the table name.  The syntax should be “PK_<TableName>”.  Examples are below:

  • PK_SOStoreDetail
  • PK_WSException
  • PK_Items

 Foreign Keys

               Foreign keys are constraints that are used to maintain referential integrity throughout the database.  Foreign keys should exist between any tables that may be joined in SELECT operations.  Naming for a foreign key should have an “FK_” prefix, followed by the target table name, followed by the source table name.  The syntax should be “FK_<TargetTable>_<SourceTable>”.  Examples are below:

  • FK_SODetailStore_EventHeader
  • FK_SODetailStore_EventMembers
  • FK_SODetailStore_StatusHierarchy

 Constraints

 There are two remaining constraints – Unique Constraints and Default Constraints.  Unique constraints enforce that each column contains a maximum of one distinct value.  Default constraints insert a value into a column when a value is not explicitly specified.  Naming for unique constraints should have a “UQ_” prefix, followed by the table name, followed by the column name.  Naming for default constraints should have a “DF_” prefix, followed by the table name, followed by the column name.  Note:  Default constraints should not be created using Enterprise Manager, as EM creates a cryptic name for default constraints that do not conform to our standards.  The syntax for a unique constraint should be “UQ_<TableName>_<ColumnName(s)>”.  The syntax for column defaults should be “DF_<TableName>_<ColumnName>”.  Examples of each are below:

  • UQ_ActivityLogHO_HOLogID        
  • UQ_EventOwners_LastNameFirstname
  • DF_Employees_IsActive
  • DF_Employees_CreateDate

Stored Procedure Considerations

All SQL should be written using ANSI-SQL standards.  ANSI standards call for the joins to be explicitly listed instead of being joined in the where clause.

 

ANSI standard:

Select d.DepartmentName, e.LastName, e.FirstName

From Department d

Join Employee e on

         d.DeptID = e.DeptID

 

Non-ANSI:

Select d.DepartmentName, e.LastName, e.FirstName

From Department d, Employee e

Where d.DeptID = e.DeptID

 

ANSI compliant code runs faster, is easier to read, and is certain to be supported by future versions of SQL.

Use ‘Set NoCount On’ at the beginning of the stored procedure.  If you do not need to know the number of rows returned, you will save network traffic because the result set is not returned to the client.

Transactions should be kept as small as possible.  If you open a transaction, don’t begin any actions that wait for user input as this may lock the table if the user is not able to respond to the application.  Keep in mind that if a transaction is rolled back due to a failure, all the previous changes prior to the failure will not roll back unless ‘SET XACT_ABORT ON’ is specified.

Do not use “SELECT * in your queries.  Always list the required columns.  This results in reduced I/O and better performance. 

Avoid the creation of temporary tables whenever possible.  Temporary tables require more I/O and may lock up more system resources.  Use table variables whenever you would consider the use of temp tables.  If temp tables are absolutely necessary, remember to add indexes to them.  Indexes should be added after data is inserted into the temp table.

When using derived tables, assign an alias name to it as below:

 


Select  Salary

From Employee

Where EmployeeID in

 (Select EmployeeID from Employee
               Where lastname = ‘Smith’) a

 

Try to avoid wildcard characters at the beginning of a word while searching using the LIKE keyword, as that results in an index scan.  The following statement results in an index scan, while the second statement results in an index seek:

  • Select LocationID from Locations Where Specialties Like ‘%pples’
  • Select LocationID from Locations Where Specialties Like ‘A%s’

Avoid searching using not equals operators (<> and NOT) as they result in table and index scans

Avoid dynamic SQL statements inside your stored procedures as much as possible, as they tend to be slower than static SQL because they are not pre-compiled.  If possible, use IF and CASE statements to avoid dynamic SQL.  Another drawback to dynamic SQL is that the users must have access to the underlying objects that are called within the dynamic SQL, as opposed to only needing rights to execute the procedure.  If you must use dynamic SQL, instead of executing it with the ‘EXECUTE’ statement, call the ‘SP_EXECUTESQL’ stored procedure.  This allows the execution plan of the sql to remain in cache.

Use caution when working with NULLs.  NULLs do not evaluate to anything when comparing them, so it is good practice to convert them to another value using ISNULL or COALESCE.

If possible, avoid using cursors.  They use a lot of SQL Server resources and reduce the performance and scalability of applications.  If row by row processing is needed, try to find another method to perform the task.  Some options may be to perform the task at the client, use table variables, derived tables, IF statements, or CASE statements.

Use caution when working with dates.  Remember that SQL Server stores dates as a date/time field.  Whenever a date is passed to SQL Server, if a time is not specified, SQL Server defaults to midnight (00:00:000).  If you must compare dates, add midnight to the date you are passing instead of converting date values from a table.  When performing this type of conversion against a table, a table scan is used.

Avoid using SELECT INTO as it locks system tables and can prevent other users from accessing the data they need while the data is being inserted.  Instead, create the target table first, then execute an INSERT command.

If there is a function that repeatedly gets called throughout your stored procedure, call that function in the beginning of your procedure and store the value inside of a variable.  You can refer to the variable during successive calls.

If there are large tables that you will join to several times within your procedure, try finding a subset of the table that will satisfy the requirements each time it is used in a join.  Take this subset and store it in a table variable, then instead of joining to the original table, join to the table variable.

When inserting values into a table, always list the columns of the table.  This ensures that your procedure will behave as expected and will not break if the columns of the table change.

Try not to use NOT IN because SQL has to use a table scan.  Instead, try to use:

  • EXISTS or NOT EXISTS
  • Use IN
  • Perform a LEFT OUTER JOIN and check for a NULL condition

Use BETWEEN if you have the option to use it in place of IN.  The second query below performs faster than the first:

  • Select * from Customer where customer_number in (1,2,3)
  • Select * from Customer where customer_number between 1 and 3

If your WHERE clause has multiple conditions joined by AND, place the one that is least likely to be false first.  If it is false, then SQL Server will stop evaluating the rest of the expression, saving processing time.

Do not use ORDER BY in your SELECT statements unless it is absolutely necessary because it adds extra overhead.  If possible, sort the data at the client.

Avoid the use of GOTO commands as they have the potential of adding disorder to program flow.

Do not use ‘Select Max(ID)’ when trying to determine the maximum value of an identity column.  Use Scope_Identity or Ident_Current to get the latest value of an identity column.  Scope_Identity maintains the current scope of the query, so that may be the safer option.

Make sure that the @@ERROR global variable is checked after every statement that modifies data.  Make sure that rollbacks are performed prior to inserting rows into an exception table.

Use the SET XACT_ABORT ON to rollback all of the statements within a transaction if there is a transaction with multiple updates and one of them fails.

Table Performance Considerations

Table Performance Considerations

The following performance tips should be followed whenever possible.  These tips are based on industry standard best practices guidelines and should result in increased application performance when followed.

  • Tables should always have a primary key assigned to them. 
  • The primary key does not have to be configured to be a clustered key.
  • Tables should be normalized to the Third Normal Form.  This means that values that are not a part of the primary key should not be included in the table.  Consider the below table:

 

EmployeeID LastName FirstName Department State

1

Smith

Joe

Accounting

PA

2

Williams

Janice

MIS

MI

3

Thompson

Michael

Sales

FL

 

This table is holds employee information.  The Department and State columns are columns that can be used by other tables in the database, and should have their own tables assigned, complete with primary keys:

 

DepartmentId DepartmentName

1

Accounting

2

Sales

3

MIS

 

StateID StateName LongStateName

1

PA

Pennsylvania

2

TX

Texas

3

MI

Michigan

4

FL

Florida

5

CA

California

 

The redesigned employee table will have references to the primary keys of the department and state tables:

 

EmployeeID LastName FirstName DepartmentID StateID

1

Smith

Joe

1

1

2

Williams

Janice

3

3

3

Thompson

Michael

2

4

  •  Tables that contain a reference to another table should have a foreign key constraint to enforce referential integrity. 
  • In the above example, there should be a foreign key constraint between the Employee table and the Department table.
  • It should be noted that data warehouse schemas typically do not follow third normal form, and a de-normalization can help to increase performance in a warehouse environment.
  • The use of triggers on tables should be utilized with caution.  Triggers cause additional processing upon the data manipulation of a table.  This may result in increased I/O, and lock contention, and may degrade the overall performance of the database.  If possible, the triggering action should be included as part of the procedure that produces the appropriate insert/update/delete action.
  • The number of indexes that are created on a given table should be limited.  Whenever data is modified on a table, the indexes associated with those tables need to be maintained.  This causes increased activity on the server and may result in performance issues.  For tables that are relatively static, this is not a major concern.  For tables that are volatile, you should generally not create more than three or four indexes.
  • Indexes should be placed on columns that are highly selective.  Highly selective columns are columns that have a high number of distinct values.  SQL Server would probably ignore a non-clustered index on a gender column because there are only 2 possible values (M/F).
  • If creating covering indexes (composite indexes), choose columns that support a large number of the queries you are trying to improve.
  • Consider using a clustered index when the following occur:
    • The physical ordering supports the range retrievals of important queries, or equality returns many duplicates.
    • The clustered index key is used in the ORDER BY  clause or  GROUP BY  clause of critical queries.
    • The clustered index key is used in important joins to relate the tables – that is, it supports the foreign key
    • The clustered index columns are not updated regularly
  • Consider using a non-clustered index when the following occur:
    • The query is highly selective
    • The column is involved in a foreign key relationship
    • A covering index is required
  • Use user defined datatypes if a particular column repeats in a lot of tables, so that the datatype of that column is consistent across all tables.
  • Always use varchar in place of char.  The use of char always takes up space in the database, even if no values are entered. 
  • Never use nvarchar or nchar as datatypes.  These denote the use of Unicode, and this should be employed only in systems where multiple language sets are expected to be used.
  • Avoid the use of TEXT datatypes as they are extremely inflexible to query against.
  • Try to reduce the number of columns in a table.  The fewer the number of columns in a table, the less space the table will use, since more rows will fit on a single data page, and less I/O overhead will be required to access the table’s data.

Auto-Magically visualize your Perfmon and Profiler results

Troubleshooting SQL Server issues usually involves setting up a perfmon and profiler trace and then trying to identify the issues based on particular times and high reads or writes etc.  These have been industry standard ways to diagnose issues, but did you know you can integrate the two to get the best of bost worlds in a pretty little interface?  In this post I would like to go over how you can integrate the two together and dynamically see what is causing the various spikes to occur.

 

Setting up Perfmon

Open up PerfMon by going to Control Panel -> Administrative tools -> Performance.  Next go to the Performance Logs and Alerts section. Right click Counter Logs and select New Log Settings. Type a name for the log. 

 

 Add all the counters you want to monitor with the Add Counters button.  I typically like to include the following:

MemoryAvailable MBytes
 MemoryCache Bytes
 MemoryPages/sec
 MemoryPool Paged Resident Bytes
 MemorySystem Cache Resident Bytes
 MemorySystem Code Resident Bytes
 MemorySystem Driver Resident Bytes
 PhysicalDisk(_Total) Disk Write Time
 PhysicalDisk(_Total)Avg. Disk Bytes/Read
 PhysicalDisk(_Total)Avg. Disk Bytes/Write
 PhysicalDisk(_Total)Avg. Disk Queue Length
 PhysicalDisk(_Total)Avg. Disk Read Queue Length
 PhysicalDisk(_Total)Avg. Disk Write Queue Length
 PhysicalDisk(_Total)Disk Read Bytes/sec
 PhysicalDisk(_Total)Disk Write Bytes/sec
 Processor(_Total)Access MethodsCount Lob Readahead
 SQLServer:Access MethodsFull Scans/sec
 SQLServer:Access MethodsPage Splits/sec
 SQLServer:Access MethodsWorktables Created/sec
 SQLServer:Buffer ManagerBuffer cache hit ratio
 SQLServer:Cursor Manager by Type(_Total)Cursor memory usage
 SQLServer:Cursor Manager by Type(_Total)Cursor worktable usage
 SQLServer:Databases(_Total)Active Transactions
 SQLServer:Databases(_Total)Log Flushes/sec
 SQLServer:Databases(_Total)Percent Log Used
 SQLServer:Databases(_Total)Transactions/sec
 SQLServer:General StatisticsActive Temp Tables
 SQLServer:General StatisticsProcesses blocked
 SQLServer:General StatisticsUser Connections
 SQLServer:LatchesLatch Waits/sec
 SQLServer:LatchesTotal Latch Wait Time (ms)
 SQLServer:Locks(_Total)Lock Timeouts/sec
 SQLServer:Locks(_Total)Lock Wait Time (ms)
 SQLServer:Locks(_Total)Lock Waits/sec
 SQLServer:Locks(_Total)Number of Deadlocks/sec
 SQLServer:Memory ManagerConnection Memory (KB)
 SQLServer:Memory ManagerLock Memory (KB)
 SQLServer:Memory ManagerSQL Cache Memory (KB)
 SQLServer:Memory ManagerTarget Server Memory (KB)
 SQLServer:Memory ManagerTotal Server Memory (KB)
 SQLServer:SQL StatisticsSQL Compilations/sec
 SQLServer:SQL StatisticsSQL Re-Compilations/sec

 

Go to the Log Files tab and you can choose the log file type I typically use the Binary File but you can use any means you would like.  Hit Configure button and specify the filename and location

 

On the third tab you can schedule the start time etc.

When you click OK your log starts gathering data immediately. The status is shown with the green icon in the list next to log name in the list.

Set up our SQL Profiler Trace

Next we start our profiler trace.  You can do that by clicking  Start–> Programs–>Microsoft SQL Server–> Performance Tools–>SQL Profiler.

When it opens up go to File -> New Trace. Choose the server you wish to monitor. A window to set the Trace properties will open. 

In the General tab check the save to file box and select the file and location.

For the events you can choose all the ones you are interested in.  There are so many things you can profile for but I typically use just 4 and I include all their columns (You must make sure to include both start and end times or else you will not be able to correlate):

 

Once you have all your data collected you need to stop both the profiler trace and the perfmon trace. 

 Now its time to merge the two together.

To do so make sure you have closed all your profiler windows and then reopen your profiler trace.  (If you do not close Profiler and reload the trace some options will be greyed out.)

Next click File–>Import Performance Data–> and then select your Perfmon trace file.

 

 

Once you do so the two will automagically be linked together and you can select the counters you want to include.

 

 

Conclusion

When you integrate the PerfMon stats with the SQL Profiler Trace data together you can easilly determine the exact SQL Statements that were running when you experiences certain bottlenecks.  Try it out yourself and see how helpful it is.