Wednesday 13 September 2017

Introduction to Locking in SQL Server

Locking is a major part of every RDBMS and is important to know about. It is a database functionality which without a multi-user environment could not work. The main problem of locking is that in an essence it's a logical and not physical problem. This means that no amount of hardware will help you in the end. Yes you might cut execution times but this is only a virtual fix. In a heavy multi-user environment any logical problems will appear sooner or later.

Lock modes

All examples are run under the default READ COMMITED isolation level. Taken locks differ between isolation levels, however these examples are just to demonstrate the lock mode with an example. Here's a little explanation of the three columns from sys.dm_tran_locks used in the examples:

resource_type
This tells us what resource in the database the locks are being taken on.It can be one of these values:
DATABASE, FILE, OBJECT, PAGE, KEY, EXTENT, RID, APPLICATION, METADATA, HOBT, ALLOCATION_UNIT.

request_mode
This tells us the mode of our lock.

resource_descriptionThis shows a brief description of the resource. Usually holds the id of the page, object, file, row, etc. It isn't populated for every type of lock
The filter on resource_type <> 'DATABASE' just means that we don't want to see general shared locks taken on databases. These are always present. All shown outputs are from the sys.dm_tran_locks dynamic management view. In some examples it is truncated to display only locks relevant for the example. For full output you can run these yourself.

Shared locks (S)

Shared locks are held on data being read under the pessimistic concurrency model. While a shared lock is being held other transactions can read but can't modify locked data. After the locked data has been read the shared lock is released, unless the transaction is being run with the locking hint (READCOMMITTED, READCOMMITTEDLOCK) or under the isolation level equal or more restrictive than Repeatable Read. In the example you can't see the shared locks because they're taken for the duration of the select statement and are already released when we would select data from sys.dm_tran_locks. That is why an addition of WITH (HOLDLOCK) is needed to see the locks.

BEGIN TRAN

USE AdventureWorks

SELECT * FROM Person.Address WITH (HOLDLOCK)
WHERE AddressId = 2
 
SELECT resource_type, request_mode, resource_description
FROM   sys.dm_tran_locks
WHERE  resource_type <> 'DATABASE'

ROLLBACK


Update locks (U)

Update locks are a mix of shared and exclusive locks. When a DML statement is executed SQL Server has to find the data it wants to modify first, so to avoid lock conversion deadlocks an update lock is used. Only one update lock can be held on the data at one time, similar to an exclusive lock. But the difference here is that the update lock itself can't modify the underlying data. It has to be converted to an exclusive lock before the modification takes place. You can also force an update lock with the UPDLOCK hint:
BEGIN TRAN

USE AdventureWorks

SELECT * FROM Person.Address WITH (UPDLOCK)
WHERE AddressId < 2

SELECT resource_type, request_mode, resource_description
FROM   sys.dm_tran_locks
WHERE  resource_type <> 'DATABASE'

ROLLBACK



Exclusive locks (X)

Exclusive locks are used to lock data being modified by one transaction thus preventing modifications by other concurrent transactions. You can read data held by exclusive lock only by specifying a NOLOCK hint or using a read uncommitted isolation level. Because DML statements first need to read the data they want to modify you'll always find Exclusive locks accompanied by shared locks on that same data.
BEGIN TRAN

USE AdventureWorks

UPDATE Person.Address 
SET AddressLine2 = 'Test Address 2'
WHERE AddressId = 5

SELECT resource_type, request_mode, resource_description
FROM   sys.dm_tran_locks
WHERE  resource_type <> 'DATABASE'

ROLLBACK


Intent locks (I)

Intent locks are a means in which a transaction notifies other transaction that it is intending to lock the data. Thus the name. Their purpose is to assure proper data modification by preventing other transactions to acquire a lock on the object higher in lock hierarchy. What this means is that before you obtain a lock on the page or the row level an intent lock is set on the table. This prevents other transactions from putting exclusive locks on the table that would try to cancel the row/page lock. In the example we can see the intent exclusive locks being placed on the page and the table where the key is to protect the data from being locked by other transactions.
BEGIN TRAN

USE AdventureWorks

UPDATE TOP(5) Person.Address 
SET AddressLine2 = 'Test Address 2'
WHERE PostalCode = '98011'

SELECT resource_type, request_mode, resource_description
FROM   sys.dm_tran_locks
WHERE  resource_type <> 'DATABASE'

ROLLBACK


Schema locks (Sch)

There are two types of schema locks:
  • Schema stability lock (Sch-S): Used while generating execution plans. These locks don't block access to the object data.
  • Schema modification lock (Sch-M): Used while executing a DDL statement. Blocks access to the object data since its structure is being changed.
In the example we can see the Sch-S and Sch-M locks being taken on the system tables and the TestTable plus a lot of other locks on the system tables.
BEGIN TRAN

USE AdventureWorks

CREATE TABLE TestTable (TestColumn INT)

SELECT resource_type, request_mode, resource_description
FROM   sys.dm_tran_locks
WHERE  resource_type <> 'DATABASE'

ROLLBACK


Bulk Update locks (BU)

Bulk Update locks are used by bulk operations when TABLOCK hint is used by the import. This allows for multiple fast concurrent inserts by disallowing data reading to other transactions.

Conversion locks

Conversion locks are locks resulting from converting one type of lock to another. There are 3 types of conversion locks:
  • Shared with Intent Exclusive (SIX). A transaction that holds a Shared lock also has some pages/rows locked with an Exclusive lock
  • Shared with Intent Update (SIU). A transaction that holds a Shared lock also has some pages/rows locked with an Update lock.
  • Update with Intent Exclusive (UIX). A transaction that holds an Update lock also has some pages/rows locked with an Exclusive lock.
In the example you can see the UIX conversion lock being taken on the page:
BEGIN TRAN

USE AdventureWorks

UPDATE TOP(5) Person.Address 
SET AddressLine2 = 'Test Address 2'
WHERE PostalCode = '98011'

SELECT resource_type, request_mode, resource_description
FROM   sys.dm_tran_locks
WHERE  resource_type <> 'DATABASE'

ROLLBACK


Key - Range locks

Key-range locks protect a range of rows implicitly included in a record set being read by a Transact-SQL statement while using the serializable transaction isolation level. Key-range locking prevents phantom reads. By protecting the ranges of keys between rows, it also prevents phantom insertions or deletions into a record set accessed by a transaction. In the example we can see that there are two types of key-range locks taken:
  • RangeX-X - exclusive lock on the interval between the keys and exclusive lock on the last key in the range
  • RangeS-U – shared lock on the interval between the keys and update lock on the last key in the range
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

BEGIN TRAN

USE AdventureWorks

UPDATE Person.Address 
SET AddressLine2 = 'Test Address 2'
WHERE AddressLine1 LIKE '987 %'

SELECT resource_type, request_mode, resource_description
FROM   sys.dm_tran_locks
WHERE  resource_type <> 'DATABASE'

ROLLBACK


Lock Granularity

Lock granularity consists of TABLE, PAGE and ROW locks. If you have a clustered index on the table then instead of a ROW lock you have a KEY lock. Locking on the lower level increases concurrency, but if a lot of locks are taken consumes more memory and vice versa for the higher levels. So granularity simply means the level at which the SQL Server locks data. Also note that the more restricted isolation level we choose, the higher the locking level to keep data in correct state. You can override the locking level by using ROWLOCK, PAGLOCK or TABLOCK hints but the use of these hints is discouraged since SQL Server know what are the appropriate locks to take for each scenario. If you must use them you should be aware of the concurrency and data consistency issues you might cause.

Spinlocks

Spinlocks are a light-weight lock mechanism that doesn't lock data but it waits for a short period of time for a lock to be free if a lock already exists on the data a transaction is trying to lock. It's a mutual exclusion mechanism to reduce context switching between threads in SQL Server.

Lock Compatibility Matrix

This is taken from http://msdn2.microsoft.com/En-US/library/ms186396.aspx. Also a good resource to have is a Lock Compatibility Matrix which tells you how each lock plays nice with other lock modes. It is one of those things you don't think you need up until the moment you need it.
Use the following table to determine the compatibility of all the lock modes available in Microsoft SQL Server.

Diagram showing lock compatibility matrix

Conclusion

Hopefully this article has shed some light on how SQL Server operates with locks and why is locking of such importance to proper application and database design and operation. Remember that locking problems are of logical and not physical nature so they have to be well thought out. Locking goes hand in hand with transaction isolation levels so be familiar with those too. In the next article I'll show some ways to resolve locking problems.

References

* https://msdn.microsoft.com/En-US/library/ms186396.aspx
* https://msdn.microsoft.com/en-us/library/jj856598(v=sql.120).aspx
* http://www.sqlteam.com/author/mladen-prajdic

Thank You.

Sunday 9 October 2016

SSIS – One Variable To Rule Them All

Variable Expressions

Back in the DTS days, in order to change the value of a variable we need to write VBScripts.  These scripts were at best irritating to write and were wrought with opportunities for misspellings and unrunnable code.
Enter stage right with SSIS and its variables, which can be evaluated as expressions.  This was a vast improvement over DTS and all of the VBScript that we had in the past was chucked.
But how exactly do we evaluate variables as expressions.  First, what does it mean to do this?  When a variable is evaluated as an expression, the value for the variable is calculated at run-time based on SSIS package properties, system properties, and other variables to determine a value for the variable.
From here though, let’s see how this works with our own variable.  To start create a package and open up the Variables window.  In this window create a new variable named SimpleVariable.



Once the variable is created bring up the Properties window for the variable.  There will be a number of things to choose from in the properties.  The item we are concerned with is EvaluateAsExpression.  Follow the red arrow below if cannot find the property.  Change this value to True.



When you change the EvaluateAsExpression property to True, an ellipse will be enabled (at the red arrow below) that will allow you to open the expression builder for the variable.



Click on the ellipse and the Expression Builder window will appear.  This is pictured below.  There are a couple areas to pay attention to when building expressions.  First, the area in green below contains all of the system and package variables that are available.  If you are unsure on the name of a variable just browse through this list to retrieve one.  Next, in the yellow area, has a list of all of the operations that are possible in expressions.  These can be a bit tricky to get used to at first and I’ll dedicate a later post to some of their nuances.



For our expression at the moment, type in “1 + 2”.  This is written in the Expression box above.  Selecting the Evaluate Expression button will return the result of 3.

This was a simple explanation of how to build an expression for a variable.  Now onto something much more useful.

The One Variable

Yesterday’s post focused on the configuration of the SSIS environment form a folder perspective.  The reason for that post was because I wanted to talk today about how I will often use one variable in a package as a fulcrum for setting many other variables and configuring other portions of the SSIS package.
In the Folder Structure post, I talked about the WorkFiles folder.  Often I include a variable that has this value which I use to set the values for files and folders that the package must interact with.  The rest of this post will demonstrate this practice.
In an SSIS package, create the following variables; WorkFolder, ArchiveFolder, FileName, LogFolder, and ResponseFile.  Set the value for the WorkFolder variable to “c:\temp”.  At this point your Variables window should look like this:

Now change the EvaluateAsExpression property for ArchiveFolder, FileName, LogFolder, and ResponseFile like we did in the previous example.  What we want to do now is create expressions for these variables based on the value in WorkFolder.  Type in the following expressions for each of the variables:
  • ArchiveFolder: @[User::WorkFolder] + “\\Archive\\”
  • FileName: @[User::WorkFolder] + “\\ImportFile.txt”
  • LogFolder: @[User::WorkFolder] + “\\Log\\”
  • ResponseFile: @[User::WorkFolder] + “\\Response\\”
Once these are all set, the values for each of the variables should change to the following

As you can see, the WorkFolder variable now controls the values for each of the other variables.  By just changing the value of WorkFolder all of the locations for all of the files can be changed.

Variable Wrap-Up

Using a variable or two to control the configuration of your SSIS packages can is relatively simple to do.  In the demo above, the one variable reduced the amount of configuration points needed for the package.  Instead of setting all of the values individually, they are set by the one variable.
This can help reduce configuration changes when you are moving your SSIS packages from environment to environment.  It can also reduce the amount of work required when troubleshooting a package since only a few variable will vary between environments.
Now that I’ve laid out some ground work, be prepared for a few packages in the next few days that I’ve used to solve business problems.

References

SSIS Package

Thank You.

Tuesday 31 May 2016

SSIS – Folder Structures

The Folder Structure

The SSIS folder structure is usually contained under a single root directory.  For simplicity, I often name this folder SSISYeah, I’m creative.  Off of the root folder, I’ll add folders for Packages and WorkFiles.
The Packages folder often contains many folders underneath it.  These folders are for all of the SSIS projects that have been deployed.  For every project there will be a Project folder.  The Project folder may have other folder beneath them, such as a Configurations folder.
The WorkFiles folder will be similar to the Packages folder in the sense that there will be Project folders below it.  TheProject folders will have other folders such as ArchiveLogsResponse, or Temporary.
If you are good at visualizing, the folder structure will look like this diagram:
image_thumb[3]

The Folder Purpose

Now that this has been described, less put some definition and understanding behind these folders.
  • Packages: Everything under the Packages folder is considered “code”.  Because of this status these folders are under change management control.  People who are not the gatekeepers should only be able to browse these folders, if that.  These folders should be backed up on a regular basis.
  • Project (under Packages): These folders contain all of the SSIS package files (dtsx) and configuration files (dtsconfig) for each deployed project.  A project could be a single SSIS project or a collections of projects based on a client’s business.  Typically, I store SSIS packages outside of the MSDB database in the file system.  This has worked best for change management and with security in the environments.
  • WorkFiles: Everything under the WorkFiles folder is considered temporary.  These folders will store files to be imported, exported, or log files.  Business users are often granted access to these folders depending on their projects and whether they need to pick-up, drop-off, or review files.
  • Project (under WorkFiles): These folders contain project, same project definition from above, related files.  They can be logs, data received for import, or data exported.
  • Archive (Workfiles –> Project): After a file has been imported by an SSIS package, the imported file is dated and placed in this folder.
  • Logs (Workfiles –> Project): Log files generated from the SSIS package and the SQL Agent job steps are placed here.  These need to be in a place where more than those running the jobs can find them.
  • Response (Workfiles –> Project): When the SSIS package has data that cannot be imported and is rejected for some reason – that data will go in this folder.
  • Temporary (Workfiles –> Project): This folder stores the raw files that are generated during an SSIS package execution.  This keeps them out of the way and is an easy place to just delete everything in the unlikely scenario that there are space constraints.

Folder Structure Wrap-Up

There are countless other folders that you could add, but this should cover a lot of the basics.  These have been most of the basics that I try to use.
This has been about how I do things.  The tips and tricks I’ve learned to keep things going and to reduce headaches.  These aren’t necessarily the right answers, but they are my answers.

SSIS - Data Flow Breakpoints

 Data Viewers Definition
Data Viewers are containers that can contain a single SSIS buffer’s work.  As the data passes through the Data Path you are able to view what is in each buffer.  As each buffer passes into the Data Viewer, the processing of the SSIS package on that Data Path is paused.  In essence, you have a breakpoint.  But not really.
There are four types of data viewers.  These are:
  • Grid: Displays selected columns display in a tabular format.
  • Histogram: Displays distribution of numeric values and works only with numeric data.
  • Scatter plot: Displays scatter plot graph works only with numeric data.  You are able to set the x-axis and y-axis.
  • Chart format: Displays a column chart of the count of discrete values in the selected column.
To be perfectly honest, I’ve only ever found a use for the Grid Data Viewer.  There may be situations that I’m unaware of that the others are of great benefit.  Unfortunately, I’ve never come across those.

Data Viewer Setup

Now setting up a Data Viewer on a Data Path is fairly simple.  First, go into the Data Flow and select the Data Path that you want to pause the data processing.  You can right-click to open the properties or double click on the Data Path.  Just like make sure your cursor is on the red “X” below.
image
Once you have that, the Data Flow Path Editor will open.  From this screen you can tab into the Data Viewers tab to get the options for Data Viewers.
SNAGHTML9262b24
The types of Data Viewers will appear on the left-hand side of the screen.  It’s shown below at the green arrow.  All four of the types from above will be listed.  As I mentioned, the only one I use is the Grid.  The other three have never been of use to me.  I’ve looked at them a few times and just thought they were pretty.
SNAGHTML9275bb6
Before moving on, it’s advisable if you are putting these into your SSIS package to name them with something meaningful.  The blue arrow and bubble above indicate the place where the name can be typed
Now this next step is optional but I do recommend that you consider this.  When you create a Data Viewer Grid you have the option to remove some columns from the grid.  This can be extremely handy when the Data Path has more than a handful of columns.  Since you are likely adding the Data Viewer to look for specific values, you may as well only return the values that are of use.
SNAGHTML926586b
When you are finished, select OK and you’ll return to the Data Flow.  From here you will notice that a little icon has been added to the Data Flow Path to indicate that there is one or more Data Viewers on the path.
image
Yes, that’s correct.  You can have more than one Data Viewer on the same Data Flow Path.  Depending on what you are troubleshooting it may be worthwhile have two or three Data Viewers that return different columns.

Data Viewer In Use

Once you have a Data Viewer in place on a package, go ahead an fire it up.  The execution of the package will change when there are Data Viewers.
When the package begins, a contained will open for each Data Viewer.  By default, these will float over the window.  If you choose, you can dock these, that is up to you.
On the container there are a few things to take notice of.  First, in the image below the yellow area is all of the data currently contained in the Data Viewer.  Within it you can highlight one row or multiple rows.
image
If you want to copy out the data, you need to use the Copy Data button, indicated at the red arrow.  The use of Crtl-C to copy doesn’t do a dang thing.  And that sucks.
If the data you are looking for isn’t in the current buffer, you can move the package forward to the next buffer.  You do this with the Play button located at the orange arrow.  Selecting this will clear the Data Viewer and it will populate with the next buffer of data.
Lastly, if you are done with the Data Viewer and want to stop using it you can detach it from the Data Flow Path.  This is done at the green arrow.  By detaching, the Data Viewer retains it’s current data while the package continues with processing the rest of the Data Flow.
As I mentioned, Data Viewers are a lot like Breakpoints.  Due to this, you can use the Breakpoint window to disable, enable, and delete data viewers in your project.  This can be extremely handy when you have a half dozen or more data viewers you just want to get rid of.
image

Data Viewers Wrap-Up

Now that we’ve gone through this, I hope you can see the power and benefit of using Data Viewers.  When you are troubleshooting an SSIS package trying to figure out where some of the data is becoming out of wack.
Above, I mentioned a couple bits where I compared Breakpoints to Data Viewers.  In a lot of regards, they function similar to one another.  There is one way, though, in which they are very different.  When a breakpoint is encountered, the entire package execution is paused until the break is cleared.  When a Data Viewer is encountered only the data in that Data Flow Path is paused.  While these can be in effect the same thing, in more complicated packages this might cause unexpected results.
Finally, you may have trouble using Data Viewers when you are processing large volumes of data.  If the row that is causing the issue is a few thousand or million rows into the data set you will likely want to do a little bit more than just add a Data Viewer.  You may need filter the data some to make the Data Viewer useful for troubleshooting.

REFERENCES


Thank You

Monday 25 April 2016

SSIS – Using Breakpoints

Using Breakpoints

Much like running projects in other languages through Visual Studio, you can also stop SSIS packages mid-execution when specific events occur in the Control Flow.  This ability allows you to stop the execution of a package to investigate and troubleshoot the state of the SSIS package.
There are ten events in the Control Flow (definitions from Books Online):
  • OnPreExecute: Called when a task is about to execute. This event is raised by a task or a container immediately before it runs.
  • OnPostExecute: Called immediately after the execution logic of the task finishes. This event is raised by a task or container immediately after it runs.
  • OnError: Called by a task or container when an error occurs.
  • OnWarning: Called when the task is in a state that does not justify an error, but does warrant a warning.
  • OnInformation: Called when the task is required to provide information.
  • OnTaskFailed: Called by the task host when it fails.
  • OnProgress: Called to update progress about task execution.
  • OnQueryCancel: Called at any time in task processing when you can cancel execution.
  • OnVariableValueChanged: Called by the Integration Services runtime when the value of a variable changes. The RaiseChangeEvent of the variable must be set to true to raise this event.
  • OnCustomEvent: Called by tasks to raise custom task-defined events.
I’ve mentioned this twice now, and thought it should be called out yet again.  Breakpoints function only at the Control Flow.  If you need to break the action in a Data Flow, this isn’t the solution for that.  This is called foreshadowing.
Breakpoint Demonstration
To implement a breakpoint in the Control Flow, start by right-clicking on a task in the Control Flow.  In the menu, the option to Edit Breakpoints will be about half way down this list.

Select this menu item and you can choose between any of the breakpoints described above.  For this example, we’ll select the OnPostExecute breakpoint.  As mentioned above, this event will fire after the task’s execution has completed.

Let’s run the package now to see what happens.  First you’ll notice that the the task has a red rot added to it, this indicates that there is a breakpoint on the task.  Next when the breakpoint is reached, a little yellow arrow is added to the breakpoint icon.

Once a breakpoint has been used there are a number of things that can be used to get some information on the breakpoint.  To start with there is the Call Stack window.  This window provides information on where the breakpoint is located.

Next there is the Breakpoints window.  Guess what it has?  Yes, all of the breakpoints in your project will be listed here.  There are couple points of interest and points of usefulness with this window.
First, the area in blue is all of the breakpoints in your package.  From this point you can enable and disable them.  A very useful feature in case you have many breakpoints in your package.
Second, the area in red is a control panel for your breakpoints.  There are options to delete single breakpoints, delete all breakpoints in a project, disable all breakpoints in the project, and change the columns returned.  There is additional functionality that can be used as well in advanced scenarios.

There is also an Output window that displays messaged on the progress of the package and breakpoint.  This information will help determine what has been run in the package when the breaks occur.

Finally, the most important piece that breakpoints bring to you is the Locals window.  This window will contain information on the current status of the package execution and the amount of time in which the package has been executing.

Most importantly, it contains all of the package variables.  Often when I need to troubleshoot some odd package behavior it is related to variables not being populated as I thought they should.  This is the golden ticket to finding out what the package is doing during execution.

Breakpoint Wrap-Up
Breakpoints are an essential tool in troubleshooting SSIS packages.  Take time to understand how they work and practice using them a few times so that when the time comes you are ready to pull them out.  They’ve gotten me through a few package executions and will probably save the day for you some time in the future.

This post scratches the surface on using breakpoints.  There is a lot more that can be covered and other scenarios for their use.  But those will have to wait for another day.  Maybe one of the other 29 more posts in this series.

REFERENCES