Bulk load XML files in SSIS

It was tough to find an efficient and reliable solution to bulk load XML files into SQL Server. I think I have figured out a good solution, so I’d like to share this with anyone else working on this.

SSIS has an XML source. However, it splits out the XML into a lot of different sources.

SourceEditor_Original

Here is the XML file that generated this drop down.

 

This XML file is fairly simple from a human standpoint, but in SSIS, it got split into 7 sources.  After you load the data, you have to join all of these tables together to get access to all of the primary keys.    It can become quite unwieldy and error-prone.

In order to make things easier for SSIS, I used XSLT (Extensible Stylesheet Language Transformations).   Here’s what the XML looked like after I was done.

Ah, so much easier to read.   The good news is that SSIS can read the new files.

Here is the SSIS source task with the new file.   There isn’t even a drop down.  It just loads one normalized XML file into the database.

You need to count the costs for this.  It takes time and disk space to translate the files.  And it will take up memory on your server temporarily to do this.  For me, those costs were much lower those caused by the alternative, which was bulk loading the files into an XML column.   SSIS loads this into columns and tables, and it makes everything much more efficient than working with the XML data type.

I don’t have a problem with the XML data type or XML indexes.  However, when you are truncating tables and reloading them every night with tons of data, it isn’t efficient.

To be continued…as I describe a pros and cons list between two different solutions.

Leave a comment

Filed under Uncategorized

Buffer pool reports

I’ve been looking at SQL Server buffer pool usage for quite a while, but at my current job, I had the idea of tracking it daily.   The stats are taken in the early evening, so it may not properly track what happens during the day or at night.   But it is meant to give a general idea of  memory usage.

I’ll start with an image, to inspire you to read the rest of the post.

Image

The real work here happens with the nightly data collection.

Here’s what you need to set up the collection.  First, set up the tables.

CREATE TABLE [dba].[BufferPool_Current](
[BufferPoolID] [INT] IDENTITY(1,1) NOT NULL,
[CollectionDate] [DATE] NULL,
[DB_NAME] [VARCHAR](50) NULL,
[db_buffer_pages] [INT] NULL,
[db_buffer_MB] [INT] NULL,
[db_buffer_percent] [DECIMAL](6, 3) NULL,
CONSTRAINT [PK_BufferPoolC] PRIMARY KEY CLUSTERED
( [BufferPoolID])
)

CREATE TABLE [dba].[BufferPool_Historical](
[BufferPoolID] [int] IDENTITY(1,1) NOT NULL,
[CollectionDate] [date] NULL,
[DB_NAME] [varchar](50) NULL,
[db_buffer_pages] [int] NULL,
[db_buffer_MB] [int] NULL,
[db_buffer_percent] [decimal](6, 3) NULL,
CONSTRAINT [PK_BufferPoolH] PRIMARY KEY CLUSTERED
( [BufferPoolID] )
)

Schedule this SQL to run every day:

TRUNCATE TABLE dba.BufferPool_Current;

DECLARE @total_buffer INT;

SELECT @total_buffer = cntr_value
FROM sys.dm_os_performance_counters
WHERE RTRIM([object_name]) LIKE '%Buffer Manager'
AND counter_name = 'Total Pages';

;WITH src AS
(
SELECT
database_id, db_buffer_pages = COUNT_BIG(*)
FROM sys.dm_os_buffer_descriptors
--WHERE database_id BETWEEN 5 AND 32766
GROUP BY database_id
)
INSERT dba.BufferPool_Current
( CollectionDate, db_name, db_buffer_pages, db_buffer_MB, db_buffer_percent )
SELECT
GETDATE() AS CollectionDate,
[db_name] = CASE [database_id] WHEN 32767
THEN 'Resource DB'
ELSE DB_NAME([database_id]) END,
db_buffer_pages,
db_buffer_MB = db_buffer_pages / 128,
db_buffer_percent = CONVERT(DECIMAL(6,3),
db_buffer_pages * 100.0 / @total_buffer)
FROM src
ORDER BY db_buffer_MB DESC;

INSERT dba.BufferPool_Historical
( CollectionDate, [db_name], db_buffer_pages, db_buffer_MB, db_buffer_percent )
SELECT CollectionDate, [db_name], db_buffer_pages, db_buffer_MB, db_buffer_percent
FROM dba.BufferPool_Current

Then in SSRS, create a Dataset with the following query.

This will pull the historical data for any databases that either
1. Have held over 5% of the buffer pool on average, the last 3 months
2. Have used over 20% of the buffer pool at any time during the last 3 months

SELECT CollectionDate, [db_name], db_buffer_percent
FROM [Admin].[dba].[BufferPool_Historical]
WHERE
(
[db_name] IN(
SELECT DISTINCT [db_name]
FROM [Admin].[dba].[BufferPool_Historical]
WHERE CollectionDate >= dateadd(month,-3, getdate())
GROUP BY [db_name]
HAVING AVG(db_buffer_percent) > 5
)
OR [db_name] IN(
SELECT DISTINCT [db_name]
FROM [Admin].[dba].[BufferPool_Historical]
WHERE db_buffer_percent > 20
AND CollectionDate >= dateadd(month,-3, getdate())
)
)
AND CollectionDate >= dateadd(month,-3, getdate())

Create a line chart with the dataset.

For the values area, map out db_buffer_percent.
For the Category Groups, use CollectionDate
For the Series Groups, use db_name

Leave a comment

Filed under Uncategorized

Track down high CPU queries

Some CPU spikes on our server provided the opportunity for me to hone-in on my query investigation skills.   I have known a long time about finding high CPU queries using DMVs (sys.dm_exec_query_stats).  The downside is that you can’t see exactly when they were running, other than the last execution time.  There is not much context in terms of time.

I have a new favorite way to track the queries – server side trace.    Set a filter of > 10,000 CPU, or something appropriate to the system.

Use a query to read the results of your trace in real-time:

SELECT TOP 1000 LoginName, TextData, duration / 1000 AS Dur, 
CONVERT(VARCHAR, StartTime,109), Writes, CPU, Reads, EndTime
FROM fn_trace_gettable(N'C:\Traces\HighCPUQueries.trc',DEFAULT)
ORDER BY StartTime desc;

Open up Performance Monitor, and look at CPU in real-time.  And see how those high queries correlate with any spikes in the system.

I used this to focus on some queries that needed to be optimized.  Also, I adjusted the cost threshhold for parallelism from 5 to 20, to avoid smaller queries from using all of the processors.

Leave a comment

Filed under Uncategorized

Call web page page from SSIS

Here’s how to hit a web page from SSIS.

Create a Script task.

At the top of the Script, add this directive:
using System.Net;

In the Main() function, add the following:

String url = "http://glutenfreesql.wordpress.com";
WebRequest request = null;
WebResponse response = null;

try
{
request = HttpWebRequest.Create(url);
request.Timeout = 7200000;   
response = request.GetResponse();
response.Close();
}
catch
{
  Dts.TaskResult = (int)ScriptResults.Failure;
}

My page was for unzipping a file, so I set the Timeout to two hours. With that line omitted, it would default to 100 seconds. Set it to whatever value makes sense for you.

In the catch clause, it sets the result to failure. In that case, the SSIS task will fail, if the page did not return a response within the timeout period.

Leave a comment

Filed under Uncategorized

Computed columns saves CPU

There was a query causing some high CPU, with 14 LIKE statements on a large table.  Which is a recipe for disaster.  But business dictated that this actually was a requirement to search the columns in this way.  For a while, I just let it go, but then when some CPU pressure happened on the server, I took a second look.

The REPLACE statement to remove dashes accounted for 70% of the amount of CPU needed.

AccountNumber LIKE ‘%00056127%’ OR
REPLACE(AccountNumber,’-‘,”) LIKE ‘%00056127%’ OR
Address1 LIKE ‘%00056127%’ OR
Address2 LIKE ‘%00056127%’ OR
Address3 LIKE ‘%00056127%’ OR
Address4 LIKE ‘%00056127%’ OR
Address5 LIKE ‘%00056127%’ OR
Name1 LIKE ‘%00056127%’ OR
Name2 LIKE ‘%00056127%’ OR
Name3 LIKE ‘%00056127%’ OR
Street1 LIKE ‘%00056127%’ OR
Street2 LIKE ‘%00056127%’ OR
City LIKE ‘%00056127%’ OR
Zip LIKE ‘%00056127%’

A computed column ended up being an ideal solution for this.  It drastically improved the performance of the query.

Before/after performance:
CPU time = 18,299 ms, elapsed time = 9,538 ms.
CPU time = 6,567 ms, elapsed time = 3,322 ms.

Also, it caused fewer reads on a different Table2 because of a different execution plan:
Table ‘Table2′. Scan count 3, logical reads 144041
Table ‘Table2′. Scan count 1, logical reads 10

The computed column was built on Alpha like this.


ALTER TABLE Table1 ADD AccountNumberNoDash
AS ( REPLACE(AccountNumber, '-', '')) PERSISTED

Leave a comment

Filed under Uncategorized

Generate Order Number in Bulk

I was asked by developers if we could generate an alphanumeric order number in the database, rather than in the code.  The number is in the style of A00001.  When it hits A99999, then it moves on to B00001.

The main challenge is that we are working with sets of records in this situation.  Most order number generation deals with one order at a time.  And, to foreshadow the solution, I eventually did have to look at them one at a time.

I’ll just dive into it.  It was much more complicated than this, but I’m reducing it to the essentials for this blog post.

Function to get Max Order Number so far…

CREATE FUNCTION [dbo].[getMaxNumberOrder]
RETURNS CHAR(6)
AS
BEGIN

DECLARE @PreviousFullOrderNumber VARCHAR(150);
DECLARE @PreviousOrderNumber CHAR(6);

SELECT @PreviousFullOrderNumber = MAX(RTRIM(OrderNumber))
FROM dbo.[Order]

SET @PreviousOrderNumber = RIGHT(@PreviousFullOrderNumber, 6);

IF @PreviousOrderNumber IS NULL
SET @PreviousOrderNumber = 'A00000';

RETURN @PreviousOrderNumber;

END

GO

Function to get Next Number so far…

CREATE FUNCTION [dbo].[getNextNumber]
(
@PreviousNumber CHAR(6)
)
RETURNS CHAR(6)
AS
BEGIN

DECLARE @PreviousInt INT;
DECLARE @PreviousChar CHAR(1);

SET @PreviousInt = CAST(RIGHT(@PreviousNumber,5) AS INT);
SET @PreviousChar = LEFT(@PreviousNumber,1);

DECLARE @NextInt INT;
DECLARE @NextChar CHAR(1);
DECLARE @NextNumber CHAR(6)

IF @PreviousInt = 99999 /* If it hit the max, then set OrderInt to 0, and increase Letter Prefix */
BEGIN

SET @NextInt = 0;
IF @PreviousChar = 'Z'
BEGIN
SET @PreviousChar = 'A';
END
ELSE
BEGIN
SET @NextChar = CHAR( ASCII(@PreviousChar) + 1);
END
END
ELSE
BEGIN

SET @NextInt = @PreviousInt + 1;
SET @NextChar = @PreviousChar;

END

SET @NextNumber = @NextChar + RIGHT('0000' + CAST( @NextInt AS VARCHAR(5) ), 5);

RETURN @NextNumber

END

GO

Code to assign next order number to order.

Put data into a temp table, and assign an int integer to it.

IF OBJECT_ID('tempdb..#DataTable') IS NOT NULL
DROP TABLE #DataTable

CREATE TABLE #DataTable
(
[IdentityInt] INT IDENTITY(1,1) NOT NULL,
OrderAmount MONEY,
OrderQuantity int
);

 

/**********************************
*
* Begin generate Order Number
*
***********************************/

DECLARE @PreviousOrderNumber CHAR(6);
SET @PreviousOrderNumber = dbo.getMaxNumberOrder()

 

DECLARE @NextOrderNumber CHAR(6);

DECLARE @WhileLoopInt_Order INT;
SET @WhileLoopInt_Order = 1;

DECLARE @MaxWhileLoopInt_Order INT;
SELECT @MaxWhileLoopInt_Order = MAX([IdentityInt]) FROM #DataTable

WHILE (@WhileLoopInt_Order <= @MaxWhileLoopInt_Order )

BEGIN

SET @NextOrderNumber = dbo.getNextNumber(@OrderNumberPrefix, @PreviousOrderNumber)

UPDATE #DataTable
SET OrderNumber = @OrderNumberPrefix + '-' + @NextOrderNumber
WHERE [IdentityInt] = @WhileLoopInt_Order;

SET @PreviousOrderNumber = @NextOrderNumber;
SET @WhileLoopInt_Order = @WhileLoopInt_Order + 1

END

Leave a comment

Filed under Uncategorized

While loop to check for existence of randomized value

A bar code is randomly generated, yet it cannot be the same as one that has already been used.   Collisions are pretty common.  So here is how I removed the collisions.

–Generate bar code
UPDATE #Table
SET BarCode = UPPER( dbo.GenerateRandom9CharKey() );

–Check for barcodes that already are used, or for unapproved words
WHILE EXISTS(
SELECT *
FROM #Table
WHERE Barcode IN ( SELECT barcode FROM dbo.Table1 )
)
BEGIN
UPDATE #Table
SET Barcode = UPPER( dbo.GenerateRandom9CharKey() )
WHERE Barcode IN ( SELECT barcode FROM dbo.Table1 )
END

Leave a comment

Filed under Uncategorized