Convert Varchar to XML

If you ever find yourself trying to get XML data out of a varchar field, you may get this error.

XML parsing: line 1, character 1055, illegal xml character

You could use this:

SELECT CAST(CAST( REPLACE( packet, 'UTF-8', 'UTF-16' ) AS NVARCHAR(MAX)) AS XML)
FROM DataTable

I just had these characters that were outside of the utf-8 definition.  So when SQL when to convert it, it was not able.   Just use the SQL trickery above to pretend that it is stored as utf-16.  In the future, my company just needs store it correctly.

` backtick
’ curly closing quote
“ double curly quote
” double closing curly quote
– long dash
• bullet point
… ellipsis

Leave a comment

Filed under Uncategorized

Nullable bit columns with no default constraint

My manager asked me to find all nullable bit columns with no default constraint.

Here is what I came up with in T-SQL (SQL Server):


SELECT DB_NAME() AS DatabaseName, 
schemas.name
+ '.'
+ tables.name AS TableName,
columns.Name AS ColumnName
FROM sys.columns
INNER JOIN sys.tables 
  ON columns.object_id = tables.object_id
INNER JOIN sys.schemas 
  ON tables.schema_id = schemas.schema_id
WHERE system_type_id = 104
AND is_nullable = 1
AND NOT EXISTS(
SELECT 'x'
FROM sys.default_constraints
WHERE parent_object_id = columns.object_id
AND parent_column_id = columns.column_id
)

Leave a comment

Filed under Uncategorized

Bulk load XML files in SSIS

It was tough to find an efficient and reliable solution to bulk load XML files into SQL Server. I think I have figured out a good solution, so I’d like to share this with anyone else working on this.

SSIS has an XML source. However, it splits out the XML into a lot of different sources.

SourceEditor_Original

Here is the XML file that generated this drop down.

 

This XML file is fairly simple from a human standpoint, but in SSIS, it got split into 7 sources.  After you load the data, you have to join all of these tables together to get access to all of the primary keys.    It can become quite unwieldy and error-prone.

In order to make things easier for SSIS, I used XSLT (Extensible Stylesheet Language Transformations).   Here’s what the XML looked like after I was done.

Ah, so much easier to read.   The good news is that SSIS can read the new files.

Here is the SSIS source task with the new file.   There isn’t even a drop down.  It just loads one normalized XML file into the database.

You need to count the costs for this.  It takes time and disk space and memory to translate the files.  For me, those costs were much lower those caused by the alternative, which was bulk loading the files into an XML column.   SSIS loads this into columns and tables, and it makes everything much more efficient than working with the XML data type.

I don’t have a problem with the XML data type or XML indexes.  However, when you are truncating tables and reloading them every night with tons of data, it isn’t efficient.

To be continued…as I describe a pros and cons list between two different solutions.

Leave a comment

Filed under Uncategorized

Buffer pool reports

I’ve been looking at SQL Server buffer pool usage for quite a while, but at my current job, I had the idea of tracking it daily.   The stats are taken in the early evening, so it may not properly track what happens during the day or at night.   But it is meant to give a general idea of  memory usage.

I’ll start with an image, to inspire you to read the rest of the post.

Image

The real work here happens with the nightly data collection.

Here’s what you need to set up the collection.  First, set up the tables.

CREATE TABLE [dba].[BufferPool_Current](
[BufferPoolID] [INT] IDENTITY(1,1) NOT NULL,
[CollectionDate] [DATE] NULL,
[DB_NAME] [VARCHAR](50) NULL,
[db_buffer_pages] [INT] NULL,
[db_buffer_MB] [INT] NULL,
[db_buffer_percent] [DECIMAL](6, 3) NULL,
CONSTRAINT [PK_BufferPoolC] PRIMARY KEY CLUSTERED
( [BufferPoolID])
)

CREATE TABLE [dba].[BufferPool_Historical](
[BufferPoolID] [int] IDENTITY(1,1) NOT NULL,
[CollectionDate] [date] NULL,
[DB_NAME] [varchar](50) NULL,
[db_buffer_pages] [int] NULL,
[db_buffer_MB] [int] NULL,
[db_buffer_percent] [decimal](6, 3) NULL,
CONSTRAINT [PK_BufferPoolH] PRIMARY KEY CLUSTERED
( [BufferPoolID] )
)

Schedule this SQL to run every day:

TRUNCATE TABLE dba.BufferPool_Current;

DECLARE @total_buffer INT;

SELECT @total_buffer = cntr_value
FROM sys.dm_os_performance_counters
WHERE RTRIM([object_name]) LIKE '%Buffer Manager'
AND counter_name = 'Total Pages';

;WITH src AS
(
SELECT
database_id, db_buffer_pages = COUNT_BIG(*)
FROM sys.dm_os_buffer_descriptors
--WHERE database_id BETWEEN 5 AND 32766
GROUP BY database_id
)
INSERT dba.BufferPool_Current
( CollectionDate, db_name, db_buffer_pages, db_buffer_MB, db_buffer_percent )
SELECT
GETDATE() AS CollectionDate,
[db_name] = CASE [database_id] WHEN 32767
THEN 'Resource DB'
ELSE DB_NAME([database_id]) END,
db_buffer_pages,
db_buffer_MB = db_buffer_pages / 128,
db_buffer_percent = CONVERT(DECIMAL(6,3),
db_buffer_pages * 100.0 / @total_buffer)
FROM src
ORDER BY db_buffer_MB DESC;

INSERT dba.BufferPool_Historical
( CollectionDate, [db_name], db_buffer_pages, db_buffer_MB, db_buffer_percent )
SELECT CollectionDate, [db_name], db_buffer_pages, db_buffer_MB, db_buffer_percent
FROM dba.BufferPool_Current

Then in SSRS, create a Dataset with the following query.

This will pull the historical data for any databases that either
1. Have held over 5% of the buffer pool on average, the last 3 months
2. Have used over 20% of the buffer pool at any time during the last 3 months

SELECT CollectionDate, [db_name], db_buffer_percent
FROM [Admin].[dba].[BufferPool_Historical]
WHERE
(
[db_name] IN(
SELECT DISTINCT [db_name]
FROM [Admin].[dba].[BufferPool_Historical]
WHERE CollectionDate >= dateadd(month,-3, getdate())
GROUP BY [db_name]
HAVING AVG(db_buffer_percent) > 5
)
OR [db_name] IN(
SELECT DISTINCT [db_name]
FROM [Admin].[dba].[BufferPool_Historical]
WHERE db_buffer_percent > 20
AND CollectionDate >= dateadd(month,-3, getdate())
)
)
AND CollectionDate >= dateadd(month,-3, getdate())

Create a line chart with the dataset.

For the values area, map out db_buffer_percent.
For the Category Groups, use CollectionDate
For the Series Groups, use db_name

Leave a comment

Filed under Uncategorized

Track down high CPU queries

Some CPU spikes on our server provided the opportunity for me to hone-in on my query investigation skills.   I have known a long time about finding high CPU queries using DMVs (sys.dm_exec_query_stats).  The downside is that you can’t see exactly when they were running, other than the last execution time.  There is not much context in terms of time.

I have a new favorite way to track the queries – server side trace.    Set a filter of > 10,000 CPU, or something appropriate to the system.

Use a query to read the results of your trace in real-time:

SELECT TOP 1000 LoginName, TextData, duration / 1000 AS Dur, 
CONVERT(VARCHAR, StartTime,109), Writes, CPU, Reads, EndTime
FROM fn_trace_gettable(N'C:\Traces\HighCPUQueries.trc',DEFAULT)
ORDER BY StartTime desc;

Open up Performance Monitor, and look at CPU in real-time.  And see how those high queries correlate with any spikes in the system.

I used this to focus on some queries that needed to be optimized.  Also, I adjusted the cost threshhold for parallelism from 5 to 20, to avoid smaller queries from using all of the processors.

Leave a comment

Filed under Uncategorized

Call web page page from SSIS

Here’s how to hit a web page from SSIS.

Create a Script task.

At the top of the Script, add this directive:

using System.Net;

In the Main() function, add the following:

String url = "http://glutenfreesql.wordpress.com";
WebRequest request = null;
WebResponse response = null;

try
{
request = HttpWebRequest.Create(url);
request.Timeout = 7200000;
response = request.GetResponse();
response.Close();
}
catch
{
  Dts.TaskResult = (int)ScriptResults.Failure;
}

My page was for unzipping a file, so I set the Timeout to two hours. With that line omitted, it would default to 100 seconds. Set it to whatever value makes sense for you.  The value is in milliseconds.

In the catch clause, it sets the result to failure. In that case, the SSIS task will fail, if the page did not return a response within the timeout period.

In order to read the results of that page for errors, here is what I am using.  If an error occurred, then the SSIS package can report failure if we use “Dts.TaskResult = (int)ScriptResults.Failure;”  My code mostly came from this MSDN page, and I added the part about looking for the word “Error”.

using System.IO;
using System.Text;

...

String url = "http://glutenfreesql.wordpress.com";
WebRequest request = null;
WebResponse response = null;
 
try
{
request = HttpWebRequest.Create(url);
request.Timeout = 7200000;
response = request.GetResponse();

Stream receiveStream = response.GetResponseStream();
Encoding encode = System.Text.Encoding.GetEncoding("utf-8");
StreamReader readStream = new StreamReader(receiveStream, encode);

Char[] read = new Char[256];
int count = readStream.Read(read, 0, 256);

while (count > 0)
{
String str = new String(read, 0, count);
if (str.Contains("Error Exec") || str.Contains("Query") )
{
Dts.TaskResult = (int)ScriptResults.Failure;
}

count = readStream.Read(read, 0, count);

response.Close();
}
catch
{
  Dts.TaskResult = (int)ScriptResults.Failure;
}

Leave a comment

Filed under Uncategorized

Computed columns saves CPU

There was a query causing some high CPU, with 14 LIKE statements on a large table.  Which is a recipe for disaster.  But business dictated that this actually was a requirement to search the columns in this way.  For a while, I just let it go, but then when some CPU pressure happened on the server, I took a second look.

The REPLACE statement to remove dashes accounted for 70% of the amount of CPU needed.

AccountNumber LIKE ‘%00056127%’ OR
REPLACE(AccountNumber,’-‘,”) LIKE ‘%00056127%’ OR
Address1 LIKE ‘%00056127%’ OR
Address2 LIKE ‘%00056127%’ OR
Address3 LIKE ‘%00056127%’ OR
Address4 LIKE ‘%00056127%’ OR
Address5 LIKE ‘%00056127%’ OR
Name1 LIKE ‘%00056127%’ OR
Name2 LIKE ‘%00056127%’ OR
Name3 LIKE ‘%00056127%’ OR
Street1 LIKE ‘%00056127%’ OR
Street2 LIKE ‘%00056127%’ OR
City LIKE ‘%00056127%’ OR
Zip LIKE ‘%00056127%’

A computed column ended up being an ideal solution for this.  It drastically improved the performance of the query.

Before/after performance:
CPU time = 18,299 ms, elapsed time = 9,538 ms.
CPU time = 6,567 ms, elapsed time = 3,322 ms.

Also, it caused fewer reads on a different Table2 because of a different execution plan:
Table ‘Table2′. Scan count 3, logical reads 144041
Table ‘Table2′. Scan count 1, logical reads 10

The computed column was built on Alpha like this.


ALTER TABLE Table1 ADD AccountNumberNoDash
AS ( REPLACE(AccountNumber, '-', '')) PERSISTED

Leave a comment

Filed under Uncategorized