SQL Data Generator to update only certain columns

We needed to copy data from production to development, but since there is patient data, the personal information needs to be obfuscated.  I couldn’t imagine this was a rare situation, yet was having trouble finding a good solution.  Luckily, I did have Red Gate SQL Developer Bundle, which includes SQL Data Generator 2.

Now, I played with this tool for a long time, and didn’t think it would work.  It didn’t allow me to update just selected columns with random data, it had to delete all of the data.

Then I thought, what if I create a new database with the same tables, and those CAN be deleted.  Then I needed to figure out a way to update the original table with the random records.   With a Google search, I found a solution from RBarry Young

http://social.technet.microsoft.com/Forums/sqlserver/pt-BR/f58c3bf8-e6b7-4cf5-9466-7027164afdc0/updating-multiple-rows-with-random-values-from-another-table

I altered this, for my situation.  Here is an example of using it to update several columns, like in an address for instance.

Now to back up, here are the steps

1. Create database with the table structures that include the personal or confidential data

2. Use SQL Data Generator to populate these tables with at least as many records as are in the original tables.   Choose the DELETE records option.  I won’t include a tutorial here – play around with it like I did, and let me know if you have questions.

3. Run the following scripts, substituting “DatabaseToChange” with your test database (copy of production in development), and substitute “DatabaseGenerated” with the name of the database that was created in step 1.

;WITH cteOriginalTable AS (
SELECT [Street1] ,[Street2] ,[City] ,
[State] ,[PostalCode] ,[AttnName],[ExtraAddressInfo],
ROW_NUMBER() OVER (ORDER BY NEWID()) AS n
FROM DatabaseToChange.dbo.[Address]
),
cteGeneratedTable AS (
SELECT [Street1] ,[Street2] ,[City] ,
[State] ,[PostalCode] ,[AttnName],[ExtraAddressInfo],
ROW_NUMBER() OVER (ORDER BY NEWID()) AS n
FROM DatabaseGenerated.dbo.[Address] t21
)
UPDATE cteOriginalTable
SET
cteOriginalTable.Street1 = cteGeneratedTable.Street1,
cteOriginalTable.Street2 = cteGeneratedTable.Street2,
cteOriginalTable.City = cteGeneratedTable.City,
cteOriginalTable.[State] = cteGeneratedTable.[State],
cteOriginalTable.PostalCode = cteGeneratedTable.PostalCode,
cteOriginalTable.AttnName = cteGeneratedTable.AttnName,
cteOriginalTable.ExtraAddressInfo = cteGeneratedTable.ExtraAddressInfo
FROM cteGeneratedTable
WHERE cteOriginalTable.n = cteGeneratedTable.n

Then since it is personally identifiable information, I have to make really sure that EVERY record was updated.  So I run this SQL to compare the fields.

SELECT A1.Street1, A2.Street1,
A1.Street2, A2.Street2,
A1.City, A2.City,
A1.State, A2.State,
A1.PostalCode, A2.PostalCode,
A1.AttnName, A2.AttnName,
A1.ExtraAddressInfo, A2.ExtraAddressInfo
FROM DatabaseToChange.dbo.[Address] A1
INNER JOIN DatabaseGenerated.dbo.address A2
  ON A1.AddressGUID = a2.AddressGUID
WHERE A1.Street1 = A2.Street1
OR A1.Street2 = A2.Street2
OR A1.City = A2.City
OR A1.[State] = A2.[State]
OR A1.PostalCode = A2.PostalCode
OR A1.AttnName = A2.AttnName
OR A1.ExtraAddressInfo = A2.ExtraAddressInfo
ORDER BY A1.Street1, A2.Street2
Advertisements
This entry was posted in Uncategorized and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s