It was tough to find an efficient and reliable solution to bulk load XML files into SQL Server. I think I have figured out a good solution, so I’d like to share this with anyone else working on this.
SSIS has an XML source. However, it splits out the XML into a lot of different sources.
Here is the XML file that generated this drop down.
This XML file is fairly simple from a human standpoint, but in SSIS, it got split into 7 sources. After you load the data, you have to join all of these tables together to get access to all of the primary keys. It can become quite unwieldy and error-prone.
In order to make things easier for SSIS, I used XSLT (Extensible Stylesheet Language Transformations). Here’s what the XML looked like after I was done.
Ah, so much easier to read. The good news is that SSIS can read the new files.
Here is the SSIS source task with the new file. There isn’t even a drop down. It just loads one normalized XML file into the database.
You need to count the costs for this. It takes time and disk space and memory to translate the files. For me, those costs were much lower those caused by the alternative, which was bulk loading the files into an XML column. SSIS loads this into columns and tables, and it makes everything much more efficient than working with the XML data type.
I don’t have a problem with the XML data type or XML indexes. However, when you are truncating tables and reloading them every night with tons of data, it isn’t efficient.
To be continued…as I describe a pros and cons list between two different solutions.