Calling ProcessData in SSIS to incrementally load data into tabular partitions
Previously, I showed a pattern for incrementally loading data by creating a partition, doing a ProcessData on the partition, then doing a ProcessRecalc on the database all in a single transaction. It is possible to do the same thing in an Integration Services package. Let’s take a look.
We cannot use the Analysis Services Processing task for this chore. This is because the task was not updated for SQL Server 2012, so ProcessRecalc is not an option. Therefore, if we want to use a built in Analysis Services task, we must use the Analysis Services Execute DDL task to do this work.
We also need to do this work transactionally. Even though it is is something of a design antipattern to use transactions in an SSIS package, we must use one because we cannot leave the database in an unqueryable state while processing (recall that ProcessData renders a database unqueryable until a Recalc is performed). In theory, you could try to use the built-in transaction support for SSIS, but transactions do not work properly with Analysis Services. The easiest way to send the commands in a single transaction is to use one Analysis Services Execute DDL task and wrap all three commands we need inside of an XMLA Batch statement.
Imagine we want to create a partition for the year 1812 on the DemographicFacts table on the Hans Rosling demo project database. Here is how to do the incremental data load:
- From SSDT, create a new Integration Services project.
- Right click in the Connection Managers pane, add a new connection to your tabular server.
- Drop the Analysis Services Execute DDL task into the Control Flow area (it is under “Other Tasks” in the SSIS Toolbox).
- Double-click to configure the task.
- In the dialog box that appears, click DDL.
- In the DDL properties, set the Connection property to the connection manager you created in step 2.
- Click the … button in the SourceDirect property to launch the DDL dialog box.
- Paste in the following XMLA command:
<Batch xmlns='http://schemas.microsoft.com/analysisservices/2003/engine' Transaction='true'> <!-- Create the partition for the year 1812 on the DemographicFacts table --> <!-- To create the partition, alter the measure group--> <Alter AllowCreate="true" ObjectExpansion="ExpandFull" xmlns="<a href="http://schemas.microsoft.com/analysisservices/2003/engine">http://schemas.microsoft.com/analysisservices/2003/engine</a>"> <Object> <DatabaseID>hans_rosling_project</DatabaseID> <CubeID>Model</CubeID> <MeasureGroupID>DemographicFacts_c2655f3e-3570-4a34-92bf-3f9ed44ad24b</MeasureGroupID> <PartitionID>Partition 1812</PartitionID> </Object> <ObjectDefinition> <Partition xmlns:xsd="<a href="http://www.w3.org/2001/XMLSchema">http://www.w3.org/2001/XMLSchema</a>" xmlns:xsi="<a href="http://www.w3.org/2001/XMLSchema-instance">http://www.w3.org/2001/XMLSchema-instance</a>" xmlns:ddl200_200="<a href="http://schemas.microsoft.com/analysisservices/2010/engine/200/200">http://schemas.microsoft.com/analysisservices/2010/engine/200/200</a>"> <ID>Partition 1812</ID> <Name>Partition 1812</Name> <Source xsi:type="QueryBinding"> <DataSourceID>bc946c5c-ec5b-4fbb-87d9-5e4f721e6adc</DataSourceID> <QueryDefinition>SELECT [dbo].[DemographicFacts].* FROM [dbo].[DemographicFacts] WHERE ([Year] = 1812)</QueryDefinition> </Source> <StorageMode valuens="ddl200_200">InMemory</StorageMode> </Partition> </ObjectDefinition> </Alter> <!-- Now do a ProceessData on the newly created partition --> <Process xmlns="<a href="http://schemas.microsoft.com/analysisservices/2003/engine">http://schemas.microsoft.com/analysisservices/2003/engine</a>"> <Type>ProcessData</Type> <Object> <DatabaseID>hans_rosling_project</DatabaseID> <CubeID>Model</CubeID> <MeasureGroupID>DemographicFacts_c2655f3e-3570-4a34-92bf-3f9ed44ad24b</MeasureGroupID> <PartitionID>Partition 1812</PartitionID> </Object> </Process> <!-- Finally, do a ProcessRecalc on the database to make the DB queryable --> <Process xmlns="<a href="http://schemas.microsoft.com/analysisservices/2003/engine">http://schemas.microsoft.com/analysisservices/2003/engine</a>"> <Type>ProcessRecalc</Type> <Object> <DatabaseID>hans_rosling_project</DatabaseID> </Object> </Process> </Batch>
- Press OK to all dialog boxes, and F5 to run the package. The run should be successful and you should be able to verify in SSMS that the partition was created, populated, and is queryable.
Let’s look a bit closer at that XMLA and look at how the scripts were generated. There are three commands in those 43 lines. The commands were generated by scripting out actions from three different places in the SSMS UI. I manually wrapped these commands in a batch, and inserted some comments which I hope are helpful. Use the comments to find the start of each command.
The first command, the Alter, creates a partition by modifying the DemographicFacts measure group. Notice that you must refer to the measure group by ID. The ID for the measure group was automatically generated by the SSDT UI when the table was created using the Import Wizard. The partition definition is again bound using a query binding. The data source ID is the GUID for the magical “Sandbox” object, which was also automatically created by the SSDT UI. None of these IDs are exposed in the UI in SSMS or SSDT, so you will need to discover them by scripting out.
To generate script for creating a partition, do the following:
- Launch the Partition Manager in SSMS on the desired table.
- Select a partition that is bound via query binding, then click the Copy button in the toolbar.
- From the New Partition dialog, click the Script button. This script will contain both the data source IDs and the measure group IDs necessary for the script.
That takes care of creating the partition. The next command is a ProcessData on the partition. Again, you must pass the measure group ID to do the processing.
To generate script for processing a partition:
- Launch the Partition Manager in SSMS on the desired table.
- Select a partition, click Process.
- Set the desired processing option in the dialog (in this case ProcessData), then script out. This script will contain the measure group ID you need for processing.
The last Process command is the ProcessRecalc on the database. Fortunately the database ID is quite readable, and also exposed via the properties page on the database, so you have some hope of actually being able to write this script yourself. Nevertheless, it is easier to generate the script by right-clicking the database in the SSMS object explorer, selecting Process, choosing ProcessRecalc from the processing options, then scripting out.
Simple, huh? Compare this to the similar code in AMO. The AMO snippet I posted was 26 lines and doesn’t contain any horrible GUIDs, so stylistically it may be a preferable approach to the one I outlined above for creating the DDL. You can call AMO from Integration Services from a scripting task. However, from a performance standpoint, you might want to stick with the ugly DDL to avoid having to deal with the VSTA dependency and overhead associated with the scripting task.
Also compare this to the approach for doing a ProcessAdd in SSIS. ProcessAdd is a lot neater, although there are scenario tradeoffs to be made. Comparing the two processing approaches merits a future post.
From → BI Semantic Model