Skip to content

Upcoming virtual and physical tabular BI sessions this month

Tomorrow morning, at 10 AM Pacific, I will be presenting Building a BI Portal for the Tabular Model in webinar format for the PASS BI virtual chapter. Click here to join the meeting. This is the same talk I will be giving at the regular session of SQLBits X in London at the end of the month.

Next up is SQL Saturday #114 in Vancouver. I was pulled in at the last minute to do a second talk, so I will be doing tabular modeling 101 at 9:30 AM and DirectQuery at 3:45 PM. The morning session is gentle, the afternoon session more challenging. The DirectQuery talk is cool because I have some tricks to model around DirectQuery limitations. Tabular modeling 101 is conceptually similar the talk I gave at SQL PASS in November. The DirectQuery talk is an hour long chunk of the pre-conference session I will be giving in London on Thursday March 29.

Looks like there are still seats available for the tabular BI pre-con on the 29th. If you’re around, do feel free to grab a seat and come it. It should be pretty technical. There are some great DirectQuery and row security samples – I am pretty excited about those demos. The multidimensional vs tabular hour, though entirely devoid of demos, is entertaining and educational enough that you may well be able to stomach an entire hour of PowerPoint! There is another foray into theory when we talk about perf and processing (and oh, we will talk about processing), but quickly we will jump back into practice and there will be lots and lots of time devoted to managing tabular models.

Hope to see you there.

Automated backups using PowerShell and the SQL Server Agent

One of the least known features of SQL Server 2012 is the addition of several PowerShell cmdlets for managing Analysis Services servers. See my previous blog post on using PowerShell cmdlets for managing tabular models for more information. When I was fiddling with the SQL Server Agent recently, I noticed that the Agent can execute PowerShell scripts automatically. Just for fun, let’s put these two things together and create a basic SQL Agent job that automatically backs up an Analysis Services database using PowerShell.

First – the prerequisites. Make sure the target server satisfies the lengthy list of prerequisites for using AMO for PowerShell. Next, ensure that the user running the SQL Server Agent service has sufficient privileges to back up the desired Analysis Services database. If you need to, run services.msc, change the user running the SQL Agent service, and restart the service.

Now you can go ahead and create the job. To take a backup of a database on the localhost\tabular instance, paste the following script into a PowerShell step in a SQL Server Agent job:

cd sqlas\localhost\tabular
backup-asdatabase "rosling.abf" hans_rosling_project -ApplyCompression

This script works, but it has a problem. The script always succeeds! Try it yourself and see. Because this script does not allow overwriting of the backup file, the backup-asdatabase command will fail on the second try. However, the SQL Agent job succeeds. Only when you drill down into the history of each individual job step do you discover the reason for failure.

We definitely want this job to fail if backup fails so that notification can be triggered. Here is how to do it. The SQL Agent will fail job steps if an unhandled exception is thrown at any time. Since backup-asdatabase does not throw unhandled exceptions, we need to throw one ourselves if the cmdlet returns an error. We can capture the error output of the backup-asdatabase cmdlet by supplying a variable to the built-in –ErrorVariable parameter. If this variable is not empty, we can throw an exception.

The revised PowerShell script now looks like this:

cd sqlas\localhost\tabular
backup-asdatabase "rosling.abf" hans_rosling_project -ApplyCompression -ErrorVariable myError
if ($myError) {throw $myError}

If you test this code, you will see that the job now fails at the job step if the backup file already exists, which is nice.

Of course, it would be better if the script succeeds if a backup file already exists. The –AllowOverwrite parameter is the one we need:

cd sqlas\localhost\tabular
backup-asdatabase "rosling.abf" hans_rosling_project -ApplyCompression -AllowOverwrite -ErrorVariable myError
if ($myError) {throw $myError}

Now we’re done. The script fails when it’s supposed to fail, and succeeds in most cases.

Using views in the a tabular model

It is very handy to views as the table source definition in the the tabular model.

The biggest advantage for using views – the columns can be renamed in the underlying data source without breaking your tabular model. If that’s not enough to convince you use views, here are a number of smaller advantages to using views, especially if you are incrementally loading data using ProcessAdd:

  • It is easy to limit the number of rows imported into the workspace database during development – simply add a where clause to your view and change the view definition later.
  • You can safely redeploy over top of a model in production without worrying about retaining partitions as the partition definition has not changed.
  • Because the partition definition has not changed, the underlying partition will never be marked as unprocessed and will not ever be reprocessed if a Process Default command is issued on the partition.
  • The script out of the partition definition will always accurately reflect the query to be used for processing.

Interested in using a view? Read on for the implementation details.

Creating a table based on a view

To create a table that is based on a view, go to SSDT, open your tabular project, and then launch the Import Wizard. You have the choice of using the table preview mode or entering a SQL query to import a table. Both of these can use views. The SQL query method is self explanatory. Table Preview mode could use a picture.
Views are shown in the list with a special icon that’s rather difficult to describe. It looks like a table with a checkbox with dotted lines on top (not sure how this represents a view). I have circled the view in this picture:

image

Once the view is selected, you can import as usual. If you made a mistake at import time and want to update the source of a table in your model to refer to a view instead of a SQL table, this is easy to correct. Simply select the table in either the grid or diagram view, press F4 to view the property grid, then click to edit the Source Data property. You can change the table definition to refer to the view instead of the table. If you have multiple partitions, you should update the partitions in the Partition Manager as well. Redeploy your model after making this change.

Creating a partition based on a view

You can use the Partition Manager in SSDT or SSMS to create a partition based on a view, even if the other partitions in the table are not based on views. Again, this is done using a SQL query (in both SSMS or SSDT) or in preview mode in SSDT.

Processing data incrementally into a partition based on a view

Processing data incrementally into partitions that are based on views is also pretty straightforward. First, you update the definition of the view in SQL to reduce the number of rows returned to just the rows we want to load, taking care to ensure that there are no duplicate rows introduced when processing. We can then do a ProcessAdd in SSMS, PowerShell, AMO, or SSIS to load the data into the partition.

Using AMO to incrementally load data into partitions based on views

If you are using AMO, the C# program I previously showed to incrementally load data changes a little bit. The only thing that changes in the AMO is that you no longer need to specify a binding when calling ProcessAdd. This means you no longer need to care about this mysterious “Sandbox” data source view, which is nice. Your AMO looks a bit nicer, however you do need some more code overall to deal with updating the SQL view.

Here is the code:

//update the SQL Server view, ensure that the data is being added for the year 1867
OleDbConnection conn = new OleDbConnection("Provider=SQLNCLI11;Data Source=.\\PowerPivot;Initial Catalog=hans_rosling_project;Integrated Security=SSPI;Persist Security Info=false");
String sql = "ALTER VIEW [dbo].[vDemographicFacts] AS SELECT ID, Country, Year, LifeExpectancy, GDPPerCapita, ChildMortalityPerThousand, ChildrenPerWoman, CountryKey, ArmsImport, ArmsExport FROM  dbo.DemographicFacts WHERE (Year = 1867)";
OleDbCommand cmd = new OleDbCommand(sql, conn);
conn.Open();
cmd.ExecuteNonQuery();

//connect to Analysis Services
Server s = new Server();
s.Connect(".\\TABULAR");

//navigate to the partition we want to process
Database db = s.Databases.FindByName("hans_rosling_project");
Cube c = db.Cubes.FindByName("Model");
MeasureGroup mg = c.MeasureGroups.FindByName("Demographic Facts");
Partition p = mg.Partitions.FindByName("DemographicFacts");

//incrementally add data to the pre-existing partition
p.Process(ProcessType.ProcessAdd);
s.Disconnect();

Using SSIS  to incrementally load data into partitions based on views

The Integration Services package that I previously showed for loading data changes as well. Now, you need an Execute SQL task to run before the Analysis Services Processing Task. Also, just as in the AMO example above, you no longer need to configure the Analysis Services Processing Task to use a query binding when doing the Process Add.

The new procedure for adding data incrementally looks like this:

  1. Create a new Integration Services project.
  2. Create two connection managers in the Connection Manager pane – one to the Analysis Services database you want to process, and an OLE DB connection to the data source you want to modify.
  3. Drag an Execute SQL Task from the SSIS toolbox to the Control Flow pane of the SSIS designer.
  4. Configure the Execute SQL Task to use the OLE DB connection defined in step 2, and set the SQL to ALTER VIEW [dbo].[vDemographicFacts] AS SELECT ID, Country, Year, LifeExpectancy, GDPPerCapita, ChildMortalityPerThousand, ChildrenPerWoman, CountryKey, ArmsImport, ArmsExport FROM dbo.DemographicFacts WHERE (Year = 1867).
  5. Drag an Analysis Services Processing Task from the SSIS toolbox to the Control Flow pane of the SSIS designer. Connect the output of the SQL task to the Analysis Services task.
  6. Double-click to configure the Analysis Services Processing Task. Use the Analysis Services connection you defined in step 2, then as described in my previous post add the partition to the list of objects to be projects by drilling down through the cube and the measure group.
  7. Set the Processing Option to Process Add for the partition, then click OK.

You can now run this package and see the view updated and data loaded incrementally.

Calling ProcessData in SSIS to incrementally load data into tabular partitions

Previously, I showed a pattern for incrementally loading data by creating a partition, doing a ProcessData on the partition, then doing a ProcessRecalc on the database all in a single transaction. It is possible to do the same thing in an Integration Services package. Let’s take a look.

We cannot use the Analysis Services Processing task for this chore. This is because the task was not updated for SQL Server 2012, so ProcessRecalc is not an option. Therefore, if we want to use a built in Analysis Services task, we must use the Analysis Services Execute DDL task to do this work.

We also need to do this work transactionally. Even though it is is something of a design antipattern to use transactions in an SSIS package, we must use one because we cannot leave the database in an unqueryable state while processing (recall that ProcessData renders a database unqueryable until a Recalc is performed). In theory, you could try to use the built-in transaction support for SSIS, but transactions do not work properly with Analysis Services. The easiest way to send the commands in a single transaction is to use one Analysis Services Execute DDL task and wrap all three commands we need inside of an XMLA Batch statement.

Imagine we want to create a partition for the year 1812 on the DemographicFacts table on the Hans Rosling demo project database. Here is how to do the incremental data load:

  1. From SSDT, create a new Integration Services project.
  2. Right click in the Connection Managers pane, add a new connection to your tabular server.
  3. Drop the Analysis Services Execute DDL task into the Control Flow area (it is under “Other Tasks” in the SSIS Toolbox).
  4. Double-click to configure the task.
  5. In the dialog box that appears, click DDL.
  6. In the DDL properties, set the Connection property to the connection manager you created in step 2.
  7. Click the … button in the SourceDirect property to launch the DDL dialog box.
  8. Paste in the following XMLA command:
    
    <Batch xmlns='http://schemas.microsoft.com/analysisservices/2003/engine' Transaction='true'>    
    <!-- Create the partition for the year 1812 on the DemographicFacts table -->  
    <!-- To create the partition, alter the measure group-->  
    <Alter AllowCreate="true" ObjectExpansion="ExpandFull" xmlns="<a href="http://schemas.microsoft.com/analysisservices/2003/engine">http://schemas.microsoft.com/analysisservices/2003/engine</a>">
       <Object>
        <DatabaseID>hans_rosling_project</DatabaseID>
        <CubeID>Model</CubeID>
        <MeasureGroupID>DemographicFacts_c2655f3e-3570-4a34-92bf-3f9ed44ad24b</MeasureGroupID>
        <PartitionID>Partition 1812</PartitionID>
       </Object>
       <ObjectDefinition>
        <Partition xmlns:xsd="<a href="http://www.w3.org/2001/XMLSchema">http://www.w3.org/2001/XMLSchema</a>" xmlns:xsi="<a href="http://www.w3.org/2001/XMLSchema-instance">http://www.w3.org/2001/XMLSchema-instance</a>" xmlns:ddl200_200="<a href="http://schemas.microsoft.com/analysisservices/2010/engine/200/200">http://schemas.microsoft.com/analysisservices/2010/engine/200/200</a>">
         <ID>Partition 1812</ID>
         <Name>Partition 1812</Name>
         <Source xsi:type="QueryBinding">
          <DataSourceID>bc946c5c-ec5b-4fbb-87d9-5e4f721e6adc</DataSourceID>
          <QueryDefinition>SELECT [dbo].[DemographicFacts].*   FROM [dbo].[DemographicFacts]  WHERE ([Year] = 1812)</QueryDefinition>
         </Source>
         <StorageMode valuens="ddl200_200">InMemory</StorageMode>
        </Partition>
       </ObjectDefinition>
      </Alter>
    
    <!-- Now do a ProceessData on the newly created partition -->
      <Process xmlns="<a href="http://schemas.microsoft.com/analysisservices/2003/engine">http://schemas.microsoft.com/analysisservices/2003/engine</a>">
       <Type>ProcessData</Type>
       <Object>
        <DatabaseID>hans_rosling_project</DatabaseID>
        <CubeID>Model</CubeID>
        <MeasureGroupID>DemographicFacts_c2655f3e-3570-4a34-92bf-3f9ed44ad24b</MeasureGroupID>
        <PartitionID>Partition 1812</PartitionID>
       </Object>
      </Process>
    
    <!-- Finally, do a ProcessRecalc on the database to make the DB queryable -->
      <Process xmlns="<a href="http://schemas.microsoft.com/analysisservices/2003/engine">http://schemas.microsoft.com/analysisservices/2003/engine</a>">
       <Type>ProcessRecalc</Type>
       <Object>
        <DatabaseID>hans_rosling_project</DatabaseID>
       </Object>
      </Process>
     </Batch>
    
  9. Press OK to all dialog boxes, and F5 to run the package. The run should be successful and you should be able to verify in SSMS that the partition was created, populated, and is queryable.

Let’s look a bit closer at that XMLA and look at how the scripts were generated. There are three commands in those 43 lines. The commands were generated by scripting out actions from three different places in the SSMS UI. I manually wrapped these commands in a batch, and inserted some comments which I hope are helpful. Use the comments to find the start of each command.

The first command, the Alter, creates a partition by modifying the DemographicFacts measure group. Notice that you must refer to the measure group by ID. The ID for the measure group was automatically generated by the SSDT UI when the table was created using the Import Wizard. The partition definition is again bound using a query binding. The data source ID is the GUID for the magical “Sandbox” object, which was also automatically created by the SSDT UI. None of these IDs are exposed in the UI in SSMS or SSDT, so you will need to discover them by scripting out.

To generate script for creating a partition, do the following:

  1. Launch the Partition Manager in SSMS on the desired table.
  2. Select a partition that is bound via query binding, then click the Copy button in the toolbar.
  3. From the New Partition dialog, click the Script button. This script will contain both the data source IDs and the measure group IDs necessary for the script.

That takes care of creating the partition. The next command is a ProcessData on the partition. Again, you must pass the measure group ID to do the processing.

To generate script for processing a partition:

  1. Launch the Partition Manager in SSMS on the desired table.
  2. Select a partition, click Process.
  3. Set the desired processing option in the dialog (in this case ProcessData), then script out. This script will contain the measure group ID you need for processing.

The last Process command is the ProcessRecalc on the database. Fortunately the database ID is quite readable, and also exposed via the properties page on the database, so you have some hope of actually being able to write this script yourself. Nevertheless, it is easier to generate the script by right-clicking the database in the SSMS object explorer, selecting Process, choosing ProcessRecalc from the processing options, then scripting out.

Simple, huh? Compare this to the similar code in AMO. The AMO snippet I posted was 26 lines and doesn’t contain any horrible GUIDs, so stylistically it may be a preferable approach to the one I outlined above for creating the DDL. You can call AMO from Integration Services from a scripting task. However, from a performance standpoint, you might want to stick with the ugly DDL to avoid having to deal with the VSTA dependency and overhead associated with the scripting task.

Also compare this to the approach for doing a ProcessAdd in SSIS. ProcessAdd is a lot neater, although there are scenario tradeoffs to be made. Comparing the two processing approaches merits a future post.

Doing a ProcessAdd in SSIS

In my post on doing a ProcessAdd in AMO, I promised something ugly in Integration Services to do a ProcessAdd. Actually I just tried it and it’s not too bad. Aesthetically, it’s as pleasing as using AMO (which is to say it’s a little painful but once you figure it out it’s maintainable). Here is how to do the same ProcessAdd with Integration Services.

ProcessAdd is one of the tabular processing operations that can be performed using the Analysis Services Processing Task, so we will use it to load the data. The major problem with using this task in SQL Server 2012 is that there is no way to make a connection with a project in your existing solution. You therefore cannot deploy both the tabular model and the IS package together in a single solution and have the connection string get updated. Instead, if you want to use the graphical interface for the Analysis Services Processing Task, you must first connect to a live server, manipulate the model, then change the connection string manually at deployment time to point to the production server. Not ideal, but not a total blocker.

That said, let’s do the ProcessAdd.

  1. From SSDT, create a new Integration Services project.
  2. Right click in the Connection Managers pane, add a new connection to your tabular server.
  3. Drop the Analysis Services Processing Task into the Control Flow area.
  4. Double-click to configure the task.
  5. In the dialog box that appears, click Processing Settings.
  6. Set the Analysis Services connection manager to the connection you created in step 2.
  7. Click the Add button. You see a dialog like this one:image
  8. As in AMO, you must drill down from the cube level to find the measure group and partition you want to process. In our case, let’s go ahead and add more data to that Partition 1800 we created. Let’s select that partition as pictured below, then click OK.image
  9. Now we’re back in the Processing Settings dialog, with Partition 1800 selected for processing. Set the Process Options for this partition to Process Add, then click the Configure link that appears.image
  10. Lo and behold, a dialog appears that allows you to set the out of line query binding for the Process Add. I think we are up to the year 1803, so let’s set the query binding to SELECT [dbo].[DemographicFacts].* FROM [dbo].[DemographicFacts] WHERE ([Year] = 1803), as pictured below:image
  11. Click OK on all dialogs, then press F5 to execute the package. Processing should succeed, and you can now go and verify from SSMS that the data was loaded successfully.

See? Not so bad. Where it gets uglier is when you do incremental processing using the ProcessData + ProcessRecalc pattern, as I will show in a future post.

See also: Using Integration Services with the tabular model

Pasted tables–a poor woman’s display folder

One limitation of the tabular model is that display folders are not supported. Another limitation of the tabular model is that you cannot add simple aggregation measures (sum, average, etc) on the fly in pivot tables in Excel. Because all aggregations must be pre-defined in your model, the number of measures in your fact table quickly proliferates. Since neither Power View nor Excel support search in the field list for tabular models, it rapidly becomes hard to find measures in the field list when there are many different facts in your model, and the field list becomes unusable. What to do?

One thing you can do is use empty tables to hold related measures. Measures can be placed on any table in the tabular model with no ill effect. The only difference between placing the measure directly on the fact table vs. placing the measure elsewhere is how you reference the columns and measures in the calculation – names must be fully qualified if the measure is outside the fact table. Personally, I find that a small price to pay for a more organized field list.

You can easily construct an empty table using pasted tables. This works for all models except DirectQuery models, which do not support pasted data. Here is how to do it.

Step 1: create some dummy data for the empty table’s column. This is easy enough to do in Excel, a 1×1 piece of data will do:

image

Step 2: copy this data. If you are working in SSDT, go back to the SSDT designer window and click inside the designer. Then go to the Edit Menu, select Paste:

image

Step 3: The Paste Preview dialog box appears. Name your “display folder”, ensure the “Use first row as column headers” box is checked, and click OK.

image

Step 4: To ensure this dummy column is not visible to end users, hide it by right-clicking the column and selecting “Hide from Client tools”

Step 5: Add measures. One thing to note: although copy and paste across cells was enabled in the measure grid, it only works on cells in the same table. You cannot right click a cell in the measure grid, copy it, then paste it in another table. If you are migrating measures from your fact table to the new table, you will have to select the text of the measure in the formula bar, copy the text, and paste the text into the formula bar in the new table. Make sure you fully qualify any column or measure names when you move the measures from table to table. You might find it easier to use the DAX Editor to move the measures from place to place instead.

Step 6: Open Excel or Power View. Enjoy your newly organized measures.

Basic date handling in DAX

One of the things that you learn very quickly in the tabular model is that you always need to have a date table. I thought that for my Hans Rosling project I could get away without one, since the data is captured only annually. Surely you should be able to do year-over-year calculations and get the previous year in DAX from the row context, right? Wrong. After much trial and error (read: error), I made the date table and got the calculations to work.

There’s a couple of things to note about the date table. If your date requirements are simple enough (and mine are as simple as you can get), there is no need to have a date table in the data source. You can just construct it in a SQL query. Also, your date key cannot be a calculated column. In my case, it was therefore necessary to manufacture a key as part of the table definition and use it when marking the date table.

The SQL for my table definition is indeed very basic:

select distinct d.Year,
DateKey=CAST(DATEFROMPARTS(d.Year, 1, 1) AS DATE)
from dbo.DemographicFacts d

The next step is to explicitly mark this table as a date table. You can find the “Mark as Date Table” easily enough in the PowerPivot ribbon. In SSDT (formerly BIDS), you can find this feature in Table->Date->Mark as Date Table. In the dialog box that appears, select DateKey, OK.

A few more modeling tasks remain. Hide the DateKey column by right-clicking it and selecting “Hide from Client Tools”. Finally, create the relationship between the date table and the Demographic Facts table. You can create the relationship on the Year column with no ill effects. The time intelligence functions still work.

Now, to write the DAX formulas. My task was to fetch the data for the previous year so I could do a year-over-year calculation. In my previous post on learning DAX I linked to this post on using time intelligence functions. Armed with the information on that page, I could solve my problem very simply using the DATEADD() function.

For example, to calculate the year-over-year growth in Average GDP Per Capita, I defined these measures:

Avg GDP Per Capita:=AVERAGE('Demographic Facts'[GDP Per Capita])
Avg GDP Previous Year:=CALCULATE([Avg GDP Per Capita], DATEADD('Date'[DateKey], -1, YEAR))
YOY Growth GDP:=IF([Avg GDP Per Capita]&&[Avg GDP Previous Year], ([Avg GDP Per Capita]-[Avg GDP Previous Year])/[Avg GDP Previous Year], BLANK())

And there you have it, basic date/time intelligence. Unfortunately in performing these calculations I found that the data did not support my hypothesis (that there would be a relationship between GDP growth, growth in life expectancy, and reduction in fertility). Unfortunately DAX won’t help me make better hypotheses, I still have to do that sort of thinking myself. On to guessing other relationships and finding more supporting data…

To master the tabular model, master DAX

The AdventureWorks tutorial for tabular models may be good for some things (especially for teaching row security), but one thing it is not good at is teaching DAX. Learning DAX is an absolute must before you can be successful at tabular modeling. One thing I get asked about is where people can start with learning DAX. Resources are scattered all over the place, so it’s hard to isolate just one to get started with. Here are some resources that I have found helpful.

Owen Duncan on the Analysis Services documentation team just put up a DAX quick start guide, which introduces some core DAX concepts in a 30 minute overview.

Then, once you have looked at the very basics, you can buy the book PowerPivot for Excel 2010: Give your data meaning by Marco Russo and Alberto Ferrari. Although marketed as a PowerPivot book, it is incredibly useful for tabular modelers. This book can sit on your lap while you mentally CALCULATE() your FILTER() context in ALL() scenarios. The book does not include all the DAX features available in SQL Server 2012, so you will need to supplement it with more material.

Once you have the book, move to the blogs. The DAX whitepaper for SQL Server 2012 is now available, which is great, but I personally find the white paper hard to wrap my head around. I find the blogs much more accessible. Once nice thing that the MSDN team did was aggregate a list of DAX blogs and learning resources on a page called the DAX resource centre. This page serves as a clearing house to DAX blog posts from people like Rob Collie, Chris Webb, Kasper de Jonge, Marco Russo, and Alberto Ferrari. I don’t suggest sitting down and reading these in one sitting, but these are good resources to have on hand if you are trying to solve problems.

Speaking of solving problems. There are a couple of problems that show up time and time again in DAX. One is time intelligence, and one is many-to-many relationships. You can get started with time intelligence using this post on time intelligence on the Analysis Services team blog. You will also want to check out Kasper’s post on what is new for time intelligence in SQL Server 2012. For many-to-many relationships, you should check out the Many to Many revolution white paper by Marco and Alberto.

Finally, a couple of videos. Last year, Kasper and Ashvini Sharma did a DAX talk at TechEd. Also, Alberto did a nice talk on many-to-many relationships at SQLBits 9. Both are well worth watching for the in depth explanations.

This should be enough to get you up and running with a DAX side project (or even a DAX real project!). Ultimately, the best DAX resource is trial and error – there is no replacement for writing the formulas yourself for your data sets. Although DAX looks kind of friendly up front, it gets pretty gnarly pretty fast. It’s worth taking some learning time on a side project to come up to speed before trying a full on implementation.

Processing data transactionally in AMO

There are a few processing commands that leave the tabular model in an unqueryable state until a ProcessRecalc is done at the database level. These commands are ProcessData and ProcessClear. Also, merging partitions in code or script leaves the model unqueryable until a ProcessRecalc occurs. Be mindful of this when implementing your automation for processing. One common processing pattern is to create a new partition, ProcessData on the partition, then ProcessRecalc on the database. If you want your model to be queryable at all times (and you probably do), you must perform this task transactionally.

I will show this transactional processing pattern in AMO on the Hans Rosling project database. Recall the previous demonstration of creating a partition and processing it. Imagine now we want a partition for the year 1802, created and populated in a single transaction. To do this in AMO, we capture the commands in a capture log, then execute the commands in one shot. This technique was illustrated in a code sample by Adrian Dumitrascu in 2006 in a Usenet post (remember Usenet?). The technique still works with tabular. Now let’s look at the code to use it with the ProcessData/ProcessRecalc pattern.

//connect to server, substitute the server/instance name with yours as necessary
Server s=new Server();
s.Connect(".\\TABULAR");

//navigate to the Demographic Facts measure group, we must add the partition to the measure group
Database db = s.Databases.FindByName("hans_rosling_project");
DataSourceView dsv = db.DataSourceViews.FindByName("Sandbox");
Cube c = db.Cubes.FindByName("Model");
MeasureGroup mg = c.MeasureGroups.FindByName("Demographic Facts");

//enter capture mode
s.CaptureXml = true;

//log the commands we want to perform - create partition, Process Data, Process Recalc
Partition p = mg.Partitions.Add("Partition 1802");
p.Source = new QueryBinding(dsv.DataSourceID, "SELECT [dbo].[DemographicFacts].*   FROM [dbo].[DemographicFacts]  WHERE ([Year] = 1802)");
p.StorageMode = StorageMode.InMemory;
p.Update(UpdateOptions.ExpandFull);
p.Process(ProcessType.ProcessData);
db.Process(ProcessType.ProcessRecalc);

//stop logging
s.CaptureXml = false;

//execute commands transactionally but not in parallel
XmlaResultCollection results = s.ExecuteCaptureLog(true, false);

We’ve seen the code to navigate to the measure group several times before. The new code, which starts logging the commands, begins with the s.CaptureXml line. When we finally go to execute the captured commands by calling ExecuteCaptureLog, we make sure we set the first parameter to true so that the commands are transactional. In this case, it doesn’t really matter if you set the second parameter to true or false – the engine will look at this particular processing job and schedule it in sequence anyway. I set the parameter to false to make it clearer that the code is sequential.

If you are using this code, be sure to leave lots of comments. This particular piece of code is not self-documenting. Nowhere here is there any method or enum called something helpful like “Transaction” or “Batch”. AMO is what it is at this point, so we just need to learn to live with its quirks.

See also: Process Add example. Although Process Add does not require a Recalc, the query binding and the Process Add should happen in a single transaction as well. I will leave improving that example as an exercise for the reader.

Doing a ProcessAdd in AMO

You can use ProcessAdd to add more data to an existing partition. There are a few ways to use ProcessAdd:

  • Add a new query binding, then process the query binding
  • Change the partition definition query to reflect the rows you want to add, then do a ProcessAdd on the partition
  • If your partition definition query is based on a view, update the definition of the view in the data source to reflect the rows that you want to add, then do a ProcessAdd on the partition

Unfortunately, unlike multidimensional, the tabular UI does not have a place to define a new query binding for a partition. If you want to use the query binding method, you need to do it using automation or script.

Adding the new query binding and doing the ProcessAdd in AMO is pretty simple. Recall my previous AMO example for creating a new partition. In that example, we created a new partition called “Partition 1800” which added the data for the year 1800. Now imagine we wanted to update this partition and add the data for the year 1801 using a query binding. Here is the code:

//connect to server, substitute the server/instance name with yours as necessary
Server s=new Server();
s.Connect(".\\TABULAR");

//navigate to the partition we want to process
Database db = s.Databases.FindByName("hans_rosling_project");
DataSourceView dsv = db.DataSourceViews.FindByName("Sandbox");
Cube c = db.Cubes.FindByName("Model");
MeasureGroup mg = c.MeasureGroups.FindByName("Demographic Facts");
Partition p = mg.Partitions.FindByName("Partition 1800");

//do the ProcessAdd for the year 1801, defining the new query binding inline. then disconnect
p.Process(ProcessType.ProcessAdd, new QueryBinding(dsv.DataSourceID, "SELECT [dbo].[DemographicFacts].*   FROM [dbo].[DemographicFacts]  WHERE ([Year] = 1801)" ));
s.Disconnect();

Once you navigate to the right place, it is one line to do the ProcessAdd, including the definition of the new query binding. This is neater than any solution using XMLA.

One thing to note about this example: you will see that there is no ProcessRecalc after the ProcessAdd. As it turns out, ProcessAdd does the necessary recalculations so the database is queryable after this operation completes. Therefore it is not necessary to wrap this statement in a batch or to do the recalc.

Of course, the neatest solution of all is to update the definition of your view and then do a ProcessAdd on that. That is especially true if you are using Integration Services, as I will demonstrate in a future post.