Skip to content

Doing a ProcessAdd in SSIS

In my post on doing a ProcessAdd in AMO, I promised something ugly in Integration Services to do a ProcessAdd. Actually I just tried it and it’s not too bad. Aesthetically, it’s as pleasing as using AMO (which is to say it’s a little painful but once you figure it out it’s maintainable). Here is how to do the same ProcessAdd with Integration Services.

ProcessAdd is one of the tabular processing operations that can be performed using the Analysis Services Processing Task, so we will use it to load the data. The major problem with using this task in SQL Server 2012 is that there is no way to make a connection with a project in your existing solution. You therefore cannot deploy both the tabular model and the IS package together in a single solution and have the connection string get updated. Instead, if you want to use the graphical interface for the Analysis Services Processing Task, you must first connect to a live server, manipulate the model, then change the connection string manually at deployment time to point to the production server. Not ideal, but not a total blocker.

That said, let’s do the ProcessAdd.

  1. From SSDT, create a new Integration Services project.
  2. Right click in the Connection Managers pane, add a new connection to your tabular server.
  3. Drop the Analysis Services Processing Task into the Control Flow area.
  4. Double-click to configure the task.
  5. In the dialog box that appears, click Processing Settings.
  6. Set the Analysis Services connection manager to the connection you created in step 2.
  7. Click the Add button. You see a dialog like this one:image
  8. As in AMO, you must drill down from the cube level to find the measure group and partition you want to process. In our case, let’s go ahead and add more data to that Partition 1800 we created. Let’s select that partition as pictured below, then click OK.image
  9. Now we’re back in the Processing Settings dialog, with Partition 1800 selected for processing. Set the Process Options for this partition to Process Add, then click the Configure link that appears.image
  10. Lo and behold, a dialog appears that allows you to set the out of line query binding for the Process Add. I think we are up to the year 1803, so let’s set the query binding to SELECT [dbo].[DemographicFacts].* FROM [dbo].[DemographicFacts] WHERE ([Year] = 1803), as pictured below:image
  11. Click OK on all dialogs, then press F5 to execute the package. Processing should succeed, and you can now go and verify from SSMS that the data was loaded successfully.

See? Not so bad. Where it gets uglier is when you do incremental processing using the ProcessData + ProcessRecalc pattern, as I will show in a future post.

See also: Using Integration Services with the tabular model

Pasted tables–a poor woman’s display folder

One limitation of the tabular model is that display folders are not supported. Another limitation of the tabular model is that you cannot add simple aggregation measures (sum, average, etc) on the fly in pivot tables in Excel. Because all aggregations must be pre-defined in your model, the number of measures in your fact table quickly proliferates. Since neither Power View nor Excel support search in the field list for tabular models, it rapidly becomes hard to find measures in the field list when there are many different facts in your model, and the field list becomes unusable. What to do?

One thing you can do is use empty tables to hold related measures. Measures can be placed on any table in the tabular model with no ill effect. The only difference between placing the measure directly on the fact table vs. placing the measure elsewhere is how you reference the columns and measures in the calculation – names must be fully qualified if the measure is outside the fact table. Personally, I find that a small price to pay for a more organized field list.

You can easily construct an empty table using pasted tables. This works for all models except DirectQuery models, which do not support pasted data. Here is how to do it.

Step 1: create some dummy data for the empty table’s column. This is easy enough to do in Excel, a 1×1 piece of data will do:

image

Step 2: copy this data. If you are working in SSDT, go back to the SSDT designer window and click inside the designer. Then go to the Edit Menu, select Paste:

image

Step 3: The Paste Preview dialog box appears. Name your “display folder”, ensure the “Use first row as column headers” box is checked, and click OK.

image

Step 4: To ensure this dummy column is not visible to end users, hide it by right-clicking the column and selecting “Hide from Client tools”

Step 5: Add measures. One thing to note: although copy and paste across cells was enabled in the measure grid, it only works on cells in the same table. You cannot right click a cell in the measure grid, copy it, then paste it in another table. If you are migrating measures from your fact table to the new table, you will have to select the text of the measure in the formula bar, copy the text, and paste the text into the formula bar in the new table. Make sure you fully qualify any column or measure names when you move the measures from table to table. You might find it easier to use the DAX Editor to move the measures from place to place instead.

Step 6: Open Excel or Power View. Enjoy your newly organized measures.

Basic date handling in DAX

One of the things that you learn very quickly in the tabular model is that you always need to have a date table. I thought that for my Hans Rosling project I could get away without one, since the data is captured only annually. Surely you should be able to do year-over-year calculations and get the previous year in DAX from the row context, right? Wrong. After much trial and error (read: error), I made the date table and got the calculations to work.

There’s a couple of things to note about the date table. If your date requirements are simple enough (and mine are as simple as you can get), there is no need to have a date table in the data source. You can just construct it in a SQL query. Also, your date key cannot be a calculated column. In my case, it was therefore necessary to manufacture a key as part of the table definition and use it when marking the date table.

The SQL for my table definition is indeed very basic:

select distinct d.Year,
DateKey=CAST(DATEFROMPARTS(d.Year, 1, 1) AS DATE)
from dbo.DemographicFacts d

The next step is to explicitly mark this table as a date table. You can find the “Mark as Date Table” easily enough in the PowerPivot ribbon. In SSDT (formerly BIDS), you can find this feature in Table->Date->Mark as Date Table. In the dialog box that appears, select DateKey, OK.

A few more modeling tasks remain. Hide the DateKey column by right-clicking it and selecting “Hide from Client Tools”. Finally, create the relationship between the date table and the Demographic Facts table. You can create the relationship on the Year column with no ill effects. The time intelligence functions still work.

Now, to write the DAX formulas. My task was to fetch the data for the previous year so I could do a year-over-year calculation. In my previous post on learning DAX I linked to this post on using time intelligence functions. Armed with the information on that page, I could solve my problem very simply using the DATEADD() function.

For example, to calculate the year-over-year growth in Average GDP Per Capita, I defined these measures:

Avg GDP Per Capita:=AVERAGE('Demographic Facts'[GDP Per Capita])
Avg GDP Previous Year:=CALCULATE([Avg GDP Per Capita], DATEADD('Date'[DateKey], -1, YEAR))
YOY Growth GDP:=IF([Avg GDP Per Capita]&&[Avg GDP Previous Year], ([Avg GDP Per Capita]-[Avg GDP Previous Year])/[Avg GDP Previous Year], BLANK())

And there you have it, basic date/time intelligence. Unfortunately in performing these calculations I found that the data did not support my hypothesis (that there would be a relationship between GDP growth, growth in life expectancy, and reduction in fertility). Unfortunately DAX won’t help me make better hypotheses, I still have to do that sort of thinking myself. On to guessing other relationships and finding more supporting data…

To master the tabular model, master DAX

The AdventureWorks tutorial for tabular models may be good for some things (especially for teaching row security), but one thing it is not good at is teaching DAX. Learning DAX is an absolute must before you can be successful at tabular modeling. One thing I get asked about is where people can start with learning DAX. Resources are scattered all over the place, so it’s hard to isolate just one to get started with. Here are some resources that I have found helpful.

Owen Duncan on the Analysis Services documentation team just put up a DAX quick start guide, which introduces some core DAX concepts in a 30 minute overview.

Then, once you have looked at the very basics, you can buy the book PowerPivot for Excel 2010: Give your data meaning by Marco Russo and Alberto Ferrari. Although marketed as a PowerPivot book, it is incredibly useful for tabular modelers. This book can sit on your lap while you mentally CALCULATE() your FILTER() context in ALL() scenarios. The book does not include all the DAX features available in SQL Server 2012, so you will need to supplement it with more material.

Once you have the book, move to the blogs. The DAX whitepaper for SQL Server 2012 is now available, which is great, but I personally find the white paper hard to wrap my head around. I find the blogs much more accessible. Once nice thing that the MSDN team did was aggregate a list of DAX blogs and learning resources on a page called the DAX resource centre. This page serves as a clearing house to DAX blog posts from people like Rob Collie, Chris Webb, Kasper de Jonge, Marco Russo, and Alberto Ferrari. I don’t suggest sitting down and reading these in one sitting, but these are good resources to have on hand if you are trying to solve problems.

Speaking of solving problems. There are a couple of problems that show up time and time again in DAX. One is time intelligence, and one is many-to-many relationships. You can get started with time intelligence using this post on time intelligence on the Analysis Services team blog. You will also want to check out Kasper’s post on what is new for time intelligence in SQL Server 2012. For many-to-many relationships, you should check out the Many to Many revolution white paper by Marco and Alberto.

Finally, a couple of videos. Last year, Kasper and Ashvini Sharma did a DAX talk at TechEd. Also, Alberto did a nice talk on many-to-many relationships at SQLBits 9. Both are well worth watching for the in depth explanations.

This should be enough to get you up and running with a DAX side project (or even a DAX real project!). Ultimately, the best DAX resource is trial and error – there is no replacement for writing the formulas yourself for your data sets. Although DAX looks kind of friendly up front, it gets pretty gnarly pretty fast. It’s worth taking some learning time on a side project to come up to speed before trying a full on implementation.

Processing data transactionally in AMO

There are a few processing commands that leave the tabular model in an unqueryable state until a ProcessRecalc is done at the database level. These commands are ProcessData and ProcessClear. Also, merging partitions in code or script leaves the model unqueryable until a ProcessRecalc occurs. Be mindful of this when implementing your automation for processing. One common processing pattern is to create a new partition, ProcessData on the partition, then ProcessRecalc on the database. If you want your model to be queryable at all times (and you probably do), you must perform this task transactionally.

I will show this transactional processing pattern in AMO on the Hans Rosling project database. Recall the previous demonstration of creating a partition and processing it. Imagine now we want a partition for the year 1802, created and populated in a single transaction. To do this in AMO, we capture the commands in a capture log, then execute the commands in one shot. This technique was illustrated in a code sample by Adrian Dumitrascu in 2006 in a Usenet post (remember Usenet?). The technique still works with tabular. Now let’s look at the code to use it with the ProcessData/ProcessRecalc pattern.

//connect to server, substitute the server/instance name with yours as necessary
Server s=new Server();
s.Connect(".\\TABULAR");

//navigate to the Demographic Facts measure group, we must add the partition to the measure group
Database db = s.Databases.FindByName("hans_rosling_project");
DataSourceView dsv = db.DataSourceViews.FindByName("Sandbox");
Cube c = db.Cubes.FindByName("Model");
MeasureGroup mg = c.MeasureGroups.FindByName("Demographic Facts");

//enter capture mode
s.CaptureXml = true;

//log the commands we want to perform - create partition, Process Data, Process Recalc
Partition p = mg.Partitions.Add("Partition 1802");
p.Source = new QueryBinding(dsv.DataSourceID, "SELECT [dbo].[DemographicFacts].*   FROM [dbo].[DemographicFacts]  WHERE ([Year] = 1802)");
p.StorageMode = StorageMode.InMemory;
p.Update(UpdateOptions.ExpandFull);
p.Process(ProcessType.ProcessData);
db.Process(ProcessType.ProcessRecalc);

//stop logging
s.CaptureXml = false;

//execute commands transactionally but not in parallel
XmlaResultCollection results = s.ExecuteCaptureLog(true, false);

We’ve seen the code to navigate to the measure group several times before. The new code, which starts logging the commands, begins with the s.CaptureXml line. When we finally go to execute the captured commands by calling ExecuteCaptureLog, we make sure we set the first parameter to true so that the commands are transactional. In this case, it doesn’t really matter if you set the second parameter to true or false – the engine will look at this particular processing job and schedule it in sequence anyway. I set the parameter to false to make it clearer that the code is sequential.

If you are using this code, be sure to leave lots of comments. This particular piece of code is not self-documenting. Nowhere here is there any method or enum called something helpful like “Transaction” or “Batch”. AMO is what it is at this point, so we just need to learn to live with its quirks.

See also: Process Add example. Although Process Add does not require a Recalc, the query binding and the Process Add should happen in a single transaction as well. I will leave improving that example as an exercise for the reader.

Doing a ProcessAdd in AMO

You can use ProcessAdd to add more data to an existing partition. There are a few ways to use ProcessAdd:

  • Add a new query binding, then process the query binding
  • Change the partition definition query to reflect the rows you want to add, then do a ProcessAdd on the partition
  • If your partition definition query is based on a view, update the definition of the view in the data source to reflect the rows that you want to add, then do a ProcessAdd on the partition

Unfortunately, unlike multidimensional, the tabular UI does not have a place to define a new query binding for a partition. If you want to use the query binding method, you need to do it using automation or script.

Adding the new query binding and doing the ProcessAdd in AMO is pretty simple. Recall my previous AMO example for creating a new partition. In that example, we created a new partition called “Partition 1800” which added the data for the year 1800. Now imagine we wanted to update this partition and add the data for the year 1801 using a query binding. Here is the code:

//connect to server, substitute the server/instance name with yours as necessary
Server s=new Server();
s.Connect(".\\TABULAR");

//navigate to the partition we want to process
Database db = s.Databases.FindByName("hans_rosling_project");
DataSourceView dsv = db.DataSourceViews.FindByName("Sandbox");
Cube c = db.Cubes.FindByName("Model");
MeasureGroup mg = c.MeasureGroups.FindByName("Demographic Facts");
Partition p = mg.Partitions.FindByName("Partition 1800");

//do the ProcessAdd for the year 1801, defining the new query binding inline. then disconnect
p.Process(ProcessType.ProcessAdd, new QueryBinding(dsv.DataSourceID, "SELECT [dbo].[DemographicFacts].*   FROM [dbo].[DemographicFacts]  WHERE ([Year] = 1801)" ));
s.Disconnect();

Once you navigate to the right place, it is one line to do the ProcessAdd, including the definition of the new query binding. This is neater than any solution using XMLA.

One thing to note about this example: you will see that there is no ProcessRecalc after the ProcessAdd. As it turns out, ProcessAdd does the necessary recalculations so the database is queryable after this operation completes. Therefore it is not necessary to wrap this statement in a batch or to do the recalc.

Of course, the neatest solution of all is to update the definition of your view and then do a ProcessAdd on that. That is especially true if you are using Integration Services, as I will demonstrate in a future post.

Building a BI Portal for the Tabular Model

Thanks to the folks that braved the snow and freezing rain to see the session on building a BI portal for the tabular model. Here is the deck: Building a BI Portal.

There is some reference material in this deck that is useful to those who were not there this morning. What I did was walk through four major Microsoft reporting/data visualization tools (Excel, Power View, Reporting Services, and PerformancePoint) and looked at the strengths and weaknesses of the tools. I sliced the decision making on two axes: technological considerations (requirements such as SharePoint, mobile, ad hoc reporting, etc) and also data visualizations (availability of the different types of charts, graphs, Analysis Services capabilities, etc) in each tool. There are a number of slides that stack the four tools against each other in each of these areas, and compare/contrast in one place the capabilities of each tool. Check it out, if there are questions let me know.

Bottom line: Power View just adds to the offerings in the Microsoft BI stack. No existing tools, including older tools like PerformancePoint, are killed or displaced by Power View. In some ways, Power View actually makes BI pros jobs harder. The slick data visualizations that people want to see front and centre are actually incredibly difficult to integrate into a full-fledged BI dashboard, as there is no custom web part for Power View and no way to link in child reports from Power View reports. BI Pros wanting to make use of the full set of reporting capabilities available for the tabular model will have to do a delicate and creative dance between four tools and try to glue the bits and pieces together.

This presentation was based off the data set for the Hans Rosling project. I have extended this data set and added arms import and export data (to get some additive measures), I will post an updated data set in the future. Also I will post some DAX learning resources for building the tabular model driving the Hans Rosling project. Links to the date/time functions in particular will be very helpful for those approaching DAX for the first time.

Looking forward to doing this one again at SQLBits.