Skip to content

Process Defrag

April 30, 2012

As I mentioned in my previous post on processing tabular models, it is a good idea to do a Process Defrag on your tables if you are doing incremental processing. I didn’t really explain why before. Now is the time.

First, think about how the xVelocity in-memory analytics engine (VertiPaq engine) processes and stores your data. There is a dictionary, stored in memory, that tells the xVelocity engine where to find your data. That dictionary is contains encoded entries, which were encoded using hash encoding (for strings) or value encoding. The more data that the engine can see at encoding time, the better the compression and the smaller the dictionary. If you do a lot of incremental processing, encoding can be done inefficiently (since engine can’t see all data at once and choose a smarter encoding method) and the dictionary can get a lot bigger for no reason. Also, when you clear data from your tables, the corresponding dictionary entries are not removed. Over time, cruft accumulates in the dictionary, thus taking up precious memory.

There are a few ways to figure out how much room your tables are using:

  • Use Kasper’s memory tracking workbook.
  • Use a DMV query. The most useful one in this scenario is DISCOVER_STORAGE_TABLE_COLUMN_SEGMENTS.
  • Look on disk. For example, on a 64-bit OS with default Data directory location for an instance called .\Tabular, I can look for my database in C:\Program Files\Microsoft SQL Server\MSAS11.TABULAR\OLAP\Data\ and navigate the directories to the table that you want to monitor. For example, if I wanted to monitor the FactInvoice table of my TabularProject20, I would navigate to C:\Program Files\Microsoft SQL Server\MSAS11.TABULAR\OLAP\Data\TabularProject20.0.db, right-click the FactInvoice<blah> folder, and then click Properties to see the file size.

There are two ways to get rid of cruft – either Process Full, which lets the engine see all the data at encoding time and thus do a good job of compressing it, or process defrag.

Let’s see the effect of a Process Defrag. Here is a table that I have been processing. I did some Process Add, Process Data, Process Clear, delete partitions, merge partitions, etc – probably 10 incremental processing operations in all. Here’s my file size for the dimension – 118 MB for two partitions on a table with 1.7 million rows.

image

After a Process Defrag, look at the results:

image

The table is down to 55 MB, under half its original size! Defrag is worth its weight in gold. Just make sure you have enough RAM to complete the defrag operation – because you are dealing with uncompressed data, you will need at least 2x the size of the table you are defragmenting available in RAM or else the defrag will fail with an out of memory error. For this example, with our 118 MB table we should have 236 MB free RAM before we try the defrag.

Of course no processing example would be complete without some automation examples. Here is the AMO code for a Process Defrag:

//connect to Analysis Services
Server s = new Server();
s.Connect(".\\TABULAR");

//process defrag the Demographic Facts dimension
Database db = s.Databases.FindByName("hans_rosling_project");
Dimension dim = db.Dimensions.FindByName("Demographic Facts");
dim.Process(ProcessType.ProcessDefrag);

s.Disconnect();

Compared to some of our previous AMO samples, this is pretty clean. No Sandboxes, no GUIDs, just get the table, defrag it, and get out. Nice.

You can do a ProcessDefrag from PowerShell using the Invoke-ProcessDimension cmdlet. You cannot do a ProcessDefrag from the Analysis Services Processing Task in SSIS, as the option is not exposed in the UI. You can, however, use the Analysis Services DDL task to do the defrag. Here’s some example XMLA:


  <Process xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
  <Type>ProcessDefrag</Type>
  
    TabularProject20
    FactInvoice_c0bedb71-3d9d-4163-9b05-a53a128c64d2
  
</Process>  

Moral of the story – it is really important that a good defrag is part of your processing routine. Make sure you do it before it’s too late, and you run out of RAM to clean up after yourself.

About these ads
2 Comments
  1. Chris permalink

    thanks for the tip

  2. One addition. Be sure to run ProcessRecalc after running ProcessDefrag.

    I’m sure in your partitioning scenario above the ProcessDefrag did shrink the sizes of some files, but I suspect that a large part of why it saved so much disk space is that you haven’t rerun ProcessRecalc yet. Probably best to do that in the same transaction so that your model isn’t left unquerable.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 35 other followers

%d bloggers like this: