In part 1 of this article, I showed how to use a hierarchy in a pivot table when row security is applied to a tabular model. In that example, I showed you a measure that displays the head count for all employees in the organization, regardless of the security model.
In this post, I will show you how to blank out the totals for the bosses instead of displaying them. Doing this with headcount is a bit of a contrived example, as it shouldn’t be a secret how many people work in a company. A better scenario is something like salary information, but head count will have to do for now. Again, this is complicated. You have been warned.
As a pre-requisite to this calculation, you must first construct the calculated columns L1HeadCount through L6HeadCount as instructed in Part 1.
Next, we will construct 7 measures. The first 6 measures are to check to see which level in the hierarchy is currently showing in the pivot table. The seventh measure actually performs the head count calculation.
The first measure looks like this:
Is L1 Filtered:= If( HASONEVALUE(Employees[Level1]), Not(ISBLANK(Values(Employees[Level1]))) && ISBLANK( COUNTROWS( FILTER( all( Employees), [UserAliasWithDomain] = VALUES( Employees[Level1]) ) ) ), FALSE() )
This measure checks to see if the selected level in the hierarchy has been filtered out by row security. It returns TRUE if there is a single level selected in the hierarchy and it has been filtered out by row security, and FALSE otherwise. The COUNTROWS() FILTER() bit is the one that checks to see if the level in the hierarchy has been affected by row security. I can’t really give a more detailed explanation of why each part is necessary; as I wrote this measure at the end of March and I have forgotten the details. Needless to say it’s not self-evident.
You can now add five more filters, for Level2 through Level6, simply substituting the column name [Level1] for the corresponding columns for the other levels.
Now, you can do the final calculation for the filtered head count:
Filtered Head Count:= if( ISFILTERED([Level6]) && NOT([Is L6 Filtered]), MAX([L6HeadCount]), if( ISFILTERED([Level5]) && NOT([Is L5 Filtered]), MAX([L5HeadCount]), If( ISFILTERED([Level4]) && NOT([Is L4 Filtered]), MAX([L4HeadCount]), if( ISFILTERED( [Level3]) && NOT( [Is L3 Filtered]), Max( [L3HeadCount]), if( ISFILTERED( [Level2]) && NOT( [Is L2 Filtered]), MAX( [L2HeadCount]), If( NOT( [Is L1 Filtered]), Max( [L1HeadCount]), BLANK( ) ) ) ) ) ) )
This measure says hey, if there is a single level of the hierarchy showing in the pivot table, and the level of the hierarchy has not been filtered out by row security, show the head count, otherwise, show a blank. Test this out in the same way you tested part 1, and you’ll see that it works as advertised.
Please, if you can optimize this DAX, post something better in the comments. Otherwise, copy/paste it into your model and just fiddle around with it blindly until something works.
Fair warning: this is going to be complicated. In this post, I am going to assume that you’ve read the row security whitepaper, you are familiar with how parent/child hierarchies work in the tabular model, and you can deal with fairly complicated DAX concepts. For a primer on hierarchies, see Kasper’s post. For DAX, I recommend a few months of experience after learning DAX. If you don’t have that much DAX experience, let your eyes gently glaze over while I try to explain the DAX formulas, blindly copy/paste the formulas into your model, and hope for the best without trying to achieve true understanding (the latter is generally how I deal with DAX). Now to the post.
Hierarchies don’t work very well when row security is applied to a model. I went into details on this problem in the tabular security white paper, see the section “Example: Securing a model using parent/child hierarchies” for a detailed writeup.
I have a simple problem: in an org chart that can be up to 6 levels deep, count the number of reports in a user’s organization (both direct and indirect). The catch is that this model has been secured so that users can see employee data only for themselves and for their reports; not for their peers or superiors.
Defining a parent/child hierarchy as Kasper explains and then creating a measure using COUNT([User Name]) does not work properly here. If I drop the hierarchy onto a pivot table when connected to the model as a worker bee (individual contributor in Microsoft parlance), then I see the Count of Employees as 1 for every employee in the organization, including the bosses, because all of the other rows in the Employees table have been filtered out and cannot be counted. This is a side effect of the fact that visual totals always apply. Unlike in multidimensional models, these visual totals cannot be turned off. This is why the white paper from Microsoft recommends that you do not use hierarchies in pivot tables – DAX does not natively handle hierarchies as a first class citizen.
However, it is possible to work around this limitation in DAX. Greg Galloway helped me to devise this solution.
First step – open the Organizational Security example from http://securitytabularbi.codeplex.com/ and apply row security as described in the white paper.
Next – add some calculated columns. We will insert one calculated column per level of the hierarchy. This calc column with count all of the employees at that level of the hierarchy. Calculated columns work even when row security is applied, because the values in the calc columns are evaluated when the model is processed, which occurs before security applies. Therefore, the calc columns are static and the values do not change even when dynamic security is applied at runtime.
Create a calc column called L1HeadCount. Set the formula to this:
=COUNTROWS( FILTER( ALL(Employees), PATHCONTAINS([OrgHierarchyPath], EARLIER([Level1])) ) )
So what does this do?
Recall that calc columns are evaluated in a row context – that is, a row in the table is selected, and the value is computed for that particular row. Let’s take an example row:
This formula uses COUNTROWS() to count the number of rows in a custom table constructed using the FILTER() function. This custom table contains all rows in the Employees table (this is what the ALL() function does) that match a certain criteria. The criteria is specified in the PATHCONTAINS() function and the PATHCONTAINS() function is used to check the employee hierarchy for every single row in the Employee table for a particular manager. The second parameter to the PATHCONTAINS() function, EARLIER([Level1]), has the value for the Level1 manager (aka the big boss) for the selected user. So when ascott0 is the selected row, domain\andreas0 is the Level1 manager. This value, domain\andreas0, is fixed – so even though PATHCONTAINS() is checked 87 times, the parameter stays the same. The FILTER function for the row ascott0 returns a table where all the employees have the manager domain\andreas0 in the management chain. Counting the rows in this table yields 86 rows, which is the correct headcount for that manager.
Go ahead and create 5 more calc columns along these lines – L2HeadCount through L6HeadCount. Each time, simply change the parameter passed to the EARLIER() function, using [Level2] through [Level6] as appropriate. These calculated columns work systematically through the org chart to find the number of reports, finally terminating at 1 for the leaf node.
So we have the calculated columns now, but we can’t use those in a pivot table. We need to create a measure for that to work. Hold on to your hats, here’s the measure you need:
Head Count:= if( ISFILTERED([Level6]), MAX([L6HeadCount]), if( ISFILTERED([Level5]), MAX([L5HeadCount]), If( ISFILTERED([Level4]), MAX([L4HeadCount]), if( ISFILTERED([Level3]), Max([L3HeadCount]), if( ISFILTERED([Level2]), MAX([L2HeadCount]), Max( [L1HeadCount]) ) ) ) ) )
The ISFILTERED() function checks to see if that level of the hierarchy is selected and, if so, returns the appropriate calculated column value. I am not going to try to explain this in detail. Alberto Ferrari wrote a post about clever hierarchy handling in DAX, this is pretty much the same methodology.
Now you have a measure that you can use when you drop the Employee hierarchy into a pivot table with row security applied. Because all of the calc columns were pre-calculated before row security was applied, the correct number of rows for each manager are returned. To verify, Analyze in Excel connecting as domain\ascott0, and you will see that this is true.
In part 2 of this post I will show you how to blank out those numbers instead of displaying them. It is not so important (or perhaps undesirable) to blank them out for headcount, but perhaps if you had salary information you would want to blank them out.
If you would like this experience for using hierarchies changed, please feel free to vote up the Connect item I filed on this - http://connect.microsoft.com/SQLServer/feedback/details/733766/measure-values-in-hierarchies-do-not-roll-up-effectively-when-row-security-is-applied-to-a-tabular-model
Microsoft published the second of two white papers I wrote on the tabular model. This one is called Securing the Tabular BI Semantic Model. You can download the samples for the whitepaper from http://securitytabularbi.codeplex.com/.
When I signed up to write the two white papers (this one and DirectQuery), I thought this one would be the easier of the two to write. After all, the basic concepts of the security model can be summarized in a blog post, and there’s even a tutorial on MSDN on how to implement dynamic security. Easy, right? Wrong!
Turns out, the examples were incredibly difficult to write. Even though there is already an example for dynamic security, the LOOKUPVALUE() function used in the example to implement dynamic security does not perform very well. The dynamic security examples in the white paper use different mechanisms to implement the dynamic row filters. I won’t spoil the surprise, download the white paper and see for yourself.
Also, using row security with hierarchies is really hard. It is so hard that I couldn’t even fit everything I learned about using row security with hierarchies into the white paper. Over the next little while I will publish some supplemental blog posts with my findings.
I talk a bit in there about security designs, when to use Kerberos, etc. Not included is how to configure Kerberos for use with Analysis Services. If you need to do that, check out blog posts for previous releases, such as this Kerberos configuration guide and this Kerberos configuration checklist. Also not included in the paper is a discussion of security and DirectQuery, go read the DirectQuery white paper if you need to learn about that.
Again, hope you like the paper. Send your feedback to me or Microsoft. Watch this space for leftovers spilling over from the white paper.
As I mentioned in my previous post on processing tabular models, it is a good idea to do a Process Defrag on your tables if you are doing incremental processing. I didn’t really explain why before. Now is the time.
First, think about how the xVelocity in-memory analytics engine (VertiPaq engine) processes and stores your data. There is a dictionary, stored in memory, that tells the xVelocity engine where to find your data. That dictionary is contains encoded entries, which were encoded using hash encoding (for strings) or value encoding. The more data that the engine can see at encoding time, the better the compression and the smaller the dictionary. If you do a lot of incremental processing, encoding can be done inefficiently (since engine can’t see all data at once and choose a smarter encoding method) and the dictionary can get a lot bigger for no reason. Also, when you clear data from your tables, the corresponding dictionary entries are not removed. Over time, cruft accumulates in the dictionary, thus taking up precious memory.
There are a few ways to figure out how much room your tables are using:
- Use Kasper’s memory tracking workbook.
- Use a DMV query. The most useful one in this scenario is DISCOVER_STORAGE_TABLE_COLUMN_SEGMENTS.
- Look on disk. For example, on a 64-bit OS with default Data directory location for an instance called .\Tabular, I can look for my database in C:\Program Files\Microsoft SQL Server\MSAS11.TABULAR\OLAP\Data\ and navigate the directories to the table that you want to monitor. For example, if I wanted to monitor the FactInvoice table of my TabularProject20, I would navigate to C:\Program Files\Microsoft SQL Server\MSAS11.TABULAR\OLAP\Data\TabularProject20.0.db, right-click the FactInvoice<blah> folder, and then click Properties to see the file size.
There are two ways to get rid of cruft – either Process Full, which lets the engine see all the data at encoding time and thus do a good job of compressing it, or process defrag.
Let’s see the effect of a Process Defrag. Here is a table that I have been processing. I did some Process Add, Process Data, Process Clear, delete partitions, merge partitions, etc – probably 10 incremental processing operations in all. Here’s my file size for the dimension – 118 MB for two partitions on a table with 1.7 million rows.
After a Process Defrag, look at the results:
The table is down to 55 MB, under half its original size! Defrag is worth its weight in gold. Just make sure you have enough RAM to complete the defrag operation – because you are dealing with uncompressed data, you will need at least 2x the size of the table you are defragmenting available in RAM or else the defrag will fail with an out of memory error. For this example, with our 118 MB table we should have 236 MB free RAM before we try the defrag.
Of course no processing example would be complete without some automation examples. Here is the AMO code for a Process Defrag:
//connect to Analysis Services Server s = new Server(); s.Connect(".\\TABULAR"); //process defrag the Demographic Facts dimension Database db = s.Databases.FindByName("hans_rosling_project"); Dimension dim = db.Dimensions.FindByName("Demographic Facts"); dim.Process(ProcessType.ProcessDefrag); s.Disconnect();
Compared to some of our previous AMO samples, this is pretty clean. No Sandboxes, no GUIDs, just get the table, defrag it, and get out. Nice.
You can do a ProcessDefrag from PowerShell using the Invoke-ProcessDimension cmdlet. You cannot do a ProcessDefrag from the Analysis Services Processing Task in SSIS, as the option is not exposed in the UI. You can, however, use the Analysis Services DDL task to do the defrag. Here’s some example XMLA:
<Process xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"> <Type>ProcessDefrag</Type> TabularProject20 FactInvoice_c0bedb71-3d9d-4163-9b05-a53a128c64d2 </Process>
Moral of the story – it is really important that a good defrag is part of your processing routine. Make sure you do it before it’s too late, and you run out of RAM to clean up after yourself.
Say you have already installed an Analysis Services instance of SQL Server 2012. Maybe you made a mistake in setup and accidentally set it up as a multidimensional instance. What do you do?
This is unsupported but it works. You can change the DeploymentMode property in the msmdsrv.ini file to change the mode of the server instance.
Note that you can only run an Analysis Services instance in a single mode at a time. You cannot host multidimensional and tabular databases on the same instance. If you must have both types of databases, you need two instances.
To change the DeploymentMode property:
- Backup any multidimensional databases on your server and either detach them or delete them. You will not be able to load them on the tabular instance.
- Copy the msmdsrv.ini file to your desktop. For my instance (which I called TABULAR, I installed it like that from setup), I found the config file in C:\Program Files\Microsoft SQL Server\MSAS11.TABULAR\OLAP\Config.
- Open the config file in Notepad. Change the DeploymentMode property from 0 (multidimensional) to 2 (tabular), as pictured. Save and close the file.
- Copy the msmdsrv.ini file back to the OLAP\Config directory.
- From services.msc, restart the Analysis Services instance.
You can verify that the server mode has changed from SSMS. When you connect to the server instance, you can see that the icon for the server instance changed from the yellow cube icon to the blue tabular icon.
And if you are wondering why it is DeploymentMode=2 for tabular and not 1, DeploymentMode=1 is for PowerPivot for SharePoint instances. The difference between these two deployment modes is subtle. PowerPivot for SharePoint only supports impersonating the current user for processing, which makes sense since refresh is often an interactive process from the browser. Tabular supports only impersonating a specific Windows user, impersonating the service account, or inheriting permissions from the database when processing data into the model (see my previous post on impersonation for details). This makes sense because processing is mostly an unattended process for server instances. Also, based on the deployment mode, features in SSMS are blocked because some activities do not make sense on a PowerPivot for SharePoint instance (eg Restore, Restore from PowerPivot).
Because there are three different deployment modes in all, if you need multidimensional, PowerPivot for SharePoint, and tabular, you must have three different instances of Analysis Services.
Tabular models have cube names. This is unintuitive – we’re supposed to be done with cubes, right? Well as we have seen in previous AMO and SSIS posts, under the covers tabular models are still composed of cubes, data sources, data source views, dimensions, and measure groups. The cube concept has not gone away.
The cube name of the tabular model gets exposed to end users in a few places. One is in the “Cube Browser” in SQL Server Management Studio. (I put “Cube Browser” in air quotes as the MDX query designer is not a great component for browsing cubes. Feel free to vote up and add comments to an existing bug tracking the “Cube Browser” on Connect. I digress).
Another place is in Excel when connecting to a model:
All cubes are named model by default. You can change the default cube name as follows:
- Open a tabular project in SSDT.
- Right-click the project in the Solution Explorer and then click Properties.
- Change the Cube Name property to the new name.
PS – in tabular models, it is one cube per model only. The engine enforces this restriction.
Today Microsoft published a white paper that I wrote on Using DirectQuery in the Tabular BI Semantic Model. The paper is 46 pages long, and it covers everything from model development to partitioning strategies to security to configuring SharePoint environments for DirectQuery.
I learned a lot writing this white paper. The most interesting part (to me at least) are the sections of the white paper where you learn how to replace time intelligence functions in DirectQuery models. I’m sure that you DAX experts out there will come up with even cleverer solutions over time to work around the formula limitations for DirectQuery, but hopefully the formulas I wrote are a good starting point. You will get the most out of these sections if you are already familiar with DAX.
I hope you enjoy it, please send any feedback on it to either me or Microsoft.