You are on page 1of 12

Developing and Managing a BI Semantic Model John C.

Hancock, Lead Program Manager, SQL Server

JOHN C. HANCOCK: Hi, this session is Developing and Managing a BI Semantic Model. My name is John Hancock, I'm a lead program manager in the SQL Server Analysis Services Team. So, before viewing the session, you might want to take a look at the session entitled "What's New In Analysis Services and PowerPivot," because that will give you a lot of foundations of what the BI Semantic Model is about, and how the technology works. For this session, we're going to be focusing on the steps to actually develop and then manage a BI semantic model.

Starting with some context to set the stage, the BI semantic model is one BI model for all end user experiences. And what that means is that once you've developed a BI semantic model, you can deploy that and then use that in any client tool for BI, including Excel, and PowerView, and scorecarding, and custom applications.

The BI semantic model consists of a data model, a business logic and queries that can get executed against it, and data access to the sources that are behind the model.

In SQL Server 2012, with the BI semantic model, whether you're working in PowerPivot for Excel and doing personal BI applications, or sharing those with your team in SharePoint, or building full-sized, and full-scale corporate BI applications using Analysis Services, it's the same model that underpins all of those scenarios. For today's session, I'll be focusing on building the model using the organizational BI tools, or the professional tools, but the same kinds of modeling can be done using PowerPivot for Excel.

So, I've briefly mentioned there that we have two possibilities for developing models. This, I would just like to talk a little about how you choose which one is right for your needs. So, for information workers, who are comfortable with Excel as a technology, you have rapid response to business problems, and very used to the Excel paradigm, they should be using PowerPivot for

Excel to build their models. It's a much more familiar environment, and when they want to share the model, they can just publish that up to SharePoint. For BI developers, where they have a project and they need to build a full BI solution, they'll typically be using the BI development studio, which is being renamed to be SQL Server Data Tools in SQL Server 2012. It will feel like Visual Studio. You have source code. You have TFS integration. And the experience is much more optimized for a BI developer. So, as I said, I will be focusing on the right-hand side for today's session.

So, let me spend the rest of today's session just walking you through the process of building these models, and switch over to a demo. So, the first thing I'll do is show you how someone can actually use a model that has already been built by a professional model builder. So, I'll start out as an analyst in Excel. And I'm going to go and connect Excel to an existing Analysis Services model, I have to say from other sources, from Analysis Services, specify the server where this model has been deployed, and then choose the model. So, for this example, I'm an analyst in a small police service, and the model that I'm going to be using has a lot of information in it on things like training courses that members of my police service have undertaken. So, once I've connected in Excel, I can now do ad hoc analysis. And on the right-hand side, you can see a very rich model that has things like KPIs, and measures. It has lots of different dimensions of the data to look at, like training courses and dates. And, pretty easily in Excel, I can just go ahead and pick things like the average score, and then the position of the employee who has taken the course. And then things like the employee's full name. So, it has a very interactive experience. As you can see, it's very fast, and it's all powered by this BI semantic model that I've built. So, that was an example of using this in Excel. Let me show you the same experience using PowerView. Here I'm in SharePoint. I have a reporting services data connection to that same

model that I was using in Excel. And now I can go ahead and create a PowerView report to do ad hoc reporting and data visualization on top of that model. As you can see, the model is the same. It's presented differently to PowerView users in the sense that it shows a list of tables rather than measures and dimensions as it does in Excel, but it's powered by the same model under the covers. So, I can do the same kind of analysis in here. I can do position, and employees. I can do ad hoc reporting right there in the browser using the PowerView tools. Because it's PowerView, of course, I can build much more interesting and sophisticated data visualizations. So, for example, I can go start building interactively, show the max scores, and then start showing the column charts, and really have a much more interactive and fun way of working with the model to build some great looking data visualizations. Okay, so that's showing you how end users can actually connect to the model in Excel, in PowerView, and in other tools, and use that model. So, let's switch gears now and show you how we go and build the model that I've just demonstrated from scratch. So, let's go ahead and open up SQL Server Data Tools, and here in SQL Server Data Tools, I can go and create a new project. And under the business intelligence folder, there is an analysis services section. And we can take a quick look at the choices here for building models. I can choose to build an analysis services model either as a multidimensional model, or in SQL Server 2012 as a tabular model, which is what we'll be focusing on today. Another important one to call out is that if I already have a PowerPivot model that an information worker has built, I can import that into Visual Studio, into SQL Server Data Tools, and use that as the starting point for a project. In this case, I'm just going to go ahead and do the start from scratch with a tabular project. Now, the first time that you use SQL Server Data Tools after you've installed it on a machine, it's going to prompt you for an Analysis Services instance to use for modeling against. So, what this means is that in order to build a model, you actually have to have an analysis services instance running in tabular mode in this case in order to actually even build a model to start with. Many people will install a local instance of analysis services on the same machine as their dev machine for this purpose, but you can also have a shared server that multiple developers use with an instance installed on it. And, once I specify this, it goes into the project's etchings, and then from then on out I don't need to respecify. So, here I have created a new BI semantic model project, and it's in Visual Studio Shell, so it supports solutions with multiple projects. What I'm actually working with is source code. So, if I go and edit this as code, you can see the XMLA source code that is behind the model. And, therefore, I can use things like TFS and source control systems to manage this. Let's just go back to the designer. So, the first thing I'm going to do to start off this model is actually connect to my data warehouse, and bring in some of the tables that I need. Let's say import from data source. One very important feature of the BI semantic model is that it allows you to connect to many, many different data sources, classic relational sources, as well as more interesting sources like Azure, data market connections, Odata, reporting services reports. I can

connect to Excel files and text files. So, it really allows you to bring together data from many different sources, and put them into one model. In this case, I'll just choose SQL Server, and I want to connect to my local machine, and pick the data warehouse database that I have for this police service. So, I'm just going to specify my credentials to connect. And that now gives me a list of tables that are in this data warehouse. I could go and write queries if I would like to have more control. But in this case, I'm going to go and pick from the tables and views that have already been defined in this data warehouse. So, there are a number of different subject areas. I'm going to start with the first one which is training attendance, which is the model that I showed you from Excel and PowerView. So, let's choose select training attendance, and I know that connected to that fact table there are several dimension tables. So, I will select related tables, and go and pull those in as well. I happen to know that position is another table I'm interested in, so I can add that, too. Okay, so as part of the import process, I can also, if I'd like, rename the tables that will be created in the model. And another important thing that I'll probably do during this import process is filter some of the data. So, my fact table in this case may be very large, it may be billions of rows in practice. And while I'm modeling, I don't necessarily want to use that whole dataset to model against. So, as I'm importing it, I'll just go and filter it so that I'm only bringing in, for example, data from a certain date range. So, in my case, it's where the date key is less than 1,000, and now when I bring in the data to model against, that filter will be applied, and I'll only load data from the filter that I've specified. Okay, let me go and hit finish and import that data. Okay, so that's completed. All right. So, let's take a look at what it has actually done. So, the tabular designer allows you to work against the actual data in your model, so it's a very interactive designer that lets you see the data, make changes on the data, and see the results straight away. So, I can look at the list of tables here, and I can start to do things like filter and sort the data right here in a designer, so that I can actually understand the data that's being brought in, and the effects as I change the model. I can also work against the model in a diagram mode, which is very useful for quite a lot of scenarios, such as, for example, figuring out the relationships between tables. So, this diagram I can zoom in and out, and start to work with larger models. In this case, I can notice straight away that the training attendance and the employee tables have a relationship between them, and that was imported when I brought in the tables from the data warehouse, because of referential integrity rules. When I look at the employee and position, I can see there aren't actually -- there's no relationship to find, but I want to create one. So, I'm going to go ahead and drag position key from the position table over to employee, and now I've defined the relationship between those two tables. So, the diagram view is very handy for working with things where you need to think about the shape of your model rather than the data. Whether I'm working in diagram, or working in the

data view, I still have access to the property grid. So, for example, if I wanted to do things like rename the employee table to be "Employee" rather than "DimEmployee," I can use the property grid when I'm in diagram model. And when I switch to data mode, the same property grid is available. So, I can really choose where are the right places to work with the properties as I'm defining the model. Okay. So, let's take a look at what has actually been brought in, and how we would like to extend this model. So, the first thing is I see that the employee table has a lot of great information in it, but it has first name and last name as two separate columns, and what I really want in my model is a single column that has the employee's full name in it. So, as part of the BI semantic model, I can create calculated columns that let me use the data analysis expressions, or DAX language, to add additional columns to the model to supply more sophisticated calculations. So, in my column I'm going to go ahead and say, equals, and specify a DAX expression, and then choose first name, in this case it's a very simple expression it's just concatenating the first name with a space and then a last name. Just hit enter. As you can see, I'm working against the data here, so as soon as I've made that calculation, I can figure out what the results are without having to define a calculation, and then deploy a model, and then go and test it somewhere else. As you'll notice, it's been named as Calculated Column 1. I can name it something else, like FullName. And there you can see how you can go ahead and add columns to the data that's been imported to build a more sophisticated model. There is a wide range of DAX expressions that you can use here. It's a very full-fledged formula language, and there's a lot of material available with the SQL Server 2012 launch on the DAX expression language. Okay. So, let's take a look at some of the other operations I can do in the data grid. I'll switch to the date table. And I can see, for example, the date business key column is showing the time format as well, and really what I want to do is just make that short date. So, I can change the format of these columns. So, this is the kind of process you'll go through as you're building a BI semantic model. You'll take the data that has come in from the source system. You'll rename some things, you'll add some things to mock it up. You'll change formats and names, and things like that in order to build a model that really is a great experience for the end users. There I've gone and formatted the date column. You can also do other things like, for example, the month column, it's quite handy for calculations, but I don't really want to show that directly to end users. So, I can go and hide that from client tools. So, I can have columns in my model that I use for calculations and relationships, but they aren't necessarily shown to end users. Okay, so the model is starting to look pretty interesting. Next, let's start looking at some of the business logic that I would like to add to this. So, I have a training attendance table, and it shows me for every employee that has been on a training course what the result was, and what the score was. So, individual scores are interesting, but what I'm really interested in is to look at things like average scores. So, what I'm going to do is define what we call a measure. A measure allows you to do complex calculations across data rather than being a single column that just uses the current row context.

So, what I will do is go into the measure grid, and just start typing, equals to start a DAX expression, and then say max of the score column. Actually, in this case, I really don't want the max. Let me just look for average, rather. I can do average of score, and hit enter. What it's done there is, it's actually created a measure in this model, and it's showing me the results for that measure of all of the data that's currently shown in the grid. If I go and filter the view here, so that I only show, for example, a specific training course, it will update the measure so that it's only showing the filtered values rather than the whole set. You'll notice that it named it Measure1, because I didn't specify a name. So, I can either edit it in the expression itself, or use the measure grid to go and change that to something like average score. Okay, so averages are interesting, but there are other kinds of business logic I would like to add here, too. So, for example, I want to add the max score, max of score. And I'm probably going to need the min score for some scenarios as well. There's a shortcut to some of the more simple calculations that I've shown called auto sum, where I can just go and choose and say, min, and it will automatically write the formula for me. So, I can just change the title here. Okay, the measure grid that I'm working against as well is free-form. So, I can choose to arrange the measures in it in whatever shape I would like. So, let's say I have a new section here, which is I want to define a bunch of counts. I can put in a text or a caption in the grid, and then just start adding additional measures here like count, or in this case distinct count, of my employees. So, it really lets you have a very free-format approach to defining your measures. This is going to be EmployeeCount. Great. So, I've defined some of these measures. I've got a very interesting model. So, at this point, I would really like to start testing it out, and see what the effects are in client tools. What we've done is, we've actually supplied a button in the designer itself that lets you launch Excel, and connects to the model that I'm building. Now, what's actually happening here is that because the designer that I'm using here is working against data behind the scenes there is actually an Analysis Services database created called a workspace database, which is driving this model. So, as I'm using this, and as I'm defining these measures, that workspace database is going to be updated and kept in sync with the designer. So, if I go over to SQL Server Management Studio and show the list here, there you can see the project name, there's the workspace database, and if I have a look, there are all the tables. And so, as I'm working with this model, that workspace database is kept up to date, and I can use that in other tools. I'll go back to Visual Studio and just launch Excel here, it lets me open up an Excel instance against the same model. Now, I can keep both open. I can run in Excel, and in Visual Studio at the same time. So, there's the model that I'm building. So, let's go and try it out. We had that average score. Let's go and put position in as well. One thing straight away that jumps out is that it would be really nice if the average score was formatted with two decimals. I could change that in Excel, but then all the users would have to

do that. So, let me switch back to SQL Server Data Tools and change it directly in the designer. So, let me choose decimal number here, and without any further changes, I can just switch back to Excel and refresh, and as you can see it's working against the live model. So, it's a very easy way to interactively change the model as I'm using it, and testing it in Excel. The next thing I notice is that average score is not a very interesting display at the moment. It's just showing the number of average score for each of these different positions. So, what I would rather have is something more like a key performance indicator that showed how the different groups were doing against their targets. So, I'll go back to the model and say, take that average score measure and create a key performance indicator from it. So, what I can do is either choose a target value, like another measure, and say I wanted to see the average as a percentage of the maximum; or I can define an absolute value, like in my case I'll just choose 100, and see how the score is doing against that value. I have a way of defining the thresholds here where I can say, okay, up to below 50 is red, between 50 and 80 is yellow, and above 80 is green. And I can choose a range of icon styles for when people are in a different position. Let me just choose that. And now that measure has turned into a key performance indicator. And if I switch over to Excel and refresh, now I can see a KPI in my Excel client. And so now, as well as just choosing the value, I can also choose the status here, which will show me an indicator of red, green or yellow depending on the score. And this KPI, just like all measures, works at all levels of granularity. So, I can pick the employee table, and choose that full name column that I added earlier, and now I can see how the average scores are doing across all those people. Okay, so it's pretty interesting with what I have already. And because I'm in Excel, I can do a lot of other things like, for example, add slices. So, let me go ahead and pick training categories, and the training results, and put slices into this as well. And so, no matter how I slice this data, pick a particular category or whatever, the key performance indicators and measures are showing the results of those calculations dependent on the slices and the rows and columns that have been defined. Okay. So, the model is looking pretty good at the moment. I'm pretty happy with how that's turning out. And I think users are going to get a lot of value out of it. But, I'm going to need to define some of the sort of more technical details of the model in terms of how it's stored, and how it's secured. So, the first thing I will do is look at partitions. A table as it comes in is brought in as one large table, and if I refresh it, I refresh all the data in that table. For some of the larger fact tables, or larger transaction tables that I have, instead of having one large table, I would like to split that up in order to manage the process of refreshing that data. So, I'm going to go ahead and choose partitions. And, as I can see here, the training attendance table was brought in with a single partition. It's a very small dataset at this point, it's only 862 rows. I filtered it on the way in when I imported it so that the first partition actually is filtered to say only numbers that are less than 1,000. Let me switch to the SQL view to show that a bit more clearly. So, there you can see the query that is actually driving this.

Now, what I can do is create multiple partitions so that I can refresh them separately. I'll just copy the existing one, and say, okay for this partition I'm going to show all the other data where it's greater than/equal to. So, the partitioning schemes that people can use here, you can partition across dates, or times, or other categories, or geographical regions, whatever makes sense for your business process. So, now that I've defined those two partitions, let me hit okay, and actually process them. As I've created the partitions, they have no data in them yet, so I can say process partitions. And since first partition was already processed when I imported it, I can just process the new one I've added, and click on okay, which will prompt me for my credentials. And now I have the other 138 rows, so now I can see that my total set that I'm working against in this table is now looking at both the rows that are less than 1,000 and greater than/equal to 1,000, because the table has the union of all the rows that are in the different partitions. So, these partitions will really let me manage it much more easily. The next thing I would like to do is define security. So, I'm going to go ahead and build security roles into my model. So, the first role that I would like is for administrators, let's just call this Admin. And the permissions that people in this role have are to administer the database. As I'm defining this, I could go and define members of these roles right now, but typically what I do is build the roles as part of the model, and then deploy it, and then in Management Studio I would go ahead and add members to that. So, that's the admin. The other thing I'm going to do is put in users and say, okay, users don't have administrator rights, but they do have rights to read the data. Now, one interesting thing about this database is that in the model I have data for multiple levels in the organization, going up to the chief of police, and what I would like to do is actually do row level security so that I can actually define a role that filters the data so that users who are managers can see all of the training results, but users that are below that level can only see data, for example, up until position level 4 here, and above. Let me go back to my security dialog and make that change. So, I'll say, Users, I will define a DAX filter that filters the position table to say, okay, the DIM position sequence key has to be greater than 3 for this user to be able to see it. So, this user won't be able to see chief of police or deputy of police, they will be able to see everyone below that. And I'll also define a manager's role that can actually read everything, that has no filter defined. Okay, so there's my security scheme for this model. I have administrators, I have users who can read everything for employees above a certain level, and I have managers who can read everything. The first thing I'm going to want to do here is actually test that out. So, when I go and launch Excel, it will prompt me and let me specify either a specific user or a role that I've defined in order to test it out. So, let me see what the effect is for users. Now, we'll launch Excel as the user role that I defined there. So, let me pick the average score, or the employee count, and show that the positions have filtered. So now I can see, if I showed

the sequence here, I can only see anything with sequence above 3. And because the row level security is defined, it's also filtered out all the related rows that go along with that. So, I haven't just filtered the position table, I've also filtered the employee table, which is connected to the position table. So, the row level security that we have will actually filter out all the data that the user is not allowed to see beyond the current table as well into related tables. Okay, so my model now has a pretty interesting model. It has the security defined. It has partitions set up to administer it. The last thing I want to do before I deploy is look at some of the properties that are specifically designed to improve the PowerView experience that users have when connecting to the model. If I go to the employee table, I can see here the columns I've defined, and the properties that are being specified. There is a section in the table properties that lets you define reporting properties that affect the PowerView experience, and other client tools that will use this. So, the first thing is I can say, what where does the default field sit? And what this means is that if an end user double clicks the employee table in PowerView, what is the set of columns that will be displayed by default? In this case, I'm going to want it to be the full name as well as the phone number, and the e-mail address of the user rather than all the other ones. Okay, so once I bolt that -- add that property to the table that will control the experience in PowerView. If I needed more sophisticated control over the PowerView experience, I can use the table behavior dialog. What this does is, it allows me to define things like a primary key or row identifier for the table, which in this case we'll use the employee key. And it lets me specify behavior for columns that you would like to keep unique. As an example of this, if I have two employees with the same full name, like John Hancock, if I drag the full name column onto the canvas in PowerView, it should always keep those two rows separate. It should never collapse two employees with the same name into one row. And by defining this property in the model that will actually control that behavior. I can also define which column in this table I would like to be the default label, and this affects experiences such as the card view in PowerView, where you, by default, will show the employee's full name on top of the card. If I had images defined in this model, I could do the same thing for imagine which would control the image that's displayed on the card by default. Okay, so my model is exactly how I would like it now. The next step I need to do is deploy that to a server. So, I'll look at the project properties, and by default it's putting my deployment server as local host, which is the same one I'm using for my workspace database. I'm going to name the database here Academy Demo. I can change the name of the cube, if I'd like, I'm just going to leave that at model for the moment. And I can also choose some of the options for this model by changing the processing option to do not process. What I would like here is that when I deploy this model, I only deploy the metadata, and don't actually physically process the data until later. So, I'll hit on okay, and then go to debug, build and deploy the solution. So, what it's doing now is, it's connecting to that Analysis Services Server instance running in tabular mode, and deploying the model that I've built here in SQL Server Data Tools. Okay. So,

switching over now to the administration experience, what I have here is SQL Server Management Studio connected to that instance. And I'll just refresh to show the latest one that's been deployed. There I can see the workspace database that was loaded, and I can also see the, since I deployed it to the same server, I can see the final model that I've deployed for end users to connect to. Because I chose not to process this, there's no data loaded in the model yet. So, the first thing I'll do in Management Studio is going ahead and process the database to load the data into it. So, that's successful, and now I'll just browse the database just to check that it's all working. So, the cube browser, or model browser, lets me from within Management Studio just go and test out to see whether the data that I'm expecting has been loaded correctly into the model, so let me just pull in some of the dimensions and measures in here. And, as you can see, this is another example of the BI Semantic Model in action. The cube or model browser is showing very a multidimensional view of the model in terms of breaking it out into measures, and dimensions, and what-have-you. Whereas, if I connected PowerView to the same model, it would actually show tables like I originally defined in the tabular designer. Great. So, there are a couple of things I can do in Management Studio. I can also look and define partitions. So, even after I've deployed the model to the server, I can go and define the partitions, and modify them from Management Studio. I can also use processing so that I can define processing, and have a lot more control on partitions. I can use scripting. I can do all of the features that you'd expect in Management Studio for managing the partitions, and the processing of them. And, finally, the last thing I can do in Management Studio is, I can define further roles, or manage the ones that are in there already. So, let me go and choose the users role that I defined already, and look at the properties, and here I can see the role name, I can see that it's defined as a read role, as I did in SQL Server Data Tools. And now, this is a point when I would actually go and define the members of this role. And, in this case, I'm going to use an individual user, but I could, of course, here also use groups, so that I could define an authorization group, and then just add that to the role. Okay. So, there you have it. That's the experience for managing the model that I've deployed using the SQL Server Management Tools. Okay, so that ends today's session. Thank you for attending.

END

You might also like