Microsoft Excel has a powerful and relatively new tool to add to the arsenal of data analysis tools that we are already accustomed to using, this is the Excel Data Model. Working together with Power Query and Power Pivot, the Data Model enables Excel users to work with much larger data sets than we could previously. Here we describe the high-level workflow of obtaining data from various sources, importing into the data model, and through to analysis and visualization.
Most users of Microsoft’s spreadsheet program, Excel, are comfortable with organizing information into columns and performing calculations with formulas, and using the wide range of built-in functions provided in Excel. A worksheet can include up to 1,048,576 rows and 16,384 columns, which has been sufficient space for many data analysis tasks.
Data sets have grown larger, and Microsoft has added a new a very powerful mechanism to Excel to enable analysis of even larger data sets, this is called the Data Model. The Data Model allows us to bring in multiple tables of data from a variety of sources, create relationships between those tables and then perform analyses on those related tables using Pivot Tables and Pivot Charts. The Data Model does an excellent job of compressing data so we no longer have a limit of a million rows, in fact, we can load tables with many millions of rows, limited only by the resources (memory and storage) available on our computers.
Fig. 1 shows a schematic of data analysis using the Data Model.
We connect to data located in various sources using Power Query. This is located on the Data tab in the latest Excel, as the two groups: Get & Transform Data and Queries & Connections as shown in Fig. 2.
The full range of connectors is available through the Get Data command button, which allows us to connect to data contained in local files, remote databases, or through web services. Power Query provides tools for “shaping” the data before bringing it into Excel – this includes removing columns that aren’t necessary for our analyses, changing data types and combining tables. Once the data have been shaped to our needs, they are added to the Excel Data model.
The Data Model holds the collection of tables that we import from all the different sources. The Data Model becomes useful by creating relationships between the tables using Power Pivot. We can also create calculated columns, hierarchies, measures and KPIs (key performance indicators) with Power Pivot, which are all part of the Data Model. All of this sounds like we are building a relational database inside an Excel workbook, and that is what we are doing. The Data Model database engine is SQL Server Analysis Services (SSAS), which is referred to as VertiPaq in quite a bit of online documentation. Each Excel workbook can have one Data Model. Analysis and visualization of the data is commonly done with Pivot Tables and Pivot Charts – tools that many Excel users are very familiar with already.
Excel’s Data Model allows us to import volumes of data limited only by the resources available on our computers. In order to process that data as efficiently as possible we can allow Excel to use multiple processors at the same time. All modern computers have more than a single processor. To check that this ability is enabled on our computer, go to the File tab and select “Options”. Click on “Advanced” and scroll down to the “Formulas” section to see the number of cores that Excel is allowed to use when doing calculations on the machine.
Fig. 3 shows the setting for a computer that is able to use 16 cores. We can set the number that Excel can use to any value from 1 to 16 in this case, so this can be adjusted for optimum performance of the computer. The number of threads available on your computer may be different – common values will be 4 and 8 cores for modern computers.
Excel’s Data Model allows us to work with data in a way that almost seems to sidestep our traditional use of some of Excel’s functions. In particular, the ability to create relationships between tables effectively removes the need for lookup functions in many cases. This is a very high level look at data flow through Excel using the Data Model. We will look at details of data import with Power Query and data modeling with Power Pivot in future articles.