Excel’s pivot table is a tool that allows us to summarize and explore data held in one worksheet or Excel table, and display the results of the summarization in a second table. In this post we consider the basic workflow of creating pivot table reports by dragging fields to the appropriate sections of the PivotTable Fields interface, which we explain below.
The most confusing part of pivot tables when starting to work with them is the summarization interface and ability to move fields to rows or columns. The categorical (or qualitative) data are what we use as row and column labels. These fields provide a finite set of groups (categories) into which the data are summarized. Categories are also referred to as dimensions while numerical data are referred to as measures or facts. Note that measures are numerical data that make sense for doing calculations of some sort. Say we have the dimensions, Male and Female. We could assign numerical values, 1 and 2 to these instead of the names. These values remain dimensions because it doesn’t make sense to add these particular values – they remain useful as labels for grouping data. We could summarize the number of Males and Females by counting them, and those summaries are measures. We will see examples below of how dimensions and measures fit together in a pivot table report.
Example data used in Excel articles are often related to business – this is perfectly reasonable since that is where Excel is most used. Excel is also very useful for other data sets and there is a great deal of data freely available on the Internet. We will be using a data set about nuclear ballistic missiles extracted from publications by the Federation of American Scientists, who keep a record of the Status of World Nuclear Forces. The footnotes of the linked document provide details about nuclear capabilities for nuclear capable states. We have extracted data for nuclear ballistic missiles, both land based (LBBM) and submarine launched (SLBM). This ignores a variety of other nuclear weapons including bombs, cruise missiles and any other existing configurations. A ballistic missile is basically a rocket that fires until its propellant is used up, and then the payload flies freely until it arrives at the target. Ballistic missiles make up a significant fraction of the world’s nuclear arsenal and the bulk of its explosive power. Remember though, that this article is not attempting to provide a detailed analysis of nuclear weapons. The point here is to examine the basics of pivot tables in Microsoft Excel with some publicly available data.
Data were extracted for the following countries: Russia, USA, China, France, United Kingdom, India, and Pakistan. No entries are included when the number of launchers weren’t verified or at least estimated. Israel and North Korea don’t have verified ballistic missiles in the publications referenced. Our data set accounts for 5751 warheads out of the total estimated number of roughly 13,000 nuclear explosives in the world.
The current data set can be downloaded here. The data are formatted as an Excel table with the table name “BallisticMissiles”. To see the name of the table, click on a cell in the table so the Table Design contextual tab appears. The table name is in the Properties group as shown in Fig. 1.
To insert a pivot table for the current data set, click on a cell in the table, or select the entire table, and go the Insert tab -> Tables group. Click the PivotTable down arrow and select From Table/Range as shown in Fig. 2.
The PivotTable dialog box appears and allows you to choose the source data from either a named table, which is filled in automatically in our case, or from a range. We will also place the new pivot table on a New Worksheet so ensure that radio button is selected as in Fig. 3.
Do not check the box to add the data to the Data Model – we won’t be using any particular features of the Excel data model since all of our data fits into a small worksheet, and we only have a single table. Click the OK button and our new pivot table will open on a new worksheet as shown in Fig. 4.
The pivot table in Excel has two distinct areas. The area labelled 1 in Fig. 4, on the left side of the worksheet, is the report area where the summarized report will appear. The contents of this area will change as we move fields around in the PivotTable fields area on the right side of the worksheet. Area 2 contains all the fields from our Excel table. These fields can be dragged into the 4 sections of area 3.
The full list of fields in the current data set can be seen in Fig. 5. Some of these fields can act as dimensions, so they provide group names for the values that will be summarized. The other fields are measures – the numerical data that will be summarized in the pivot table report. When we begin exploring a data set, it is useful to take a look at the source data to understand the contents of each of the fields.
We can drag any of these fields to one of the areas in section 3 of Fig. 4. To construct a report in section 1, we drag fields from the fields list to the Rows area. As an example, the values from the Country field become labels for rows A4 to A10 in the report area. The 7 possible values we have for country names are: China, France, India, Pakistan, Russia, United Kingdom and USA. This field is an example of a “dimension”, which provides a finite set of groups (categories) into which the data are summarized. The names of the countries are the categories in this case. In order to see a summary of values for each country, we can drag the yield field to the Values area and the result is shown in Fig. 6.
The yields of all the warheads for each country are summed and presented as a total available yield for that country. The yields are numerical values, and we can do calculations with them – these types of fields are referred to as “measures”. For example, Pakistan has 6 types of nuclear warheads listed in the original Excel table. The summary yield for Pakistan’s weapons is then calculated as: 80+128+128+300+600+192 = 1,428 kilotons. The pivot table does this summary calculation for us automatically for each of the 7 countries saving us a great deal of time. We can change the type of summary calculation Excel does for a particular measure – options include Sum, Count Average, and Max among others. The calculation we need will depend on the specific measure we are looking at. In the case of Fig. 6, the pivot table has summarized the yield of the nuclear ballistic missiles owned by each country in our data set and provided us with a total. US’s total available yield is the largest, and is 571,272 kilotons. The other countries in our data set have been similarly summarized. The calculation applied to the measure will be part of the name shown in the Values area – in Fig. 6 we have “Sum of yield (kilotons)” so we know the yield values are being summed.
With the simple act of dragging and dropping fields into the Rows, Columns and Values areas of the PivotTable Fields pane, we can efficiently explore our data set and create reports to highlight the important features.
We’ve looked at the mechanics of creating a simple pivot table report. In order to extract meaningful information from the data set, we need to have a look at the columns in our data set to see what we have to work with. There are 11 columns, from A to K. We have 39 rows of data below the column headings with the following information.
- Column A: Country – one of 7 possibilities: China, France, India, Pakistan, Russia, United Kingdom, USA
- Column B: Missile type – Land based ballistic missile (LBBM) or submarine launched ballistic missile (SLBM)
- Column C: Designation – The name of the specific model of missile
- Column D: Year deployed – The year the missiles went into service
- Column E: Range (km) – The maximum distance the missile can travel from the launch point
- Column F: # of launchers – The number of loaded missile tubes with a ready missile
- Column G: # of warheads per missile – Some missiles carry a single warhead to a target; others have multiple warheads that can each be sent to a different target
- Column H: 1 warhead yield – The explosive yield in kilotons of a single warhead of this type
- Column I: # of warheads – The total number of warheads of this type, this is calculated as the number of launchers (missiles) times the # of warheads per missile; Column F times Column G
- Column J: yield (kilotons) – The total explosive yield of all the warheads of that type, this is calculated as the 1 warhead yield (kilotons) times the # of warheads; Column H times Column I
- Column K: # of Nagasaki bombs – The equivalent number of Nagasaki bombs for the same total yield as Column J. The yield of the bomb dropped on Nagasaki had a yield of about 20 kilotons; Column J divided by 20
Some easy summaries available to us with the current data set include:
- The number of warheads available to each country as shown in Fig. 7 – (Drag Country to the Rows area and # of warheads to the Values area.) You should see a “Sum of # of warheads” column in the report area of the pivot table
- For each type of missile (LBBM or SLBM): the number of launchers for each type of missile, the number of warheads and the total yield of each type of missile, is shown in Fig. 8 – (For this one drag 3 fields into the Values area, the # of launchers, # of warheads and the yield (kilotons)) – each of these becomes a column.
A count of the number of warheads available in the world is interesting but doesn’t give us a feel for the actual total destructive power available because the yield of warheads varies considerably. In order to get a sense of the total destructive power, we can convert the total yields to a single unit that we can perhaps have a bit of intuition about. The bomb that was dropped on Nagasaki, Japan on August 6, 1945, had a yield of about 20 kilotons so we will look at the yields of our current data set in terms of the number of Nagasaki-equivalent bombs available today. To do this we can find the yields of warheads in units of kilotons and divide that by 20 kilotons to get the number of Nagasaki equivalent warheads. Note that few people have a sense of the devastation that a 20 kiloton nuclear warhead can cause, but we need a way to try to understand the explosive power of warheads ranging from 8 kilotons to 4.5 megatons in the current data set. We can do this best by finding a single unit.
Our next sample pivot table looks at 3 measures summarized for the countries in our data set (see Fig. 9):
- The # of warheads
- Yield (kilotons), and
- # of Nagasaki bombs
Recall that the number of Nagasaki equivalent bombs is calculated by dividing the total yield in kilotons by the approximate yield of the nuclear weapon used at Nagasaki, Japan, which was 20 kilotons. Comparing these columns, this report allows us to see some interesting things including:
- Pakistan has the smallest total yield which is easy to see when we look at the # of Nagasaki equivalent bombs. The number of Nagasaki Equivalent bombs is smaller than the number of warheads that Pakistan has available so the average warhead yield for Pakistan is less than 20 kilotons. In fact, the average warhead yield for Pakistan is about 14 kilotons (calculate this by dividing the yield by the # of warheads), which is close to the yield of the weapon used at Hiroshima. This is significantly smaller than the yields of weapons held by most of the other countries.
- China does not have the largest total yield, but if we calculate the average yield per warhead, each of China’s warheads has a yield of over 400 kilotons on average – this is the largest in our data set.
You can likely find other interesting information in the report of Fig. 9.
Finally, we can dive even deeper and break the report of Fig. 9 down to show all of that data and also separate land based ballistic missiles from submarine launched ballistic missiles. We accomplish this by arranging our fields as shown in Fig. 10.
Note that Country and Missile type are both in the Rows area. The result is shown in Fig. 11.
Organizing a report with Excel’s pivot table feature is quick and easy once we are able to differentiate between dimensions and measures and can control the type of summarization applied to the measures. Pivot tables are a powerful tool to explore data for new insights. Of course, pivot table reports can be made even easier to understand by applying styles, conditional formatting, filters and/or slicers. Pivot charts provide a visual representation of the pivot table report. These will be the subjects of future articles.
If the reader wishes further practice, the following questions can be answered by constructing pivot tables with this data set:
- What are the total number of launchers and warheads in the data set? (Launchers: 1708, warheads: 5751) Note that a launcher is a single missile tube, so on average, a missile has 3.4 warheads on it.
- How many launchers of each type (land based or sea launched) does Russia have? (LBBM: 306, SLBM: 160)
- How many different designations of missiles does China have? (10 total, LBBM: 9, SLBM: 1) (Drag country to Rows and Designation to Columns. Designation is a text field so the default summarization should be Count instead of Sum)
- What is the total yield of warheads available to the US in kilotons? The world? (US: 571,272, World: 1,270,952)
- Which 2 countries have the smallest number of Nagasaki-equivalent bombs? (India: 108, Pakistan: 71)