What is a Data Structure?  Part 1
An Explanation Using Tables in Excel 2010
SC_excel_DataStructureP1_header


Whether you are just going to look at your data and draw conclusions visually, or you are going to writing a computer program to manipulate it, even a very simple structure can help.

When analyzing data, it is useful to decide on the data that needs to be processed, and then organize that data into a data structure. A well designed data structure can make the operations that need to be done on the data very efficient.

Microsoft Excel has a very simple structure, without many checks on what goes into the cells. An empty spreadsheet is just a bunch of cells in a grid.
SC_excel_DataStructureP1_img01

We can put just about anything into any cell and Excel won’t complain about it.
SC_excel_DataStructureP1_img02

In order to do analysis on data, we need to have the same type of data in either the rows or columns. The data that we want to do operations on should be grouped together.

Here we have assumed that the data in each row is related and just rearranged the data in the columns:
SC_excel_DataStructureP1_img03

Also, let’s move the data over a cell so that we have some maneuvering room:
SC_excel_DataStructureP1_img04

One thing we can do is get some totals, for example we can count the cells:
SC_excel_DataStructureP1_img05

In cells B5, C5 and D5 we have counted the number of cells in those columns. We know that we have 3 items in column B, 3 in C and 3 in D. However, we could change the total in column C to be a sum, i.e., we are going to add the numbers in that column:
SC_excel_DataStructureP1_img06

There is a limit to the number of operations a person can do based on the data types column. For example in D2 we have the number 1.33 while in D3 we have a structure containing 3 numbers and D4 has text of some sort – it doesn’t make sense to add them. In this case we can count the number of cells that have stuff in them but that is about all.

In column C we just have numbers so we can add them, find the average, find minimum and maximum values … we can do a variety of different types of calculation:
SC_excel_DataStructureP1_img07

At this point we would need more information about the data before we can do any meaningful analysis.

Suppose we have checked other sources about the data and we now have the following. (Notice that we have added column names in order to make the data more meaningful.)
SC_excel_DataStructureP1_img08

This is just a simple spread-sheet but Excel does have a data structure called a table which allows us to do additional things with a data set easily. So let’s turn this small example into a table structure and explore the offerings.
SC_excel_DataStructureP1_img09

SC_excel_DataStructureP1_img10b

Note the downward arrows on the right side of each column header cell.
SC_excel_DataStructureP1_img11

This is a drop-down list which allows for both sorting and filtering options. Sorting on a column results in all rows being sorted appropriately. This simplifies sorting compared to when the data isn’t in a table format.

When we look at the drop-down for column B, “Animal Type”:
SC_excel_DataStructureP1_img12
We can see that Excel recognizes that the data in the column are text, and offers to build Text Filter rules.

When we choose the drop-down arrow for Weight:
SC_excel_DataStructureP1_img13
Excel recognizes the data in the column as numbers and offers Number Filters.

Here is the table after we have sorted the weight from smallest to largest:
SC_excel_DataStructureP1_img14

Summary:

Arranging data in meaningful groups, checking the data, annotating the data and working with table structures allows for quick and simple data exploration.

Next Articles and How-to’s:

  • Why do we care really?
    • With Excel this leads to
      • importing from databases
      • Pivot tables
      • PowerPivot
    • In the world of analytics:
      • Users of the R language place tables into a “data frame” structure
      • Users of the Python Pandas module use “pandas.DataFrame”s.
      • Both of these structures hold the contents of tables from Excel or databases.