What is a Data Structure? Part 1
An Explanation Using Tables in Excel 2010
Whether you are just going to look at your data and draw conclusions visually, or you are going to writing a computer program to manipulate it, even a very simple structure can help.
When analyzing data, it is useful to decide on the data that needs to be processed, and then organize that data into a data structure. A well designed data structure can make the operations that need to be done on the data very efficient.
Microsoft Excel has a very simple structure, without many checks on what goes into the cells. An empty spreadsheet is just a bunch of cells in a grid.
We can put just about anything into any cell and Excel won’t complain about it.
In order to do analysis on data, we need to have the same type of data in either the rows or columns. The data that we want to do operations on should be grouped together.
Here we have assumed that the data in each row is related and just rearranged the data in the columns:
Also, let’s move the data over a cell so that we have some maneuvering room:
One thing we can do is get some totals, for example we can count the cells:
In cells B5, C5 and D5 we have counted the number of cells in those columns. We know that we have 3 items in column B, 3 in C and 3 in D. However, we could change the total in column C to be a sum, i.e., we are going to add the numbers in that column:
There is a limit to the number of operations a person can do based on the data types column. For example in D2 we have the number 1.33 while in D3 we have a structure containing 3 numbers and D4 has text of some sort – it doesn’t make sense to add them. In this case we can count the number of cells that have stuff in them but that is about all.
In column C we just have numbers so we can add them, find the average, find minimum and maximum values … we can do a variety of different types of calculation:
At this point we would need more information about the data before we can do any meaningful analysis.
Suppose we have checked other sources about the data and we now have the following. (Notice that we have added column names in order to make the data more meaningful.)
This is just a simple spread-sheet but Excel does have a data structure called a table which allows us to do additional things with a data set easily. So let’s turn this small example into a table structure and explore the offerings.
Note the downward arrows on the right side of each column header cell.
This is a drop-down list which allows for both sorting and filtering options. Sorting on a column results in all rows being sorted appropriately. This simplifies sorting compared to when the data isn’t in a table format.
When we look at the drop-down for column B, “Animal Type”:
We can see that Excel recognizes that the data in the column are text, and offers to build Text Filter rules.
When we choose the drop-down arrow for Weight:
Excel recognizes the data in the column as numbers and offers Number Filters.
Here is the table after we have sorted the weight from smallest to largest:
Summary:
Arranging data in meaningful groups, checking the data, annotating the data and working with table structures allows for quick and simple data exploration.
Next Articles and How-to’s:
- Why do we care really?
- With Excel this leads to
- importing from databases
- Pivot tables
- PowerPivot
- In the world of analytics:
- Users of the R language place tables into a “data frame” structure
- Users of the Python Pandas module use “pandas.DataFrame”s.
- Both of these structures hold the contents of tables from Excel or databases.
- With Excel this leads to
Leave A Comment