Types of data: Nominal
Ordinal Interval/Ratio
Data is central to statistical analysis
When we wish to find out more about a phenomenon or process we collect data.
Usually we collect several measures on each person or thing of interest.
Each thing we collect data about is called an observation.
If we are interested in how people respond,
then each observation will be a person.
OR an observation could be a business
or a product, or a period in time, such as a week.
Variables record the measurements we are interested in.
Age, sex and chocolate preference can all be stored as variables.
For each observation we record a score or value for each of the variables.
When we store this data in a spreadsheet or database,
each row corresponds to a single observation
and each column is a variable.
Level of measurement
The level of measurement used for a variable
determines which summary statistics,
graphs and analysis are possible and sensible.
The Nominal level is the most basic level of measurement.
Nominal is also known as categorical or qualitative.
Examples of nominal variables
are sex,
preferred type of chocolate
and colour.
These are descriptions or labels with no sense of order.
Nominal values can be stored as a word or text or given a numerical code.
However, the numbers do not imply order.
To summarise nominal data we use a frequency or percentage.
You can not calculate a mean or average value for nominal data.
The next level of measurement is ordinal.
Examples of ordinal variables are rank, satisfaction,
and fanciness!
Ordinal variables have a meaningful order,
but the intervals between the values in the scale may not be equal.
For example the gap between first and second runners in a race may be small,
whereas there is a bigger gap between second and third.
Similarly there may be a big difference between satisfied and unsatisfied,
but a smaller difference between unsatisfied and very unsatisfied.
Like Nominal data, ordinal data can be given as frequencies.
Some people state that you should never calculate a mean or average for ordinal data.
However it is quite common practice, particularly in research regarding
people's behaviour to find mean values for ordinal data.
You should be careful if you do this to think about what it means and if it is justifiable.
The most precise level of measurement is interval/ratio.
This label includes things that can be measured rather than classified or
ordered,
such as number of customers
weight, age and size.
Interval ratio data is also known as scale, quantitative or parametric.
Interval/Ratio data can be discrete, with whole numbers
or continuous, with fractional numbers.
Interval/Ratio data is very mathematically versatile.
The most common summary measures
are the mean, the median and the standard deviation.
The way data should be represented in a graph or chart depends on the level of measurement.
Nominal data can be displayed as a pie chart,
column or bar chart
or stacked column or bar chart.
In most cases the best choice for a single set of nominal data
is a column chart.
Ordinal data must not be represented as a pie chart,
but is best shown as a column or bar chart.
Interval/ratio data
is best represented as a bar chart or a histogram.
For these the data is grouped.
Box plots illustrate the summary statistics for a variable in a neat way.
Data which occurs over time is best displayed as a line chart.
Here is an example using different types of data.
Helen sells choconutties.
Helen is interested in developing a new product to add to her line of choconutties.
She develops a questionnaire and asks a random sample of 50 of her customers
to fill it out.
She asks them their age and sex, how much they spend on groceries each week,
how many chocolate bars they buy in a week,
and which they like best out of dark, milk and white chocolate.
She asks them how satisfied they are with choconutties:
very satisfied, satisfied, not satisfied, very unsatisfied.
And she asks them how likely they are to buy a whole box
of 10 packets of choconutties.
Helen enters the data in a spreadsheet.
Each row has responses from one customer.
Each column contains the measurements or scores for one variable.
The type of chocolate preferred is nominal data.
This can be shown in a pie chart or bar chart.
We can summarise by saying that 46% of customers prefer Dark chocolate,
40% prefer milk chocolate,
and 14% prefer white chocolate.
The measures of satisfaction and likelihood are ordinal level data.
These should not be shown in a pie chart.
The values should be put in a logical order in a column chart.
We could say that 32% are very satisfied with choconutties and 72% of people are satisfied or very satisfied.
and 72% of people are satisfied or very satisfied.
The average satisfaction score comes to 2.06,
which could be interpreted as satisfied.
However it is debatable whether it is sensible to calculate a mean satisfaction score.
Age, amount spent on groceries
and number of chocolate bars are all interval/ratio data.
These can be displayed on bar charts or histograms.
We can say that for the customers in the sample,
the mean age is 38 years, the mean amount spent on groceries is $192,
and the mean number of chocolate bars bought per week is 3.3.
These are all meaningful summary statistics.
The type of analysis that is sensible for a given dataset
depends on the level of measurement.
You can find out more about this in the video, "Choosing the test".