Uploaded by CreativeHeuristics on 13.12.2011

Transcript:

Types of data: Nominal

Ordinal Interval/Ratio

Data is central to statistical analysis

When we wish to find out more about a phenomenon or process we collect data.

Usually we collect several measures on each person or thing of interest.

Each thing we collect data about is called an observation.

If we are interested in how people respond,

then each observation will be a person.

OR an observation could be a business

or a product, or a period in time, such as a week.

Variables record the measurements we are interested in.

Age, sex and chocolate preference can all be stored as variables.

For each observation we record a score or value for each of the variables.

When we store this data in a spreadsheet or database,

each row corresponds to a single observation

and each column is a variable.

Level of measurement

The level of measurement used for a variable

determines which summary statistics,

graphs and analysis are possible and sensible.

The Nominal level is the most basic level of measurement.

Nominal is also known as categorical or qualitative.

Examples of nominal variables

are sex,

preferred type of chocolate

and colour.

These are descriptions or labels with no sense of order.

Nominal values can be stored as a word or text or given a numerical code.

However, the numbers do not imply order.

To summarise nominal data we use a frequency or percentage.

You can not calculate a mean or average value for nominal data.

The next level of measurement is ordinal.

Examples of ordinal variables are rank, satisfaction,

and fanciness!

Ordinal variables have a meaningful order,

but the intervals between the values in the scale may not be equal.

For example the gap between first and second runners in a race may be small,

whereas there is a bigger gap between second and third.

Similarly there may be a big difference between satisfied and unsatisfied,

but a smaller difference between unsatisfied and very unsatisfied.

Like Nominal data, ordinal data can be given as frequencies.

Some people state that you should never calculate a mean or average for ordinal data.

However it is quite common practice, particularly in research regarding

people's behaviour to find mean values for ordinal data.

You should be careful if you do this to think about what it means and if it is justifiable.

The most precise level of measurement is interval/ratio.

This label includes things that can be measured rather than classified or

ordered,

such as number of customers

weight, age and size.

Interval ratio data is also known as scale, quantitative or parametric.

Interval/Ratio data can be discrete, with whole numbers

or continuous, with fractional numbers.

Interval/Ratio data is very mathematically versatile.

The most common summary measures

are the mean, the median and the standard deviation.

The way data should be represented in a graph or chart depends on the level of measurement.

Nominal data can be displayed as a pie chart,

column or bar chart

or stacked column or bar chart.

In most cases the best choice for a single set of nominal data

is a column chart.

Ordinal data must not be represented as a pie chart,

but is best shown as a column or bar chart.

Interval/ratio data

is best represented as a bar chart or a histogram.

For these the data is grouped.

Box plots illustrate the summary statistics for a variable in a neat way.

Data which occurs over time is best displayed as a line chart.

Here is an example using different types of data.

Helen sells choconutties.

Helen is interested in developing a new product to add to her line of choconutties.

She develops a questionnaire and asks a random sample of 50 of her customers

to fill it out.

She asks them their age and sex, how much they spend on groceries each week,

how many chocolate bars they buy in a week,

and which they like best out of dark, milk and white chocolate.

She asks them how satisfied they are with choconutties:

very satisfied, satisfied, not satisfied, very unsatisfied.

And she asks them how likely they are to buy a whole box

of 10 packets of choconutties.

Helen enters the data in a spreadsheet.

Each row has responses from one customer.

Each column contains the measurements or scores for one variable.

The type of chocolate preferred is nominal data.

This can be shown in a pie chart or bar chart.

We can summarise by saying that 46% of customers prefer Dark chocolate,

40% prefer milk chocolate,

and 14% prefer white chocolate.

The measures of satisfaction and likelihood are ordinal level data.

These should not be shown in a pie chart.

The values should be put in a logical order in a column chart.

We could say that 32% are very satisfied with choconutties and 72% of people are satisfied or very satisfied.

and 72% of people are satisfied or very satisfied.

The average satisfaction score comes to 2.06,

which could be interpreted as satisfied.

However it is debatable whether it is sensible to calculate a mean satisfaction score.

Age, amount spent on groceries

and number of chocolate bars are all interval/ratio data.

These can be displayed on bar charts or histograms.

We can say that for the customers in the sample,

the mean age is 38 years, the mean amount spent on groceries is $192,

and the mean number of chocolate bars bought per week is 3.3.

These are all meaningful summary statistics.

The type of analysis that is sensible for a given dataset

depends on the level of measurement.

You can find out more about this in the video, "Choosing the test".

Ordinal Interval/Ratio

Data is central to statistical analysis

When we wish to find out more about a phenomenon or process we collect data.

Usually we collect several measures on each person or thing of interest.

Each thing we collect data about is called an observation.

If we are interested in how people respond,

then each observation will be a person.

OR an observation could be a business

or a product, or a period in time, such as a week.

Variables record the measurements we are interested in.

Age, sex and chocolate preference can all be stored as variables.

For each observation we record a score or value for each of the variables.

When we store this data in a spreadsheet or database,

each row corresponds to a single observation

and each column is a variable.

Level of measurement

The level of measurement used for a variable

determines which summary statistics,

graphs and analysis are possible and sensible.

The Nominal level is the most basic level of measurement.

Nominal is also known as categorical or qualitative.

Examples of nominal variables

are sex,

preferred type of chocolate

and colour.

These are descriptions or labels with no sense of order.

Nominal values can be stored as a word or text or given a numerical code.

However, the numbers do not imply order.

To summarise nominal data we use a frequency or percentage.

You can not calculate a mean or average value for nominal data.

The next level of measurement is ordinal.

Examples of ordinal variables are rank, satisfaction,

and fanciness!

Ordinal variables have a meaningful order,

but the intervals between the values in the scale may not be equal.

For example the gap between first and second runners in a race may be small,

whereas there is a bigger gap between second and third.

Similarly there may be a big difference between satisfied and unsatisfied,

but a smaller difference between unsatisfied and very unsatisfied.

Like Nominal data, ordinal data can be given as frequencies.

Some people state that you should never calculate a mean or average for ordinal data.

However it is quite common practice, particularly in research regarding

people's behaviour to find mean values for ordinal data.

You should be careful if you do this to think about what it means and if it is justifiable.

The most precise level of measurement is interval/ratio.

This label includes things that can be measured rather than classified or

ordered,

such as number of customers

weight, age and size.

Interval ratio data is also known as scale, quantitative or parametric.

Interval/Ratio data can be discrete, with whole numbers

or continuous, with fractional numbers.

Interval/Ratio data is very mathematically versatile.

The most common summary measures

are the mean, the median and the standard deviation.

The way data should be represented in a graph or chart depends on the level of measurement.

Nominal data can be displayed as a pie chart,

column or bar chart

or stacked column or bar chart.

In most cases the best choice for a single set of nominal data

is a column chart.

Ordinal data must not be represented as a pie chart,

but is best shown as a column or bar chart.

Interval/ratio data

is best represented as a bar chart or a histogram.

For these the data is grouped.

Box plots illustrate the summary statistics for a variable in a neat way.

Data which occurs over time is best displayed as a line chart.

Here is an example using different types of data.

Helen sells choconutties.

Helen is interested in developing a new product to add to her line of choconutties.

She develops a questionnaire and asks a random sample of 50 of her customers

to fill it out.

She asks them their age and sex, how much they spend on groceries each week,

how many chocolate bars they buy in a week,

and which they like best out of dark, milk and white chocolate.

She asks them how satisfied they are with choconutties:

very satisfied, satisfied, not satisfied, very unsatisfied.

And she asks them how likely they are to buy a whole box

of 10 packets of choconutties.

Helen enters the data in a spreadsheet.

Each row has responses from one customer.

Each column contains the measurements or scores for one variable.

The type of chocolate preferred is nominal data.

This can be shown in a pie chart or bar chart.

We can summarise by saying that 46% of customers prefer Dark chocolate,

40% prefer milk chocolate,

and 14% prefer white chocolate.

The measures of satisfaction and likelihood are ordinal level data.

These should not be shown in a pie chart.

The values should be put in a logical order in a column chart.

We could say that 32% are very satisfied with choconutties and 72% of people are satisfied or very satisfied.

and 72% of people are satisfied or very satisfied.

The average satisfaction score comes to 2.06,

which could be interpreted as satisfied.

However it is debatable whether it is sensible to calculate a mean satisfaction score.

Age, amount spent on groceries

and number of chocolate bars are all interval/ratio data.

These can be displayed on bar charts or histograms.

We can say that for the customers in the sample,

the mean age is 38 years, the mean amount spent on groceries is $192,

and the mean number of chocolate bars bought per week is 3.3.

These are all meaningful summary statistics.

The type of analysis that is sensible for a given dataset

depends on the level of measurement.

You can find out more about this in the video, "Choosing the test".