te tauanga

statistics

MS3081 introduction to time series ncea level 3

2013/1

statistics

ncea level 3

Expected time to complete work This work will take you about 10 hours to complete. You will work towards the following standards: Achievement Standard 91580 (Version 1) Mathematics and Statistics 3.8 Investigate time series data Level 3, External 4 credits In this booklet you will focus on this learning outcome: •• investigating time series data with odd point moving means. You will continue to work towards this standard in booklets MS3082 and MS3083.

Copyright © 2013 Board of Trustees of Te Aho o Te Kura Pounamu, Private Bag 39992, Wellington Mail Centre, Lower Hutt 5045, New Zealand. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means without the written permission of Te Aho o Te Kura Pounamu.

© te ah o o te k u ra p ou n am u

contents 1

Introduction to time series

2

First steps in smoothing the raw data

3

Using a spreadsheet to find the trend

4

Linear trend lines

5

First steps of writing a statistical report

6

Review activity

7

ÂŠ te ah o o t e k ur a p o un a m u

MS3081

1

how to do the work When you see:

1A Complete the activity.

Your teacher will assess this work.

Check the website.

You will need: •• access to the Internet to get on to the MS3000 OTLE website where you will find the data sets you need and the spreadsheet answers. •• any type of appropriate mathematical and statistical technology such as graphing calculator, net book or tablet that has a spreadsheet. Resource overview In this booklet you will learn about how to comment on the features of time series graphs. You will learn how to smooth the raw data using odd point moving means and then find the equation of the trend line. It is expected that you will use a spreadsheet for this work. In the following booklet MS3082 you will learn about even point moving means.

2

MS3081

© te ah o o te k u ra p ou n a mu

introduction to time series learning intentions

In this lesson you will learn to:

• identify the features in a time series

• relate the features to the context.

introduction Do you: • measure your heartbeat? • log your sports performance? • watch the share market? If you do, then you’re already into time series.

features of time series A time series is a set of observations or measurements made at regular intervals over a period of time. Daily temperatures, commodity prices, records of growth, population patterns and exchange rates are all examples of time series.

iStock

1

The graph of an electrocardiogram (ECG) is a time series graph.

Throughout the booklets on time series, you’ll be looking at various types of time series, learning the skills and techniques needed to interpret what has happened and forecast what may happen next. You’ll have a chance to apply the skills as you learn them, then in the last lesson you will put it all together and work through the assessment task.

© te ah o o t e k ur a p o un a m u

MS3081

3

introduction to time series

features of time series graphs Time series are made up of four components: two long-term: T the overall trend C the long-term cycle and two short-term: S the seasonal variation R (or I) is the residual; that is, any random irregularity. Any or all of these components may be present in a time series. Let’s have a look at some time series graphs and identify the components.

the trend

T, the trend of a series is sometimes called the secular trend (secular means ‘over a long period of time’). The easiest way to discover a trend is to use a spreadsheet to graph the raw data and trend line. The trend may be a straight line or a curve. Try to think what the trend implies about the data.

N Z Consumer Price Index (quarterly) New Zealand consumer price index

11020 020

Price (\$)

11010 010 11000 000 990 990 980 980

Dec – 94

Sep – 94

Jun – 94

Mar – 94

Dec – 93

Sep – 93

Jun – 93

Mar – 93

Dec – 92

Sep – 92

Mar – 92

960 960

Jun – 92

970 970

Mar-92Jun-92 Sep-92 Dec-92 Mar-93Jun-93 Sep-93Dec-93 Mar-94Jun-94 Sep-94 Dec-94

Year You can see from the graph that the trend shows a gradual increase in the cost of living from March 1992 to December 1994. The steeper slope of the graph line from June 1994 shows that the rate of increase was greater during the second half of 1994 than it was during the first half. The secular trend tells you that the cost of living is rising.

4

MS3081

© te ah o o te k u ra p ou n a mu

introduction to time series

the cycle Looking at the graph of a series is also the best way to pick up any long-term cycles, C, which are hard to spot from the data. For a cycle to be considered long-term, it has to be over two years in length. Anything shorter is called a seasonal variation. Youʼll be able to work out the length of a cycle by finding the time between peaks or dips in the graph line. Here, you can see that the graph line peaks roughly every seven years.

Sound and Sight Company share price

150 100

iStock

50

2000

1993

1986

1979

1972

0 1965

Price (cents)

200

Year The Sound and Sight Company share price time series has a seven-year cycle. There’s also a long-term or secular trend showing an overall increase in the share price.

© te ah o o t e k ur a p o un a m u

MS3081

5

introduction to time series

seasonal variation S, the seasonal variations, are often known about before the data is collected. With their regular and recurring patterns, they are easy to spot from the graph. Look at the time axis (horizontal) to work out the period of the variations. The period is the length of time between two peaks (or troughs). It may be a week, a month or any fraction of a year. Even regular daily variations over the period of a day are classed as seasonal features, as in this example. Patient temperature chart May 13-16 Patient temperature chart May 13–16 40 40

Temperature (ºC)

39 39 38 38 37 37 temperature °C

36.8 36.8 normal normal

36 36

9.00 pm

2.00 pm

6.00 am

9.00 pm

2.00 pm

6.00 am

9.00 pm

2.00 pm

6.00 am

9.00 pm

2.00 pm

6.00 am

35 35 6.00am2.00pm9.00pm6.00am2.00pm9.00pm6.00am2.00pm9.00pm6.00am2.00pm9.00pm

Time You can see that the patient’s temperature rises to a peak at 9.00 pm, then falls steadily during the night to an early-morning low. This marked seasonal pattern diminishes as she recovers. Seasonal variations are often present in time series connected with climate or season.

6

MS3081

© te ah o o te k u ra p ou n a mu

introduction to time series

residual effect R, the residual effect, describes the irregular or random variations that occur in all kinds of time series. They show up as ‘spikes’ or ‘dips’ in the graph line and are often described as noise in a time series. They can be caused by unpredictable rare events (earthquake, fire, etc.), measurement error, or just plain random human behaviour. The residual components make it impossible to forecast future values with total accuracy. Who knows what spanner may get thrown in the works!

Price of audio component (\$) Price of audio component (\$)

350 350 300 300

Price (\$)

250 250 200 200

150 150 dollars 100 100 50 50 Jul – 93

Mar – 93

Nov – 92

Jul – 92

Mar – 92

Nov – 91

Jul – 91

Mar – 91

Nov – 90

Jul – 90

Mar – 90

00 Mar-90Jul-90 Nov-90Mar-91Jul-91 Nov-91Mar-92Jul-92 Nov-92Mar-93Jul-93

Months The spike in the graph shows a temporary rise in price. A factory fire halted production of the component. This led to a shortage, which in turn pushed prices up temporarily.

© te ah o o t e k ur a p o un a m u

MS3081

7

introduction to time series

Other irregular features you may see from time to time are:

ramps Steps Ramps steps

These cause the trend line to be displaced. They can be caused by: • a change in the time period • a change in collection method or reporting procedure • opening or closing of a sales outlet • a change in price • a change in the law • and so on. Quarterly price index for tobacco products

(Dec 93, 1000)

Quarterly price index for tobacco products (Base Dec 93, index 1000)

600 550

Price index

500 450 400 350 300

Sep – 88 Dec – 88

Jun – 88

Sep – 87 Dec – 87 Mar – 88

Jun – 87

Sep – 86 Dec – 86 Mar – 87

Jun – 86

Jun – 85

Sep – 85 Dec – 85 Mar – 86

Sep – 84 Dec – 84 Mar – 85

Jun – 84

Sep – 83 Dec – 83 Mar – 84

Jun – 83

Mar – 83

200

Dec – 82

250

Jun-83 Jun-84 Jun-85 Jun-86 Jun-87 Jun-88 Dec-82 Sep-83 Dec-83 Sep-84 Dec-84 Sep-85 Dec-85 Sep-86 Dec-86 Sep-87 Dec-87 Sep-88 Dec-88 Mar-83 Mar-84 Mar-85 Mar-86 Mar-87 Mar-88

Quarter In the July 1986 budget, there was a large increase in the tax on tobacco products. This is evident in the step feature indicating a dramatic rise in the price index recorded over the next quarter (three months).

8

MS3081

© te ah o o te k u ra p ou n a mu

introduction to time series

reading time scales on graphs Sometimes it can be confusing to know exactly when changes are occurring, especially when data points are not graphed separately, but joined by lines. For most data measured or collected, the time point stated usually means â&#x20AC;&#x2DC;up to the end of the periodâ&#x20AC;&#x2122;. This is a small part of a data set on mackerel caught in New Zealand waters. Year Tonnes 1977 1978 1979

18 492 6 605 7 589

Between the end of 1977 and the end of 1978, there were 6 605 tonnes of mackerel caught. That means that during 1978, there was a decrease of almost 12 000 tonnes. You can see this represented on the graph in activity 1B, question 3.

identifying features in the data Look at the time series graph of the raw data to identify features. Consider all four components of time series, although you may find that all four are not necessarily present. Often the long-term cycle, c, is not apparent. Make sure that you write your comments in context. This means that you will need to identify the variables and relate these to your comments. Your comments should be about the data set that you are investigating and not generic comments alone. You need to be able to interpret what you see in the graphs and any statistical summaries that you create from the spreadsheet. You are a data detective and your mission is to unravel the story that is hidden within and between data points. 1A

An ECG graph is made by recording the electrical impulses made while the heart is beating and printing these on paper strip where the side of each small square represents 1 mm (each larger square is 5 mm in length). Each larger square represent 0.2 seconds. Look at the bottom time series in the ECG graph below. This is the rhythm strip. The data was collected over approximately five seconds and the voltage is measured on the vertical axis, with 10 mm being equal to 1 mV. Comment on the features of this time series by filling in the gaps in the passage below.

ÂŠ te ah o o t e k ur a p o un a m u

MS3081

9

iStock

introduction to time series

The variables for the ECG graph are 1. ________________ measured in mV and time measured in 2. ________________. There is a clear seasonal 3. ________________ that begins with a small wave, a large spike and a second wave that is 4. _______________ than the first. This indicates that the resting 5. ____________ recorded from the heart makes a small 6. ____________ and decays, then rapidly spikes by about 1 mV followed by an equally rapid 7.____________ that drops below the resting voltage. Finally there is a second slower increase of just under 8.________________ mV. The trend is neither increasing nor 9._______________ so we say that the trend for voltage is constant over this 10.__________________ period. As this data covers only a few seconds we are not able to determine the 11. _____________________ effect. The third spike is slightly lower than the others recorded on this graph but is probably not significant enough to consider as 12.__________________. To find out more about the ECG refer to the links on the MS3000 OTLE website.

10

MS3081

ÂŠ te ah o o te k u ra p ou n a mu

introduction to time series

Describe the features you see in these time series graphs. Make sure that you write your comments in context.

Yearly cigarette consumption in NZ (millions)

Yearly cigarette consumption in New Zealand

1. 66000 000 55500 500 55000 000 44500 500

2.

93

1993

92

1992

91

1991

90

1990

89

1989

88

1988

86 87 year 1987

85

1986

84

1985

83

1984

82

1983

81

1982

1980

44000 000 (m) number 33500 500 33000 000 80

1981

Number (millions)

66500 500

Quarterly retail chemists' sales in New Zealand millions) Quarterly retail(\$chemist sales in New Zealand

280 280 260 260 240 240 sales 220\$(m) 220

© te ah o o t e k ur a p o un a m u

Sep-93

Mar-94

Sep-94

Sep – 94

year

Mar-93

Mar – 94

Sep-92

Sep – 93

Mar-92

Mar – 93

Sep-91

Sep – 92

Mar – 91

Mar-91

Mar – 92

200 200

Sep – 91

Sales (\$m)

1B

MS3081

11

introduction to time series

3. Mackerel and barracouta catches in New Zealand waters 45 000 40 000

Tonnes

35 000 30 000 25 000 20 000 15 000 10 000 5 000

MS3081

1991

1990

1989

1988

1987

1986

1985

1984

1983

1982

year Year

mackerel

12

1981

1980

1979

1978

1977

1976

1975

1974

1973

1972

71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91

1971

0

barracouta

ÂŠ te ah o o te k u ra p ou n a mu

introduction to time series

4. Graph matching Choose the correct title from the list on page 15 or at the end of this exercise, and write it above the appropriate graph. y 14

6

12

4

3

2

10

10000L milk

Milk 10 000 L

5

8

6

4

1

2

x

0 0

1

2

3

4

5

6

7

8

0 0

1

2

3

4

5

6

7

8

10

12

14

16

time period

Time period

5

5

4.5

4

4

3.5

3.5

Milk 10 000 L

Milk 10 000 L

4.5

milk 10000L

milk 10000L

3

2.5

2

3

2.5

2

1.5

1.5

1

1

0.5

0.5

0

0 0

2

4

6

8

10

12

14

16

18

period

0

2

4

6

8 time period

Time period

Time period

1. No trend with random variation

14

2. Trend plus seasonal variation plus

10

10000L Milkmilk10 000 L

12

8

random variation

3. Perfect trend

6

4. Spike

4

5. Perfect trend plus seasonal variation

2

0 0

1

2

3

4

5

6

7

8

time period

Time period

Work through the PPT ‘Belle’s Dairy Farm’ on the MS3000 OTLE site and use this to check your answers. Check your answers. © te ah o o t e k ur a p o un a m u

MS3081

13

18

first steps in smoothing the raw data learning intention

In this lesson you will learn to: • • smooth the raw data using odd point moving means.

introduction Often it is hard to see the secular trend in a time series graph because of the fluctuations in the graph line. Smoothing the data makes the trend more evident.

moving means It is easy to use a spreadsheet to calculate moving means. The advantages of this method are: • it gives a smoother curve than the raw data • it spreads the effect of extreme values. We will use moving means in our analysis of time series. In this example, the average or mean of every three data values is found.

Death rate per thousand of young American males from 1912 to 1922 Year

1912

1913

1914

1915

1916

1917

1918

1919

1920

1921

1922

Rate

4.5

4.7

4.4

4.2

4.5

5.0

12.2

5.3

4.8

3.8

3.8

4.5

4.4

4.4

4.6

7.2

7.5

7.4

4.6

4.1

Mean 3 years

5.0 + 12.2 + 5.3 3 = 7.5 The smoothed rate for 1918 is the mean of the rates for 1917, 1918 and 1919. See how the effect of the ‘spike’ has been spread over the three years. Death rate per thousand of young American males

Death rate per thousand of young American males

14.0 Rate per thousand

12.0 10.0 8.0 6.0

4.0 rate per thousand 2.0

1920

1921

1922

1919

1921

1918

1920

1917 year

1919

1916

1918

1915

1917

1914

1916

1913

1915

1912

1914

1913

0.0 1912

2

1922

Year raw rawdata data

14

MS3081

moving(3) means(3) mean

© te ah o o te k u ra p ou n a mu

first steps in smoothing the raw data

number of points – series without a seasonal component The number of values the data is averaged over is often referred to as the number of points. So ‘a three-point moving average’ tells you that groups of three means have been used to smooth the data. We calculate a moving mean of order three. For time series without a seasonal pattern, the number of time periods chosen is not critical. But remember, the larger the number of time periods or points chosen for averaging, the more values you lose off the ends. With three points you lose one value from each end. With five points you lose two, and with seven you lose three. This can be a drawback if you don't have many data values. If the number of periods averaged is too small, the graph may still have too many variations for the trend to be evident. If the number of periods is too large, many features of the data may be lost. You'll get the idea by looking at the example shown for the value of exports. The 12-point moving average smooths the data well, while the six-point moving average leaves quite large variations remaining.

series with a seasonal pattern

For time series with a seasonal pattern, the number of time periods chosen for averaging is usually the same as is present in one cycle of the raw data. For quarterly raw data, one cycle has four pieces of data so a moving mean of order four (four-point moving mean) is used. For data collected on each of seven days of a week, one cycle is made up of seven pieces of data so a seven-point moving mean is used. We would say that we calculate a moving mean of order seven.

Value of exports: fish, crustaceans, dairy and other animal products 500

420

Value

340

260

180

100

Mar–85

Mar–87

Key: – monthly

© te ah o o t e k ur a p o un a m u

Mar–89

Mar–91

– 6pt moving mean

Mar–93

Mar–95

– 12 pt moving mean

MX3081

15

first steps in smoothing the raw data

2A

In the chart below, means of five adjacent data values have been found. (The means have been rounded to the nearest whole number.) Complete the chart by calculating the missing five-point means, d, e and f.

Thickness of the ozone layer above New Zealand (measured in Dobson Units in July each year) Year

Data

Mean 5

1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992

289 305 300 326 307 308 313 292 320 322 313 306 271 290 292 292 289 276 291

305 309 d 309 308 311 312 311 306 300 e 290 287 288 f

16

MS3081

ÂŠ te ah o o te k u ra p ou n a mu

using a spreadsheet to find the trend learning intention

In this lesson you will learn to: • • use a spreadsheet to find the trend.

introduction

iStock

3

© te ah o o t e k ur a p o un a m u

MS3081

17

using a spreadsheet to find the trend

the trend One of the main uses for time series analysis is to make a forecast. We need to graph the raw data and the moving means and this is easier when using a spreadsheet. Once the raw data and the moving means have been graphed on the same axes, it is a small step to find the equation of the trend line. The trend equation is based on the smoothed data only so it is vital to ensure that only the graph of the moving means is highlighted. In Excel once the moving means is highlighted a dropdown menu gives a choice of models to fit as the trend and gives the option to find its equation. This booklet will focus on a linear trend so you will choose the linear model from the dropdown menu. You will find help for using Excel on the MS3000 OTLE site. Contact your Mathematics and Statistics teacher if you need help with this.

iStock

3A

Absences data 1. Save the data set Absences data from the MS3000 OTLE site to your computer. Add your name to the beginning of the file name. Open the file by double-clicking on the left of the file name.

This data is a time series of the number of absences by Year 12 students over a four-week period. As the data were collected over a five-day period we calculate moving means of order five. We say that the data has seasonality of order five.

Using your spreadsheet, you are to: •• calculate odd point moving means to smooth the data

•• draw the graph of the raw data and the smoothed data on the same graph •• draw the trend line on the graph (this must be based on the smoothed data not the raw) •• comment on the features of this time series data •• find the equation of the trend line •• check the MS3000 OTLE website to mark and correct your work.

Make sure that you save your work regularly. (Press the save icon or use CTRL + S on Excel.)

Name all work clearly. 18

MS3081

© te ah o o te k u ra p ou n a mu

using a spreadsheet to find the trend

3. For the Absences data, write your equation of the trend line in the space provided. y=

Check the website. 4. If your answer is not correct, comment on the changes that you think you need to make to your spreadsheet so that you get the correct answer.

It is best to correct your work before you send it to your teacher.

ÂŠ te ah o o t e k ur a p o un a m u

MS3081

19

using a spreadsheet to find the trend

naming files

When you save a file to your computer, it’s best to personalise it. Make sure that the file name begins with your name so that your teacher can read your name without opening the file. Then write the name or the task and whether it is an assignment or assessment. Your Te Kura ID number is useful too. For example: Jo Brown-10339906TimeSeriesAssignment.doc Remember that your teacher gets lots of these and if they are all named ‘Retail data’ it would be a real problem to try to identify who each file belongs to. How to name files (*for some computers): • left-click to highlight the file name • left-click again so that the box around the file name appears • then hover over the name of the file with the mouse to the position you want and click again to place a cursor in the name box, then you can type what you want. *NB: For Mac users don’t left/right-click.

20

MS3081

© te ah o o te k u ra p ou n a mu

linear trend lines learning intention

introduction A line of best fit can be drawn through the smoothed data, enabling us to see the trend clearly. This line is used for the first step when calculating a forecast. This lesson will focus on describing linear trends.

linear trend lines Before we attempt to describe the gradient we note: • the trend (it can often be described as linear in this course) • the units on the y-axis – number of absences • the units on the x-axis – day • the numerical value of the gradient (rounding is usually sensible) • whether the gradient is positive or negative –– if positive, use increasing to describe the trend –– if negative, use decreasing to describe the trend • all comments need to be in context. This means that you need to write a sentence using the variables and units given with the data – in this case number of absences per day.

using the absences data Number of absences by day 35 30 Number of absences

y = –0.0618x + 16.424

25 20 – Raw

15

– cmm

10

– Linear (cmm)

5 0

Mon Tue Wed Thu Fri Mon Tue Wed Thu Fri Mon Tue Wed Thu Fri Mon Tue Wed Thu Fri

4

I notice that: • the trend is linear • the units on the y-axis – number of absences • the units on the x-axis – days of a five day week • the numerical value of the gradient is – 0.0618 • the gradient is negative so use decreasing to describe the trend.

© te ah o o t e k ur a p o un a m u

MS3081

21

linear trend lines

This means that the quantitative description of the gradient is: Absences for Year 12 students at this school are decreasing by approximately 0.0618 absences each day of a five-day week. A secondary school is required by law to be open 190 days. We could multiply 0.0618 by 190, to give a yearly approximation (190 × 0.0618 = 11.74). Absences are decreasing by approximately 12 absences Your answers must be per year for a school year of 190 days. written in context and rounded sensibly.

4A

Sleep data 1. Save the file Sleep data from the MS3000 OTLE website on OTLE to your computer. Add your name to the beginning of the file name. This time series data records the number of hours of sleep had by a Year 13 student over a four-week period (moving means of order 7). Note that in this case you will have to leave three empty cells at the top of the moving average column and three empty cells at the bottom.

iStock

Remember the units must be correct and it is wise to round your answer sensibly.

You are to: •• calculate odd point moving means •• draw the graph of the raw data and the moving means •• draw the trend line on the graph •• comment on the features of this time series data •• find the equation of the trend line •• write a quantitative comment about the gradient. Make sure that you save your work regularly. 2. Write your comments on the features of this time series here:

22

MS3081

© te ah o o te k u ra p ou n a mu

linear trend lines

3. For the Sleep data, write your equation of the trend line in the space provided. y=

5. If your answer is not correct, comment on the changes you need to make:

significant figures and rounding numbers Computers and calculators will give us many more decimal places for a number than is sensible for the problem. If we are working with data from an experiment that has been recorded to three significant figures is it sensible to give the answer to more than three significant figures? No, it is not. The third significant number has itself arisen as a result of rounding and could contain errors. It is perhaps more sensible to round the final answer to two significant figures. Think carefully about the number of decimal places you will use in your spreadsheets and always look at the number of significant figures in the original data as a guide. It is wise to consider how the data were gathered and why a given number of significant figures or decimal places was selected.

ÂŠ te ah o o t e k ur a p o un a m u

MS3081

23

first steps writing a statistical report

5

learning intentions

In this lesson you will learn to: • • present your report in a structured manner.

introduction In order that the report has cohesiveness it is advisable to follow the structure outlined below: • Introduction: the purpose of the investigation • Data source: a brief description of the data and where it comes from • Results from your data including analysis • Discussion of your findings. Remember the statistical enquiry cycle (PPDAC). You may wish to check the PPDAC poster on the MS3000 OTLE site.

5A

the report introduction 1. Begin your report with a statement describing the purpose of your investigation.

The following could be used as an introduction for the absences data used in chapter three. Complete the paragraph by filling in the gaps: ‘The purpose of this 1. _______________ is to determine the features of the 2. ___________ series formed by the number of 3. _____________________ by Year 12 students over a four-week period. I will find the 4. ____________ of the trend line after smoothing the raw 5. _________________ .’

24

MS3081

© te ah o o te k u ra p ou n a mu

first steps of writing a statistical report

the data 2. Describe briefly the data, where it comes from and state the variables used.

Hereâ&#x20AC;&#x2122;s an example that could be used for the Absences data. Complete the paragraph by filling in the gaps: This data set I am using is of the 1. _________________________ of absences by Year 12 2. ________________ over a four 3. _____________________ period for the five 4. ________________________ of a school week. The 5. ___________________ are days of the week for the independent variable and number of absences for the 6. _____________________________ variable. I found this data on the Te Kura MS3000 7. ________________________ website.

ÂŠ te ah o o t e k ur a p o un a m u

MS3081

25

review activity learning intention

In this lesson you will learn to: • • investigate times series data with odd point moving means.

the hot stuff bakery iStock

6

Save the file Hot Stuff Bakery data from the MS3000 OTLE website to your computer. Add your name and ID number to the beginning of the file name.

The Hot Stuff Bakery kept a record of bread sales for a four-week period. The Hot Stuff Bakery is a small bakery situated on the main highway on the outskirts of a small rural town. It specialises in bread, but also sells other bakery products. Carry out a time series statistical investigation using the Hot Stuff bakery data.

statistical investigation process

Undertake an investigation to determine what pattern there is to food spending and use this to determine the trend. • Select and use appropriate displays in your analysis and discuss these. • Identify features in the data and relate these to the context. • Find an appropriate model.

report structure • • • •

Introduction: the purpose of the investigation. Data source: a brief description of the data and where it comes from. Results from your data including analysis. Discussion of your findings.

the results Include all of the analysis you have carried out on the data set. Use the statistical graphs and numerical statistics you have obtained from the time-series graph, and make sure that you include clear headings in the spreadsheet and that your work is set out in a logical way. Refer to lessons 1–4 to review the concepts of: •• smoothing a time-series graph using odd point moving means •• using a spreadsheet to view the overall trend •• adding linear trend lines •• evaluating the value of the correlation coefficient and other features evident in the data set.

26

MS3081

© te ah o o te k u ra p ou n a mu

review activity

© te ah o o t e k ur a p o un a m u

MS3081

27

review activity

what to do now

Complete the cover sheet and self-assessment form and send with this booklet to your Te Kura Mathematics and Statistics teacher. Remember to include a printout of your spreadsheet with all of your working for this review activity.

28

MS3081

ÂŠ te ah o o te k u ra p ou n a mu

7

1. 1A

introduction to time series

1. voltage 2. seconds 3. effect 4. larger 5. voltage 6. increase 7. decrease 8. 0.5 mV 9. decreasing 10. time 11. cyclical 12. unusual

1B

Your answers could be different from these. 1. Until about 1984 the overall trend appears to be increasing slightly. After 1984 the trend is decreasing and it appears that it could be linear. This means that the consumption of cigarettes was decreasing from 1984 to 1993. There appears to be some kind of pattern with peaks and troughs and, as the time period is approximately three years, this indicates that there could be a cyclical effect, but it does not appear to be strong. There are no obvious spikes. 2. There appears to be a linear trend that is increasing. A clear seasonal pattern is present indicating that there is a strong seasonal effect. The peak of the last cycle could be higher than expected indicating that some kind of noise is present. 3. The large spike or irregularity in the barracouta catches was caused by foreign fishing vessels catching as much as possible before the introduction of the 200 nautical mile Exclusive Economic Zone (EEZ) to New Zealand waters in 1978. The effect is visible in the marked drop in mackerel catches during 1978, too. Neither graph has any obvious cyclical or seasonal component. The trend in the barracouta catches shows a slight overall increase since 1978, and mackerel catches are on the rise since 1980 and at a faster rate than the barracouta catch. More mackerel than barracouta were caught in 1991. 4. Check out the file Belle's Dairy Farm on MS3000 OTLE website.

ÂŠ te ah o o t e k ur a p o un a m u

MS3081

29

2.

first steps in smoothing the raw data

2A

Data

Mean 5

1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992

289 305 300 326 307 308 313 292 320 322 313 306 271 290 292 292 289 276 291

305 309 311 309 308 311 312 311 306 300 294 290 287 288 288

d = 311 That is the mean of the five data values centred on 1978. (300 + 326 + 307 + 308 + 313) ÷ 5 = 310.8 … (311 to nearest whole number)

3. 3A

Year

using the spreadsheet to find the trend

1. Check the Absences data spreadsheet answers on the MS3000 OTLE site. 2. There is a linear trend that is decreasing slightly. This means that over this three-week period, the number of absences is decreasing slightly.

Seasonal variation is present, with a peak in the number of absences for Year 12 students on Fridays and decrease in the number of absences on Wednesdays.

Random variation (residual effect) is present and is shown by the irregular pattern in the peaks and troughs. The data for the third week shows a trough that is greater than expected and a peak that is higher than expected. This means that on the third Wednesday there were fewer absences than expected whereas on the Friday there were more absences than expected.

30

MS3081

© te ah o o te k u ra p ou n amu

4. 4A

linear trend lines

1. Check the Absences data spreadsheet answers on the MS3000 OTLE site. 2. There is a linear trend that is decreasing slightly.

Seasonal variation is present, with a decrease in the hours of sleep for Year 13 students on Saturdays followed by a peak in the hours of sleep for Year 13 students on Sundays.

Random variation (residual effect) is present and is shown by the irregular pattern in the peaks and troughs and in particular the second weekend showing a deeper trough and larger peak.

3. y = –0.0285x + 7.2609 If you did not get this answer, check the Excel file on OTLE MS3000 site 4.

I notice that: • the trend is linear • the units on the y-axis – hours of sleep • the units on the x-axis – days of a five day week • the numerical value of the gradient is –0.0285 • the gradient is negative so use decreasing to describe the trend.

This means that: The hours of sleep for Year 13 students at this school are decreasing by approximately 0.0285 hours each day over this four-week period. This is better given as: the hours of sleep for Year 13 students at this school are ‛decreasing’ by approximately 1 minute and 43 seconds each day of this four-week period.

5. 5A

1.

first steps of the report

1. investigation 2. time 3. absences 4. equation 5. data

2. 1. number 2. students 3. week 4. days 5. variables 6. dependent 7. OTLE © te aho o t e k ur a p o un a m u

MS3081

31

acknowledgments Every effort has been made to acknowledge and contact copyright holders. Te Aho o Te Kura Pounamu apologises for any omissions and welcomes more accurate information. Data sets: All from Stage 1 team, Department of Statistics, The University of Auckland except for the Hot Stuff Bakery data that is imaginary. Images: iStock: Cover: Generic chart – 18869948 Detective – 18432196 ECG graph (BW closeup) – 171684 Annual report – 6372295 Happy students attending class – 17757341 Student sleeping – 1739813 Bread – 17790922.

32

MS3081

© te ah o o te k u ra p ou n a mu

Student name:

Not yet attempted

Didnâ&#x20AC;&#x2122;t understand

Understood some

Investigate time series data with odd point moving means.

Investigate time series data with odd point moving means.

Contact your teacher if you want to talk about any of this work. Freephone 0800 65 99 88 teacher use only Please find attached letter Teacher comment

ÂŠ te ah o o te k u ra p ou n am u

Understood most

Very confident in my understanding

cover sheet â&#x20AC;&#x201C; ms3081 students â&#x20AC;&#x201C; place student address label below or write in your details. Full name ID no. Address (If changed)

authentication statement I certify that the assessment work is the original work of the student named above.

Signed

Signed

(Student)

(Supervisor)

for school use only assessment

www.tekura.school.nz

MS3081

Maths Statistics NCEA Level 3