You are on page 1of 23

UNIT 6 DATA MANAGEMENT MATH 421A 15 HOURS

Revised June 1, 00

100

UNIT 6: Data Management Previous Knowledge


With the implementation of APEF Mathematics at the Intermediate level, students should be able to: - Grade 7- distinguish between biased and unbiased sampling - select appropriate data collection methods - construct a histogram - read and make inferences for data displays - determine measures of central tendency - create and solve problems using the numerical definition of probability - identify all possible outcomes of two independent events - Grade 8- develop and apply the concept of randomness - construct and interpret box and whisker plots - determine the effect of variations in data on the mean, median and mode - Grade 9- determine probabilities involving dependent and independent events - determine theoretical probabilities of compound events

Overview:
- sampling techniques and Bias - measures of Central Tendency and 50% Box Plots - 90% Box Plots and Applications - Probability and Applications (Expected Values)

101

SCO: By the end of grade 10 students will be expected to:

F1 design and conduct experiments using statistical methods and scientific inquiry

F2 demonstrate an understanding of concerns and issues that pertain to the collection of data F12 draw inferences about a population/sample and any bias that can be identified

F14 demonstrate an understanding of how the size of a sample affects the variation in sample results

G5 develop an understanding of sampling variability

Elaborations - Instructional Strategies/Suggestions Sampling Techniques (8.1) Invite student groups to explore the following questions: If you want to know what percent of high school students on PEI know the capitals of the Canadian provinces, how would you do this and who would you ask? Would the results represent the views of the entire grade 10 population? Class discussion might touch on these topics: What does the term population mean? Is it reasonable to survey the entire population? If the response is no, then how do we select a representative sample to be surveyed? Concept of Bias should be introduced at this point. Bias is some influence that prevents the sample from being representative of the entire population. Challenge student groups to determine possible ways to select a biased sample.(ex. Sample selected could be only grade 12 Canadian Studies classes) Invite students to explore ways of selecting an unbiased sample. Students should read pp.365-367 in Math Power 10. Probability sampling < simple random < every member of the population has an equal chance of being selected. Ex. All students names are put in a hat and 30 are selected < systematic < every nth member of a population is selected Ex: If the school population is 630 and you want to select a sample of 30 students, 630 30 = 21. Therefore in an alphabetical student list select every 21st student. <stratified < the population is divided into groups, or strata, from which random samples are taken. Ex: School is divided into grades and you want 30 people. Randomly pick 10 people from each grade. <cluster < choose a random sample from one group within a population. Ex: School is subdivided by classes. A class is chosen randomly and all members are selected. Non-Probability sampling (not random) <convenience < no thought or effort has been put into selecting the sample. It is designed to be convenient for the sampler. Ex: Samplers survey their friends at the cafeteria table.

102

Worthwhile Tasks for Instruction and/or Assessment Sampling Techniques (8.1) Journal/Pencil/Paper A survey result indicates that .. most Canadians feel that the Senate is a waste of tax-payers money. What are some of the questions you should ask about this survey? ( who was surveyed- was it random across Canada? ; What age groups were surveyed? ; What socio-economic groups were surveyed?) Pencil/Paper Identify the population you would sample for an opinion on each topic: a) minimum driving age b) student parking spaces c) fees for athletic teams d) cafeteria food Pencil/Paper You intend to survey the school population to determine whether the students would attend another dance this month. Describe a sampling method for each sampling technique: a) systematic b) convenience c) simple random d) stratified Presentation Bring an example of a recent survey in a newspaper or magazine to class and discuss the validity of the survey. Was there bias in the survey question(s)? What sampling method do you think was used? Project Try to find out what company does the surveys during the election campaign and ask questions relating to bias and sampling methods.

Suggested Resources Sampling Techniques Mathpower 10 p.368 # 1,6,11,14,17, 21,24

103

SCO: By the end of grade 10 students will be expected to: F12 draw inferences about a population/sample and any bias that can be identified

Elaborations - Instructional Strategies/Suggestions Sampling Techniques (contd) (8.1) < Volunteers < members of a population choose to participate in a survey. Ex: Interested students volunteer to participate (mailin or phone-in surveys fall under this category) Various Types of Bias (8.2) < Selection (Sampling) Bias This is the type of bias created by faulty sample selection this generally would not happen in probability sampling procedures.
< Response Bias This bias is created by faulty question or survey construction. In other words the wording of the question influences the response. This can occur in all sampling techniques. Ex: In the question Is it really fair that young people are not allowed to drive until they are 16? the phrase really fair shows a bias in the question. < Non-Response Bias This bias is created when a large number of people do not complete a survey. Ex: Mail out questionnaires commonly have a poor response. People do not mail them back, therefore, a bias is created because inferences are made on sketchy results.

G2 design yes/no type questions

F4 construct various displays of data

Measures of Central Tendency Generate discussion to see what students current knowledge is on mean, median and mode. Mean ( Median < )< The arithmetic average. The middle number. Once the list is in ascending order, the median is the middle value. If there is an even number of values, the median is the average of the middle two. Half of the data is below the median and half of the data is above the median. Ex: 1, 3, 4, 4, 6, 7, 8, 8, 8, 9, 10 (Median = 7) Ex: 1, 2, 2, 3, 5, 5, 6, 6, 6, 7, 8, 9 (Median = 5.5) The most frequently occurring number Ex: In the first list above the mode = 8 and in the second list mode = 6. Suggested Resources

Mode <

Worthwhile Tasks for Instruction and/or Assessment

104

Various Types of Bias (8.2) Journal In a short paragraph describe in your own words the types of bias that can occur and give an example of each. Group Activity Study newspapers, magazines, TV commercials, etc. Find as many statements as possible that you feel are biased. Identify each one as a response, non-response, or selection bias. Project Contact a polling company and ask for copies of the questions used to survey political party popularity during the last election. Study the questions for any bias and determine the method of sampling. Measures of Central Tendency Pencil/paper(See p.112 for explanation on constructing box plots) Each student in the class picks a number from 1 to 10. Write the data from the entire class on the board and find the mean, median and mode. Draw a 50% box plot. Pencil/Paper/Estimation A random generator(TI-83) is used to generate 20 numbers from 1 to 100. Estimate the mean, median and mode from the data below. Calculate the mean, median and mode and relate these to your estimates. Draw a 50% box plot. 55 100 91 95 46 75 94 17 19 53 72 71 24 75 80 24 98 6 77 19 Pencil/Paper Listed below are the heights, in centimetres, of 35 competitors in an Olympics event. Examine the data to determine the spread (range) of the data, where the data was centred, and if any extreme heights existed. Construct a 50% box plot on the data below. 190 175 180 185 192 195 187 167 175 185 183 184 180 183 185 188 189 187 198 183 184 185 181 185 184 182 185 189 187 184 180 175 178 195 189

Various Types of Bias Mathpower 10 p.372 #1-13

Measures of Central Tendency

Note to teachers: To use the TI-83 as a random number generator. Math < PRB 5:randInt(

generates numbers from 1 to 100 in groups of 20.

105

SCO: By the end of grade 10 students will be expected to:

Elaborations - Instructional Strategies/Suggestions Measures of Central Tendency (contd) For Box Plots we must look at the data in quarters or quartiles. Q1 (first quartile) < the first quartile is the mid-value of the first half of the data (ie. up and not including the median). Ex: 1, 3, 4, 4, 6, 7, 8, 8, 8, 9, 10 (Q1 = 4)

F5 calculate various statistics using appropriate technology, analyze and interpret displays and describe the relationships

Q3 (third quartile)< the third quartile is the mid-value of the second half of the data (ie. after the median). Ex: 1, 3, 4, 4, 6, 7, 8, 8, 8, 9, 10 (Q3 = 8)

G4 interpret and report on the results obtained from surveys and polls, and from experiments

Once we have determined the Median and the quartiles we can then plot this data in a Box Plot. A box plot has 50% of the values inside the box and the left whisker represents the first quarter of the data and the right whisker represents the fourth quarter of the data. Ex: 1, 3, 4, 4, 6, 7, 8, 8, 8, 9, 10 Now we have Q1= 4, Median = 7, and Q3 = 8 For this example we will use a number line from 1 to 10 with a scale of 1.

In general, more valid inferences can be made when the measures of central tendency are all close together. The more they are dispersed the less valid the inferences.

106

Worthwhile Tasks for Instruction and/or Assessment Measures of Central Tendency Pencil/Paper The results of an experiment to determine the effect of temperature on the speed of sound in air consisted of taking nine measurements at 100 C and nine taken at 220 C. The data is displayed below.

Suggested Resources Measures of Central Tendency

a) draw a 50% box plot for each set of data b) What is the median speed at the lower temperature? At the higher temperature? c) Between what two speeds do 50% of the data lie for each plot. d) From your results, what do you think is the effect of an increase in temperature on the speed of sound in air? Pencil/Paper A survey of weekly television viewing time of 25 female and 26 male teenagers produced the following data.

a) Find the measures of central tendency (mean, median and mode) b) What type of sampling technique would you assume was used? c) What types of conclusions can you make about the survey?

107

SCO: By the end of grade 10 students will be expected to: F4 construct various displays of data

Elaborations - Instructional Strategies/Suggestions Measures of Central Tendency (contd) Re-doing the previous example using the TI-83: Stat 1:Edit clear all lists, then enter the data in L1 If the data must be arranged in ascending order press Stat 2:Sort A(L1) where A is ascending

F26 construct, interpret and apply 90% box plots F30 organize and display information in many different ways with and without technology

To graph a 50% box plot: 2nd Stat Plot 1:Plot 1 and having the following settings the 4th graph choice doesnt connect the outliers to the box while the 5th choice of graph does. Typically we will be using this 5th choice. It is a box and whiskers plot with outliers.

to graph set the appropriate window dimensions or press zoom 9:zoom stat

press trace and see the minimum, Q1, the dian, Q3 and the maximum

me

by cursoring across the box plot.

108

Worthwhile Tasks for Instruction and/or Assessment Measures of Central Tendency Pencil/Paper/Technology A teacher has the following results in percent in a class test. 76, 43, 56, 74, 96, 89, 55, 66, 49, 80, 85, 93, 95, 77, 96, 70, 98, 46, 78, 55, 76, 95, 95, 96, 52, 98, 73, 95, 81, 96, 59, 94, 44, 92, 96. Sort the data in ascending order. And draw a 50% box and whiskers plot. Solution Enter the data in the TI-83. Sort the data Stat 2:Sort A(L1). Graph the data on the TI-83. To see the graph, set the window dimensions by pressing zoom 9:stat

Suggested Resources Measures of Central Tendency

To see the mean, minimum, Q1, median, Q3 and the maximum press Stat < Calc 1:1-var Stats enter and scrolling down L Looking at the sorted data determine the mode. What inferences can be made from the graph? Half the class has a mark over the median 80, and 1/4 over Q3 95 . Because the median is 80 and we see a short upper whisker then a lot of the class is very high. The lower whisker is long which means that there are a few really low students dragging the mean down. Note to teacher: If the box is really short then the middle 50% have marks very close together. If the box is long then there is a large range of marks in the middle 50% of students. Communication/Journal. Make inferences about the following box plot.

The median is skewed around 85% with a short upper whisker and therefore a lot of marks there. The range of the upper half is very small thus the upper half of the class have marks very close together. The lower half have a greater range and thus a greater dispersion of marks. Marks in upper half are high because median is 85%.

109

SCO: By the end of grade 10 students will be expected to:

F26 construct, interpret and apply 90% box plots

Elaboration - Instructional Strategies/Suggestions 90% Box Plots Binomial Population < A population that has two possible outcomes. In other words, in response to a question the answer is either YES or NO. Ex: Toss of a coin Ex: Did you pass your test? Ex: Are you a band student? 90% Box Plots combine results of many small samples of the population. These box plots then allow us to make inferences on the population as a whole or backwards from population to sample. The box plots given are for sample sizes 20, 40, and 100. Ex: In a school of 1000 students, a sample of 20 students is surveyed. This procedure is repeated 100 times and each time the 20 students are randomly chosen. (Not necessarily the same students). This gives us the data to create a 90% Box Plot for sample size 20. In the above example, assume the population is known be 70% enrolled in the English Program and 30% in French Immersion. When conducting a survey (as explained in the above paragraph) the following data is obtained and placed in a frequency table.
# marked Frequency
8 1 9 2 10 6 11 2 12 14 13 11 14 21 15 17 16 15 17 8 18 2 19 1

In a 90% Box Plot, 10% of the values are contained in the two whiskers together. Out of 100 trials, 10% would be 10. In the table above we need to count frequencies from both ends until we are as close to 10 as possible. Working our way in from both sides, the closest we get to 10 is 12 which is obtained when using the first three columns on the left and the last two columns on the right. The rest of the values are contained in the box. Now would be a good time to show the students the entire 90% Box Plot for sample size 20 table and let them realize that all this work has generated only 1 of the box plots in this table. So instead of doing all this work from now on use the tables provided. In order to do the worthwhile tasks you will need to be able to read the box plot tables. Instructions are given in Addison-Wesley 10 text p. 548 and 556.

110

Worthwhile Tasks for Instruction and/or Assessment 90% Box Plots (population to sample, sample to population) Group Activity/Paper/Pencil Divide class into groups and have each group create a 90% box plot based on a different percent of marked items. Note to teachers: To generate the data using the TI-83 for a situation where 80% of the school population is enrolled in the English Program: Math < over to Prb 7:randBin( Random binomial)

Suggested Resources 90% Box Plots see worksheet at end of unit Activity Estimating the size of a wildlife population Math 10 p.560 Instructions for this activity are at the back of the unit. Math 10 p.561 # 1, 3-5

(Sample size, probability, number of samples). In this sample the sample size is 20, the probability is 80% and this is repeated 100 times 80% of 20 = 16 so we would expect out of every 20 people surveyed 16 would be in the English Program. This program generates 100 numbers with this restriction but taking into account the fact that there is some uncertainty in the sampling process. In the first 20 people you survey it might happen that most (or very few) of them are in the English Program so that you may not have exactly 16 out of 20 in the English Program. If enough groups of 20 students are surveyed the average should move closer to 16. For the following problems and those in the Suggested Resources use the Box Plot tables at the end of this unit. Pencil/Paper 20% of the school population take Canadian Studies. In a random sample of 20 students, what range of students might be taking Canadian Studies. Pencil/Paper If 34% of the student population regularly attends school dances, is it likely that a random sample of 40 students would contain 20 students who attend dances. Pencil/Paper In a random sample of 20 grade 10 students 7 said they have a drivers license. Make an inference about the percent of grade 10 students who have a drivers license. ( ex. Math 10 p.556)

Problem Solving Strategies Math Power 10 p.397 #1,3,6

111

SCO: By the end of grade 10 students will be expected to: G10 find probability given various conditions

Elaborations - Instructional Strategies/Suggestions Probability (p.374) A simple way of introducing students to the study of probability is to do an activity like the following: Each card has a letter written on it

if the cards were placed in a hat, what is the chance (or probability) that you will draw (assume that after each draw the cards are replaced): a) a vowel b) a consonant c) an E d) an X Now challenge the students to come up with a definition of probability. Probability < The ratio of the number of favourable outcomes to the total number of possible outcomes. P(outcome) is the probability of getting that outcome. For example, when rolling a die P(3) is the probability of rolling a 3 which equals . Using a deck of 52 cards a person draws a jack. a) What are the chances of drawing a second jack if the first jack has been replaced? ( ) This is an example an independent event. An independent event is when each event has an equal chance of occurring. b) What are the chances of drawing a second jack if the first jack was not replaced. ( ) This is an example of a dependent event. Expected Values (8.3) Have students play the game as described in example 2 p. 381. Students need to keep track of the number of rolls needed to win. The table included on p.110 at the end of this unit is to help students record their result with this activity. When students have completed the activity, record the number of rolls it took each student to win and then find the class mean (experimental solution). Now go through the solution to the example to calculate the expected value of each roll. Use the expected value to find the number of rolls expected to win (theoretical solution).

112

Worthwhile Tasks for Instruction and/or Assessment Probability (p.374) Pencil/Paper A jack is drawn from a deck of 52 cards. a) What is the probability of drawing a second jack from the deck if the first jack is replaced? b) What is the probability of drawing a second jack from the deck if the first jack is not replaced? Journal How do independent and dependent events differ? Expected Values (8.3) Pencil/Paper In a contest at a local coffee/donut store the prizes are as shown. What is the expected value for this contest?

Suggested Resources Probability Mathpower 10 Scrabble p.362#1, 2d,g, i-k Rock, Scissors, Paper all p.374 #1 do any three p.375 #3 a-c use chart p.381

Expected Values Mathpower 10 p.382 # 2-6,9,10,13 Math 10 p.575 # 1-6

= .94 If you spend more than $.94 at the store then you will spend more than you win on average. Pencil/Paper At the Old Home Week Exhibition there is a game of chance where you toss 2 coins. If both come up heads you will win $4. If only one comes up heads you will win $1. If neither comes up heads they pay you nothing. It costs $2 to play this game. Complete the table below to determine the expected value for this game. Should you play this game

Journal Design a game where you will raise money for the school council during the winter carnival. (Make sure you dont lose money for the school but still give participants a reasonable chance of winning.

113

# of rolls 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Sum

Points

Total

114

Estimating the size of a Wildlife Population(bi-nomial population:tagged or not tagged) To estimate the number of animals in a species, wildlife biologists use a capture - recapture sampling technique. To simulate this process popcorn can be used. Have approximately 100 kernels in each ziplock bag where 10 in each bag has been spray painted black. ( see Math 10 p.560 for detailed instructions)

Students do not know how many popcorn are in the bag that they have. Dont allow them to count them yet, that is done in step 8.
1. Place the unmarked popcorn(natural colour) in a styro-foam cup this represents the population at large 2. Count the number of marked popcorn (black) this is the number captured and released 3. Place the marked popcorn in the cup and mix the popcorn up. this represents the release of the captured into the wild where they mix with the rest of the population 4. Pick 40 popcorn from the cup(dont look - this is the random sample) this is the recapture 5. Count the number of marked popcorn this represents the number of marked items in the sample 6. Use the chart (sample size 40) to determine the percentage range of marked items in the population For example, if there were 6 marked popcorn kernels, then by using the table we would get a percentage range of 8% to 26%. 7. Use the steps below to estimate the size of the entire population (the total number of kernels in the bag) Using the 8% to 26% range. We know that 10 kernels are marked so the total population could range from 38 to 125. .08n = 10 .26n = 10 n = 125 n = 38 Therefore there is a 90% probability that there are between 38 and 125 popcorn (marked and unmarked) in your bag. 8. Count the total number of popcorn in your bag. Does your prediction fall in an acceptable range?

115

Construction of box plots If we look at 50% box plots then 50% of the data (values) are contained in the box and the remaining 50% are contained in the two whiskers combined. For our example, 50% of 100 trials is 50. In the frequency table below (from p.106) we must try to get the two whiskers adding to as close to 50 as possible (cant be less than 50) If we work inward from the outside columns in the table we see this development; Column
# marked Frequency

1
8 1

2
9 2

3
10 6

4
11 2

5
12 14

6
13 11

7
14 21

8
15 17

9
16 15

10 11
17 8 18 2

12
19 1

Combining the values of columns 1 and 12 we get a value = 2 Adding to the above total columns 2 and 11 we get = 6 Adding to the above total columns 3 and 10 we get = 20 Adding to the above total columns 4 and 9 we get = 37 Now as we approach 50 (the total we want) we will probably only be able to add one extra column at a time Adding to the above total column 5 we get = 51 If we had chosen to add to the above total column 8 we would have gotten = 54 So we can see that the best result comes from adding column 5 last to get a total of 51. Column
# marked Frequency

1
8 1

2
9 2

3
10 6

4
11 2

5
12 14

6
13 11

7
14 21

8
15 17

9
16 15

10
17 8

11 12
18 2 19 1

The same procedure of working from outside to inside is used for 90% box plots.

116

90% Box Plot Problems Population to Sample (given the % of population, find # possible in a sample) 1) 30% of students at Three Oaks take Physics. In a random sample of 20 students, estimate how many students could possibly be taking Physics. 2) At a certain school, 80% of the students take History. In a random sample of 40 students, estimate how many students might be taking History. 3)In the town of Montague 18% of people speak two languages. In a random sample of 100 residents, estimate how many people might speak two languages. 4) If 28% of 16 year-old people smoke, is it possible that a random sample of 40 people would contain 19 smokers? 5) The probability (chances) of correctly answering a true/false question is 50%. If you guess the answers, can you correctly guess 24 out of 40 questions correctly, 90% of the time? 6) The probability of guessing a multiple choice question (each question has 5 possible answers) is 20%. If you guess the answers, can you guess 15 out of 40 questions correctly 90% of the time? Sample to Population (given the # possible in a sample, find the % of the population) 1) 8 out of 40 randomly selected grade 10 students say that they have a part-time job. Make an inference about the percent of grade 10 students that have a part-time job. 2) In Westisle High School a survey showed that 12 out of 20 randomly selected students come from a farm home. Use the box-plots to estimate the percent of students in Westisle who come from a farm background. 3) Bluefield has 900 students. A survey showed that 26 out of 40 students were bussed to school: a) make an inference about the percent of students who go to school by bus. b) use the answer from (a) to estimate how many students are bussed. Project/Presentation 4) Design a one question yes/no survey about a topic of your choice. Conduct your survey with a random sample of 40 people. Use the results to make an inference about the percent of people who would answer yes on the survey question. Explain how you chose your random sample. Which method of sampling did you use? How were you able to eliminate bias in your question?

117

118

119

120

121

122