You are on page 1of 215
SHIN TAKAHASHI TREND-PRO, CO., LTD. TABLE OF CONTENTS PREFACE. vil OUR PROLOGUE: STATISTICS WITH HEART-POUNDING EXCITEMENT » a 1 DETERMINING DATA TYPES 13 1. Categorical Data and Numerical Data 14 2. An Example of Tricky Categorical Data 20 3. How Multiple-Choice Answers Are Handled in Practice 28 Exercise and Answer 29 Summary 29 z GETTING THE BIG PICTURE: UNDERSTANDING NUMERICAL DATA 31 1. Frequency Distribution Tables and Histograms 32 2. Mean (Average) 40 3. Median, 4h 4, Standard Deviation 48 5. The Range of Class of a Frequency Table. 54 6. Estimation Theory and Descriptive Statistics 57 Exercise and Answer. 57 Summary 58 E GETTING THE BIG PICTURE: UNDERSTANDING CATEGORICAL DATA 59 1. Cross Tabulations 60 Exercise and Answer. 64 ‘Summary 64 4 STANDARD SCORE AND DEVIATION SCORE 65 1. Normalization and Standard Score 66 2. Characteristics of Standard Score 3 3. Deviation Score 74 4, Interpretation of Deviation Score 76 Exercise and Answer, B ‘Summary 80 5 LET'S OBTAIN THE PROBABILITY 81 1. Probability Density Function 82 2. Normal Distribution 86 3. Standard Normal Distribution Example | Example Il 4, Chi-Square Distribution 5. t Distribution, 6. F Distribution : 7. Distributions and Excel. Exercise and Answer. Summary 6 LET'S LOOK AT THE RELATIONSHIP BETWEEN TWO VARIABLES: 1. Correlation Coefficient 2. Correlation Ratio 3. Cramer's Coefficient Exercise and Answer. Summary 7: LET'S EXPLORE THE HYPOTHESIS TESTS 1. Hypothesis Tests. 2. The Chi-Square Test of Independence Explanation Exercise Thinking It Over Answer 3. Null Hypotheses and Alternative Hypotheses 4, Pevalue and Procedure for Hypothesis Tests 5. Tests of Independence and Tests of Homogeneity Example Procedure 6. Hypothesis Test Conclusions. Exercise and Answer. Summary .. APPENDIX, LET'S CALCULATE USING EXCEL 1. Making a Frequency Table 2. Calculating Arithmetic Mean, Median, and Standard Deviation 3. Making a Cross Tabulation 4, Calculating the Standard Score and the Deviation Score, 5, Calculating the Probability of the Standard Normal Distribution 6. Calculating the Point on the Horizontal Axis of the Chi-Square Distribution. 7. Calculating the Correlation Coefficient 8 Performing Tests of Independence INDEX Vi TABLE OF CONTENTS 89 5 7 9 106 106 107 108 109 111 116 121 127 138 142 143 144 154 152 157 158 160 170 175 184 184 185 187 188 189 191 192 195 197 199 204 205 206 208 213 PREFACE This is an introductory book on statistics. The intended readers are: + Those who need to conduct data analysis for research or business * Those who do not necessarily need to conduct data analysis now but are interested in getting an idea of what the world of statistics is like * Those who have already acquired general knowledge of statistics and want to learn more Statistics is one of the areas of mathematics most closely related to everyday life and business. Familiarizing yourself with statistics may come in handy in situations like: * Estimating how many servings of fried noodles you can sell at a food stand you are planning to set up during a school festival * _ Estimating whether you will be able to pass a certification exam * — Comparing the probability that a sick person will get better between a case in which medicine X is used and a case in which it is not used This book consists of seven chapters. Basically, each chapter is organized in the follow- ing sections: + Cartoon * Text explanation to supplement the cartoon + Exercise and answer + Summary You can learn a lot by just reading the cartoon section, but deeper understanding and knowledge will be acquired if you read the other sections as well, | would be very pleased if you start feeling that statistics is fun and useful after reading this book. | would like to thank the staff in the development department of Ohmsha, Ltd., who offered me the opportunity to write this book. | would also like to thank TREND-PRO, Co,, Ltd. for making my manuscript into a cartoon, the scenario writer, re_akino, and the illustrator, Iroha Inoue. Last but not least, | would like to thank Dr. Sakaori Fumitake of the College of Social Relations at Rikkyo University. He provided me with invaluable advice while | was preparing the manuscript for this book. ‘SHIN TAKAHASHI OUR PROLOGUE: STATISTICS WITH HEART-POUNDING EXCITEMENT ie EF WELL, WELL. Are) WELCOME TO 4 OUR HOME SWEET fs ir a 2 OUR PROLOGUE PLEASE TAKE WK u fa TH... tt iou I’M HOME, RUI! x SAY HELLO 70 eae aie eee MR. IGARASHI. HE 2 pace WORKS FOR ME. | \ YOU FLATTER ME! R i; aN a MR. IGARASHI, WHAT KIND OF WORK DO YOU DOP WELL, AS I WORK FOR THE SAME COMPANY AS YOUR FATHER... @ i \y STATISTICS WITH HEART-POUNDING EXCITEMENT @ 3 TO &XPLAIN IT FULLY, I DO MARKET RESEARCH USING STATISTICS...BUT 1 GUESS MARKETING |S AN UNFAMILIAR WORD FOR A HIGH SCHOOL GIRL LIKE YOU. YOU ARE HONEST! DO YOU KNOW WHAT STATISTICS 15, DID I MAKE IT TOO DIFFICULT? 4 OUR PROLOGUE SORRY, I HAVE NEVER HEARD OF IT. MAYBE YOU DON’T KNOW THAT EITHER. ROUGHLY SPEAKING, STATISTICS IS A STUDY THAT ESTIMATES THE STATUS OF A POPULATION BY USING INFORMATION GAINED FROM SAMPLES. LOOK AT TODAY'S PAPER. IT SAYS THAT “ACCORDING TO A CHOMAI TIMES SURVEY, THE CABINET APPROVAL RATING IS 39%.” THAT'S MY POINT. THAT'S WHERE HUA Nett Hee Of: STATISTICS COMES IN. YOU WERE SURVEYED, BUT THE CABINET APPROVAL RATING |S IN THE PAPER. THAT'S WEIRD. YOU TWO BOTH HAVE THE RIGHT TO VOTE, DON’T YOU? IVE NEVER BEEN CONTACTED BY THE CHOMA! TIMES ABOUT THE CABINET. HOW ABOUT YOU, MR. TAKATSUP A ) RUI, DO YOU KNOW HOW MANY VOTERS THERE ARE IN JAPAN? \¥ STATISTICS WITH HEART-POUNDING EXCITEMENT » 5 HOWEVER, IT IS UNREALISTIC TO SURVEY RIGHT. YOU CAN GET = EVERYONE. THERE ARE THE PRECISE CABINET TCO MANY. PEORLE: APPROVAL RATING IF - YOU SURVEY EVERY SINGLE VOTER. THAT IS WHY ONLY A LIMITED NUMBER OF PEOPLE ARE SURVEYED. DAD 1S TORMENTING ME By TALKING ABOUT HARD SEE, RUIP THE REAL GROUP THAT SHOULD BE SURVEYED IS CALLED A POPULATION. A GROUP MADE OF SAMPLES SELECTED TOC FROM THE POPULATION IS CALLED A SAMPLE. THOSE ARE STATISTICAL { TERMS. any 2 6 OUR PROLOGUE WHAT HE IS SAYING IS...IN THE CASE IT SAYS THAT THE SURVEY WAS OF THE APPROVAL RATING OF CONDUCTED WITH 2,000 PEOPLE, SO THE CABINET, THE POPULATION |S IN THIS CASE, THE SAMPLE IS THOSE ALL VOTERS. 2,000 PEOPLE. SAMPLING HOW CAN I GET AN IDEA IF POSSIBLE, I WANT BUT THAT IS TECHNICALLY OF THE POPULATIONS: TO EXAMINE THE IMPOSSIBLE. WHAT AM I STATUS? IT DOES NOT POPULATION... ‘GOING TO DOP HAVE TO BE STRICTLY PRECISE, BUT IT HAD BETTER BE AS ACCURATE AS POSSIBLE. THAT 15 100) DIFFICULT! SST TIL BME ONLY af\ | SaUARE METER vA THAT'S WHERE WELL, MAYBE STATISTICS NEXT TIME, COMES IN PLEASE! PLEASE TELL ME MORE. \¥ STATISTICS WITH HEART-POUNDING EXCITEMENT 7 I HAVE TO THINK OF |] ee Ss WAY TO GET CLOSE TO THINKING OF MR. IGARASHL... HIM MAKES ME FEEL HAPPY. YES, THANKS: TO YOU. & OUR PROLOGUE HERE YoU ARE, DAD. DAD...COULD YOU HIRE A STATISTICS TUTOR FOR MEP I'LL HAVE A TUTOR TEACH YOU EVERY ‘SATURDAY. 90 I CAN LEARN MORE ABOUT YOUR JOB? y SN si 5 Pes YOU? INTERESTED IN MY JOBP THANK YOU, DAD. THE TUTOR COULD BE ONE OF YOUR WORKERS. (LIKE MR. IGARASHI...) THANKS FOR COMING. COME IN. \y STATISTICS WITH HEART-POUNDING EXCITEMENT » RUI! YOUR TEACHER IS. HERE. RUI, THIS IS MY Mi MAMORU YAMAMOTO. DAD...ISN’T MR. IGARASHI IGARASHIP?: MAMORU LIVES a ‘RTO OUR HOUSE. HE 1S GOOD AT TEACHING, TOO. \y STATISTICS WITH HEART-POUNDING EXCITEMENT @ 11 THIS IS A NIGHTMARE. SHALL WE BEGIN, RU! IP \ MR. IGARASHI, I d GREAT! WHY DON'T WORKED HARD ) J YOU WORK WITH ME. TO MASTER STATISTICS! DETERMINING DATA TYPES 1. CATEGORICAL DATA AND NUMERICAL DATA 50, MR. YAMAMOTO, WHAT MUST I LEARN FIRST? A SIMPLE EXAMPLE WILL BE BETTER FOR A BEGINNER... OH! YOU HAVE THE WHOLE SERIES OF MELON HIGH SCHOOL STORY! YES, IT'S MY FAVORITE comic. I CONFESS, I KIND OF LIKE THIS COMIC TOO. BUT WHAT DOES IT HAVE TO DO WITH STATISTICS? AHA, FOUND IT! Melon High School Story Vol. 5 Reader Questionnaire (01. What is your impression of Melon High School Story Vol. 5? 1.Very fun 2. Rather fun 3. Average 4, Rather boring 5. Very boring 02, Sex 1. Female 2. Male 03. Age years old (4. How many comics do you purchase per month? titles A Rina keychain will be given away to 30 lucky winners among those who send back this ‘questionnaire! SNe OLR Meu) Ta Ce Re aan Tu at et DETERMINING DATA TYPES 15 VY WHAT ABOUT THIS QUESTIONNAIRE? YOU MIGHT BE ABLE TO GET A RINA KEYCHAIN! (CIF YOU ARE LUCKY ENOUGH...) . D WR NOW, BACK TO STUDYING... NES com) JUST ANSWER THAT [NOP gy _ y QUESTIONNAIRE. OR Tos ‘\ QUESTIONNAIRE RESULTS ai RESPONDENT YOUR IMPRESSION OF MELON HIGH SCHOOL STORY nN = VERY FUN RATHER FUN AVERAGE RATHER BORING RATHER FUN VERY BORING VERY FUN RATHER FUN AVERAGE A B G D & F 6 H I AVERAGE SUPPOSE THE RESULTS OF THE QUESTIONNAIRE LOOKED LIKE THIS. DETERMINING DATA TYPES 17 FOR EXAMPLE, THE ANSWERS TO THE QUESTIONNAIRE CAN BE CATEGORIZED LIKE THIS. Yelm (pmol) B a A) ) CET Camelia (1. What is your impression of Melon High School Story Vol. 5? itt CANNOT BE °°" MEASURED 17 years old 04. ANE MEASURED, titles Rina keychain will be given away to 30 lucky winners among those who send back this ‘questionnaire! Se RC te Lan a Ck CN Ra tara era DATA THAT CANNOT BE MEASURED IS CALLED CATEGORICAL DATA, AND DATA THAT CAN BE MEASURED |S CALLED NUMERICAL DATA* * CATEGORICAL DATA IS ALGO SOMETIMES CALLED QUALITATIVE, AND NUMERICAL DATA |S SOMETIMES CALLED QUANTITATIVE. DETERMINING DATA TYPES 1 I UNDERSTAND WHY THE FIRST YOU FEEL THAT WAY. QUESTION DOES . NOT LOOK LIKE CATEGORICAL 1. What is your impr ‘Melon High School Ste fery fun A Rather fun THOUGH IT DOES NOT LOOK LIKE CATEGORICAL DATA, THIS IS INDEED DATA THAT CANNOT BE MEASURED. , nya (Np oll Ra ra’ > q a =e a a DATA CANNOT BE EQUALLY ‘SEPARATED. 20 CHAPTER 1 NOW, TELL ME YOUR...UMM. THEN, COULD YOU TELL ME YOUR HEIGHT? |WHEN YOU MEASURE HT, YOU DETERMINING DATA TYPES 21 SO THE GRADUATION ABOVE 151cm IS 152, AND IT EDS: TO 153, 154, AND SO ON, IN EQUAL INTERVALS. IF THE INTERVALS BETWEEN ACH GRADUATION ARE EQUAL... THIS IMPLIES THAT "HEIGHT” CAN BE \EASURED AND THUS IS A NUMERICAL DATA TYPE. LET ME FIND SOME = INFORMATION ON. = THE WEB... i. = =LON HIGH ==, SCHOOL STORY" a igh |WALLPAPERP! = Se = I MEAN, THE I BELIEVE YOU HAVE PASSED GRADE N SO-CALLED PRE-2 OF THE TEST IN PRACTICAL y STEP TEST. ENGLISH PROFICIENCY BY THE SOCIETY FOR TESTING ENGLISH PROFICIENCY. DETERMINING DATA TYPES 23 GUESS WHICH TYPE OF DATA THE GRADES OF THE STEP TEST ARE. THE STEP TEST GRADES: Grade Requirements Grade 1 Advanced university graduate level. vocabulary 10,000-15,000 words Grade 2 High school graduate level, vocabulary 5,100 words Grade 3 Junior high school graduate level, vocabulary 2.100 words Grade 4 Intermediate junior high school level, vocabulary 1,300 words Grade 5 Beginner junior high school level, vocabulary 600 words (from the Society for Testi sh Proficiency htt: swown.eken.or ip LOOK AT THE DIFFICULTY OF THE STEP TEST GRADES. THERE ARE BIG DIFFERENCES IN THE REQUIRED VOCABULARY BETWEEN EACH GRADE. RIGHT. BUT IN ADDITION, VOCABULARY IS NOT THE ONLY DIFFERENCE BETWEEN GRADES. THERE ARE OTHER ASPECTS. 24 CHAPTER STEP TEST GRADES ARE IMMEASURABLE DATA. IN OTHER THUS, THE INTERVAL BETWEEN EACH WORDS, THEY ARE GRADE |S NOT THE CATEGORICAL DATA. SAME. ery Bye Te EAT aey Al eee NOW, YOU SHOULD BE ABLE TO cae ANSWER TH see mw 4 Rather torn 5. Very bonng 2 Sex DFemate 03. Age DO THE ANSWERS TO Ql HAVE EQUAL INTERVALS? 6, How many comies Burehase ber monn? 7 CATEGORICAL THAT DEPENDS ON BACH PERSON'S TASTE... DETERMINING DATA TYPES 25 90, LET ME GIVE YOU A Quiz. PRINT RUN OF MELON HIGH VERY GOOD! NOW LET'S END TODAY'S LESSON. WOULD YOU MIND IF... I ANSWER THIS. QUESTIONNAIRE? TLL BE ABLE TO WORK WITH HIM SOON. WAIT FOR ME, MR. IGARASHI! STATISTICS IS MORE FUN THAN I THOUGHT. DETERMINING DATA TYPES 27 3. HOW MULTIPLE-CHOICE ANSWERS ARE HANDLED IN PRACTICE ‘As mentioned on page 25, the multiple-choice answers for the first question of the readers’ questionnaire are categorical data. However, in practice, it is possible to handle such data as numerical data when processing consumer questionnaires and so on. Some examples are below. Very fun = 5 points Rather fun = points Average @ — Bpoints Rather boring = 2 points Very boring 2 Apoint Very fun = 2 points Rather fun = 1 point Average = Opoints Rather boring > ~— 1 points Very boring = -2 points The same data is handled differently in theory and in practice. Keep in mind that data may be categorized differently in different situations. 28 CHAPTER EXERCISE Determine whether the data in the following table is categorical data or numerical data. Respondent Blood Opinion on Comfortable air 100m track type sports drink X conditioning race record temperature (°C) _ (seconds) Mr/Ms. A Not good 25 141 Mr/Ms, B Good 24 12.2 Mr/Ms. Good 25 17.0 c Mr/Ms. D. Average 27 15.6 E Mr/Ms, Not good 24 18.4 ANSWER Blood type and opinion on sports drink X are examples of categorical data. Comfortable air conditioning temperature and 100m track race record are examples of numerical data. SUMMARY * Data is classified as categorical data or numerical data, + Some data, such as “very fun’ or “very boring,” is theoretically categorical data. However, in practice, itis possible to treat it as numerical data. DETERMINING DATA TYPES 29 GETTING THE BIG PICTURE: UNDERSTANDING NUMERICAL DATA 1 WAS LOOKING AT THIS MAGAZINE TO CHOOSE WHICH RESTAURANT TO 7 4 WHAT ARE YOU READING? YOU LIKE RAMEN? YOU START THE LESSON SO ‘SUDDENLY. WHAT DO YOU T THINK ABOUT THIS TABLE? ee ela. ! e WELL, ONE THING I CAN SEE IS THAT THE PRICES VARY WIDELY. BUT THIS TABLE IS JUST A BUNCH OF NUMBERS, IT DOESN'T MEAN HOW CAN WE MAKE THIS TABLE MORE TO MAKE A GRAPH, WE MEANINGFUL? t FIRST MUST MAKE PRICE SS THERE IS A HUGE SHOPPING MALL CONSISTING OF 5O RAMEN SHOPS...AND ONLY RAMEN SHOPS. SOMEHOW RUIHAS TURNED INTO AN ELEVATOR GIRL? ae EACH SHOP SERVES oor ict aeen ONLY ONE TYPE OF EQUAL OR GREATER/ RAMEN, tess THAN AND THE SHOPS ARE LOCATED ON DIFFERENT FLOORS ACCORDING TO THE PRICE OF THEIR RAMEN. oF ¥G00-1000 SUCH A GROUP DIVISION IS CALLED A CLASS IN STATISTICAL TERMS. GETTING THE BIG PICTURE: UNDERSTANDING NUMERICAL DATA 35 ON EACH FLOOR, THERE IS A SIGN INDICATING THE MIDDLE PRICE OF EACH CLASS. ]& FLOOR GUIDE SHOP NAME THE SECOND FLOOR IS CLASS 600-700 YEN, SO THE DISPLAY SAYS A SINCE THIS SHOPPING MALL PLACES EACH SHOP ON A DIFFERENT FLOOR ACCORDING TO PRICES, THE NUMBER OF SHOPS ON EACH FLOOR VARIES. 4.ON THE FIRST FLOOR, 13 ON THE SECOND FLOOR... THE NUMBER OF SHOPS ON EACH FLOOR IS CALLED FREQUENCY. 36 CHAPTER 2 Fi THIS 1S CALLED N Nn A class \o/ MIDPOINT. ~ Va oS Sd peas OY a oe! eee] ef sarc MOST SHOPS IS THE THIRD FLOOR. THERE ARE 18 SHOPS. NOW, TRY CALCULATING A / tHe eeLanve FReQuency @ Y OF SHOPS ON THE THIRD FLOOR. SOMETHING SIMILAR TO. PERCENTAGE. PERCENTAGE 1S FAMILIAR TO YOU, ISN'T ITP WHATS THAT? IT'S THE RATIO AGAINST THE TOTAL WHEN THE TOTAL 1S CONSIDERED 1. THE NUMBER OF VALUES INCLUDED IN A CLASS RELATIVE _ FREQUENCY THE TOTAL NUMBER OF VALUES: THERE ARE 18 SHOPS ON THE THIRD FLOOR, AND VERY GOOD! THE RELATIVE FREQUENCY THERE ARE 50 SHOPS IN OF RAMEN SHOPS IN CLASS 700-800 THIS MALL...5O... “] | YEN-IN OTHER WORDS, SHOPS WITH CLASS MIDPOINT 750 YEN-IS 0.36. YOU GET THE PERCENTAGE BY MULTIPLYING BY 100, 50 THIS IS EQUAL TO 36%. OH, NO! THIS IS GETTING MATHEMATICAL... GETTING THE BIG PICTURE: UNDERSTANDING NUMERICAL DATA 37 50 BEST RAMEN SHOPS FREQUENCY TABLE ARE YOU FOLLOWING CLASS EQUAL OR GREATER/ LESS THAN) RELATIVE FREQUENCY TO SUMMARIZE WHAT I HAVE EXPLAINED UP TO THIS POINT, IT (wen MAYBE THIS SEEMS SIGH...THIS DIFFICULT BECAUSE 1S ALL MATH THERE ARE TOO MANY AFTER ALL. NUMBERS. IT MAY BECOME EASIER IF WE USE A GRAPH. A BAR CHART CALLED A HISTOGRAM... HISTOGRAMS BASED ON 50 BEST RAMEN SHOPS FREQUENCY TABLE OUR HORIZONTAL VERTICAL AXIS SHOWS AXIS THE VARIABLES— HISTOGRAM (VERTICAL AXIS IS FREQUENCY) SHOWS THE IN THIS CASE, THE FREQUENCY IN THE FIRST HISTOGRAM PRICE OF RAMEN. THE WIDTH OF EACH BAR IS THE AND THE RELATIVE RANGE OF THE 5 FREQUENCY IN CLASS. 2 THE SECOND oh) 750 850 960 HISTOGRAM. THE CENTER Pe u OF EACH BAR HISTOGRAM 15 THE CLASS (VERTICAL AXIS IS RELATIVE FREQUENCY) MIDPOINT. omy 550 650 =—'750 TO "FEEL LIKE” I FEEL GRASPING IS LIKE I AM IMPORTANT. THE SORT OF FREQUENCY TABLE AND HISTOGRAM BEGINNING TO... EXIST TO GIVE YOU A BETTER SENSE OF ALL THE DATA. GRASP THE OVERALL IMAGE OF RAMEN, PRICES. 2. MEAN (AVERAGE) THE OTHER DAy, I WENT BOWLING WITH ALL THE GIRLS IN MY HOMEROOM CLASS. WHAT?! I'LL KNOCK YOU ALL THE GIRLS Nl OVER, YOU PINHEAD/ IN YOUR HOMEROOM CLASS...THAT MUST BE ALOT. WELL, THERE WERE 18 OF US, S50 WE FORMED 3 TEAMS: OF 6, AND PLAYED E. RESULTS OF BOWLING TOURNAMENT TEAM A TEAM B TEAM C RUI-RUI KIMIKO SHINOBU JUN MEGUMI YUKA YUM! YosHIM ‘SAKURA SHIZUKA Mel KANAKO OH! THIS 15 GOOD TouKo KAORI KUMIKO MATERIAL FOR TODAY'S CLASS. YUKIKO HIRONO- 15 THIS YOUP “RUI-RUI” ROUGHLY SPEAKING, 1S YOUR NICKNAME? YOUR SCORE SEEMS: TO BE AROUND THE AVERAGE FOR YOUR YES, THEY CALL ME "RUI-RUI,” AND I SCORED 86, TLL HAVE TO THINK ‘ABOUT THAT. DO YOU UNDERSTAND IF I AM ABOVE OF COURSE 1 DO. THAT THE AVERAGE IS THE AVERAGE AVERAGE, YOU THE SCORE WE CAN IS THE MIDDLE HAVE TO BUY WHY DON’T WE EXPECT A PERSON SCORE OF A ME APIECE OF{ TRY CALCULATING ON YOUR TEAM TO CAKE. RECEIVE? TEAM. PLAYED BETWI TEAMS, I GUESS YOU COMPARED THE SUM OF THE SCORES OF EACH TEAM. TEAM A YOU GET THE AVERAGE BY 8b*73+(24 +/1/+90+38_ 522. oy DIVIDING THE SUM OF THE 6 6 SCORES BY THE NUMBER OF TEAM B B447I+ 103*85+90489 522 | b Base TEAM C 229+77+59+95+70488 618 ——— ale THUS, YOUR TEAM'S AVERAGE |S 87. INSTEAD OF BUYING YOU NO RIGHT CAKE, I'LL GIVE YOU A TO CALL ME RUI-RUI. THE AVERAGE IS CALLED THE MEAN IN STATISTICS. AND THE MEAN WE WERE TALKING ABOUT JUST NOW IS SOMETHING CALLED THE ARITHMETIC MEAN, TO BE PRECISE. TOO BAD THIS TIP 1S NOT AS DELICIOUS AS CAKE. THERE ARE OTHER TYPES OF MEAN, SUCH AS GEOMETRIC MEAN AND HARMONIC MEAN. YOU DO NOT HAVE TO LEARN THE FORMULAS: NOW, BUT I SUGGEST YOU REMEMBER THESE NAMES. GETTING THE BIG PICTURE: UNDERSTANDING NUMERICAL DATA 43 3. MEDIAN RESULTS OF BOWLING TOURNAMENT TEAM A TEAM TEAM C PLAYER] SCORE “PLAYER SCORE rurRu | 86 KIMIKO gn | 93 meouM yum | (24 — YOSHIML enizuKa} 01 MEI Touxo | 90 KAORI Kacve | 38 YUKIKO HIRONO IN CASES LIKE THIS, I AGREE. THE AVERAGE WHEN THERE IS A VALUE IS ABOVE 100...BUT THAT IS EXTREMELY 5 PEOPLE SCORED LARGE OR SMALL, BELOW 100. 44 CHAPTER 2 HERE, I DON'T THINK YOU CAN REALLY ‘SAY THAT THE AVERAGE IS "ROUGHLY THE SCORE OF EACH PERSON.” IT 1S MORE APPROPRIATE TO USE THE MEDIAN INSTEAD OF THE MEAN. THE MEDIAN |S e FIRST, WE SORT THE THE VALUE THAT SCORES OF EACH COMES IN THE TEAM BY THEIR SIZES. MIDDLE WHEN YOU PUT THE VALUES: IN ORDER FROM SMALLEST TO NUMBER OF VALUES = ODD ~10416 -39.0 (-87) 604 773) || “tHemcomor MEDIAN NUMBER OF VALUES = EVEN [-o4 382 Gre 422) 46) 90.3 —_ MEDIAN IS THE AVERAGE OF THESE TWO IF THE NUMBER OF VALUES IF THE NUMBER OF IS EVEN, AS IN THE CASE VALUES IS ODD, THE OF THIS BOWLING GAME, SCORE THAT IS IN THE THE AVERAGE OF THE TWO MIDDLE IS THE MEDIAN. VALUES IN THE MIDDLE |S THE MEDIAN. GETTING THE BIG PICTURE: UNDERSTANDING NUMERICAL DATA 45 ONE MORE TIP TRUE, YOU CAN'T EAT A TIP, BUT IN REGARD TO THIS IS AS GOOD AS CAKE... ARE YOU SAVING ANY MONEY? THEN YOU MUST WONDER WHY THE “AVERAGE SAVINGS” REPORTED IN NEWSPAPERS AND ON TV NEWS IS SO HIGH. YES, I DO. MY SAVINGS DOESN'T COME CLOSE, AND EVEN MY DAD DOES NOT SEEM TO BE THAT RICH. 46 CHAPTER 2 THE AVERAGE IS HIGH } BECAUSE OF SOME WS MILLIONAIRES. YOU NEED NOT BE DISAPPOINTED BECAUSE YOUR SAVINGS IS — MUCH LESS THAN THE Oi AVERAGE. IN SUCH CASES, THE MEDIAN 1S MUCH CLOSER TO I MUST MARRY A RICH GUY WHOSE SDNINGS IS WAY HIGHER THAN THE MEDIAN! GETTING THE BIG PICTURE: UNDERSTANDING NUMERICAL DATA 47 4. STANDARD DEVIATION THIS TIME, LOOK AT NUMBER LINE. RUI-RUI TEAM A AND TEAM B. WRITE DOWN THE NAMES OF THE PLAYERS ACCORDING TO awn | \|touKo| KAORI YUKIKO| MEI KIMIKO| MEGUMI YOSHIMI 48 CHAPTER 2 bo THEIR SCORES. EVEN THOUGH THE AVERAGE SCORE OF EACH TEAM WAS 87... THE TRENDS DESCRIBED BY THE NUMBER LINES ARE QUITE DIFFERENT. THEY SURE ARE. TEAM A'S SCORES: VARY FROM LOW TO HIGH, BUT TEAM BS SCORES ARE MORE SIMILAR. IN SHORT, STANDARD DEVIATION IS AN INDICATOR TO SHOW THE DIFFERENCE FROM THE MEAN OF ACH VALUE IN THAT ‘SET. THE MINIMUM STANDARD DEVIATION IS ZERO, AND AS THE "SCATTERING OF DATA” INCREASES, SO DOES THE STANDARD DEVIATION. © (MINIMUM) a NO SCATTERING == ——> (ALL VALUES ARE THE SAME) Pe MTeRe STANDARD DEVIATION IS USED TO DESCRIBE SUCH SCATTERING OF DATA. INDICATOR TO SHOW THE DIFFERENCE...? GUESS WHICH STANDARD: DEVIATION IS LARGER, TEAM A'S OR TEAM B'S? J GETTING THE BIG PICTURE: UNDERSTANDING NUMERICAL DATA 49 RIGHT. THE FORMULA IS AS FOLLOWS. IT’S EASY. YOU JUST PUT SOME NUMBERS INTO THE FORMULA. LET'S TRY IT TOGETHER. OKAY, TLL ONE ITA IT 1S SUDDENLY STARTING TO SOUND LIKE matHemarics. (Sy (86-8 17+(13-87) (104-8 ( 11-8 )%(90-81)4(38-27)" ae = | (COR 3774 24% 34 (49) 7 ¢ = S | {+ Be eipeeerF 240/ THEN TRY FIGURING (OUT THE STANDARD DEVIATION OF TEAM B BY YOURSELF. yy WO ae, (An aS I PUT THIS NUMBER HERE, AND... wis 6 He sauare | 8 3 94-3 RCSD oo ARS PCy =(89- = [EHCP 67+ C2)43*4 2? 6 = | RRER ABORT 6 CORRECT! STANDARD DEVIATION TEAM A= 27.5 TEAMB=Q.5 MEMBERS OF TEAM B HAD SCORES SIMILAR TO EACH OTHER. THUS THE STANDARD DEVIATION IS ‘SMALLER THAN TEAM A'S. GETTING THE BIG PICTURE: UNDERSTANDING NUMERICAL DATA 51 I TOLD YOU THAT THE FORMULA FOR STANDARD YOU SUBTRACT 1 DEVIATION IS: FROM THE TOTAL NUMBER OF //_5UM OF (EACH VALUE - MEAN}? valet . NUMBER OF VALUES THERE'S ALSO A DIFFERENT FORMULA, WHICH IS: es as /__SUM OF (EACH VALUE - MEAN)? NUMBER OF VALUES - 1 GENERALLY SPEAKING... THE FIRST FORMULA IS APPLIED WHEN CALCULATING THE STANDARD DEVIATION OF AN ENTIRE STATISTICAL POPULATION. THE SECOND FORMULA IS APPLIED WHEN CALCULATING THE STANDARD DEVIATION OF A SAMPLE. IS THE REAL GROUP YOU ACTUALLY WANT TO AND A SAMPLE IS A GROUP OF PEOPLE SELECTED FROM THE POPULATION. 52 CHAPTER Z 6 MEMBERS GOOD MEMORY! THERE |S NO PROBLEM WHEN YOU CAN GET THE DATA FOR ALL THE MEMBERS OF A GROUP, AS WE DID WITH YOUR BOWLING TEAM. BUT USUALLY THAT 1S IMPOSSIBLE. THUS, THE SECOND FORMULA IS- ACTUALLY USED MORE FREQUENTLY. THIS IS THE END OF TODAY'S LESSON. GETTING THE BIG PICTURE: UNDERSTANDING NUMERICAL DATA 53 5. THE RANGE OF CLASS OF A FREQUENCY TABLE If you felt that something was unclear in “Frequency Distribution Tables and Histograms” on age 32, take another look here at the table introduced on page 38, TABLE 2-1: 50 BEST RAMEN SHOPS FREQUENCY TABLE Class (equal or Class Frequency Relative greater/less than) _midpoint frequency 500-600 550 4 0.08 600-700 650 13 0.26 700-800 750 18 0.36 800-900 850 12 0.24 900-1000 950 3 0.06 Sum’ 50 1.00 ‘As you can see, the range of class in this table is 100. The range was not determined according to any kind of mathematical standard set the range subjectively. Determining the range of class is up to the person who is analyzing the data But shouldn't there be a way to set the range of class mathernatically? A frequency table may seem invalid if its range is determined subjectively. There is a way to figure out the range of class mathematically. This is explained on the following pages. You'll also find a sample calculation using the data in Table 2-1. 54 CHAPTER Z Step 1 Calculate the number of classes using the Sturges’ Rule below: (0g, (number of values) odio2 1og450 1+ —— 31+ 5.6438. = 6.6438... = logy92 Step 2 Calculate the range of class using the formula below: (the maximum value) - (the minimum value) the number of classes calculated from the Sturges’ Rule 980 - 500 480 = —— = 685714. = 69 7 Ze GETTING THE BIG PICTURE: INDERSTANDING NUMERICAL DATA 55 56 CHAPTER Z Below is a frequency chart organized according to the range of class as calculated by the formula in step 2. TABLE 2-2: 50 BEST RAMEN SHOPS FREQUENCY TABLE (RANGE OF CLASS DETERMINED MATHEMATICALLY) Class (equal or Class, Frequency Relative greater/less than) _ midpoint frequency 500-569 534.5 2 0.04 569-638 603.5 5 0.10 638-707 6725 15 030 707-776 741.5 6 12 776-845 8105 10 0.20 845-914 879.5 10 0.20 914-983 948.5 2 0.04 Sum 50 1.00 What do you think of this? Does this table seem even less convincing compared to Table 2-1? And why is the interval 69 yen? IF you try to explain to people that “this was calculated by a formula called the Sturges’ Rule.” they will only get mad and say, “Who cares about Stur... . whatever! Why did you set. the interval to a weird amount like 69 yen?” ‘To summarize, some people may hesitate to set the range of class subjectively. How- ever, as the table above indicates, determining the range of class with the Sturges’ Rule does not necessarily provide a convincing table. A frequency table is, after all, a tool to help you visualize data. The analyst should set the range of class to any amount he or she thinks is appropriate. 6. ESTIMATION THEORY AND DESCRIPTIVE STATISTICS In the prologue, we explain that statistics can make an estimate about the situation of the Population based on information collected from samples. To tell the truth, this explanation is not necessarily correct. Statistics can be roughly classified into two categories: estimation theory and descrip- tive statistics. The one introduced in the prologue is the former. What, then, is descriptive statistics? It is a kind of a statistics that aims to describe the status of a group simply and clearly by organizing data. Descriptive statistics regards the group as the population, Perhaps this explanation of descriptive statistics is abstract and difficult to understand. Here is an example to help clarify things. Remember when | figured out the mean and standard deviation of Rui’s bowling team? This was not because | was trying to estimate the status of a population from the information collected from Rui’s team. | calculated the mean and standard deviation purely because | wanted to describe the status of Rui’s team simply, That kind of statistics is descriptive statistics. Nae an Mac Weis EXERCISE The table below is a record of a high school girls’ 100m track race. Runner 100m track race (seconds) Ms.A 163 Ms.B 224 Ms.C 185 Ms.D 187 Ms.E 201 1. What is the average? 2. What is the median? 3. What is the standard deviation? GETTING THE BIG PICTURE: UNDERSTANDING NUMERICAL DATA 57 ANSWER, 1, The arithmetic mean is 16.3 + 22.4 + 18.5 + 18.7 + 201 5 2. Themedianis18.7. 163 185 201 3. The standard deviation is (16.3 - 19.2) + (22.4 - 19.2)? + (98.5 - 19.2) + (18.7 - 19.2) + (20. - 19.2 5 (-2.9)2 + 3.2% + (0.7)? + (-0.5)? + 0.92 5 8.41 +10.24 + 0.49 + 0.25 + 0.81 5 SUMMARY + To visualize the big picture of the data intuitively, create a frequency table or draw a histogram, + When making a frequency table, the range of class may be determined by the Sturges’ Rule. * To visualize the data mathematically, calculate the arithmetic mean, median, and stan- dard deviation + When there is an extremely large or small value in the data set. it is more appropriate to use the median than the arithmetic mean. + Standard deviation is an index to describe “the size of scattering” of the data 58 CHAPTER Z GETTING THE BIG PICTURE: UNDERSTANDING CATEGORICAL DATA 1. CROSS TABULATIONS YES! AT LEAST UP To TODAY. DO YOU REMEMBER THAT CATEGORICAL DATA IS A TYPE OF DATA YOU CANNOT MEASURE? S O A PLAID SAILOR : OUTFIT? 1] THAT'S RATHER il UNUSUAL. | \ \ \ \ es (IO THT \W\ \ A\. | iH \ HL | \ \ We CONDUCTED ay A SURVEY ON ‘ THE UNIFORM LOOK, THis 15 OUR —— DESIGN IN OUR E < S CLASS. DO YOU LIKE OR DISLIKE THE NEW UNIFORM DESIGN? RESPONSE. = HERE ARE THE RESPONSE RESPONSE RESULTS. LiKE NEITHER, LiKe NEITHER DISLIKE LiKe LiKe uke uke 10 uke uw LiKe 12 uke 13 NEITHER 14 uke ~~ OY RG RW HA 6 7 2 7 20) 2i 22| 23 24 23 26 RESULTS OF THIS SURVEY ARE CATEGORICAL NEITHER al NEITHER LiKe 32] Neer LiKe 33] Like LIKE | DISLIKE uke 35 Like Like 36 LiKe Like 37 Like DISLIKE 38 LIKE Nemer 39] NemHer Like 40 LiKe Like DISLIKE uke Like Like YOU CAN'T “MEASURE” LIKES AND DISLIKES, RIGHT? GETTING THE BIG PICTURE: UNDERSTANDING CATEGORICAL DATA 61 LET'S MAKE A TABLE TO GET THE BIG PICTURE OF ALL THE DATA. RESPONSE LIKE NEITHER: DISLIKE ICALL THIS AcKoss TABULATION. A BY THE WAY, WHAT WAS YOUR ANSWER TO. THis SURVEY ‘QUESTION? THERE WERE 28 PEOPLE WHO. ANSWERED “LIKE,” LET'S MAKE THIS INTO A GRAPH. IT SHOULD BECOME EASIER TO RAPHS. UNDERSTAND. s\ S k= xia THE GRAPH SHOWS THAT MORE THAN HALF OF THE STUDENTS ANSWERED THAT THEY “LIKE” IT, I GUESS THE DAM NOT DESIGN OF THIS UNIFORM WAS eee FAIRLY WELL RECEIVED. Ry PO WELL, IF I WERE ASKED, I WOULD ANSWER THAT I “LIKE” IT TOO. DON’T worry, WE WILL NEVER ASK YOU THAT QUESTION. GETTING THE BIG PICTURE: UNDERSTANDING CATEGORICAL DATA 63 ERCISE AND ANSWER EXERCISE A newspaper took a survey on political party A, which hopes to win the next election, The results are below. Respondent Do you expect party A to win or lose against party B? Lose Lose Lose I don't know Win Lose Win I don't know Lose 0 Lose Make a cross tabulation from these survey results. Below is the cross tabulation. Response Frequency % Win 2 20 Idont know 2 20 Lose SUMMARY 64 CHAPTER 3 One way to see the big picture of all the data is to make a cross tabulation, STANDARD SCORE AND DEVIATION SCORE TODAY'S LESSON IS IN A COFFEE SHOP. RUI'S FRIEND YUM! JOINS THE LASS. EXCUSE ME. AMI BOTHERING y 4 YOU TWO ui) F ” if NOTHING LIKE LOVEBIRDS? " THAT. SO, WHAT STANDARD IE BOTH SCORED 90, SHOULD I TEACH ae BUT ON DIFFERENT YOU TODAY? Hey has Tes é AN 8 TEST SIR! PLEASE TEACH US ABOUT STANDARD * ADJUSTING TEST RESULTS BASED ON STANDARD SCORE IS COMMONLY KNOWN AS GRADING ON A CURVE. 66 CHAPTER 4 BUT SOMEHOW, YUM GOT AHIGHER ADJUSTED SCORE (AND STANDARD SCORE) IN 1 GOT IN ENGLISH. THAT IS BECAUSE THE RANGE OF SCORES BETWEEN ENGLISH AND CLASSICAL JAPANESE. H L J K L M N 0 Pp STANDARD SCORE AND DEVIATION SCORE 67 CALCULATE THE MEAN OF EACH SUBJECT. ENGLISH 813 90 [00 [si BVEGACE CLASSICAL faved & , mreusi = 81.3 JAPANESE 74.3 9o 100 Siemtss = 14.3 COMPARE THIS SHOULD MAKE cACH score's OL OESTETNO Te DISTANCE FROM YOUR SCORE OF 90 THE AVERAGE. AND HER SCORE OF 90. THIS 15 SO e ’ TLL BUY A PIECE DISCOURAGING... \ OF CAKE FOR ACH OF YOU. AND My HISTORY SCORE AND HER BIOLOGY ScoRE BUT OUR ADJUSTED SCORES WERE DIFFERENT IN THIS CASE AS WELL. WHAT |S THE WELL, STANDARD STANDARD DEVIATION 16... DEVIATION OF THESE RANGE OF SUBJECTS? SCATTERING!" AVERAGE STANDARD SCORE AND DEVIATION SCORE 69 / SUM OF (EACH VALUE - MEAN} NUMBER OF VALUES THE SMALLER THE STANDARD DEVIATION IS, THE SMALLER THE “RANGE OF SCATTERING” OF THE DATA... SO, YOUR CLASSMATES HAD MORE SIMILAR SCORES IN BIOLOGY THAN IN HISTORY. IF I WERE A HIGH SCHOOL JUNIOR APPLYING FOR COLLEGE, I'D STUDY HARD FOR BIOLOGy. __.._ = @. 100 wHarpo =| gy YOU MEAN? oO ONE OR TWO POINTS: MAY AFFECT YOUR RANK GREATLY. AHIGH SCHOOL UNIFORM SUITS HIM 50 WELL sis rages. UNDERSTAND...OUR Z T IT WO! E SCORES WERE THE SO ANNOYING TO SAME, BUT YOUR 73 COMPARE SCORES. IN BIOLOGY was MORE VALUABLE... THAT IS WHY NORMALIZATION WAS INTRODUCED. IT 1S ALSO CALLED ‘STANDARDIZATION. NORMALIZATION? THAT MAKES IT IT IS A CALCULATION EASIER TO FIGURE USING THE DISTANCE OUT HOW MUCH A FROM THE MEAN AND SCORE IS WORTH. THE STANDARD DEVIATION OF THE DATA... ‘STANDARD SCORE AND DEVIATION SCORE 71

You might also like