# Chapter 1

Data and Statistics

1.1 Using Data to Answer Statistical Questions
1.2 Sample Versus Population
1.3 Using Calculators and Computers

What is Statistics To baseball fans, a ball player's statistics is the numbers on the back of their cards. These numbers may be the players' hits.

What is Statistics To business people, statistics may refers to sale charges or business expenses. These statistics can be in the form of numbers, or shown graphically using charts or diagrams.

What is Statistics Statistics is the art and science of designing studies, analyzing the data produced by these studies, translating data into knowledge and understanding of the world around us. Components of Statistics  Design: Planning how to obtain data  Description: Summarizing and analyzing X i the data obtained n  Inference: Making decisions and predictions • Estimation • Hypothesis Testing

Why is Statistics important?  Health study Does a low-carbohydrate diet result in significant weight loss?  Market analysis Are people more likely to stop at a Starbucks if they've seen a recent TV advertisement for their coffee?  Heart health Does regular aspirin intake reduce deaths from heart attacks?  Cancer research Are smokers more likely than non-smokers to develop lung cancer? To search for answers to these questions, we need … Statistics!

Example: Harvard Medical School study of Aspirin and Heart attacks Study participants were divided into two groups  Group 1: assigned to take aspirin  Group 2: assigned to take a placebo Results: The percentage of each group that had heart attacks during the study: • 0.9% for those taking aspirin • 1.7% for those taking placebo Can you conclude that it is beneficial for people to take aspirin regularly? Yes ?

Population & Sample Population: All subjects of interest Sample: Subset of the population form whom we have data Subjects: The entities (individuals, schools, rats, countries or days) that we measure in a study.  The information we gather with experiments and surveys is collectively called data. (the education level for each individual, the average class size for each school.…)

Important! Question: Why consider "Sample"? Sample Population Example: Preferred Car Color The purpose was to discover the typical color of cars that is preferred by Singapore residents. • We could define the population as Singapore adult residents OR… • The sample of residents we surveyed should be a representative of the population.

Descriptive Statistics and Inferential Statistics Descriptive Statistics refers to methods for summarizing the collected data. Summaries consist of graphs and numbers such as averages and percentages. Inferential statistics refers to methods of making decisions or predictions about a population based on data obtained from a sample of that population.

Example: Preferred Car Color Descriptive Statistics • About 42% residents preferred the color "silver" • Pie chart • Or… Inferential statistics • Singapore residents preferred blue cars. Correct? • Or…

Example: Descriptive Statistics Types of U.S. Households. Based on a Sample of 50,000 Households in the 2005 Current Population Survey.

Example: Inferential Statistics Suppose we'd like to know what people think about controls over the sales of handguns.  We can study results from a recent poll of 834 Florida residents.  In that poll, 54.0% of the sampled subjects said they favored controls over the sales of handguns.  We are 95% confident that the percentage of all adult Floridians favoring control over sales of handguns falls between 50.6% and 57.4%.

Class Problem #1 Inferential Statistics are used: 1. To predict the sample data we will get when we know the population. X 2. To describe whether a sample has more females or males. X 3. To reduce a data file to easily understood summaries. X 4. To make predictions about populations using sample data. √

Parameter & Statistic A parameter is a numerical summary of the population. A statistic is a numerical summary of a sample taken from the population.

Randomness and Variability Random sampling allows us to make powerful inferences about populations. • Measurements may vary from person to person. Randomness is also crucial to performing experiments well Measurements may vary from sample to sample, and just as people vary, so do samples vary. • Predictions will therefore be more accurate for larger samples.

Using Calculators and Computers Using (and Misusing) Statistics Software and Calculators  MINITAB and SPSS are two popular statistical software packages. Using Technology  The problem is that a computer will perform the statistical analysis you request whether or not its use is valid for the given situation. You, not technology, must select valid analyses.

Using Calculators and Computers Data files  Large sets of data are typically organized in a spreadsheet format known as a data file.  Each column contains measurements for a particular characteristic.  Each row contains measurements for a particular subject. Databases  An existing archive collection of data files.  Not all databases give reliable information. Before you give credence to such data, verify that the data are from a trustworthy source.