0% found this document useful (0 votes)

20 views32 pages

Summarizing Data Graphically and Numerically: Concepts in Statistics

The document outlines the steps of statistical investigation, including producing data, exploring it, and drawing conclusions. It distinguishes between categorical and quantitative data, describes methods for visualizing distributions such as dot plots and histograms, and explains measures of center and spread like mean, median, and standard deviation. Additionally, it discusses the identification of outliers and the use of boxplots for summarizing data distributions.

Uploaded by

serkan özel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views32 pages

Summarizing Data Graphically and Numerically: Concepts in Statistics

Uploaded by

serkan özel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Summarizing Data Graphically

and Numerically
Concepts in Statistics
Steps of Statistical Investigation
Begin with a research question, then
proceed with these steps:

1. Produce Data: Determine what to

measure, then collect data.

2. Explore the Data: Analyze and

summarize the data (also
called exploratory data analysis).

3. Draw a Conclusion: Use the data,

probability, and statistical inference
to draw a conclusion about the
population.
Categorical vs. Quantitative Data

Data consist of individuals and variables that give us information about

those individuals.

• An individual can be an object or a person

• A variable is an attribute, such as a measurement or a label

Example: Medical Records
In this example, the individuals are the patients. There are 6 variables in
this dataset:
• Mother’s age at delivery (years)
• Mother’s weight prior to pregnancy (pounds)
• Whether mother smoked during pregnancy (yes, no)
• Number of doctor visits during first trimester of pregnancy
• Mother’s race (Caucasian, African American, Asian, etc.)
• Baby’s birth weight (grams)
Variables

There are two types of variables: quantitative and categorical.

• Categorical variables take category or label values and place an

individual into one of several groups. Each observation can be placed in
only one category, and the categories are mutually exclusive.
• Medical records example: smoking is a categorical variable, with two groups
(smoker or nonsmoker). Gender and race are the two other categorical variables.

• Quantitative variables take numerical values and represent some kind

of measurement.
• Medical records example: age is a quantitative variable because it can take on
multiple numerical values, a person can be 18 years old or 80 years old. Weight
and height are also quantitative variables.
Distribution of Quantitative Data

In data analysis, the goal is to describe patterns in the data and create a
useful summary about a group. A table is not a useful way to view data
because patterns are hard to see in a table. For this reason, the first step in
data analysis is to create a graph of the distribution of the variable.

In a graph that summarizes the distribution of a variable, we can see

• the possible values of the variable.
• the number of individuals with each variable value or interval of values.
Distribution of Quantitative Data using a Dot Plot

When we describe patterns in data, we use descriptions of shape, center,

and spread. We also describe exceptions to the pattern. We call these
exceptions outliers.
Dot plots

• Individual variable values are visible, particularly when the data set is
small.
• Shape, center, and spread are not affected by how the dot plot is
constructed.
• The overall range can be calculated (largest value – smallest value).

• Above, each dot is a children’s cereal. The numbers on the horizontal axis
are the variable values. The vertical axis gives the count of cereals. We
can see that 10 children’s cereals have 2 grams of protein in a serving.
Distribution of quantitative data using a dot plot:
Shape
Shape: To describe the shape of a distribution, imagine sketching the
outline of the data to emphasize the general trend.
Common Descriptions of Shape Distribution

Right skewed: A cluster of data on the left with

a tail of data tapering off to the right. A right-
skewed distribution has a lot of data at lower
variable values.
Left skewed: A cluster of data on the right with a
tail of data tapering off to the left. A left skewed
distribution has a lot of data at higher variable
values with smaller amounts of data at lower
variable values.
Symmetric with a central peak (bell-
shaped): A central peak with a tail in both
directions. A bell-shaped distribution has a lot of
data in the center with smaller amounts of data
tapering off in each direction.
Uniform: A rectangular shape, the same amount of
data for each variable value.
Distribution of quantitative data using a dot plot:
Center and Spread
Center and Spread
To describe the pattern in a distribution of a quantitative variable we also
describe the center and spread.
When we describe a distribution of a quantitative variable, it is helpful to
identify a typical value. We choose a single value of the variable to
represent the entire group. This is one way to think about the center of the
distribution.
• We also want to describe how much the data varies among individuals in
the group. Variability is another word for spread. We describe the spread
in two ways:
• We look at the smallest value and the largest value to describe
an overall range in the data.
• We also describe a range of typical values to represent common
variable values for the group.
Outliers

Outliers are observations that fall outside the overall pattern. More precise
methods for identifying outliers will be described later.

In the distribution of wrist measurements below, there are two women with
unusually large wrists. These women might be outliers. They are marked in
red. The man with the smallest wrist measurement is shown in yellow. This
man is probably not an outlier.
Histograms

Our goal in data analysis is to describe patterns in data and create a useful
summary about a group. When a graph summarizes the distribution of a
variable, we can see
• the possible values of the variable.
• the number of individuals with each variable value or interval of values.
As we have seen, a dot plot is a useful graphical summary of a distribution.
A histogram is an alternative way to display the distribution of a
quantitative variable. Histograms are particularly useful for large data sets.
A histogram divides the variable values into equal-sized intervals. We can
see the number of individuals in each interval.
Histograms: Using Dotplot with Bins

To create a histogram, divide the variable values into equal-sized intervals

called bins. In this graph, we chose bins with a width of 5 cm. Each bin
contains a different number of individuals. For example, 48 adults have hip
measurements between 85 and 90 cm, and 97 adults have hip
measurements between 100 and 105 cm.
Histograms: Count and Frequency
In this histogram each bin is now a bar. The height of the bar indicates the
number of individuals with hip measurements in the interval for that bin. As
before, we can see that 48 adults have hip measurements between 85 and
90 cm, and 97 adults have hip measurements between 100 and 105 cm.

In the histogram, the count is the number of individuals in each bin. The
count is also called the frequency. From these counts, we can determine a
percentage of individuals with a given interval of variable values. This
percentage is called a relative frequency.
Which histogram gives the most helpful
summary of the distribution?
When we change the bins, the data gets grouped differently. The different
grouping affects the appearance of the histogram

For this situation, the middle histogram is probably the most useful
summary because the intervals correspond to letter grades.
General advice regarding histograms

• Avoid histograms with large bin widths that group data into only a few
bins. A histogram constructed with large bin widths will show the
distribution as a “skyscraper.” This does not give good information about
variability in the distribution.

• Avoid histograms with small bin widths that group data into lots of bins. A
histogram constructed with small bin widths will show the distribution as
a “pancake.” This does not help us see the pattern in the data.
Using a Histogram

The ages of the actresses who won an Oscar for Best Actress from 1970 to
2001:

Shape: The distribution of ages appears skewed to the right. Most are
young.
Center: The distribution of ages appears to be centered between 30 and 35
years
Spread: The data range from about 20 to about 80, so the overall range is
approximately 60.
Outliers: Winners older than 60 years are unusual. There are three outliers:
Choosing a Measure of
Center

We develop two different measurements for identifying the center of a

distribution: the mean and the median. Each measure has special
properties.
The mean is the average. It is written as and pronounced “x-bar.” To
calculate the mean, we add the data values and divide by the number of
data points.
We can write this as a formula:

In this formula, the symbol means sum (add up the values). The represents
the data values. The letter represents the number of data values.
Calculating the Mean

Let’s find the mean of a set of three quiz scores: 70, 85, 82.

In this situation, n is 3 because there are 3 quiz scores. We add the “x”
values, 70 + 85 + 82 to get 237, then divide by 3 to get a mean of 79.

We could write this calculation using the formula:

Calculating the Median

The median is another way to identify a typical value. The median is the
middle of the data when all the values are listed in order. The median
divides the data into two equal-sized groups. There is as much data below
the median as above it.

Homework scores: 70, 80, 80, 80, 85, 86, 90, 90, 95.
The median score is 85. This is the center score. There are four homework
scores below 85 and four homework scores above 85.
Mean vs Median

• Use the mean as a measure of center only for distributions that are
reasonably symmetric with a central peak. When outliers are present, the
mean is not a good choice.
• Use the median as a measure of center for all other cases.
• Always plot the data.
• We need to use a graph to determine the shape of the distribution. By
looking at the shape, we can determine which measures of center best
describe the data.
Interquartile Range and Boxplots

• The range measures the variability of a distribution by looking at the

interval covered by all the data. The IQR measures the variability of a
distribution by giving us the interval covered by the middle 50% of the
data.
• The five-number summary of a distribution consists of the minimum,
quartile 1, median, quartile 3, and maximum.
• The IQR is the measure of spread we should use when using the median
to measure center.
Interquartile Range and Boxplots (cont.)

• When using the median and IQR to measure center and spread, a data
point is considered an outlier if it satisfies one of the following conditions.
• The data value is more than 1.5 IQRs greater than Q3 (i.e., the value is greater
than Q3 + 1.5 * IQR)
• The data value is more than 1.5 IQRs less than Q1 (i.e., the value is less than Q1 −
1.5 * IQR)
Boxplots

Here are the two sets of exam scores. The data was divided into
quartiles. In a data set, each quartile contains the same number of
scores.
Here is the five-number summary for these two distributions:
• Class A: Min: 40 Q1: 71 Q2: 74.5 Q3: 78.5 Max: 95
• Class B: Min: 40 Q1: 61 Q2: 74.5 Q3: 89 Max: 95
To create the boxplot for each distribution,
• Draw a box from Q1 to Q3.
• Draw a vertical line in the box at the median.
• Extend a tail from Q1 to the smallest value that is not an outlier
and from Q3 to the largest value that is not an outlier.
• Indicate outliers with asterisks (*).
Boxplots (cont.)

Actors: Min=31, Q1=37.75, M=42.5, Q3=48.75, Max=76

Actresses: Min=21, Q1=32, M=35, Q3=41.5, Max=80

In general, actresses win the Best Actress Oscar at a younger age than do
actors. The median age for actresses is 35, compared to 42.5 for actors. Not
only do actresses win at a younger age, the Oscar is awarded more
consistently to younger actresses. There is less variability in the middle half
of the actresses’ ages (IQR = 9.5) than in the actors’ ages (IQR = 11). Both
distributions have older winners that are outliers. These older winners are
unusual and skew the distribution of ages to the right.
Measuring Spread about the Mean

Let’s consider the sample data set 2, 2, 4, 5, 6, 7, 9. The mean of this data
set is

Here is a dotplot of this data set with the mean marked by the vertical blue
line.
Measuring Spread about the Mean (continued)

When visualized on a dotplot, these differences are viewed as distances

between each point and the mean.
A negative difference indicates that the data point is to the left of the mean
(shown in blue on the graph below).
A positive difference indicates that the data point is to the right of the mean
(shown in green on the graph below).

Calculating differences
2 − 5 = −3
2 − 5 = −3
4 − 5 = −1
5−5=0
6−5=1
7−5=2
9−5=4
Average Distance from the Mean (ADM)

• The ADM (average devition from the mean) is a measurement of spread

about the mean. More precisely, ADM measures the average distance of
the data from the mean.
• We can use the ADM to define a typical range of values about the mean.
We mark the mean, then we mark 1 ADM below the mean and 1 ADM
above the mean. This interval is centered at the mean and captures
typical values about the mean.
• The ADM is a reasonable measure of spread about the mean, but there is
another measure that is used much more often: the standard deviation
(SD). The standard deviation behaves very much like the average
deviation.
Standard Deviation

The standard deviation (SD) is a measurement of spread about the mean

that is similar to the average deviation. We think of standard deviation as
roughly the average distance of data from the mean.
The formula for the standard deviation of a data set can be described by the
following expression:

The symbols in the expression are defined as follows:

• n is the number of values in the data set (the count)
• Recall that means to add up (compute the sum)
• is the mean of the data set
• The individual values are denoted by x
Choosing Numerical Summaries

• Use the mean and the standard deviation as measures of center and
spread only for distributions that are reasonably symmetric with a central
peak. When outliers are present, the mean and standard deviation are not
a good choice.
• Use the five-number summary (which gives the median, IQR, and range)
for all other cases.
• Always plot the data. We need to use a graph to determine the shape of
the distribution. By looking at the shape, we can determine which
measures of center and spread best describe the data.
Quick Review

• How do you describe the shape of a distribution?

• What are outliers?
• What is standard deviation a measure of?
• What needs to be done to determine the shape of the distribution?
• What is the five-number summary used for?
• What are categorical variables?
• What is a right skewed distribution?

Biostatistics: Variables and Graphs Guide
No ratings yet
Biostatistics: Variables and Graphs Guide
39 pages
Analytical Techniques Lec 1
No ratings yet
Analytical Techniques Lec 1
42 pages
Statistics for Scientists
No ratings yet
Statistics for Scientists
35 pages
Statistical Foundations for Data Analysis
No ratings yet
Statistical Foundations for Data Analysis
108 pages
Descriptive Statistics Overview
No ratings yet
Descriptive Statistics Overview
67 pages
ADBM Statistical Techniques Overview
No ratings yet
ADBM Statistical Techniques Overview
39 pages
Data Presentation in Statistics for Engineers
No ratings yet
Data Presentation in Statistics for Engineers
65 pages
Statistics: Basics and Key Concepts
100% (1)
Statistics: Basics and Key Concepts
33 pages
Basic Statistics Workshop Overview
No ratings yet
Basic Statistics Workshop Overview
85 pages
Understanding Levels of Measurement in Statistics
No ratings yet
Understanding Levels of Measurement in Statistics
38 pages
Statistics
100% (1)
Statistics
11 pages
Unit 4 Quantitative Analysis and Interpretation
No ratings yet
Unit 4 Quantitative Analysis and Interpretation
10 pages
Univariate and Multivariate Data Analysis
No ratings yet
Univariate and Multivariate Data Analysis
152 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
59 pages
Understanding Statistics and Data Management
No ratings yet
Understanding Statistics and Data Management
4 pages
Interpreting Test Score: Online Workshop 8602 Aiou
100% (1)
Interpreting Test Score: Online Workshop 8602 Aiou
39 pages
Understanding Basic Statistics Concepts
No ratings yet
Understanding Basic Statistics Concepts
13 pages
Unit 1 - Examining Distributions
No ratings yet
Unit 1 - Examining Distributions
80 pages
SPSS Data Analysis Basics Explained
100% (1)
SPSS Data Analysis Basics Explained
110 pages
Understanding Statistics: Key Concepts
No ratings yet
Understanding Statistics: Key Concepts
11 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
36 pages
Statistics in Nursing Research Overview
No ratings yet
Statistics in Nursing Research Overview
37 pages
Pharmacology and Biostatistics Overview
No ratings yet
Pharmacology and Biostatistics Overview
38 pages
Data Analytics: Statistical Methods Overview
No ratings yet
Data Analytics: Statistical Methods Overview
38 pages
Data Managementmmw
No ratings yet
Data Managementmmw
26 pages
SLIDES Statistics-Chapter 2
No ratings yet
SLIDES Statistics-Chapter 2
31 pages
Introduction to Statistics and Data Analysis
No ratings yet
Introduction to Statistics and Data Analysis
37 pages
Business Statistics and Data Analysis
No ratings yet
Business Statistics and Data Analysis
14 pages
Introduction to Statistics Basics
No ratings yet
Introduction to Statistics Basics
26 pages
Basic Statistics
100% (10)
Basic Statistics
73 pages
Statistics and Probability Formulas Guide
No ratings yet
Statistics and Probability Formulas Guide
47 pages
Appendix B: Introduction To Statistics: Eneral Terminology
No ratings yet
Appendix B: Introduction To Statistics: Eneral Terminology
15 pages
Biostats - PST 426.sister HO Fawole
No ratings yet
Biostats - PST 426.sister HO Fawole
85 pages
Understanding Statistics and Data Types
100% (1)
Understanding Statistics and Data Types
60 pages
Understanding Biostatistics Basics
No ratings yet
Understanding Biostatistics Basics
49 pages
Understanding Test Scores and Statistics
No ratings yet
Understanding Test Scores and Statistics
39 pages
Overview of Statistical Methods and Analysis
No ratings yet
Overview of Statistical Methods and Analysis
52 pages
Understanding Business Statistics Basics
No ratings yet
Understanding Business Statistics Basics
83 pages
BIO 401 FINAL MCQs AND QUESTION
No ratings yet
BIO 401 FINAL MCQs AND QUESTION
22 pages
Presentation of Data
No ratings yet
Presentation of Data
36 pages
Data Presentation in Statistics
No ratings yet
Data Presentation in Statistics
36 pages
Quantitative Data Analysis Techniques
No ratings yet
Quantitative Data Analysis Techniques
31 pages
Insem AIML
No ratings yet
Insem AIML
8 pages
Understanding Data Processing and Analysis
No ratings yet
Understanding Data Processing and Analysis
132 pages
Week 2
No ratings yet
Week 2
27 pages
Sampling Design and Statistical Analysis
No ratings yet
Sampling Design and Statistical Analysis
34 pages
Descriptive Statistics Overview
No ratings yet
Descriptive Statistics Overview
74 pages
Understanding Statistics: Basics & Types
No ratings yet
Understanding Statistics: Basics & Types
3 pages
Understanding Basic Statistics Concepts
No ratings yet
Understanding Basic Statistics Concepts
8 pages
Types and Characteristics of Statistical Data
No ratings yet
Types and Characteristics of Statistical Data
6 pages
Analyzing Quantitative Data Distributions
No ratings yet
Analyzing Quantitative Data Distributions
75 pages
Understanding Data Analysis Basics
No ratings yet
Understanding Data Analysis Basics
31 pages
Unit 6 MSG On-Level
No ratings yet
Unit 6 MSG On-Level
19 pages
Comprehensive Statistics Guide
No ratings yet
Comprehensive Statistics Guide
81 pages
PDF Notes
No ratings yet
PDF Notes
28 pages
Intro to Basic Statistics Concepts
No ratings yet
Intro to Basic Statistics Concepts
13 pages
MMW Data Management
No ratings yet
MMW Data Management
2 pages
IntroductoryStatistics2e PowerPoint Ch09
No ratings yet
IntroductoryStatistics2e PowerPoint Ch09
20 pages
Examining Relationships: Quantitative Data: Concepts in Statistics
No ratings yet
Examining Relationships: Quantitative Data: Concepts in Statistics
21 pages
IntroductoryStatistics2e PowerPoint Ch08
No ratings yet
IntroductoryStatistics2e PowerPoint Ch08
22 pages
DCMP STAT 14B Instructor InClassActivity
No ratings yet
DCMP STAT 14B Instructor InClassActivity
10 pages
DCMP STAT 14A Instructor InClassActivity
No ratings yet
DCMP STAT 14A Instructor InClassActivity
10 pages
IntroductoryStatistics2e PowerPoint Ch07
No ratings yet
IntroductoryStatistics2e PowerPoint Ch07
19 pages
IntroductoryStatistics2e PowerPoint Ch02
No ratings yet
IntroductoryStatistics2e PowerPoint Ch02
64 pages
ConceptsinStatistics 10 InterferenceforMeans
No ratings yet
ConceptsinStatistics 10 InterferenceforMeans
38 pages
DCMP STAT 04A Instructor InClassActivity
No ratings yet
DCMP STAT 04A Instructor InClassActivity
10 pages
Relationships in Categorical Data With Intro To Probability
No ratings yet
Relationships in Categorical Data With Intro To Probability
17 pages
DCMP STAT 13D Instructor InClassActivity
No ratings yet
DCMP STAT 13D Instructor InClassActivity
14 pages
DCMP STAT 17C Instructor InClassActivity
No ratings yet
DCMP STAT 17C Instructor InClassActivity
12 pages
DCMP STAT 14D Instructor InClassActivity
No ratings yet
DCMP STAT 14D Instructor InClassActivity
10 pages
DCMP STAT 02E Instructor InClassActivity
No ratings yet
DCMP STAT 02E Instructor InClassActivity
12 pages
CalculusVolume2 03
No ratings yet
CalculusVolume2 03
24 pages
DCMP STAT 02A Instructor InClassActivity
No ratings yet
DCMP STAT 02A Instructor InClassActivity
15 pages
DCMP STAT 03A Instructor InClassActivity
No ratings yet
DCMP STAT 03A Instructor InClassActivity
10 pages
CalculusVolume2 05
No ratings yet
CalculusVolume2 05
20 pages
DCMP STAT 18B Instructor InClassActivity
No ratings yet
DCMP STAT 18B Instructor InClassActivity
14 pages
DCMP STAT 17A Instructor InClassActivity
No ratings yet
DCMP STAT 17A Instructor InClassActivity
14 pages
CalculusVolume2 04
No ratings yet
CalculusVolume2 04
28 pages
CalculusVolume2 02
No ratings yet
CalculusVolume2 02
90 pages
DCMP STAT 02B Instructor InClassActivity
No ratings yet
DCMP STAT 02B Instructor InClassActivity
6 pages
Math g4 m5 Topic C Lesson 14
No ratings yet
Math g4 m5 Topic C Lesson 14
21 pages
CalculusVolume2 01
No ratings yet
CalculusVolume2 01
42 pages
Grade 4 Division with Remainders
No ratings yet
Grade 4 Division with Remainders
12 pages
Math g4 m3 Topic e Lesson 21
No ratings yet
Math g4 m3 Topic e Lesson 21
16 pages
Food Waste Reduction App Overview
No ratings yet
Food Waste Reduction App Overview
20 pages
CENG 197 Competency Appraisal 2 Guide
No ratings yet
CENG 197 Competency Appraisal 2 Guide
2 pages
Additive Manufacturing - Lecture Notes
No ratings yet
Additive Manufacturing - Lecture Notes
134 pages
Project Proposal: Sangguniang Kabataan 2021
100% (1)
Project Proposal: Sangguniang Kabataan 2021
3 pages
FDTP Guideline Winter 2024
No ratings yet
FDTP Guideline Winter 2024
3 pages
Cambridge O Level Mark Scheme 2019
No ratings yet
Cambridge O Level Mark Scheme 2019
15 pages
CleanSeal Brochure
No ratings yet
CleanSeal Brochure
3 pages
Auditing in Computerized Environment Introduction
100% (2)
Auditing in Computerized Environment Introduction
4 pages
Memory Organization Lecture
No ratings yet
Memory Organization Lecture
91 pages
Instruction and Operation Manual: Display and Data Logger
No ratings yet
Instruction and Operation Manual: Display and Data Logger
64 pages
Bathula Soumya BS
No ratings yet
Bathula Soumya BS
3 pages
MIL-HDBK-217 Revision Overview
No ratings yet
MIL-HDBK-217 Revision Overview
20 pages
Limits, Tolerances, and Fits Explained
No ratings yet
Limits, Tolerances, and Fits Explained
62 pages
BCM4360
No ratings yet
BCM4360
8 pages
Student Cumulative Record: CZARYNA BUTAC
No ratings yet
Student Cumulative Record: CZARYNA BUTAC
4 pages
Merus HPQ Hybrid Power Quality Technical Brochure Datasheet 5 24
No ratings yet
Merus HPQ Hybrid Power Quality Technical Brochure Datasheet 5 24
13 pages
Final Marks and Exam Viewing Details
No ratings yet
Final Marks and Exam Viewing Details
4 pages
Sustainability 14 03259
No ratings yet
Sustainability 14 03259
37 pages
Intermediate English Exam Guide
No ratings yet
Intermediate English Exam Guide
6 pages
Section VI
No ratings yet
Section VI
44 pages
Complete Guide To Social Video
No ratings yet
Complete Guide To Social Video
49 pages
Soal Reading 2
No ratings yet
Soal Reading 2
7 pages
Electrical Engineer Resume Overview
No ratings yet
Electrical Engineer Resume Overview
5 pages
Counseling Scheme NEET 2016
No ratings yet
Counseling Scheme NEET 2016
21 pages
Google Cloud Security Insights
No ratings yet
Google Cloud Security Insights
22 pages
GTL Economics SPE-94380-MS
No ratings yet
GTL Economics SPE-94380-MS
8 pages
Microsoft Learn Skilling Playbook
No ratings yet
Microsoft Learn Skilling Playbook
19 pages
Operating Systems Course Overview
No ratings yet
Operating Systems Course Overview
42 pages
Literature Review Network Security
100% (2)
Literature Review Network Security
6 pages
GMDSS AIS Inspection Cheat Sheet 1745600918
No ratings yet
GMDSS AIS Inspection Cheat Sheet 1745600918
10 pages

Summarizing Data Graphically and Numerically: Concepts in Statistics

Uploaded by

Summarizing Data Graphically and Numerically: Concepts in Statistics

Uploaded by

Summarizing Data Graphically

1. Produce Data: Determine what to

2. Explore the Data: Analyze and

3. Draw a Conclusion: Use the data,

Data consist of individuals and variables that give us information about

• An individual can be an object or a person

• A variable is an attribute, such as a measurement or a label

There are two types of variables: quantitative and categorical.

• Categorical variables take category or label values and place an

• Quantitative variables take numerical values and represent some kind

In a graph that summarizes the distribution of a variable, we can see

When we describe patterns in data, we use descriptions of shape, center,

Right skewed: A cluster of data on the left with

To create a histogram, divide the variable values into equal-sized intervals

We develop two different measurements for identifying the center of a

We could write this calculation using the formula:

• The range measures the variability of a distribution by looking at the

Actors: Min=31, Q1=37.75, M=42.5, Q3=48.75, Max=76

When visualized on a dotplot, these differences are viewed as distances

• The ADM (average devition from the mean) is a measurement of spread

The standard deviation (SD) is a measurement of spread about the mean

The symbols in the expression are defined as follows:

• How do you describe the shape of a distribution?

You might also like