You are on page 1of 14

Knowledge Base Article Link

C955 Formulas and Key Concepts

This summary is intended to supplement the MindEdge textbook and should not be treated as a
replacement. (Please use the document to help you review material, not to learn it in the first
place.)

Module 1:

Data classification:
A. Discrete data- has distinct values, can be counted, has unconnected points (think “dots”)
B. Continuous data- has values within a range, it is measured (not counted), does not have
gaps between data points (think data is “connected lines or curves”)

Intervals:
A. “Less than” or “greater than” correspond to open intervals- marked with parentheses.
B. “Less than or equal to” and “greater than or equal to” correspond to closed intervals-
marked with brackets.

Sign rule for multiplication and division:


A. (positive)*(positive)=(positive); (negative)*(negative)=(positive)
(a product or division of two numbers of the same sign will result in a positive number)
B. (negative)*(positive)=(negative) (positive)*(negative)=(negative)
(a product or division of two numbers of different signs will result in a negative number)

Prime and composite numbers:


A. A prime number is a number that has exactly two positive factors, 1 and the number
itself.
B. A composite number is a number that is not prime (so it has 3 or more positive factors,
including 1 and the number itself).

Prime factorization- writing the number as a product of only prime numbers

The greatest common factor (GCF) of two or more numbers- It is the largest number that
divides all the given numbers evenly
Module 2:

Multiples of a number- numbers that can be obtained by multiplying the given number by 1, 2,
3, 4….

Least common multiple (LCM) of two or more numbers- the smallest positive number that can
be divided by the given numbers

Unit Conversions
(only the most common; a complete list can be found in MindEdge on page 2.15)

Unit Conversions for Household Measures of Volume:

● 1 tablespoon = 3 teaspoons
● 1 fluid ounce = 2 tablespoons
● 1 cup = 8 fluid ounces
● 1 pint = 2 cups
● 1 quart = 2 pints
● 1 gallon = 4 quarts

Common Metric Conversions:


● 1 L = 1000 mL
● 1 kg = 1000 g
● 1 g = 1000 mg (milligrams)
● 1 mg = 1000 mcg (micrograms)

Conversion between Household and Metric units:


● 1 cc (cubic cm) = 1 mL
● 1 fl oz = 30 mL
● 1 L = 1.057 qt
● 1 tsp = 5 mL
● 1 kg = 2.2 lb
● 1 oz = 28.35 g

Temperature Conversions:
9 5
Fahrenheit = Celsius *
5 + 32 Celsius = (Fahrenheit - 32) *
9

Shorthand version of the


above formula: F = 1.8C + 32
Module 3:

Like terms- terms that have the same variable(s) raised to the same power(s); they can be
combined using addition and subtraction

Addition/subtraction principle- we can add/subtract the same number to both sides of an


equation and the resulting expressions remain equal

Multiplication/division principle- we can multiply/divide both sides of an equation by the same


number and the resulting expression remain equal (division by zero is not allowed)

The Butterfly Method (“cross-multiply”):


a c
If
b = d , then a*d = b*c

Slope-intercept equation of a line:

y = mx + b, where m is slope and b is the y-intercept

Slope of a line- given two points (x1,y1) and (x2,y2), the slope m of the line going through these
two points is

Example:
Module 4:

Types of Data:
A. Quantitative (numerical) data- consists of data values that are numerical, quantities
that can be counted or measured (additions/subtractions make sense)
B. Categorical (qualitative) data- consist of data that are groups or labels, and are not
necessarily numerical (additions/subtractions do not make sense)

Single Variable Display

Categorical Pie Chart or Bar Chart

Quantitative Histogram, Stemplot,


Dotplot, or Boxplot

Displays for Categorical data:


A. Pie Chart- displays different parts of the whole; example:

B. Bar Chart- displays counts or frequencies of each category; example:


Displays for Quantitative data:outlier
A. Dot plot- distribution of data, particularly clusters, gaps, and outliers, most useful for
smaller data sets
a. Each piece of data is represented by a dot above the corresponding data value
on a number line.
b. Repeated data values have a mark for each instance stacked vertically.

B. Stem Plot (AKA Stem-and-Leaf Plot)-shape of data according to place values, contains
the actual data values
a. two columns, one for stems and one for leaves
b. Stems are usually the leftmost digit (for data between -99 and 99).
c. Leaves are usually the right most digit (for data between -99 and 99).

C. Box Plot- center, quartiles, spread, and outliers in a given data set
a. four parts: the first whisker, two rectangles (the box), and another whisker
b. each part covers 25% of the data, regardless of length
c. can be horizontal or vertical
d. Modified box plot- displays outliers as points outside the whiskers
e. Five Number Summary example:
D. Histogram- shape and spread
a. heights of vertical bars represent the number of data points within the interval
represented by the width of the bar
b. heights can be frequencies or relative frequencies (percentage of data in interval)

Histogram Shape
A. Symmetric- left half is (roughly) same as right half

B. Right Skewed (positively skewed)- long tail stretches to the right of the peak

C. Left Skewed (negatively skewed)- long tail stretches to the left of the peak
D. U-Shaped- contains a “valley” rather than a single peak

E. Uniform- straight across, all data appears as equal number of times

F. Multimodal- two (bimodal) or more clear peaks

Measures of Center- value which represents the “typical” data point in a data set
A. Mode- value that occurs most often in a data set
a. there may be more than one mode (multimodal)
b. there is no mode for a uniform distribution
B. Median- halfway point, equal number of data points above the median as below, always
order the data from smallest to largest first
a. Odd number of data points- value in the exact middle of the data points
b. Even number of data points- average of the middle two data points
c. resistant to skew- extreme values do not greatly affect the median
C. Mean (common average)- add up all the data points and divide by how many data points
there are
a. extreme values greatly influence the mean in the direction of the skew

Measure of Spread (variability)- describes how much the data vary from the center
A. Range- difference between the largest data value and the smallest data value
B. InterQuartile Range (IQR)- difference between the third quartile, Q3, and the first
quartile, Q1
a. Quartiles- break the data into four equal size groups
b. Second quartile (Q2)- the median
c. First quartile (Q1)- the median of the data below Q2
d. Third Quartile (Q3)- the median of the data above Q2
C. Standard deviation- “average” distance each data point is from the mean
Empirical Rule (68-95-99.7 Rule)- for Normal Distributions (bell shaped curves)

a. About 68% of the data under the normal curve is within 1 standard deviation of
the mean.
b. About 95% of the data under the normal curve is within 2 standard deviations of
the mean.
c. About 99.7% of the data under the normal curve is within 3 standard deviations
of the mean.

Which measures of center and spread should we use?


A. Normal, symmetric data- use the mean and standard deviation
B. Skewed data- use the median and IQR

Misrepresenting Data with Graphical Displays


A. Scale of Axis- The vertical scale should start at zero.
B. Omitting Labels or Units- leaves size and categories unspecified
C. Using a 2-Dimensional Graph to Represent a 1-Dimensional Measurement- In
graphs like the one below, our eyes see area, which distorts the true differences we are
trying to illustrate. We should avoid using such graphs!
Module 5:

Relationship Between Two Variables


A. Explanatory Variable (x)- presumed to possibly cause changes in the response
variable; also known as the independent variable
B. Response Variable (y)- presumed to be affected by the explanatory variable; also
known as the dependent variable

Graphical Displays of Two Variable Data

Explanatory Response Notation Display

Categorical Categorical (C → C) Two-way Table

Categorical Quantitative (C → Q) Side-by-side Box Plots

Quantitative Quantitative (Q → Q) Scatterplot

A. Two-way Frequency Table (AKA contingency table) (C→C)- rows show one
variable's categories, the columns the other variable’s categories
a. Joint frequencies (values in middle of table)- amount of data falling into both the
corresponding row and column
b. Marginal frequencies (values on right and bottom sides of table)- totals of
corresponding row or column
c. Grand total (bottom right corner of table)- total size of the data set
d. Conditional percentages- computed by dividing each joint frequency by the
corresponding explanatory variable marginal frequency
e. Overall percentages- computed by dividing each frequency by the grand total
B. Side-by-Side Box Plots (C→Q)- a box plot is displayed for each category of the
explanatory variable on the same graph
C. Scatterplot (Q→Q)- data create ordered pairs graphed on the coordinate plane

Correlation (Q → Q)

A. Positive Correlation- As the explanatory variable increases, the response variable


increases.
B. Negative Correlation- As the explanatory variable increases, the response variable
decreases.
C. No Correlation- scatterplot reveals no trend between the variables
D. Non-linear Relationship- scatterplot reveals a trend that is not a straight line
E. Correlation Coefficient (r)- measures the strength of the linear relationship between
the variables
a. r is always between -1 and 1.
b. The closer r is to 1, the stronger the positive linear correlation.
c. The closer r is to -1, the stronger the negative linear correlation.
d. r = 0 indicates no linear correlation, but that does not rule out non-linear
relationships (the graph may still show a curvilinear relationship)

F. Effect of Outliers- when far off the regression line, outliers weaken r
G. On a scatterplot, the closer the points are laid out in a line, the stronger the correlation.
Module 6:

Sampling Methods
A. Collecting Data:
a. Population - the group you want to study
b. Sampling Frame - the list of people or things you pull the sample from
c. Sample - the subset of the population that is actually being studied
B. Bias occurs when the Sampling Frame does not accurately represent the Population
C. Sampling Methods
a. Simple Random - participants are randomly chosen from the entire population
b. Voluntary - researchers invite everyone in the sampling frame to participate,
those who respond make up the sample.
c. Stratified - all groups are chosen, only some people within each group are
studied.
d. Cluster - some groups are chosen, all people within those groups are studied.

Study Design
A. Observational Study - someone observes what is happening in a situation. There is no
“treatment”. We are not comparing how two groups do with or without some key
difference.
a. Only association can be determined.
B. Experimental Study - researchers apply the treatment to one group and no treatment
(placebo) to a control group.
a. Causation can be determined in a well designed, controlled experiment.

Association
A. Relationship between variables
B. Scatterplots can show the pattern of the relationship between quantitative variables
C. Can be established by an observational study.

Causation- A change in one variable creates a change in the other variable.


A. Difficult to establish
B. Can be established by an Experimental Study
C. Association does not always mean causation.
D. Correlation does not always mean causation.
E. Lurking Variable- variable not included in the study, but affects the variables that were
included in the study

Simpson’s Paradox
A. occurs when a result that appears in groups of data disappears when the groups are
combined
B. can only occur when the sizes of the groups are inconsistent
Regression Analysis
A. Regression equation- equation modeling relationship between quantitative variables
B. Simple linear equation (regression line or line of best fit)- models the data with a line
a. x is the explanatory variable
b. y is the response variable
c. Equation is given by y = mx + b where m is the slope and b is the y-intercept
d. The sign of the slope (positive or negative) matches the sign of r (positive or
negative).
C. Used to predict data
a. Plug explanatory values in for x and calculate corresponding response values
for y.
b. Linear Interpolation- predictions between known data points
c. Linear Extrapolation- predictions for data larger than the maximum x-value or
smaller than the minimum x-value of the known data points
d. Potential Problems
i. Inappropriate Extrapolation- Trends do not always continue indefinitely.
ii. Association is not Causation- Watch for lurking variables.
iii. Not a Representative Sample
iv. Small Sample Size
Module 7:

Probability
A. Experiment- a situation for which a probability is being examined
B. Outcome- a possible result of an experiment
C. Event- a collection of desired outcomes
D. Sample Space (universe)- set of all possible outcomes
E. Fair- an experiment where all outcomes are equally likely
F. Complement of a set- everything NOT in the set
G. Disjoint events- contain no common outcomes, cannot happen simultaneously
H. Dependent events- The occurrence of one event changes the probability of the
occurrence of the other event.
I. Independent events- The occurrence of one event does NOT change the probability of
the occurrence of the other event.

Theoretical (Classical) Probability- For a particular event, count the number of outcomes in
the event and divide by the total number of possible outcomes for the experiment. Theoretical
probability requires that the experiment be fair.

Empirical Probability (relative frequency)- Perform the experiment multiple times (called trials),
count the number of times the event occurs and divide by the total number of trials. Empirical
probability does not require the experiment to be fair.

Law of Large Numbers- As the number of trials of an experiment increases, the empirical
probability gets closer to the true probability.

General Addition Rule-


● P(A or B) = P(A) + P(B) - P(A and B)

Conditional Probability- P(B|A) is read “the probability of event B happening given that event
A has happened”.
P (A and B)
● P(B|A) =
P (A)
● Can determine independence. Events A and B are independent if either of the following
are true:
a. P(B|A) = P(B)
b. P(A and B) = P(A) x P(B)

General Multiplication Rule-


● P(A and B) = P(A)P(B|A)
Probability Trees- display all the possible outcomes in a sample space; each path (sequence
of branches) represents a possible outcome
A. Probabilities are placed on the corresponding branches of each path.
B. Probabilities of each individual outcome in sample space can be found by
multiplying along the path.
C. Probabilities of events that include more than one outcome can be found by
adding the products from each corresponding path of each outcome.
D. Law of Total Probability- For any probability tree, if you multiply along each
branch and then sum all of the resulting products, you will get 1 = 100%.

Complement Formulas (calculating the probability of something NOT happening)


A. P(not A) = 1 - P(A)
B. P(“at least one”) = 1 - P(“none”)

Probability Formulas which can only be used when events are Disjoint
A. P(A and B) = 0
B. P(A or B) = P(A) + P(B)
C. P(A|B) = P(B|A) = 0

Probability Formulas which can only be used when events are Independent
A. P(A and B) = P(A) x P(B)
B. P(A or B) = P(A) + P(B) - [P(A) x P(B)]
C. P(A|B) = P(A)
D. P(B|A) = P(B)

You might also like