2.
1 Introducing to statistic
1. Type of data
a. categorical/Qualitative
b. Quantitative
2. For the data that represented by 2 variable is call Bivariate Data
a. Bivariate categorical data: refer to the data that consists of two cate
variables.
To represent the relationship of the two variables, we can use
Histogram
Frequency chart
Mosaic Plot
2.1 Introducing to statistic 1
b. bivariate quantitative data
We use scatter plot to represent the relationship of quantitative variables.
In the scatter plot, there are 2 variable: Explanatory and Respond variable
In order to describe the relationship or the association of the 2 variables, we
have to know:
Direction: Positive, negative, non
Form: linear, Non-linear
Strength: Strong, Moderate, weak
Influential point: any point that if removed, it will change the linear relationship
substantially.
Outlier: is the point that has an extreme response variable(y is too big), an
outlier does not follow the general trend
2.1 Introducing to statistic 2
High leverage point: A high level point or high leverage point that has an
extreme explanatory variable(x is too big)
2.1 Introducing to statistic 3
There is a strong, negative, non-linear association between the number of mn
spent and the number of the remaining questions.
2.2. Representing Two categorical variable
Two way table: represent 2 variables
Side by side bar graph
2.1 Introducing to statistic 4
Segment bar graph
Correlated/Dependent
2.1 Introducing to statistic 5
marginal relative frequency: is the relative frequency of all the people in certain
category.
Conditional relative frequency: is the frequency that we have of a particular
category given the fact that we know a subject is in another category.
independent category:
dependent category:
1. Two-way tables: A statistical table that shows the frequencies or relative
frequencies of two categorical variables in a cross-tabulated format, with
one variable represented by rows and the other by columns.
2. Side-by-side bar graphs: A graphical display that shows the distribution of
a categorical variable by displaying the frequency or relative frequency of
each category as a bar
3. Mosaic plots: A graphical display that shows the relationship between two
categorical variables by dividing the area of a rectangle into tiles that
represent the different categories of both variables.
4. Segmented bar graphs: A graphical display that shows the relationship
between two categorical variables by dividing the bars in a bar graph into
segments that represent the different categories of one of the variables.
5. Categorical variable:
A variable that can take on a limited number of categories or values, such as
"male" or "female," but cannot be meaningfully ordered or measured on a
2.1 Introducing to statistic 6
continuous scale.
1. Quantitative variable
A variable that can be measured or ordered on a continuous scale, such as
height or weight.
1. Bivariate variable
A statistical concept that refers to the relationship between two variables,
often used to describe the association between two categorical variables.
1. Marginal relative frequency: The frequency or relative frequency of a
particular category of a categorical variable, calculated by dividing the
frequency or relative frequency of that category by the total frequency or
relative frequency for the entire sample or population.
2. Conditional relative frequency: The relative frequency of a particular
category of a categorical variable within another category, calculated by
dividing the frequency or relative frequency of the first category within
the second category by the total frequency or relative frequency of the
second category.
2.1 Introducing to statistic 7