This document provides an overview of basic statistical concepts including definitions of statistics, populations and samples, descriptive and inferential statistics. It discusses various methods for summarizing data including measures of central tendency (mean, median, mode) and measures of dispersion (range, variance, standard deviation). It also provides examples of how to calculate and interpret these statistical measures to analyze data and make inferences about populations.
This document provides an overview of basic statistical concepts including definitions of statistics, populations and samples, descriptive and inferential statistics. It discusses various methods for summarizing data including measures of central tendency (mean, median, mode) and measures of dispersion (range, variance, standard deviation). It also provides examples of how to calculate and interpret these statistical measures to analyze data and make inferences about populations.
This document provides an overview of basic statistical concepts including definitions of statistics, populations and samples, descriptive and inferential statistics. It discusses various methods for summarizing data including measures of central tendency (mean, median, mode) and measures of dispersion (range, variance, standard deviation). It also provides examples of how to calculate and interpret these statistical measures to analyze data and make inferences about populations.
Statistics • Statistical Thinking – thought processes that focus on ways to understand, manage and reduce variation. – Get information – Get survey – Control for variation – Example: shopping – survey where it would be cheaper to get the items that you want
IS 119 SY 2022-2023 1st Semester
Statistics • Definition – (Plural sense) is a set of numerical data – (Singular sense) is a branch of science which deals with the collection, presentation, analysis, and interpretation of data – Applications: • build macroeconomic model to estimate economic relationships and evaluate government policies • Studying the effects of specific interventions on the welfares of households or institutions
IS 119 SY 2022-2023 1st Semester
Statistics – Applications: • In financial matters, may use regression and correlation analysis to help understand the relationship of financial ratio to a set of other variables in business • Use statistical models to forecast sales in the coming year • Political party may want to study the effects of political campaign expenditures on voting outcomes • Academicians may use statistics in modeling its graduates via tracer studies
IS 119 SY 2022-2023 1st Semester
Basic Terminologies • Population VS sample (collection of all elements VS part or subset of the population) • Parameter VS statistic – Parameter: a numerical characteristic of the population (usually denoted in Greek letters) – Statistic: a numerical characteristic of the sample • Descriptive VS Inferential statistics
IS 119 SY 2022-2023 1st Semester
Basic Terminologies • Descriptive VS Inferential statistics – Descriptive: composed of methods concerned with collecting, describing, and analyzing a set of data without drawing conclusions or inferences about a large group. – Inferential: composed of methods concerned with the analysis of a subset of data leading to predictions or inferences about the entire set of data
IS 119 SY 2022-2023 1st Semester
Basic Terminologies Descriptive Inferential • simply describe what is or • Conclusions are reached that what the data shows extend beyond the immediate data alone • Used to try to infer from the • Use to describe what’s sample data what the population going on in our data might think • Used to make judgment on whether the probability that an observed difference between groups is dependable or might have happened by chance • use to make inferences from the sample data to more general conditions IS 119 SY 2022-2023 1st Semester Types of Data • Qualitative or categorical data – objects being studied are grouped into labeled categories based on some qualitative traits – Examples: sex (male or female); status (single, married, separated, widow, etc) – Commonly summarized as percentages or proportions • Quantitative or numerical data – refers to any attribute that is measured numerically – Discrete: counts (no. of poor households, etc) – Continuous: numerical responses from measurements (income, etc) – Commonly summarized using averages or means IS 119 SY 2022-2023 1st Semester Methods of Summarizing the Data • Textual Presentation • Tabular • Graphical • Computation of summary measures – Measure of location or central tendency (mean, median or mode) – Measure of statistical dispersion like standard deviation, variance or range – Measure of location such as percentile, or quartile
IS 119 SY 2022-2023 1st Semester
Measure of Central Tendency • Mean: the ratio of the sum of all values of observations to the number of observations in the data set – Properties: reflects the magnitude of every observation since each contributes to the value of the mean – Affected by extreme values – Weighted mean: means of subgroups combined when properly weighted
IS 119 SY 2022-2023 1st Semester
Measure of Central Tendency • Median: value which divides the ordered data set into two equal parts – Example: (case of even) 4 observations – 1, 3, 2, 1; ordered from lowest to highest: 1, 1, 2, 3, thus md= (1+2)/2=1.5 – (case of odd) 5 observations 1, 5, 9, 11, 12, md=9 – Properties: positional value hence not affected by extreme values; not amenable to further computation (cannot be combined)
IS 119 SY 2022-2023 1st Semester
Measure of Central Tendency • Mode: value which occurs most often – Example: data set {1, 1, 1, 1, 2, 3, 4, 4}; Mode is 1 – For {1, 1, 2, 2, 3}, modes are 1 and 2 – For {1, 2, 3, 4}, there is no mode – Properties: determined by the frequency of occurrence and not by values of observation – Cannot be manipulated algebraically – Can be defined with qualitative and quantitative data IS 119 SY 2022-2023 1st Semester Measures of Dispersion Measure how scattered the data values are around the mean/average; • Range: the length of the interval which contains all the data – Calculated by subtracting the lowest from the highest: it is a poor and weak measure of dispersion since it depends in only two observations except when sample size is large
IS 119 SY 2022-2023 1st Semester
Measures of Dispersion • Range: the length of the interval which contains all the data – Simple, easy to understand – Gives comprehensive value since it gives the limit – Lack of clustering of values – Depends on the value of extreme items – Sensitive to sampling variations – Not tractable mathematically
IS 119 SY 2022-2023 1st Semester
Measures of Dispersion • Variance: a measure of dispersion of data with respect to the mean. – Measures the average distance between each of a set of data points and their mean value.
– Population variance, δ2= Σi=1 (xi - μ)2/N
– Sample variance, s2= Σi=1 (xi –x)2/(n-1)
IS 119 SY 2022-2023 1st Semester Measures of Dispersion • Standard Deviation: – Measures on the average, the dispersion of each observation from the mean. Large amount of variation means that the data values are far from the mean, hence sd is large
– Population Sd, δ= SQTR(Σi=1 (x - μ)2/N)
i
– Sample Sd, s= SQTR(Σi=1 (x –x)2/(n-1))
i
IS 119 SY 2022-2023 1st Semester
Measures of Relative Dispersion • Unitless and are used to compare the scatter of one distribution with the scatter of another distribution. • Coefficient of Variation: is a statistical measure of the dispersion of data points in a data series around the mean. It is the ratio of the standard deviation to the mean. – Useful when interest is in the size of variation relative to the size of the observation.
CV = (δ/μ)*100%
IS 119 SY 2022-2023 1st Semester
Example 1 The foreign exchange rate is an indicator of the stability of the peso and is also an indicator of the economic performance. Market forces and not government policy have determined the level of the pesos since Gov’t intervenes through the Bangko Sentral ng Pilipinas, only when there are speculative elements in the market. Given below are the means and standard deviations of the quarterly P-$ exchange rate for the periods 1998 to 1999 and 2000 to 2001. which of the two periods is more stable? Mean Standard Deviation 1998-1999 40.4 2.01 2000-2001 48.6 1.21
IS 119 SY 2022-2023 1st Semester
Solution to Example 1 • 1998-1999 CV98-99 = (2.01/40.4) X 100% = 4.98%
• 2000-2001 CV2000-2001 = (1.21/48.6) X 100% = 2.49%
• Thus the period 2000-2001 is more stable with
respect to the peso-dollar exchange rate IS 119 SY 2022-2023 1st Semester Inferential Statistics • Deals with the methods to generalize what the sample data show
• Inferential statistics answer the question:
• Can I generalize to the population the patterns/
differences/ profile that I see in my sample?
• Note that there is no need to do inferential
statistics if the data are already population data IS 119 SY 2022-2023 1st Semester Next topic… • Studying Relationships – Correlation Analysis – Regression Analysis – Important Statistics and Tests