→ science that deals with collection, presentation, → division of statistics that aims to give information analysis, and interpretation of data. about the population by studying the characteristics → fundamentally concerned with the understanding of of the sample drawn from it. structures of data. → methods range from simple to more systematic Categories of Data procedures in describing and summarizing data. 1. Quantitative Data These methods enable us to develop way of → uses categories or attributes that are distinguishes thinking. by some nonnumeric characteristics (e.g., sex, → describes or characterized person, objects, religion, race, and color of the skin, etc.). situation, and phenomena with some reliability (facts). 2. Quantitative Data → make statement and comparison in an objective → consists of numbers representing counts or manner. measurements (e.g., weights, heights, temperature, → make evidence-based decisions. scores, etc.).
Steps in Statistical Investigation Classification of Variables
1. Identify the Problem 1. According to Source 1.1 Primary Data 2. Collection of Data → refer to information which is gathered directly from → refers to the different methods and techniques of the original source. gathering the data. 1.2 Secondary Data 3. Presentation of Data → refer to information which is taken from a secondary → refers to the tabulation and organization of data in source. tables, graphs, and charts. 2. According to Functional Relationship 4. Analysis of Data 2.1 Independent Data → process of deriving relevant information from the → refer to any controlling data; affects the dependent gathered data through the different statistical tools. data; sometimes called as predictor variable.
5. Interpretation of Data 2.2 Dependent Data
→ refers to the task of drawing conclusions or → any data that is affected by the controlling data; inferences from the analyzed data. sometimes called as criterion variable.
Population and Sample 3. According to Continuity of Values
→ universe – set of all entities under study. 3.1 Discrete Data → population – set of complete collection or totality of → quantitative data which can assume a finite or all possible values of the variables. countable number of values; cannot be represented → sample – subset or sub collection of elements drawn by fractions or decimal numbers but by any whole from a population; refers to the proportion of a number only. population. 3.2 Continuous Data Data and Variable → quantitative data which can assumes an infinity of → data – refers to any information concerning to a many possible values corresponding to the points on population or sample. a line interval; can be represented by fractions and → variable – attribute of interest observable of each decimal. entity in the universe. → parameter and statistic – numerical measures that 4. According to Scale of Measurements describe the population of interest. 4.1 Nominal → data that consists of names, labels, or categories Divisions of Statistics only commonly used by number to categorize data. 1. Descriptive Statistics → division of statistics that summarizes or describes 4.2 Ordinal the important characteristics of a given set of data. → measurements which deal with order or rank; degrees of difference are not available. 1 TRANSCRIBED BY: BUCYOT (BSN 1 – Y1 – 37) MATM111 – MATHEMATICS IN THE MODERN WORLD 1ST SEMESTER – MIDTERM – A.Y. 2023-2024 LESSON: INDTRODUCTION TO STATISTICS
4.3 Interval → distribution of the data is normal.
→ similar with ordinal but this level of measurement does not only show likeness or difference between Median data, likewise it gives meaningful amounts of → central value of distribution. differences between data. It does not have a “true- → value that divided the distribution into two equal zero” starting point, instead it is arbitrarily assigned. parts. → formula: 4.4 Ratio a. if n is odd → a modifies interval level to include the starting point n + 1 th “zero”; the quality of ratio or proportion is meaningful. x̅ = ( ) 2 b. if n is even Measure of Central Tendency n x̅ = ( )th → an index of the central location of a distribution. It is 2 a single value that is used to identify the “center” of the data or the typical value. Advantages of the Median → precise yet simple → not affected by extreme values. → most representative value of the data. → exact middle value of the distribution. → can be computed even for grouped data with open- Arithmetic Mean ended class intervals. → most frequently used measure of central tendency. → the sum of the observations divided by the total Disadvantages of the Median number of observations. → median cannot be combined with other distribution → sum of all values, divided by the total number of with similar variates to obtain overall median. values. → median value does not have direct relation to the → notations: total number of observations and their total value. It a. 𝛍 – used to denote population mean; parameter merely indicates the value that divides the b. 𝐱̅ – used to denote sample mean; statistic population into two parts. → can be computed in two ways: a. ungrouped data When to Use the Median b. grouped data → data is in ordinal scale. → formula: → middle value is desired. Σx → measure of central tendency that is not affected bu x̅ = extreme values is needed. n → data distribution is skewed. Advantages of the Mean → if the distribution has open-ended intervals. → takes into account of all observations. → can be used for further statistical calculation and Mode mathematical manipulation. → value of variable or set that occurs the most → values of the mean always exists and unique. frequently in a distribution. → widely understood measure of central tendency. → also referred to as the nominal average. → determine the mode by counting the frequency of Disadvantages of the Mean each observed value and finding the observed → may or may not be an actual observed value in the value with the highest frequency of occurrence. data set. → unimodal – one mode → easily affected by extreme values, especially if the → bimodal – two modes number of observations is small. → multimodal – more than two modes → cannot be computed if there are missing values due → no mode to omission or non-response. → in grouped data with open-ended class intervals, the Advantages of the Mode mean cannot be computed. It is independent on all → extreme values do not easily affect the mode. observed values. → value is always one of the observed values in the data set. When to Use the Mean → can be obtained bot for qualitative and quantitative → data is of interval and ratio scale. types of data. → the values of each score are desired. → further statistical computation is needed. Disadvantages of the Mode 2 TRANSCRIBED BY: BUCYOT (BSN 1 – Y1 – 37) MATM111 – MATHEMATICS IN THE MODERN WORLD 1ST SEMESTER – MIDTERM – A.Y. 2023-2024 LESSON: INDTRODUCTION TO STATISTICS
→ mode is sometimes not unique or does not exist.
→ does not possess the desired algebraic property of the mean that allows further manipulation. → obtain new mode of distributions, all the raw data of the different distributions have to be merged to obtain a new mode.
When to Use the Mode
→ data is in nominal scale. → most frequent value is desired.
(Lecture Notes in Economics and Mathematical Systems 374) Søren Asmussen, Reuven Rubinstein (Auth.), Prof. Dr. Georg Pflug, Prof. Dr. Ulrich Dieter (Eds.)-Simulation and Optimization_ Proceedings of t