You are on page 1of 16

Prepared by Henry Brandon (23/09/2023)

PRESIDENT’S COLLEGE
DEAPRTMENT OF MATHEMATICS
Integrated Mathematics
Module 2 – Statistics
Handout 1: Data Collection & Presentation
Week 3
Organization of Data
The organization of data refers to the structured arrangement and management of information or data elements to
make them accessible, understandable, and useful for various purposes. It involves the design and implementation
of a systematic framework or structure that allows data to be stored, retrieved, processed, and analyzed efficiently.
Organizing data is crucial for decision-making, information retrieval, data analysis, and overall data management.
Several key aspects of data organization include:
1. Data Structure: Data can be organized using various structures such as tables, lists, trees, graphs,
databases, and more, depending on the nature and requirements of the data. These structures help
establish relationships and hierarchies among data elements.
2. Data Classification: Data is categorized into different groups or classes based on its characteristics and
attributes. Classification helps in organizing data into meaningful categories, making it easier to manage
and retrieve.
3. Data Naming and Labeling: Properly naming and labeling data elements, fields, or variables is essential for
clarity and consistency. Clear and meaningful names make it easier to understand and work with the data.
4. Data Hierarchies: Data often has hierarchical relationships, where some data elements are more granular
or detailed, while others are higher-level summaries or aggregations. Organizing data hierarchically can
facilitate navigation and analysis.
5. Data Indexing: Indexing involves creating data structures (e.g., indexes) to quickly locate specific data
records or entries within a large dataset. Indexing improves data retrieval performance, especially in
databases.
6. Data Relationships: Establishing relationships between different data elements or entities is crucial for
relational databases. Relationships define how data elements are related to each other and are key for
maintaining data integrity.
7. Data Storage: Choosing appropriate storage mechanisms and technologies to store data efficiently, whether
it's in traditional databases, data warehouses, data lakes, or cloud-based storage solutions.
8. Data Security: Implementing measures to protect data from unauthorized access or breaches is a
fundamental aspect of data organization. This includes user authentication, encryption, access controls, and
data auditing.
9. Data Documentation: Properly documenting data, including metadata (data about data), is essential for
understanding its context, source, quality, and usage. Documentation is crucial for data governance and
compliance.
10. Data Retrieval and Querying: Creating mechanisms for retrieving and querying data is vital for extracting
meaningful insights from it. This may involve the use of query languages, search algorithms, or data
analysis tools.
11. Data Maintenance: Regularly updating, cleaning, and maintaining data to ensure its accuracy, consistency,
and reliability over time.
12. Data Coding: Data coding involves assigning numerical or categorical codes to represent specific categories,
attributes, or values within a dataset. This coding process simplifies the data, making it more manageable
and facilitating analysis. For example, in a survey, you might code responses like "Male" as 1 and "Female"
as 2. Coding is essential for quantitative analysis and statistical operations.
13. Data Entry: Data entry is the process of manually inputting data into a digital system or database. This can
involve entering data from paper forms, surveys, or other sources into a computerized system. Accurate
and consistent data entry is critical for maintaining data quality and integrity. Errors in data entry can lead
to problems down the line when analyzing or using the data.
14. Data Validation: During the data entry process, data validation checks can be implemented to ensure that
the entered data adheres to predefined rules or standards. This helps identify and correct errors or
inconsistencies in real-time, contributing to data quality.
15. Data Transformation: In some cases, data entry and coding may also involve transforming data from one
format or structure to another. For example, converting dates from different formats into a standardized
date format for consistency.
16. Data Cleaning: Data cleaning often follows data entry, and it involves identifying and correcting errors,
missing values, duplicates, or outliers in the dataset. Cleaning is essential to ensure data accuracy and
reliability.
Prepared by Henry Brandon (23/09/2023)
Instruments Used to collect data
1. Surveys: Surveys involve asking structured questions to a sample of individuals or organizations. They can
be conducted through various means, including paper surveys, online surveys, telephone interviews, or
face-to-face interviews.
2. Questionnaires: Questionnaires are a type of survey instrument that typically consists of a set of
standardized questions. Respondents provide written or verbal responses to these questions.
3. Interviews: Interviews involve one-on-one or group interactions with participants to gather information.
They can be structured (with predetermined questions) or unstructured (more open-ended and
conversational).
4. Observations: Observational data is collected by directly watching and recording events, behaviors, or
phenomena. It can be done in a controlled setting (controlled observations) or in natural environments
(naturalistic observations).
5. Experiments: Experiments involve manipulating one or more variables to observe their effects on other
variables. Experiments are commonly used in scientific research to establish cause-and-effect relationships.
6. Focus Groups: Focus group discussions involve a small group of participants who engage in a facilitated
discussion about a specific topic. They are often used to gather qualitative insights and opinions.
7. Content Analysis: Content analysis involves systematically analyzing written, visual, or audio materials
(e.g., texts, videos, images) to extract and code relevant information.
8. Case Studies: Case studies involve an in-depth examination of a single case or a few cases. Researchers
collect detailed information about the case(s) to gain insights into specific phenomena.
9. Secondary Data: Secondary data is data that has been previously collected by someone else for a different
purpose. Researchers use existing datasets, documents, or records for their analysis.
10. Sensor Data: Sensors and instruments like GPS devices, temperature sensors, accelerometers, and more can
collect data automatically in real-time, often used in scientific research and environmental monitoring.
11. Web Scraping: Web scraping involves extracting data from websites or online sources. It is commonly used
in data collection for web-based research.
12. Diaries and Journals: Participants keep records of their activities, thoughts, or experiences in diaries or
journals. These can provide valuable insights into daily life and attitudes.
13. Photography and Videography: Images and videos can capture visual data, which is especially useful in
fields like anthropology, ecology, and art analysis.
14. Biometric Data: Biometric instruments can collect physiological data like heart rate, EEG
(electroencephalogram), or eye-tracking data to study human behavior and physiological responses.
15. Social Media Data: Data from social media platforms can be collected for various purposes, such as
sentiment analysis, trend tracking, or studying online behavior.
16. Geographic Information Systems (GIS): GIS tools collect and analyze spatial data, including maps,
geographic coordinates, and geographic features.
17. Telemetry: Telemetry involves remotely collecting data from sensors or instruments and transmitting it to
a central location for monitoring or analysis. It's commonly used in fields like environmental science and
engineering.
18. Biological Samples: In fields like biology and medicine, researchers collect biological samples (e.g., blood,
tissue, DNA) for laboratory analysis.
19. Economic Indicators: Economic data, such as GDP, unemployment rates, and inflation, are collected by
government agencies and organizations to monitor economic conditions.
20. Psychometric Tests: Psychometric instruments are used in psychology and education to measure cognitive
abilities, personality traits, and other psychological constructs.

Presenting and Organizing Data Visually


A. Tally Tables: Tally tables, also known as tally sheets or tally charts, are simple tools used for counting and
recording occurrences or frequencies of specific events or items. Tally marks are typically used to
represent counts, with each tally mark representing a predetermined number of occurrences, such as five.
Tally tables are often used in manual data collection to track quantities efficiently.
B. Frequency Tables: Frequency tables are organized data tables that display the frequency or count of each
unique value or category within a dataset. They provide a summary of how many times each value appears.
Frequency tables are commonly used in statistics and data analysis to help understand the distribution of
data.
C. Cumulative Frequency Tables: Cumulative frequency tables, also known as cumulative frequency
distributions, extend the concept of frequency tables by showing not only the individual frequencies but
also the cumulative or running total of frequencies as values are grouped or sorted in ascending or
descending order. Cumulative frequency tables are used to analyze and understand the cumulative
distribution of data, which is useful for various statistical purposes, such as finding percentiles or creating
cumulative frequency histograms.
Tally Table
Prepared by Henry Brandon (23/09/2023)

Frequency Table

Cumulative Frequency Table

Presentation of Data
The presentation of data refers to the process of visually and graphically representing information, facts, or
findings in a clear, understandable, and often engaging manner. The primary objective of data presentation is to
communicate data-driven insights effectively to an audience, making it easier for them to grasp the meaning,
patterns, and implications of the data. Effective data presentation enhances data interpretation and aids decision-
making. Common methods of presenting data include charts, graphs, tables, maps, infographics, and narratives,
among others. The choice of presentation format depends on the nature of the data, the audience, and the specific
objectives of conveying the information.
1. Bar Chart:
 Definition: A bar chart represents data using rectangular bars of varying lengths. The length of each bar
corresponds to the value it represents.
 When to Use: Use a bar chart to compare discrete categories or data points. For example, you can use a
bar chart to compare the sales performance of different products in a store over a month.
2. Pie Chart:
 Definition: A pie chart is a circular graph divided into slices, with each slice representing a portion of a
whole. Each slice's size represents the proportion or percentage of a category.
 When to Use: Use a pie chart to show the composition of a whole when you want to emphasize relative
proportions. For instance, you can use a pie chart to display the distribution of expenses in a budget.
3. Line Graph:
Prepared by Henry Brandon (23/09/2023)
 Definition: A line graph displays data points as a series of connected dots, forming a line. It is suitable
for showing trends and changes over time or a continuous variable.
 When to Use: Use a line graph to illustrate the stock price of a company over several years, highlighting
the trend in its value.
4. Histogram:
 Definition: A histogram displays the distribution of numerical data using vertical bars or bins. Each bar
represents a range of values, and the height indicates the frequency of data points in that range.
 When to Use: Use a histogram to visualize the distribution of exam scores in a classroom, showing how
many students scored within each score range.
5. Frequency Polygon:
 Definition: A frequency polygon is a line graph that represents data frequencies using line segments
connected to data points or bins.
 When to Use: Use a frequency polygon in conjunction with a histogram to provide a smoother
representation of the data's distribution, making it easier to identify trends.
6. Ogive:
 Definition: An ogive, or cumulative frequency curve, displays cumulative frequencies or percentages of
data values.
 When to Use: Use an ogive to visualize the cumulative distribution of data, such as the cumulative
number of customers who have purchased a product at different price points.
7. Box-and-Whisker Plot:
 Definition: A box-and-whisker plot summarizes the distribution of numerical data by displaying the
median, quartiles, range, and outliers in a graphical format.
 When to Use: Use a box-and-whisker plot to compare the salary distributions of employees in different
departments of a company.
8. Stem-and-Leaf Plot:
 Definition: A stem-and-leaf plot organizes and displays numerical data by separating each data point
into a "stem" (leading digits) and a "leaf" (trailing digits).
 When to Use: Use a stem-and-leaf plot to visualize the distribution of ages of participants in a survey.
9. Scatter Plots:
 Definition: A scatter plot shows individual data points as dots on a two-dimensional plane. It is used to
visualize relationships between two continuous variables.
 When to Use: Use a scatter plot to explore the correlation between the number of hours spent studying
and exam scores for a group of students.
10. Cross Tabulation of Nominal or Categorical Data:
 Definition: Cross tabulation, or a contingency table, summarizes the relationships between two or more
categorical variables by presenting the frequency or counts in a table format.
 When to Use: Use cross tabulation to analyze the relationship between gender (male or female) and the
preference for a particular type of smartphone (iPhone or Android) among survey respondents.
A Bar Chart, Pie Chart & Line Graph

Frequency Polygon
Prepared by Henry Brandon (23/09/2023)

Histogram

Ogive

Box-and-Whisker Plot
Prepared by Henry Brandon (23/09/2023)

Stem-and-Leaf Plot

Scatter Plot

Cross Tabulation of Nominal/Categorical Data or Contingency Table


Prepared by Henry Brandon (23/09/2023)

The Shape of A Distribution


The shape of a distribution refers to the overall pattern or form that the data points or values take when they are
graphically represented. The shape of a distribution is a fundamental characteristic that provides insights into the
underlying characteristics and behavior of the data.
There are several common shapes that data distributions can exhibit, including:
A. Symmetric: A symmetric distribution is one in which the data is evenly distributed on both sides of the
central point or mean. It appears like a mirror image when folded in half. The classic example of a
symmetric distribution is the normal distribution (bell-shaped curve), where data is concentrated around
the mean, and tails on either side extend equally.
B. Skewed: A skewed distribution is asymmetrical, meaning that it is not evenly balanced around the mean.
There are two types of skewness:
 Positively skewed (right-skewed): In a positively skewed distribution, the tail on the right side (the
higher values) is longer or stretched compared to the left side. This indicates that the majority of
data points are concentrated on the lower end, and there are relatively few high values.
 Negatively skewed (left-skewed): In a negatively skewed distribution, the tail on the left side (the
lower values) is longer or stretched compared to the right side. This indicates that the majority of
data points are concentrated on the higher end, and there are relatively few low values.
C. Bimodal: A bimodal distribution has two distinct peaks or modes, indicating that there are two separate
groups or clusters within the data. This suggests that the data may come from two different populations or
sources.
D. Uniform: A uniform distribution, also known as a rectangular distribution, is characterized by data points
being evenly distributed across the entire range of values, resulting in a flat and constant shape.
E. Multimodal: A multimodal distribution has more than two distinct peaks or modes, indicating the presence
of multiple subgroups or sources in the data.
F. Irregular or Skewness Combos: Some distributions may exhibit complex or irregular shapes, combining
elements of skewness, multimodality, and other patterns.
The shape of a distribution provides valuable information about the central tendency, variability, and patterns in
the data. Understanding the shape helps researchers and analysts make informed decisions about data analysis
techniques, statistical tests, and modeling approaches that are appropriate for the data at hand. It also aids in
identifying outliers, understanding the underlying processes generating the data, and drawing meaningful
conclusions from the dataset.
Negative & Positive Skew
Prepared by Henry Brandon (23/09/2023)

Normal or Symmetric Distribution

Bimodal Distributions

Uniform Distribution
Prepared by Henry Brandon (23/09/2023)

Multimodal Distribution

Examples:
Stem-and-Leaf Plots
Prepared by Henry Brandon (23/09/2023)

0
1 8
2 0 2 2 3 4 7 8 8 9 9
3 0 0 0 0 1 1 1 2 3 4 4 4 5 6 6 6 6 7 7 7 8 9 9 9 9
4 0 0 0 0 1 1 1 1 1 2 2 4 5 6 6 7 7 9
5 0 0 1 1 2 2 3 3 3 5 5 8 9 9
6 0 1 3 5 8 8
7 3 4
8
9 4

Key: 6|8 means 68.


Interpretation:
The median is observed to be 40. The mode is observed to be 41. The shape of the distribution is positively skewed.

Line Graph
Prepared by Henry Brandon (23/09/2023)

Interpretation: Trends – increase, decrease, fluctuating, consistent.


In the beginning there is an increasing trend of 12 for the number of times mothers need to remind teenagers of
doing their chores. Then there is a sharp decrease of 10 for the remaining number of times that mothers need to
remind teenagers to do their chores. It was observed that the highest frequency is 14 and the lowest frequency is 2.

Bar Charts – Simple: Vertical & Horizontal, Stacked: Horizontal & Vertical, Comparative: Vertical &
Horizontal
Prepared by Henry Brandon (23/09/2023)

Interpretation:
The highest bar was observed to be the age group of 13-25, while the lowest bar was observed to be the age group
of 45-64.

Histograms
Prepared by Henry Brandon (23/09/2023)

Interpretation:
Increasing and decreasing trends and we can also look at the shape of the distribution. (approximately Normal)

Frequency Polygons
Prepared by Henry Brandon (23/09/2023)

Interpretation:
Same interpretations as a simple line graph.

Pie Charts:
Prepared by Henry Brandon (23/09/2023)
Interpretation:
a. Bicycle = 125
b. Does not walk = 415
c. Bus & Car = 290
Students who come to school by car are the most, while students who walk to school are the least. Majority of the
students use a form of transportation (58%) that doesn’t involve using any physical energy.

Box-and-Whisker Plots:

Interpretation: Highest observation, lowest observation, the median Q 2, the first quartile Q 1and the 3rd quartile Q 3 .
The 1st whisker is bounded by the lowest observation and the 1 st quartile. The rectangular box is bounded by the 1 st
and 3rd quartiles. The 2nd whisker is bounded by the 3 rd quartile and the highest observation. Finally, the median is
represented by a stroke through the rectangular box. This is called the 5 – point set.
Shape of the distribution can be determined from the Box-and-Whisker plot. Now to determine whether there is a
positively or negatively skewed distribution, we follow the following formula.
We need the Median and the Mean.
Measuring Skewness:
If the mean = median, then we have a normal distribution.
If the mean > median, then we have a positively skewed distribution.
If the mean < median, then we have a negatively skewed distribution.
Further, we can look at the Box-and-Whisker plot to determine the amount of data that is above or below a certain
point (percentile).
Range = Highest observation – lowest observation.
IQR & SIQR.
Prepared by Henry Brandon (23/09/2023)
Ogive:

Interpretation:
More or less this is the same interpretation as the Box-and-Whisker plot.

You might also like