You are on page 1of 6

Polytechnic University of the Philippines

College of Business Administration


Department of Human Resources Management

Group 7
Written Report

E-book on Essentials of
Business Analytics

BSBA HRM 2-1N


Members:
Bambico, Dindo
Gerilla, Mark Gabriel
Guerrero, Jaris Troy
Manon-og, Diana Marie
Francis Luis Buenaventura
What is Statistics?

Statistics is a branch of applied mathematics that deals with collecting,


organizing, analyzing, reading and presenting data. Descriptive statistics make
summaries of data. In addition to being the name of a field of study, the word "statistics"
can also mean numbers that are used to describe data or relationships. Based on
Oxford dictionary, statistics is the practice or science of collecting and analyzing
numerical data in large quantities, especially for the purpose of inferring proportions in a
whole from those in a representative sample.

What is data
– Data is a collection of information and statistics used for reference or analysis.
According to Kids Do Ecology, Data are the information gained from observing and
testing an experiment. Scientists use data to gain understanding and make conclusions.
Here is an example of data presented using a graph.

What is descriptive statistics


Descriptive statistics, help describe and understand the features of a specific
data set by giving short summaries about the sample and measures of the data. The
most recognized types of descriptive statistics are measures of center: the mean,
median, and mode, which are used at almost all levels of math and statistics. This type
of statistics can help us understand the collective properties of the elements of a data
sample.
Types of Data

Data can be categorized in several ways based on how they are collected and
the type collected. In many cases, it is not feasible to collect data from the
population of all elements of interest. In such instances, we collect data from a
subset of the population known as a sample. For example, with the thousands
of publicly traded companies in the United States, tracking and analyzing all of
these stocks every day would be too time consuming and expensive. The Dow
represents a sample of 30 stocks of large public companies based in the
United States, and it is often interpreted to represent the larger population of
all publicly traded companies.
It is very important to collect sample data that are representative of the
population data so that generalizations can be made from them. In most cases
although not true of the Dow, a representative sample can be gathered by
random sampling of the population data. Dealing with populations and
samples can introduce subtle differences in how we calculate and interpret
summary statistics. In almost all practical applications of business analytics, we
will be dealing with sample data.
Quantitative and Categorical Data
 Quantitative Data: If numeric and arithmetic operations such as addition,
subtraction, multiplication and division can be performed on them.
 Categorical Data: If numeric and arithmetic operations cannot be performed.
It can be summarized categorical data by counting the number of observations
or computing the proportions of observations in each category.

Cross Sectional Data and Time Series Data


 Cross Sectional Data: It is collected from several entities at the same or
approximately the same point in time.
 Time Series Data: It is collected over several time periods. Graphs of time
series data are frequently found in business and economic publications such
graphs help analysts understand what happened in the past, identify trends
over time, and project future levels for the time series.
What are the items under the following categories and its definitions

1) Creating distribution from data - Distributions help summarize many


characteristics of a data set by describing how often certain values for a variable
appear in a data set. Distributions can be created for both categorical and
quantitative data, and they assist the analyst in gauging variation.

a) Frequency Distributions for Categorical Data - It is often useful to create a


frequency distribution for a data set.
b) Relative Frequency and Percent Frequency Distributions - A frequency
distribution shows the number (frequency) of items in each of several non-
overlapping bins.
c) Frequency Distributions for Quantitative Data - We can also create frequency
distributions for quantitative data, but we must be more careful in defining the
nonoverlapping bins to be used in the frequency distribution.
d) Histograms - A common graphical presentation of quantitative data is a
histogram. This graphical summary can be prepared for data previously
summarized in either a frequency, a relative frequency, or a percent frequency
distribution.
e) Cumulative Distributions - A variation of the frequency distribution that
provides another tabular summary of quantitative data is the cumulative
frequency distribution, which uses the number of classes, class widths, and
class limits developed for the frequency distribution.

2) Measures of Location and its definitions

a) Mean (Arithmetic Mean) - The most commonly used measure of location is the
mean (arithmetic mean), or average value, for a variable. The mean provides a
measure of central location for the data. If the data are for a sample (typically
the case), the mean is denoted by x. The sample mean is a point estimate of
the (typically unknown) population mean for the variable of interest. If the
data for the entire population are available, the population mean is computed
in the same manner, but denoted by the Greek letter m.
b) Median - The median, another measure of central location, is the value in the
middle when the data are arranged in ascending order (smallest to largest
value). With an odd number of observations, the median is the middle value.
An even number of observations has no single middle value. In this case, we
follow convention and define the median as the average of the values for the
middle two observations.
c) Mode - A third measure of location, the mode, is the value that occurs most
frequently in a data set. To illustrate the identification of the mode, consider
the sample of five class sizes.
d) Geometric Mean - The geometric mean is a measure of location that is
calculated by finding the nth root of the product of n values.

3) Measures of Variability and its definitions - In addition to measures of location, it


is often desirable to consider measures of variability, or dispersion.

a) Range - The simplest measure of variability is the range. The range can be
found by subtracting the smallest value from the largest value in a data set
b) Variance - The variance is a measure of variability that utilizes all the data. The
variance is based on the deviation about the mean, which is the difference
between the value of each observation (xi) and the mean.
c) Standard Deviation - The standard deviation is defined to be the positive
square root of the variance. We use s to denote the sample standard deviation
and s to denote the population standard deviation.
d) Coefficient of Variation - in some situations we may be interested in a
descriptive statistic that indicates how large the standard deviation is relative
to the mean. This measure is called the coefficient of variation and is usually
expressed as a percentage.

4) Analyzing Distributions and its definitions - Distributions are very useful for
interpreting and analyzing data. A distribution describes the overall variability of
the observed values of a variable.

a) Percentiles - A percentile is the value of a variable at which a specified


(approximate) percentage of observations are below that value. The pth
percentile tells us the point in the data where approximately p percent of the
observations have values less than the pth percentile; hence, approximately
(100 – p) percent of the observations have values greater than the pth
percentile.
b) Quartiles - It is often desirable to divide data into four parts, with each part
containing approximately one-fourth, or 25 percent, of the observations. These
division points are referred to as the quartiles and are defined as:
Q1 5 first quartile, or 25th percentile
Q2 5 second quartile, or 50th percentile (also the median)
Q3 5 third quartile, or 75th percentile.
c) Z-Scores - A z-score allows us to measure the relative location of a value in the
data set. More specifically, a z-score helps us determine how far a particular
value is from the mean relative to the data set’s standard deviation.
d) Empirical Rule - the empirical rule can be used to determine the percentage of
data values that are within a specified number of standard deviations of the
mean.
e) Identifying Outliers - Sometimes a data set will have one or more observations
with unusually large or unusually small values. These extreme values are called
outliers. Experienced statisticians take steps to identify outliers and then
review each one carefully.
f) Box Plots - A box plot is a graphical summary of the distribution of data. A box
plot is developed from the quartiles for a data set.

5) Measures of Association Between Two Variables and its definitions

a) Scatter Charts - A scatter chart is a useful graph for analyzing the relationship
between two variables.
b) Covariance - Covariance is a descriptive measure of the linear association
between two variables.
c) Correlation Coefficient - The correlation coefficient measures the relationship
between two variables, and, unlike covariance, the relationship between two
variables is not affected by the units of measurement for x and y.

You might also like