You are on page 1of 14

EXECUTIVE SUMMARY

This research investigates the statistical approaches that are utilized in SSI's
business planning processes for the purpose of managing quality, inventory, and
capacity. SSI serves as the case study for this investigation. Additionally, many
distinct kinds of visual representations are scrutinized. The following are some
examples of SSIs that are commonly encountered. When conducting an analysis
of raw business data, it is critical to make use of management statistics, and it is
my sincere desire that after reading this article, the reader will have a much
better understanding of why this is the case.
I. INTRODUCTION
SSI Securities Corporation (SSI - HOSE) started doing business for the first time in
December 1999, becoming the first privately licensed securities firm and the smallest one
operating on the market. After 21 years of operating in the financial market in Vietnam, the
Company's charter capital has expanded by a factor of more than 1,500, establishing it as a
major financial institution with the highest growth rate. SSI Securities Corporation is the only
company you need to consider working with if you are seeking a securities firm in Vietnam.
In addition to trading in securities and derivatives, this company also provides various other
services, including brokerage, underwriting for the issuance of securities, custodian services,
financing, investment counseling, and margin lending. Additionally, it provides services
associated with trading futures, warrants, bonds, and shares. The establishment of trading
accounts, the transfer of funds, the auctioning of securities, and the conduct of electronic
trading are all made easier by the presence of this organization. In addition to offering
services related to portfolio management, it also offers advisory services for the stock and
debt capital markets, mergers and acquisitions, and corporate finance.
The report is divided into two major sections. The first section employs a variety of statistical
techniques utilized in business planning for quality/inventory/capacity management. The
focus of the second section will be the application of various visual representations to the
dataset. The author will use frequency distribution tables, graphs, and charts to explain and
evaluate data precisely.
II. MAIN CONTENTS
1. Apply a range of statistical methods used in business planning for quality /inventory/
capacity management
1.1. Statistical methods used to measure the variability in business processes or quality
management
1.1.1. Range
Definition: The gap in values between the lowest and highest
Formula: Range = Maximum value – Minimum Value
Example: In the set {4, 6, 9, 3, 7}, the minimum value is 3 and the maximum value is 9,
hence the range is 9 3 = 6.
1.1.2 Standard deviation
Definition: Standard deviations (σ) provide a numerical indication of the extent to which
values can deviate from the mean. When the standard deviation is small, data tend to cluster
near the mean, whereas when it is large, data tend to be more dispersed. If the standard
deviation is near zero, the data points must be relatively close to the mean; otherwise, they
must be somewhat distant from the mean.
Example: Consider the following data: 2, 1, 3, 4, 2. The mean and sum of squares of
deviations from the mean of the observations will be 2.4 and 5.2, respectively. The standard
deviation will therefore equal (5.2/5) = 1.01
1.1.3 Variance
Definition: the degree of dispersion in a data collection is quantified by its variance. The
variance is equal to zero if all the data points are in the same ballpark. Every divergence from
zero is a positive. When the variance is low, the data points tend to cluster near to the mean
and one another, but when it's high, the data points tend to be spread apart. The variance is
the median squared deviation from the mean of all the data points.
Example:
Let's determine the variance for the following data set: 2, 7, 3, 12, 9.
Calculating the mean is the first step. There are 5 data points and the sum is 33. Therefore,
33/5 = 6.6 is the mean. Then, you remove the mean from each value in the data set and square
the resulting difference. For the first value, for example:
(2 - 6.6)2 = 21.16
The squared differences are added for each value:
21.16 + 0.16 + 12.96 + 29.16 + 5.76 = 69.20
The total is then divided by the number of observations:
69.20 ÷5 = 13.84
13.84 is the variance
1.2. Probability distributions and application to business operations and processes (P4)
1.2.1. Definition
Probability distributions are statistical functions that characterize all possible values and
probabilities for a random variable within a given interval. The feasible values lie between
the minimum and maximum, but where on the probability distribution they fall depends on a
variety of factors. The mean, standard deviation, skewness, and kurtosis of distribution are all
measures of these factors (Andrea, Alberto, Luca, 2015).
1.2.2. Types of Probability distributions and their applications
1.2.2.1. Uniform Distribution
Definition: A uniform distribution is a type of probability distribution in statistics in which
every possible result has the same chance of happening. There is an equal chance of picking a
heart, club, diamond, or spade from a standard deck of playing cards. The chances of getting
a head or a tail when you toss a coin are the same, giving the coin a uniform distribution
(James, 2021).
Example:
A basic playing card deck contains 52 cards. Hearts, Diamonds, Clubs, and Spades make up
the four suits. The A, 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K, and 2 jokers are distributed evenly
throughout the four suits. For the sake of simplicity, we'll ignore the jokers and face cards
and only look at the four copies of each ace through king that exists in the deck. This leaves
us with a set of forty cards, which is a discrete dataset.
Let's pretend you're interested in the chances of drawing two hearts from the new deck. There
is a 2.5% chance that you'll draw a 2 of hearts or 1 in 40 chances. Because every card in a
deck is distinct, the odds of picking any one card are the same.
1.2.2.2. Normal Distribution
Definition: Data near the mean are more frequent occurrences than data distant from the
mean, as shown by the normal distribution, also known as the Gaussian distribution, which is
a probability distribution that is symmetric around the mean (James, 2022).
Example:
The normal distribution can also be shown in a fair game of dice. Experiments show that the
average likelihood of getting a "1" from rolling a dice 100 times is between 15% and 18%
(1/6) and that this average remains unchanged at 16.7% (1/6) after rolling the dice 1,000
times. When two dice are rolled at the same time, there are 36 possible outcomes. With six
different permutations, the probability of rolling a "1" is roughly 16.7%, or (6/36). Throwing
additional dice will produce a more complex normal distribution graph.
1.2.2.3. Poisson Distribution
Definition: In statistics, the likelihood of an event occurring is represented by a
probability distribution called a Poisson distribution. What this means is that there will be a
tally-based distribution. Poisson distributions are widely used to understand the occurrence of
independent events that occur at a constant rate over a given interval of time (Shaun, 2022).
Examples
In one of the early applications of the Poisson distribution, statistician Ladislaus Bortkiewicz
played a pivotal role. It was in the late 1800s that he investigated the tragedy of inadvertent
horse kick deaths among Prussian soldiers. The data he analyzed spanned twenty years for
ten separate army corps, the equivalent of two centuries of monitoring for a single corp.

The following histogram shows hypothetical data that is in line with what Bortkiewicz
observed.
According to his research, horse kicks cause the deaths of an average of 0.61 men per corps
per year. In most years, no soldiers are however killed by horse kicks. Alternately, four men
from the same unit were killed by horse kicks in a single dreadful year.

1.2.2.4. Exponential Distribution


Definition: The exponential distribution is a type of continuous probability
distribution used extensively in probability theory and statistics, and it is often related to the
average time until a certain event occurs. It's a process in which unrelated events take place at
a steady, consistent rate. A unique feature of the exponential distribution is that it has no
memory. Exponentially distributed random variables can have either a larger proportion of
very small values or a smaller proportion of very high values. An exponential distribution
describes, for instance, how much money a customer spends on a single trip to the grocery
shop ( Rather & Rather, 2017).
Examples: Let's imagine that, according to a survey, the typical amount of time that
a person spends having a conversation on a public telephone is about fifteen minutes. In this
circumstance, the exponential distribution function can be utilized to ascertain the possibility
that the person in front of you will wrap up their chat in fewer than ten minutes.
1.2.2.5. Statistical Quality Control
Definition: Statistical process control (SPC) is a form of quality control that uses
statistical approaches for process monitoring and control. Because of this, the process runs
smoothly, producing more products that meet requirements with less waste (rework or scrap).
To put it another way, SPC can be used for any process where the output of "compliant
goods" (items that satisfy standards) can be quantified. Standardized process control (SPC)
relies on run charts, control charts, a focus on continuous improvement, and the design of
experiments. Assembly lines are one use of statistical process control (Muhammad, Prybutok,
Abdullah, Talukder, 2010).
Examples: For example, SQC enables factories to maximize profitability through
thorough testing of finished goods. Using this method, a manufacturing organization can
study the range of goods with projected values under current conditions. This information is
precisely assessed for many identical products and then shared with the manufacturer and
buyer.
1.3. Inferential statistics
1.3.1. Definitions
When it comes to concluding a population as a whole, inferential statistics is the
statistical subfield that uses analytical tools to draw such inferences from data collected by
random sampling. With inferential statistics, we can extrapolate findings from a sample to the
entire population. An inferential statistician uses a sample statistic (e.g., the sample mean) to
infer a value for a population variable (e.g., the population means) (Zhang, Wang, Cai, Zhao,
2018).
1.3.2. Sampling methods
In statistics, the sampling method is the process of studying a population through
data collection and examination. The sample space is most extensive at the point of data
foundation (Taherdoost, 2016).
a. Random sampling
Definition: The term " random sample" refers to a selection of individuals from a
larger population that is made at random. All members of the population have the same
chance of being sampled (Mitra, Pathak, 1984).
Example: To illustrate a straightforward random sampling, consider the following
scenario: a corporation with 250 workers chooses 25 of those workers at random from a hat
containing their names. In this particular scenario, the sample is representative of random
selection because each of the 250 employees stands an equal chance of being chosen.
b. Stratified sampling
Definition: Stratified sampling is a method for drawing a statistically valid sample
from a population that has been partitioned into subpopulations based on shared traits (strata).
Stratified sampling is a method used by researchers to ensure the inclusion of all relevant
demographic categories. In addition, it aids in the estimate of group characteristics. This
method is used in many surveys to better understand differences between segments of the
population (Shen, 2015).
Example: Suppose we're comparing test scores by wealth to evaluate standardized
testing. Income stratification is possible. Students from similar-earning households should
mirror the state's population.

We need a random sample for objective population estimations and exact income data. Using
random sampling, income levels with few pupils and random chance could provide small
sample sizes. Smaller samples give imprecise estimates.
We'll use stratified sampling to avoid this. We could sample 100 students from each income
bracket. This technique presupposes we know each student's income, which is problematic. c.
Cluster sampling
Definition: Cluster sampling is a type of probability sampling that is used for
research purposes. In this method, the population is segmented into several separate groups or
clusters. The researchers then select groups at random using a process that is either
fundamentally random or systematic random sampling. This allows them to collect and
analyze data more effectively (Simkus, 2022).
Example: Let's say a corporation is interested in studying the efficacy of cell phones
in Germany. Using mobile devices, they can sort through all of the people in the country by
city (cluster), select those with the highest populations, and then narrow the list down even
further.
d. Systematic sampling techniques
Definition: Researchers utilize the statistical method of systematic sampling to zero
in on a specific subset of the population of interest. If we know how many people we want to
sample from the population, we can determine the sampling interval. To establish a sample, a
systematic sampler selects a predetermined number of individuals at regular intervals,
expanding on the concept of probability sampling (Mostafa, Ahmad, 2017).
Example: To illustrate systematic sampling, let's pretend a statistician takes a sample
of one person out of every one hundred in a population of 10,000. Similarly, it is possible to
methodically establish the sampling intervals; for example, a fresh sample could be selected
once every 12 hours.
1.3.3. Inferential statistics illustrating the differences between population and sample
based on different sampling techniques and methods (P4)
1.3.3.1. One sample T-test: Estimation and Hypotheses testing
Definition: A statistical hypothesis test known as the one-sample t-test is applied to
data to evaluate whether an uncertain population means differs from a given value (Parthiban,
Gajivaradhan, 2016).
Formula:
X−μ 0
t=
S
√n
μ0 = The test value -- the proposed constant for the population mean
X  = Sample mean
n = Sample size (i.e., number of observations)
s = Sample standard deviation
Examples
Let's say we're interested in finding out if 310 pounds is the typical weight for a certain
species of turtle. Procedures for a one-sample t-test with a = 0.05 alpha level are as follows:
Step 1 is to collect the sample data.
Consider the following data collected from a random sampling of turtles:
Sample size n = 40
x, the sample mean weight, equals 300
The standard deviation of samples equals 18.5
Step 2 is to specify the hypothesis.
Using the following hypotheses, we will conduct the one-sample t-test:
 H0: μ = 310 (population mean is equal to 310 pounds)
 H1: μ ≠ 310 (population mean is not equal to 310 pounds)

Step 3 involves computing the test statistic t.


t = (x – μ) / (s/√n) = (300-310) / (18.5/√40) = -3.4187
Test statistic t's p-value will be calculated in step 4.
The T Score to P Value Calculator indicates that the p-value associated with t = -3.4817 and
degrees of freedom = n-1 = 40-1 = 39 is 0.00149.
Step 5 is to reach a conclusion.
The null hypothesis is rejected since this p-value is smaller than our significance level, =
0.05. There is adequate information to suggest that the average weight of this species of turtle
is not 310 pounds.
1.3.3.2. Two sample T-test: Estimation and Hypotheses testing
Definition: The two-sample t-test, sometimes referred to as the independent samples
t-test, is a statistical tool that can be utilized to assess whether or not the population means of
two unidentified groups are equivalent to one another (Fralick, Zheng, Wang, Tu, Feng,
2017).
Formula
T-statistic
t=( x 1–  x 2)/sp(√ 1/n 1  + 1/n 2)
x 1 and  x 2 are the sample means 
n1 and n2 are the sample sizes
sp is calculated as:
• If σ 12 and σ 22 are known,
sp = √ (n 1 -1) σ1 2 +(n 2 -1)σ 2 2 /(n 1 +n 2 -2) 
• If σ 12 and σ 22 are unknown,
sp = √ (n 1 -1)s 12 +(n 2 -1)s 2 2 /(n 1+n 2-2) 
where s12 and s22 are the sample standard deviation.
Examples
Let's say we're interested in comparing the average sizes of two different kinds of turtles. The
following procedures will be used to conduct a two-sample t-test with a = 0.05 significance
level to examine this:
Step 1 is to collect the sample data.
Consider the following information about a hypothetical representative sample of turtles from
each population:
Sample 1

Consider a sample of size n1 equaling 40

x1 = 300 is the sample mean weight


The s1 standard deviation of the samples is 18.5.
Sample 2

Total number of samples n2 = 38

x2 = 305 is the sample mean weight


The s2 value for this sample is 16.7.
Step 2 is to specify the hypothesis.
Using the following hypotheses, we will conduct the two-sample t-test:
 H0: μ1 = μ2 (the two population means are equal)
 H1: μ1 ≠ μ2 (the two population means are not equal)
Step 3 involves computing the test statistic t.
First, the pooled standard deviation sp will be computed:
s p=√ (n 1 -1)s 12  +  (n 2 -1)s 22  /  (n 1+n 2 -2) =√ ( 40 -1)*18 .5 2  +  ( 38 -1)* 1 6 .7 2 /( 40 +3 8-2) =17.647
The next step is to calculate the test statistic t:
t =  ( x 1–  x 2)  /  sp(√ 1/n 1  + 1/n 2) = (300-305)/17.647(√ 1/ 40  + 1/ 3 8 ) = -1.2508
Test statistic t's p-value will be calculated in step 4.
The T Score to P Value Calculator indicates that the p-value associated with t = -1.2508 and
degrees of freedom = n1+n2-2 = 40+38-2 = 76 is 0.21484.
Step 5 is to reach a conclusion
We cannot reject the null hypothesis because this p-value exceeds our significance threshold
of = 0.05. There is insufficient data to conclude that the average weight of turtles in these two
populations is distinct.
1.3.3.3. Regression analysis
Simple linear regression
Definition: In statistics, simple linear regression is a method for describing and
assessing relationships between two quantitative (continuous) variables (Kumari, Yadav,
2018).
Formula

Y = β0 + β1x + ε

 y is the predicted value of the dependent variable (y) for any given value of the
independent variable (x).
 B0 is the intercept, the predicted value of y when the x is 0.
 B1 is the regression coefficient – how much we expect y to change as x increases.
 x is the independent variable ( the variable we expect is influencing y).
 e is the error of the estimate, or how much variation there is in our estimate of the
regression coefficient.

Examples
Consider the hypothetical situation where weight was determined solely by height. A
somewhat linear relationship between height (the independent or 'predictor' variable) and
body weight (the dependent or 'outcome' variable) may be shown if we plot these two
variables against one another, as seen in the example below.

The line equation Y =β0 + β1x where an is the Y-intercept and β1x is the slope of the line,
might also be used to represent this connection. The formula could be used to estimate a
person's weight given only their height. A person of 70 inches in height would be expected to
weigh:

Weight = 80 + 2 x (70) = 220 lbs.

In this linear regression with a single independent variable, we examine the effect of the
independent variable on the dependent variable. If height were the most important factor in
determining body weight, then the average scores of the participants would be relatively near
the line. If there were more factors (independent variables) that controlled body weight in
addition to height, we may anticipate that the points for individual participants would be
more widely dispersed around the line, given that we are only considering height (e.g., age,
calorie consumption, and exercise level).
Multiple regression
Definition: When trying to establish a connection between multiple independent variables
and a single dependent variable, multiple linear regression is used (Zsuzsanna, Marian, 2012).
Formula

Y = β 0+ β 1 X 1+ β 2 X 2+…..+ β i X i

Y: Dependent variable
β 0: Intercept
β i: Slope for X i
X = Independent variable
Examples
Let's say we're interested in adjusting for potential confounders like age (a continuous
variable measured in years), male gender (yes/no), and hypertension therapy (yes/no) in a
multiple linear regression analysis. Hypertension treatment status is recorded as 1 for yes and
0 for no for statistical analysis. Males are represented by the value 1, and females by 0.
Results from a multiple regression analysis show:
Independent Variable Regression Coefficient T P-value
Intercept 68.15 26.33 0.0001
BMI -0.58 10.30 0.0001
Age 0.65 20.22 0.0001
Male gender 0.94 1.58 0.1133
Treatment for 6.44 9.74 0.0001
hypertension
The multiple regression model is:
After adjusting for age, gender, and hypertension treatment, the correlation between BMI and
systolic blood pressure becomes less (0.58 as opposed to 0.67) After correction, the link
between BMI and systolic blood pressure remains statistically significant (p=0.0001),
although the magnitude of the association is reduced. The coefficient of regression reduces by
13%.
1.4. Applications of mean comparison to dataset
1.4.1. One sample t-test
2. Different types of visual representations
2.1. Types of visual representations for effective communication (P5)
2.1.1. Data table
Definitions: In the sciences, a data table serves as a common graphic organizer. The
technique is frequently used in qualitative and/or quantitative research laboratory studies. Not
all data tables have the same structure, but typically there are at least two rows or columns
and data is entered into each one.
Example

2.1.2. Frequency tables


Definitions: In a frequency table, the distribution of data points for various
combinations of variables is shown graphically. By analyzing frequency tables, you can
determine which options appear more or less often in the data. This may assist you in
learning more about each variable and determining whether any of them require recoding.
Because it merely provides the sums for each possible value of a variable, creating a
frequency table necessitates no extra calculations (Gries, 2014).
Example

2.1.3. Pie charts


Definition: Pie charts are a specialized type of graph that display data in a circular
shape. This data is represented graphically by a pie chart, where the size of each portion of
the pie corresponds to a distinct component of the data. It is vital to have a set of categories
and statistics to work with while building a pie chart. In this context, "pie" represents the
total, while "slices" represent the separate components (Kozak, Hartley, Tartanus, Wnuk,
2015).
Example

2.1.4. Histograms
Definitions: A histogram, which displays the data as a sequence of vertical bars, one
for each numerical range, is one approach to viewing numerical data (Kaplan, Gabrosek,
Curtiss, Malone, 2014).
Example

2.2. Application of different types of visual representations to the dataset


This section will provide the demonstration of the visual representation of MS. Excel to the
dataset
II.2.1. Data table
Number of males and females took part in the questionnaire “Factors affecting sustainable
consumption intention in Vietnamese”
Male Female
228 323

II.2.2. Frequency table


The frequency of each attitude variable following the 5-point Likert scale in the questionnaire
“Factors affecting sustainable consumption intention in Vietnamese”

strongly neutra strongly


Attitude (ATT) disagree agree
disagree l agree
I care about the quality of the environment where I 15 7 65 252 212
live (e.g. water, clean air, land, forests, etc.).
I support environmentally friendly products 15 6 60 234 236
(recycled materials, green label products).
I am willing to reuse plastic, bottles, and paper 9 6 106 255 175
items.
I support increased use of renewable energy 12 2 58 260 219
sources.
I am willing to participate in programs that 11 7 125 266 142
promote sustainable consumption.

2.2.3. Pie chart

Per cen t a g es an s wer ed o n a 5-p o in t L ik er t s c a le


wit h t h e v a riab le: I c are a b o u t t h e q u a lit y o f t h e
en v iro n men t wh ere I liv e (e. g . wat er, c lea n a ir,
la n d , fo res t s , et c . ).
3% 1%
strongly disagree
12% disagree
neutral
agree
38%
strongly agree

46%

2.2.4. Histogram
The frequency of each attitude variable following the 5-
point Likert scale
300

250

200 Frequency of ATT1


Frequency of ATT2
Frequency

150 Frequency of ATT3


Frequency of ATT4
100 Frequency of ATT5

50

0
strongly disagree neutral agree strongly agree
disagree

Categories of rating

III. Conclusion
I have analyzed SSI data and compiled statistics to give the company a fuller
picture of its performance in my role as an analytical researcher. Business
planning for quality/inventory/capacity management is also explored in detail, as
are the many statistical approaches utilized in business planning before legitimate
recommendations and assessments may be made to improve corporate planning
through statistical methods. This has helped me better understand the value of
statistics in management. The study uses several types of statistical analysis to
assess raw business data, and the findings are displayed graphically.

You might also like