You are on page 1of 78

STATISTICS &

PROBABILITY
Lecture #1
RANDOM VARIABLES AND PROBABILITY
DISTRIBUTION

JANICE A. BULAO
Nambaran Agro-Industrial National High School
Quote of the day..

"I avoid looking forward or backward, and try


to keep looking upward."

 CHARLOTTE BRONTE, an English novelist and poet

CS 40003: Data Analytics 2


Today’s discussion…
Probability vs. Statistics
Concept of random variable
Probability distribution concept
Discrete probability distribution
Mean and variance of Probability Distribution

CS 40003: Data Analytics 3


CS 40003: Data Analytics 4
Word hunt
There are 30 vocabularies related to our lesson today
which is Statistics & probability found in the puzzle.

CS 40003: Data Analytics 5


Real Life examples
1. What is the probability of getting infected to Covid-19
if there 1 infected in the classroom having 25 people?

2. Recall all your first and second quarter grades in all
the different subjects, what is your assurance and your
probability that you will not be promoted to Grade 12?

CS 40003: Data Analytics 6


Probability and Statistics
Probability is the chance of an outcome in an experiment (also called event).

Event: Tossing a fair coin


Outcome: Head, Tail

Statistics involves the analysis of the


Probability deals with predicting the
frequency of past events
likelihood of future events.

Example: Consider there is a drawer containing 100 socks: 30 red, 20 blue and
50 black socks.
We can use probability to answer questions about the selection of a
random sample of these socks.
 PQ1. What is the probability that we draw two blue socks or two red socks from
the drawer?
 PQ2. What is the probability that we pull out three socks or have matching pair?
 PQ3. What is the probability that we draw five socks and they are all black?

CS 40003: Data Analytics 7


Statistics
Instead, if we have no knowledge about the type of socks in the drawers, then
we enter into the realm of statistics. Statistics helps us to infer properties about
the population on the basis of the random sample.

Questions that would be statistical in nature are:

 SQ1: A random sample of 10 socks from the drawer produced one blue, four red, five
black socks. What is the total population of black, blue or red socks in the drawer?
 SQ2: We randomly sample 10 socks, and write down the number of black socks and
then return the socks to the drawer. The process is done for five times. The mean
number of socks for each of these trial is 7. What is the true number of black socks in
the drawer?
 etc.

CS 40003: Data Analytics 8


Recall that the probability of a simple event describes the chance of a single event occurring because the event can only
happen in only one way. On the other hand, the probability of compound events describes the chances of more than
one simple event.

In general, when all the outcomes are equally likely, the
probability of a particular event occurs is
 Probability of an event = Number of outcomes in which the event occurs
total number of possible outcomes

CS 40003: Data Analytics 9


Example: Suppose a fair coin is tossed, there are two possible
outcomes which are head and tail. It is assumed that the
outcomes are equally likely because the coin is fair. Therefore,
the probability of tossing a head is

𝑷𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚 𝒐𝒇 𝒕𝒐𝒔𝒔𝒊𝒏𝒈 𝒂 𝒉𝒆𝒂𝒅 = 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒐𝒖𝒕𝒄𝒐𝒎𝒆𝒔 𝒘𝒉𝒊𝒄𝒉 𝒂𝒓𝒆 𝑯𝒆𝒂𝒅


𝑻𝒐𝒕𝒂𝒍 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒐𝒖𝒕𝒄𝒐𝒎𝒆𝒔
𝑷(𝑯𝒆𝒂𝒅) = 𝟏 𝟐 𝒐𝒓 𝟎. 𝟓 𝒐𝒓 𝟓𝟎%

CS 40003: Data Analytics 10


When outcomes are not equally likely, the relative
frequency of historic data is used and the
probability of an event occurs is
𝑷𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚 𝒐𝒇 𝒂𝒏 𝑬𝒗𝒆𝒏𝒕 = 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒕𝒊𝒎𝒆𝒔 𝒊𝒏 𝒘𝒉𝒊𝒄𝒉 𝒕𝒉𝒆 𝑬𝒗𝒆𝒏𝒕 𝒐𝒇 𝒊𝒏𝒕𝒆𝒓𝒆𝒔𝒕 𝒐𝒄𝒄𝒖𝒓𝒆𝒅
𝑻𝒐𝒕𝒂𝒍 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒕𝒊𝒎𝒆𝒔
 Example: Suppose that one wants to calculate the probability that an
electronic chip produced by a machine is defective. If records show that
out of 8000 electronic chips already produced by the machine only 80
were defective then an estimate of the probability of a defective chip is

𝑷𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚 𝒐𝒇 𝒂 𝒅𝒆𝒇𝒆𝒄𝒕𝒊𝒗𝒆 𝒄𝒉𝒊𝒑 = 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒅𝒆𝒇𝒆𝒄𝒕𝒊𝒗𝒆 𝒄𝒉𝒊𝒑𝒔 𝒑𝒓𝒐𝒅𝒖𝒄𝒆𝒅


𝑻𝒐𝒕𝒂𝒍 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒄𝒉𝒊𝒑𝒔 𝒑𝒓𝒐𝒅𝒖𝒄𝒆𝒅 .
𝑷(𝑫𝒆𝒇𝒆𝒄𝒕𝒊𝒗𝒆) = 𝟖𝟎 𝒐𝒓 𝟎. 𝟎𝟏 𝒐𝒓 𝟏%
𝟖𝟎𝟎𝟎

CS 40003: Data Analytics 11


Defining Random Variable
Definition 4.1: Random Variable
A random variable is a rule that assigns a numerical value to an outcome of
interest.
-It is the outcome of the probability experiment

Values (𝒙) of
Random Variable (𝑿)
Random Variable 𝑿
The number of boys in a family of 3 children 0, 1, 2, or 3
The temperature in Baguio City on a particular day From 8.5 oC to 29 oC
The percentage of SHS students infected by COVID
From 0 % to 100 %
19 in CAR
The age of a child selected at random From 1 to 17 years
old
The number of phone calls received by a food
0, 1, 2, 3, …, 𝑛
delivery service crew in a day
CS 40003: Data Analytics 12
Activity: Say yes if it is a random variable and
no, if it’s not.
Situation Answer
1. The color of the next car to go past
my house
2. The height of a school building
selected at random
3. The number of attempts a student
needs to pass his or her NC II
assessment
4. The brand of COVID-19 vaccines
authorized and recommended by the
Food and Drug Administration (FDA)
of the Philippines
5. The price of a particular face mask

CS 40003: Data Analytics 13


Random Variables
 A random experiment is a process by which we observe
something uncertain. A sample space is a collection of all
possible outcomes of a random experiment. A sample space may
be finite or infinite. Infinite sample spaces may be discrete or
continuous. A random variable is a function defined on a sample
space. It takes specific values and can be thought as a variable
whose value depends on the outcome of an uncertain event.

 We usually denote random variables by capital letters of the
English Alphabet like 𝑋, 𝑌, 𝑜𝑟 𝑍 and its values can be denoted by
its corresponding small letters.

 The word “random” in the term “random variable” simply means
that the value is uncertain.

CS 40003: Data Analytics 14


Illustrative Example:
Random Experiment
Tossing of a coin

Sample Space
𝑆 = {𝐻𝑒𝑎𝑑, 𝑇𝑎𝑖𝑙} or 𝑆 = {𝐻, 𝑇}

Random Variables
(below are two examples from tossing a coin)

Let 𝑋 be the number of heads Let 𝑌 be the number of tails

Values (𝒙) Values (𝒚)


𝑋 = {0,1} 𝑌 = {0,1}

CS 40003: Data Analytics 15


Discrete and Continuous RV
Random variables fall into two distinct types – discrete
and continuous. A discrete random variable can only
take some values within a range whereas a continuous
random variable can take any value within a range. The
following illustrations show the difference between
discrete and continuous variables.

CS 40003: Data Analytics 16


It’s your turn! Stand or Sit!
You stand if you think the statement is Continuous
Random Variable and sit down if it is Discrete.

1 . The speed of car

Continuous RV

CS 40003: Data Analytics 17


2. The number of female
athletes-

Discrete RV

CS 40003: Data Analytics 18


3. Amount of sugar in a cup of coffee

Continuous RV

CS 40003: Data Analytics 19


4. The average amount of electricity
consumed per household

Continuous RV

CS 40003: Data Analytics 20


5. The number of patients arrival per
hour at the clinic.

Discrete RV

CS 40003: Data Analytics 21


Discrete Random Probability Distribution

 Each value of a discrete random variable can be
assigned a probability. By listing each value of the
random variable with its corresponding probability, a
discrete probability distribution is formed.
The probability distribution of a discrete random
variable, 𝑋, is a list of all possible values of 𝑋 can take
and the associated probabilities.

CS 40003: Data Analytics 22


The probability of each value of the discrete random
variable is from 0 to 1. In notation,

The sum of all the probabilities in the probability


distribution is 1. In notation,

The capital Greek letter sigma (Σ) denotes summation.

CS 40003: Data Analytics 23


Because probabilities represent relative frequencies, a discrete
probability distribution can be graphed with a relative frequency
histogram.
Guidelines in Constructing a Discrete Probability
Distribution
Let 𝑋 be a discrete random variable with possible
outcomes 𝑥1,𝑥2, . . . , 𝑥𝑛.
1. Make a frequency distribution for the possible
outcomes
2. 2. Find the sum of the frequencies.
3. Find the probability of each possible outcome by
dividing its frequency by the sum of the frequencies.
4. Check that each probability is from 0 to 1 and that the
sum of all probabilities is 1.
CS 40003: Data Analytics 24
Example: A coin is tossed twice. Let 𝑋 be the number of
heads observed in tossing of the coin. Construct a
probability distribution of the random variable 𝑋.

CS 40003: Data Analytics 25


Frequency distribution
No. of Heads (𝒙) Frequency of Outcome (𝒇)
0 1
1 2
2 1

Probability distribution
𝑿 𝑷(𝒙)
0 1/4
1 2/4 or ½
2 ¼
CS 40003: Data Analytics 26
Computing the probability of each
outcome
1. Sum of frequencies of outcomes is 4.
2. Probability of each outcome

a. Probability of getting no head:


Since there is 1 outcome of getting no head and the sum of
frequencies is 4, then
𝑃(0) = 1/4

b. Probability of getting 1 head:


There are two (2) outcomes of getting 1 head and the sum of
frequencies is 4; thus, the probability of getting 1 head is
𝑃(1) = 2/4 or 1/2.


c. Probability of getting 2 heads:
The outcome of getting 2 heads is 1 and the sum of frequencies is 4; thus, the probability of getting 2 heads is
𝑃(2) = 1/4

CS 40003: Data Analytics 27


Self Assessment:
Write whether the distribution is a probability
distribution or not. State your reason.

𝑿 𝑷(𝒙)
1 0.30
2 0.25
3 0.25
4 0.05
5 0.15

CS 40003: Data Analytics 28


Assignment:
Read and answer your next modules 3,4 & 5. For next
meeting, I will be giving you an assessment.

CS 40003: Data Analytics 29


Thank You
and
God bless!

CS 40003: Data Analytics 30


CS 40003: Data Analytics 31
CS 40003: Data Analytics 32
CS 40003: Data Analytics 33
CS 40003: Data Analytics 34
Example 4.2: In “measles Study”, we define a random variable as the number
of parents in a married couple who have had childhood measles.
This random variable can take values of .
Note:
 Random variable is not exactly the same as the variable defining a data.

 The probability that the random variable takes a given value can be computed
using the rules governing probability.
 For example, the probability that means either mother or father but not both has had
measles is . Symbolically, it is denoted as P(X=1) = 0.32

CS 40003: Data Analytics 35


Probability Distribution
Definition 4.2: Probability distribution
A probability distribution is a definition of probabilities of the values of
random variable.

Example 4.3: Given that is the probability that a person (in the ages between
17 and 35) has had childhood measles. Then the probability distribution is given
by
X Probability

?
0 0.64
1 0.32
2 0.04

CS 40003: Data Analytics 36


Probability Distribution
 In data analytics, the probability distribution is important with which
many statistics making inferences about population can be derived .

 In general, a probability distribution function takes the following form

0.64
Example: Measles Study
0 1 2
0.64 0.32 0.04 0.32
f(x)
0.04

CS 40003: Data Analytics 37


Taxonomy of Probability Distributions
Discrete probability distributions
Binomial distribution
Multinomial distribution
Poisson distribution
Hypergeometric distribution

Continuous probability distributions


Normal distribution
Standard normal distribution
Gamma distribution
Exponential distribution
Chi square distribution
Lognormal distribution
Weibull distribution

CS 40003: Data Analytics 38


Usage of Probability Distribution
 Distribution (discrete/continuous) function is widely used in simulation studies.
 A simulation study uses a computer to simulate a real phenomenon or process as
closely as possible.

 The use of simulation studies can often eliminate the need of costly experiments and
is also often used to study problems where actual experimentation is impossible.

Examples 4.4:
1) A study involving testing the effectiveness of a new drug, the number of cured
patients among all the patients who use such a drug approximately follows a
binomial distribution.

2) Operation of ticketing system in a busy public establishment (e.g., airport), the


arrival of passengers can be simulated using Poisson distribution.

CS 40003: Data Analytics 39


Discrete Probability Distributions

CS 40003: Data Analytics 40


Binomial Distribution
 In many situations, an outcome has only two outcomes: success and failure.
 Such outcome is called dichotomous outcome.
 An experiment when consists of repeated trials, each with dichotomous outcome is called
Bernoulli process. Each trial in it is called a Bernoulli trial.

Example 4.5: Firing bullets to hit a target.


 Suppose, in a Bernoulli process, we define a random variable X number of successes in trials.
 Such a random variable obeys the binomial probability distribution, if the experiment satisfies
the following conditions:
1) The experiment consists of n trials.
2) Each trial results in one of two mutually exclusive outcomes, one labelled a “success” and
the other a “failure”.
3) The probability of a success on a single trial is equal to . The value of remains constant
throughout the experiment.
4) The trials are independent.

CS 40003: Data Analytics 41


Defining Binomial Distribution
Definition 4.3: Binomial distribution

The function for computing the probability for the binomial probability
distribution is given by

for x = 0, 1, 2, …., n
Here, where denotes “the number of success” and denotes the number of
success in trials.

CS 40003: Data Analytics 42


Binomial Distribution
Example 4.6: Measles study
X = having had childhood measles a success
p = 0.2, the probability that a parent had childhood measles
n = 2, here a couple is an experiment and an individual a trial, and the
number of trials is two.

Thus,

CS 40003: Data Analytics 43


Binomial Distribution
Example 4.7: Verify with real-life experiment
Suppose, 10 pairs of random numbers are generated by a computer (Monte-Carlo method)

15 38 68 39 49 54 19 79 38 14

If the value of the digit is 0 or 1, the outcome is “had childhood measles”, otherwise,
(digits 2 to 9), the outcome is “did not”.
For example, in the first pair (i.e., 15), representing a couple and for this couple, x = 1. The
frequency distribution, for this sample is

x 0 1 2
f(x)=P(X=x) 0.7 0.3 0.0

Note: This has close similarity with binomial probability distribution!

CS 40003: Data Analytics 44


The Multinomial Distribution
The binomial experiment becomes a multinomial experiment, if we let each trial has
more than two possible outcome.

Definition 4.4: Multinomial distribution

If a given trial can result in the k outcomes with probabilities then the
probability distribution of the random variables representing the number of
occurrences for in n independent trials is

where =

and

CS 40003: Data Analytics 45


The Hypergeometric Distribution
• Collection of samples with two strategies
• With replacement
• Without replacement
• A necessary condition of the binomial distribution is that all trials are
independent to each other.
• When sample is collected “with replacement”, then each trial in sample collection is
independent.

Example 4.8:
Probability of observing three red cards in 5 draws from an ordinary deck of 52
playing cards.
You draw one card, note the result and then returned to the deck of cards
Reshuffled the deck well before the next drawing is made
• The hypergeometric distribution does not require independence and is based on the
sampling done without replacement.

CS 40003: Data Analytics 46


The Hypergeometric Distribution
 In general, the hypergeometric probability distribution enables us to find the
probability of selecting successes in trials from items.

Properties of Hypergeometric Distribution


• A random sample of size is selected without replacement from items.
• of the items may be classified as success and items are classified as failure.
Let denotes a hypergeometric random variable defining the number of successes.

Definition 4.5: Hypergeometric Probability Distribution

The probability distribution of the hypergeometric random variable , the


number of successes in a random sample of size selected from items of
which are labelled success and labelled as failure is given by

CS 40003: Data Analytics 47


Multivariate Hypergeometric Distribution
The hypergeometric distribution can be extended to treat the case where the N
items can be divided into classes with elements in the first class , … and
elements in the class. We are now interested in the probability that a random
sample of size yields elements from elements from elements from

Definition 4.6: Multivariate Hypergeometric Distribution

If items are partitioned into classes respectively, then the probability


distribution of the random variables , representing the number of
elements selected from in a random sample of size , is

with

CS 40003: Data Analytics 48


The Poisson Distribution
There are some experiments, which involve the occurring of the number of
outcomes during a given time interval (or in a region of space).
Such a process is called Poisson process.

Example 4.9:
Number of clients visiting a ticket selling counter in a metro station.

CS 40003: Data Analytics 49


The Poisson Distribution
Properties of Poisson process
 The number of outcomes in one time interval is independent of the number that occurs
in any other disjoint interval [Poisson process has no memory]
 The probability that a single outcome will occur during a very short interval is
proportional to the length of the time interval and does not depend on the number of
outcomes occurring outside this time interval.
 The probability that more than one outcome will occur in such a short time interval is
negligible.

Definition 4.7: Poisson distribution

The probability distribution of the Poisson random variable , representing


the number of outcomes occurring in a given time interval , is

where is the average number of outcomes per unit time and

CS 40003: Data Analytics 50


Descriptive measures
Given a random variable X in an experiment, we have denoted the probability that .
For discrete events for all values of except

Properties of discrete probability distribution

1. [ is the mean ]
2. [ is the variance ]

In summation is extended for all possible discrete values of .


Note: For discrete uniform distribution, with

and

CS 40003: Data Analytics 51


Descriptive measures
1. Binomial distribution
The binomial probability distribution is characterized with (the probability of
success) and (is the number of trials). Then

2. Hypergeometric distribution
The hypergeometric distribution function is characterized with the size of a sample ,
the number of items and labelled success. Then

CS 40003: Data Analytics 52


Descriptive measures
3. Poisson Distribution
The Poisson distribution is characterized with where and .

CS 40003: Data Analytics 53


Continuous Probability
Distributions

CS 40003: Data Analytics 54


Continuous Probability Distributions

f(x)

x1 x2 x3 x4
X=x
Discrete Probability distribution

f(x)

X=x
Continuous Probability Distribution

CS 40003: Data Analytics 55


Continuous Probability Distributions

 When the random variable of interest can take any value in an interval, it is
called continuous random variable.
 Every continuous random variable has an infinite, uncountable number of possible
values (i.e., any value in an interval)

 Consequently, continuous random variable differs from discrete random


variable.

CS 40003: Data Analytics 56


Properties of Probability Density Function
The function is a probability density function for the continuous random
variable , defined over the set of real numbers , if

1.

f(x)

a b
X=x

CS 40003: Data Analytics 57


Continuous Uniform Distribution
 One of the simplest continuous distribution in all of statistics is the
continuous uniform distribution.

Definition 4.8: Continuous Uniform Distribution

The density function of the continuous uniform random variable on the


interval is:

CS 40003: Data Analytics 58


Continuous Uniform Distribution

f(x)
c

A B
X=x

Note:
a)
b) )= where both and are in the interval (A,B)

CS 40003: Data Analytics 59


Normal Distribution
 The most often used continuous probability distribution is the normal
distribution; it is also known as Gaussian distribution.
 Its graph called the normal curve is the bell-shaped curve.

 Such a curve approximately describes many phenomenon occur in nature,


industry and research.
 Physical measurement in areas such as meteorological experiments, rainfall
studies and measurement of manufacturing parts are often more than adequately
explained with normal distribution.

 A continuous random variable X having the bell-shaped distribution is called


a normal random variable.

CS 40003: Data Analytics 60


Normal Distribution
• The mathematical equation for the probability distribution of the normal variable
depends upon the two parameters and , its mean and standard deviation.

f(x)

Definition 4.9: Normal distribution

The density of the normal variable with mean and variance is

where and , the  Naperian constant


CS 40003: Data Analytics 61
Normal Distribution
σ1 = σ2
σ1

σ2

µ1
µ1 µ2
µ2 µ1 = µ2
Normal curves with µ1< µ2 and σ1 = σ2 Normal curves with µ1 = µ2 and σ1< σ2

σ1

σ2

µ1 µ2
CS 40003: Data Analytics Normal curves with µ1<µ2 and σ1<σ2 62
Properties of Normal Distribution
 The curve is symmetric about a vertical axis through the mean
 The random variable can take any value from
 The most frequently used descriptive parameter s define the curve itself.
 The mode, which is the point on the horizontal axis where the curve is a
maximum occurs at .
 The total area under the curve and above the horizontal axis is equal to .

denotes the probability of x in the interval ().


x1 x2

CS 40003: Data Analytics 63


Standard Normal Distribution
 The normal distribution has computational complexity to calculate for any two , and
given and
 To avoid this difficulty, the concept of -transformation is followed.

[Z-transformation]

 X: Normal distribution with mean and variance .

 Z: Standard normal distribution with mean and variance = 1.

 Therefore, if f(x) assumes a value, then the corresponding value of is given by


:
=

CS 40003: Data Analytics 64


Standard Normal Distribution
Definition 4.10: Standard normal distribution
The distribution of a normal random variable with mean and variance is called a
standard normal distribution.

0.09
0.4
0.08 σ σ=1
0.07
0.3
0.06

0.05
0.2
0.04

0.03

0.02 0.1

0.01

0.00 0.0
-5 0 5 10 15 20 25 -3 -2 -1 0 1 2 3

x=µ µ=0
f(x: µ, σ) f(z: 0, 1)

CS 40003: Data Analytics 65


Gamma Distribution
The gamma distribution derives its name from the well known gamma function in
mathematics.

Definition 4.11: Gamma Function

for

Integrating by parts, we can write,

Thus function is defined as a recursive function.

CS 40003: Data Analytics 66


Gamma Distribution
When , we can write,

Further,

Note:
[An important property]

CS 40003: Data Analytics 67


Gamma Distribution
Definition 4.12: Gamma Distribution

The continuous random variable has a gamma distribution with parameters and
such that:

where and >0

1.0
σ=1, β=1

0.8

0.6

f(x)
0.4
σ=2, β=1

0.2
σ=4, β=1

0.0
0 2 4 6 8 10 12
x

CS 40003: Data Analytics 68


Exponential Distribution
Definition 4.13: Exponential Distribution

The continuous random variable has an exponential distribution with parameter ,


where:
where > 0

Note:
1) The mean and variance of gamma distribution are

2) The mean and variance of exponential distribution are

CS 40003: Data Analytics 69


Chi-Squared Distribution
Definition 4.14: Chi-squared distribution

The continuous random variable has a Chi-squared distribution with degrees of


freedom, is given by

where is a positive integer.

• The Chi-squared distribution plays an important role in statistical inference .


• The mean and variance of Chi-squared distribution are:

and

CS 40003: Data Analytics 70


Lognormal Distribution
The lognormal distribution applies in cases where a natural log transformation
results in a normal distribution.

Definition 4.15: Lognormal distribution

The continuous random variable has a lognormal distribution if the random


variable has a normal distribution with mean and standard deviation The
resulting density function of is:

CS 40003: Data Analytics 71


Lognormal Distribution

0.7

µ=0, σ=1
0.6

0.5

0.4
f(x)
0.3

0.2
µ=1, σ=1
0.1

0.0
0 5 10 15 20
x

CS 40003: Data Analytics 72


Weibull Distribution
Definition 4.16: Weibull Distribution

The continuous random variable has a Weibull distribution with parameter and
such that.

where and

1.0
ß=1
The mean and variance of Weibull
0.8
distribution are:

0.6

f(x)
0.4
ß=2

0.2
ß=3.5

0.0
0 2 4 6 8 10 12
x

CS 40003: Data Analytics 73


Reference

The detail material related to this lecture can be found in

Probability and Statistics for Enginneers and Scientists (8 th Ed.) by


Ronald E. Walpole, Sharon L. Myers, Keying Ye (Pearson), 2013.

CS 40003: Data Analytics 74


Any question?

You may post your question(s) at the “Discussion Forum”


maintained in the course Web page!

CS 40003: Data Analytics 75


Questions of the day…
1. Give some examples of random variables? Also, tell the
range of values and whether they are with continuous or
discrete values.

2. In the following cases, what are the probability


distributions are likely to be followed. In each case, you
should mention the random variable and the parameter(s)
influencing the probability distribution function.
a) In a retail source, how many counters should be opened at a given
time period.
b) Number of people who are suffering from cancers in a town?

CS 40003: Data Analytics 76


Questions of the day…
2. In the following cases, what are the probability
distributions are likely to be followed. In each case, you
should mention the random variable and the parameter(s)
influencing the probability distribution function.
c) A missile will hit the enemy’s aircraft.
d) A student in the class will secure EX grade.
e) Salary of a person in an enterprise.
f) Accident made by cars in a city.
g) People quit education after i) primary ii) secondary and iii) higher
secondary educations.

CS 40003: Data Analytics 77


Questions of the day…
3. How you can calculate the mean and standard deviation of
a population if the population follows the following
probability distribution functions with respect to an event.
a) Binomial distribution function.
b) Poisson’s distribution function.
c) Hypergeometric distribution function.
d) Normal distribution function.
e) Standard normal distribution function.

CS 40003: Data Analytics 78

You might also like