You are on page 1of 16

Assignment No.

02

Semester: Fall
2020
Name talal amjad Roll num ADCS-F20-003
COURSE:
PROBABILITY AND STATISTICS
INSTRUCTER:
Dr.M.RASHAD

Question #1
The importance of statistics in different fields of life?
Answers

existing position of per capita income, unemployment, population growth rates,


housing, schooling medical facilities, etc., in Statistics plays a vital role in every field
of human activity. Statistics helps in determining the a country.

Now statistics holds a central position in almost every field, including industry, commerce,
trade, physics, chemistry, economics, mathematics, biology, botany, psychology,
astronomy, etc., so the application of statistics is very wide. Now we shall discuss some
important fields in which statistics is commonly applied.
(1) Business
Statistics plays an important role in business. A successful businessman must be
very quick and accurate in decision making. He knows what his customers want; he
should therefore know what to produce and sell and in what quantities.

(2) Economics
 Economics largely depends upon statistics. National income accounts are
multipurpose indicators for economists and administrators, and statistical methods
are used to prepare these accounts. In economics research, statistical methods are
used to collect and analyze the data and test hypotheses. The relationship between
supply and demand is studied by statistical methods; imports and exports, inflation
rates, and per capita income are problems which require a good knowledge of
statistics. 

(3) Mathematics
Statistics plays a central role in almost all natural and social sciences. The methods
used in natural sciences are the most reliable but conclusions drawn from them are
only probable because they are based on incomplete evidence.

(4) Banking
Statistics plays an important role in banking. Banks make use of statistics for a
number of purposes. They work on the principle that everyone who deposits their
money with the banks does not withdraw it at the same time. The bank earns
profits out of these deposits by lending it to others on interest. Bankers use
statistical approaches based on probability to estimate the number of deposits and
their claims for a certain day

(5) State Management (Administration)


Statistics is essential to a country. Different governmental policies are based on
statistics. Statistical data are now widely used in making all administrative
decisions. Suppose if the government wants to revise the pay scales of employees
in view of an increase in the cost of living, and statistical methods will be used to
determine the rise in the cost of living.

(6) Accounting and Auditing


Accounting is impossible without exactness. But for decision making purposes, so
much precision is not essential; the decision may be made on the basis of
approximation, know as statistics. The correction of the values of current assets is
made on the basis of the purchasing power of money or its current value.
(7) Natural and Social Sciences
Statistics plays a vital role in almost all the natural and social sciences. Statistical
methods are commonly used for analyzing experiments results, and testing their
significance in biology, physics, chemistry, mathematics, meteorology, research,
chambers of commerce, sociology, business, public administration,
communications and information technology, etc.

(8) Astronomy
Astronomy is one of the oldest branches of statistical study; it deals with the
measurement of distance, and sizes, masses and densities of heavenly bodies by
means of observations. During these measurements errors are unavoidable, so the
most probable measurements are found by using statistical methods.

Question #2
what do you know about scales of measurement?
Answers
Measurement scale, in statistical analysis, the type of information provided by
numbers. Each of the four scales (i.e., nominal, ordinal, interval, and ratio) provides a
different type of information. Measurement refers to the assignment of numbers in a
meaningful way, and understanding measurement scales is important to interpreting the
numbers assigned to people, objects, and events.
Nominal scales
In nominal scales, numbers, such as driver’s license numbers and product serial
numbers, are used to name or identify people, objects, or events. Gender is an example
of a nominal measurement in which a number (e.g., 1) is used to label one gender, such
as males, and a different number (e.g., 2) is used for the other gender, females.
Numbers do not mean that one gender is better or worse than the other; they simply are
used to classify persons. In fact, any other numbers could be used, because they do not
represent an amount or a quality
Ordinal scales
In ordinal scales, numbers represent rank order and indicate the order of quality or
quantity, but they do not provide an amount of quantity or degree of quality. Usually, the
number 1 means that the person (or object or event) is better than the person labeled 2;
person 2 is better than person 3, and so forth—for example, to rank order persons in
terms of potential for promotion, with the person assigned the 1 rating having more
potential than the person assigned a rating
Interval scale

In interval scales, numbers form a continuum and provide information about the amount
of difference, but the scale lacks a true zero. The differences
between adjacent numbers are equal or known. If zero is used, it simply serves as a
reference point on the scale but does not indicate the complete absence of the
characteristic being measured. The Fahrenheit and Celsius temperature scales are
examples of interval measurement. In those scales, 0 °F and 0 °C do not indicate an
absence of temperature.

Ratio scales

Ratio scales have all of the characteristics of interval scales as well as a true zero,
which refers to complete absence of the characteristic being measured. Physical
characteristics of persons and objects can be measured with ratio scales, and, thus,
height and weight are examples of ratio measurement. A score of 0 means there is
complete absence of height or weight.

Question #3
What do you know about varaible? Write down the types and application of
variable in computer science?
Answers
A variable is a way of referring to a storage area in a computer program. This memory
location holds values—numbers, text or more complicated types of data like payroll
records.
Operating systems load programs into different parts of the computer's memory so there
is no way of knowing exactly which memory location holds a particular variable before
the program is run. When a variable is assigned a symbolic name like
"employee_payroll_id," the compiler or interpreter can work out where to store the
variable in memory.
Variable Types

When you declare a variable in a program, you specify its type, which can be chosen
from integral, floating point, decimal, boolean or nullable types. The type tells the
compiler how to handle the variable and check for type errors. The type also determines
the position and size of the variable's memory, the range of values that it can store and
the operations that can be applied to the variable. A few basic variable types include:
int - Int is short for "integer." It is used to define numeric variables holding whole
numbers. Only negative and positive whole numbers can be stored in int variables. 
null - A nullable int has the same range of values as int, but it can store null in addition
to whole numbers.
char - A char type consists of Unicode characters—the letters that represent most of
the written languages. 
bool - A bool is a fundamental variable type that can take only two values: 1 and 0,
which correspond to true and false. 
float, double and decimal - these three types of variables handle whole numbers,
numbers with decimals and fractions. The difference between the three lies in the range
of values. For example, double is twice the size of float, and it accommodates more
digits.
Declaring Variables

Before you can use a variable, you have to declare it, which means you have to assign
it a name and a type. After you declare a variable, you can use it to store the type of
data you declared it to hold. If you try to use a variable that hasn't been declared, your
code won't compile. Declaring a variable in C# takes the form:

<data_type> <variable_list>;

The variable list consists of one or more identifier names separated by commas. For
example:

 int i, j, k;
 char c, ch;
Initializing Variables

Variables are assigned a value using an equal sign followed by a constant. The form is:

<data_type> <variable_name> = value;

You can assign a value to a variable at the same time you declare it or at a later time.

For example:

 int i = 100;
Question #4
By use of MS excel, the wickets (in test,ODI &T20) cricket matches
.Consider the data of any four players (Shaheen Afridi , Shadab, M Amir,
Wahab Riaz).
 Simple bar chart
 Multiple bar chart
 Pie chart
Answers

Question #5
What are the important points of an average?
Answers

“Data is a representation of real life. It’s an abstraction, and it’s impossible to


encapsulate everything in a spreadsheet, which leads to uncertainty in the numbers.
How well does a sample represent a full population? How likely is it that a dataset
represents the truth? How much do you trust the numbers? Statistics is a game where
you figure out these uncertainties and make estimated judgements based on your
calculations.”
Statistics is all about “compressing” a lot of information into a few numbers so our brain
can process it more easily.

Statistical Averages

Let’s start simple! Statistical averages. It’s an easy-to-understand concept, and very
commonly used. The point of using averages is to get a central value of a dataset. Of
course, there is more than one way to decide which value is the most central… That’s
why we have more than one average type.
The three most common statistical averages are:

1. Mean
2. Median
3. Mode
Mean

In everyday language, the word ‘average‘ refers to the value that in statistics we call
‘arithmetic mean.‘ When calculating arithmetic mean, we take a set, add together all its
elements, then divide the received value by the number of elements. For example, the
arithmetic mean of this list: [1,2,6,9] is (1+2+6+9)/4=4.5.
This is middle-school mathematics, so I assume that you are very well aware of this
calculation anyway.

Note: calling the ‘arithmetic mean’ the ‘average’ is improper. The word ‘average’ can
refer to any of the average types — mode and median are averages, too. If you hear
anyone using ‘average’ wrong in the coffee shop, don’t correct her… but in a data
science meeting (or especially in a job interview), please stick to ‘arithmetic mean.’

 “Average.” No good.
 “Mean.” Okay.
 “Arithmetic mean.” Perfect.

Median

Here comes the most commonly used metaphor for showing the importance of the
Median!

Ten workers sit in a room. Their yearly salaries are:

Worker #1 €15.000

Worker #2 €18.000

Worker #3 €18.000
Worker #4 €18.000

Worker #5 €18.000

Worker #6 €19.000

Worker #7 €20.000

Worker #8 €22.000

Worker #9 €22.000

Worker #10 €22.000

What’s the central value of their salary?


Let’s try arithmetic mean first! The result is €19.200.
Is it a good enough “compressed value” of the information in the table above? Can we
say that these 10 workers are all making around €19.200 per year? I’d say: yes,
$19.200 is not too far from €15.000 nor €22.000.

Mode

The third famous average type is Mode. The definition is simple: mode is the element
that occurs most often in a list. In theory, it’s useful when the numerical values in your
data set are used as categorical values. But to be honest, we rarely use it; and even
when we do, we don’t really call it the mode. We just say: the most frequent element in
a list.

Conclusion
The 3 most common statistical averages are (arithmetic) mean, median and mode. You
will use mean and median all the time, so it’s good to be confident in calculating them!

This was our first baby step in discovering the great universe of statistics for data
science! The next step is to understand statistical variability.

 If you want to learn more about how to become a data scientist, take my 50-
minute video course: How to Become a Data Scientist. (It’s free!)
 Also check out my 6-week online course: The Junior Data Scientist’s First Month
video course.

Question #6
Explain the concept of “missing value” and “Outliers”?
Answers
Missing values and outliers are frequently encountered while collecting data. The
presence of missing values reduces the data available to be analyzed, compromising
the statistical power of the study, and eventually the reliability of its results.
Keywords: Bias, Data collection, Data interpretation, Statistics

Types of Missing Values


According to previous studies, missing values are divided into two categories: missing
completely at random (MCAR) and no missing at random (NMAR), depending on the
types of missingness that occurred (Table 1) [1].

Table 1
Types of Missing Values
Types of Description Possible causes
missing
values

Missing Missing data occur completely at Consent withdrawal, omission of


completely at random without being influenced major exams, death, discontinued
random by other data. follow-up and serious adverse
reactions.

Missing at Missing data occur at a specific Refusal to continue


random time point in conjunction with measurements.
participant dissatisfaction with
study outcomes and ongoing
participation

Not missing Missing data occur when a patient If a patient finds the results of self-
at random who is not satisfied with study measurement dissatisfactory in
outcomes performs the required addition to dissatisfaction related
measurements on his own, before to the study, the patient may
the scheduled measurement. refuse further measurements.

To explain the types of missing values, the following elements are defined under the
assumption that the total number of study participants is ‘i’ and the total number of
measurements is ‘j’.
Yij: The ‘j’th measurement value for the ‘i’th patient, i = i, … , I, j = 1, … , J
Yi(observation): A vector created based on measurement values of the ‘i’ th patient
Yi(missing): A vector created based on missing values of the ‘i’th patient.
Missing completely at random (MCAR)
If Yi(observation), Yi(missing), and Ri are independent, Yi(missing) indicates a value MCAR. That is,
any particular data are missing independently of other data in a data set. The
missingness occurs completely at random during the course of study, if a study
participant is suddenly absent from measurements or drops out any time before the
study ends.

Missing at random (MAR)


While Ri and Yi(observation) are dependent and Ri and Yi(missing) are independent, Yi(missing) is
the value missing at random. Thus, the occurrence of MAR is associated more
with Yi(observation) and has no relation with Yi(missing). Data missing at random can occur at a
specific time in conjunction with participant dissatisfaction with study outcomes.

Not missing at random (NMAR)


If Ri and Yi(observation) are dependent and Ri and Yi(missing) are overly
dependent, Yi(missing) represents the value not missing at random; that is, the occurrence
of NMAR is associated with both Yi(observation) and Yi(missing). The data not missing at
random can also occur in conjunction with participant dissatisfaction with study
outcomes. The difference from MAR is that the participants perform the required
measurements on their own.

Methods for Identifying Outliers


Given that the outliers are data points lying far away from the majority of other data
points, outliers in the data that is not normally distributed do not require identification. As
most statistical tests assume that data are normally distributed, outlier identification
should precede data analysis. Different methods are used to identify outliers in a normal
distribution. One of the methods measures the distance between a data point and the
center of all data points to determine an outlier.
Treatment of Outliers
There are basically three methods for treating outliers in a data set. One method is to
remove outliers as a means of trimming the data set. Another method involves replacing
the values of outliers or reducing the influence of outliers through outlier weight
adjustments. The third method is used to estimate the values of outliers using robust
techniques.

Trimming
Under this approach, a data set that excludes outliers is analyzed. The trimmed
estimators such as mean decrease the variance in the data and cause a bias based on
under- or overestimation. Given that the outliers are also observed values, excluding
them from the analysis makes this approach inadequate for the treatment of outliers.

Winsorization
This approach involves modifying the weights of outliers or replacing the values being
tested for outliers with expected values. The weight modification method allows weight
modification without discarding or replacing the values of outliers, limiting the influence
of the outliers. 

Robust estimation method


When the nature of the population distributions is known, this approach is considered
appropriate because it produces estimators robust to outliers, and estimators are
consistent. In the recent years, many studies have proposed a variety of statistical
models for robust estimation; however, their applications are sluggish due to
complicated methodological aspects.

Question #7
Explain the “Application of all statistical features” in computer sciences?

Answers
 Statistics is used for data mining, speech recognition, vision and image analysis, data
compression, artificial intelligence, and network and traffic modeling. A statistical
background is essential for understanding algorithms and statistical properties that form
the backbone of computer science.

Roles of Statisticians
Statistician John Tukey (1915-2000) was key in developing ideas embraced by
statisticians, such as exploratory techniques in order to better understand the data,
which then leads to procedures such as hypothesis testing. Statisticians put much
importance on the rigor of their analyses and incorporate theory into solving problems of
uncertainty. These theories inform the methods to help establish scientific
underpinnings to problems and their solutions.

Roles of Computer Scientists


Computer scientists tend to focus on data acquisition/cleaning, retrieval, mining, and
reporting. They are often tasked with the development of algorithms for prediction and
systems efficiency. Focus is also placed on machine learning (an aspect of artificial
intelligence), particularly for the purposes of data mining (finding patterns and
associations in data for a variety of purposes, such as marketing and finance).

Application of Statistics in Computer Science


There are a number of ways the roles of statisticians and computer scientists merge;
consider the development of models and data mining. Typically, statistical approach to
models tends to involve stochastic (random) models with prior knowledge of the data.
The computer science approach, on the other hand, leans more to algorithmic models
without prior knowledge of the data. Ultimately, these come together in attempts to
solve problems.

Data mining processes for computer science have statistical counterparts. Consider the
following:

Steps in Computer Science Steps in Statistics

Data acquisition/enrichment Experimental design for the collection of data/noise reduction

Data exploration Discerning the distribution/variability

Analysis and Modeling Group differences, dimension reduction; prediction; classification

Representation and
Reporting Visualization; communication

How else is statistics used in computer science? Simulations (used to gain


a greater understanding of a variety of systems) are truly a marriage of
computing capability and statistics—the use of statistics within
programming improves understanding of the underlying system leading to
more meaningful results. Statistics in software engineering leads to more
conclusive determinations of quality and optimal performance.

THANK YOU FOR READING MY ASSIGMENT

You might also like