You are on page 1of 61

V10.

1
Student Notebook

Uempty
Unit 2. Statistical Approaches for Pattern
Recognition

What this unit is about


This unit explains about probability distributions with real time examples, linear models for regression and
other modelling techniques and linear models for classification.

What you should be able to do


After completing this unit, you should be able to:
• Understand the concept of probability distributions
• Gain knowledge on example of statistical approaches
• Understand linear models for regression
• Learn about linear models for classification

How you will check your progress


• Checkpoint

References
IBM Knowledge Center

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

Unit objectives IBM ICE (Innovation Centre for Education)


IBM Power Systems

After completing this unit, you should be able to:

• Understand the concept of probability distributions

• Gain knowledge on example of statistical approaches

• Understand linear models for regression

• Learn about linear models for classification

Figure 2-1. Unit objectives PAD011.0

Notes:
Unit objectives are stated above.

2-2 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty

Understanding statistics IBM ICE (Innovation Centre for Education)


IBM Power Systems

• Statistics is a method of statistical research utilizing computational models, descriptions, and


excerpts for theoretical or real-life data studies.

• Statistics is the analysis of how to draw inference from the evidence, feedback, and
conclusions. Such statistical indicators involve.

Figure 2-2. Understanding statistics PAD011.0

Notes:
Statistical approaches for pattern recognition
Understanding statistics: Statistics is a concept used, to sum up, a method that is used by an observer to
describe a data collection. Unless the data collection is reliant on a survey of a greater population than the
observer can establish definitions of the population-based solely on the sample's statistical results. Data
collection and data analysis in mathematical format is the objective of Statistical research.
There are many uses of statistics including psychology, industry, physical and social studies, education,
policy, and development. Statistical data are obtained using a sampling protocol or some other process. In
data processing, two forms of statistical approaches are used: descriptive statistics, and inferential statistics.
Inferential figures are used where data is deemed to be a subset of a given group.
Types of statistics:
• Mean.
• Regression analysis.
• Skewness.
• Kurtosis.
• Variance.

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

Mean
An average of two or more digits is the standard statistical measure called mean. For a certain number range,
the mean may be determined in a variety of ways, including the arithmetical mean, using the sum of the
series numbers, and the geometrical mean, which is the average of a product package. All basic approaches
of computing a simple average produce the same estimated outcome typically.
Mean = ∑X ÷N
Here, ∑X= Sum of all the individual values and N= Total number of items.
Mean= A + (∑d÷N)
Here, A= Assumed value of the mean.
∑d= Summation of deviations and N= Number of observations.
Regression analysis
Regression analysis defines the degree to which the market movements of a commodity are determined by
external variables like interest rates, the quality of a good or service, or individual businesses or group
business. This is shown in a straight line, named linear regression.

The equation has the form Y= a + bX, where Y is the dependent variable (that's the variable that goes on the
Y axis), X is the independent variable (i.e. it is plotted on the X axis), b is the slope of the line and a is the
y-intercept.
Skewness
Skewness defines the degree to which a collection of data in a series of statistical data differs from a normal
distribution. Some data collections, including asset returns and market values, either have a positive skew, a
curve skewed to the left of the mean data, or unfavorable skew, a curve skewed to the right of the mean data.
The formula for calculating momental skewness (γ) is:
α(m) = 1/2 γ1 = μ3 / 2 σ3
Where μ is the mean and σ is the standard deviation and γ is the Fisher Skewness.
Kurtosis
Kurtosis tests if data is light-tailed or heavier-tailed. High-kurtosis data sets have large tails, or anomalies
often means elevated financial risk in the context of periodic crazy rewards. Low-kurtosis data sets have light
tails, or absence of outliers, suggesting fewer financial risks.

Variance
Variance is a function of the number duration inside a collection of results. The variance calculates the
deviation from the mean for any number in the range. The variance may aid in evaluating the risk that an
investor may consider when purchasing an asset.

2-4 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty A British statistician and geneticist Ronald Fisher established a framework for evaluating variation. It is
employed to evaluate what impact discrete variables have on a dependent variable. This can be utilized to
evaluate the output over time of various inventories.

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

T-test IBM ICE (Innovation Centre for Education)


IBM Power Systems

Figure: T-test example


Source: https://images.app.goo.gl/PAJFuF1hSWmx9y5o7

Figure 2-3. T-test PAD011.0

Notes:
T-test: It is a method to decide whether the two-group variables can be related together substantially in some
applications. It is also used where a data set fits a standard trend and may have unexpected variances like
the data set recorded 100 times because of a coin. T-tests are generally used to obtain the mean values in
the two data sets and to determine if they are from the same group. In class A and other participants in class
B, we would not like to see the very same average and standard deviation. Similarly, the mean and standard
variance will differ considerably from the placebo-feed community and the agreed product category
measures. For the two sets, T-Test takes a sample and calculates the question value, with a zero probability
of the two variables being identical. The T-test returns from both sets. These values are calculated and
compared with usual values based on the equations in dispute and the supposed null argument accordingly is
approved or denied. Mathematicians use t-tests to study additional factors and experiments for wider
samples. To have a large selection, statistics using z-tests. There are 3 types of t-tests, known as separate
t-tests.
Ambiguous test results: Imagine a doctor testing a newly discovered medication. This is ideal for the
normal procedure by providing the drug to one patient group and offering a placebo as an alternate group,
known as a control group. The placebo given to the control group is a substance without any intended
therapeutic benefit and is an indication of the reactions of the extra drug user. The average life expectancy of
the clinical trial improved by 3 years, while group leaders recorded an average life expectancy of four years,
despite the experimental medicines. Immediate evaluation can show that the medication works very well, as it
works better for the medicinal population. Also, the results may simply be a surprising bit of luck relate to a
random occurrence.

2-6 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty A t-test is useful to assess the consistency and acceptability of the results to the entire population. Suppose
In a school 100 students scored an average of 85% and the normal difference was 3 percent in one
classroom. Approximately 100 students of grade GB received an average of 87% and a regular 4%
difference. Although the average grade B is better than the average grade A, it is maybe incorrect to believe
that students at grade B have a higher performance than students at grade A. This is because the exam
scores of both groups have a common discrepancy, which means that the disparity can only be due to
chance. With the help of a t-test, we can decide whether one class has done improvement inaccuracy.
Assumptions of T-test
• The first statement on t-tests is for the metric scale. A t-test is required to follow a continued or ordinal
norm, including the IQ test scores, with a metric system applied to the calculated results.
• The second argument is a universal random sample in which the data are obtained from an arbitrarily
selected portion from the general dataset.
• Description leading to a normal distribution curve in the form of a bell when traced is the third assumption.
• The premise is essential that heterogeneity is uniform. The sample variations are almost identical, or
similar, where they occur.
T-test calculation
Three T-test data values are:
• The difference between the average data collection values (average difference).
• The standard deviations of each other sample.
Data values from each data group. The t-test results produce a t-value. This t-value is subsequently
contrasted with a value from a set of essential values. This contrast helps to assess the effect of chance on
the disparity and whether the differences are unregulated.

Table: T-test Calculation


Source: https://images.app.goo.gl/e1UCeNgNcN77t88k7
T-Distribution Tables In one-tail or two-tail configurations. These values used to assess the circumstances of
the same (positive or negative) direction that have a given significance or similar. There are two relationships
between the samples.
• A big t-score indicates various groups.
• A small score means that the classes are similar.

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

Degrees of freedom shall relate in a sample to concepts of difference and are critical in evaluating the validity
and efficacy of the null hypothesis. Usually, the sum of data in the array depends on the estimate of other
values.
For a pairing t-test, the formula for calculating the t-value and freedom rate is:

The two remaining forms are part of individual t-tests. Such forms of samples are chosen independently of
one another - that is, the data sets in the two classes do not apply to the same values. They involve situations
including the separation of a set of 200 diabetic patients into two groups of 100 diabetic patients each. One
group is the test group and is offered a placebo, while the other party gets the medication recommended. It
represents two separate sets of samples that are unpaired together.
Placebo: "A medicine or procedure prescribed for the psychological benefit to the patient rather than for any
physiological effect".
Equal variance (or pooled) t-test
When the number of observations of each category is the same, the equivalent variance t-test is used, if the
variance of the two data sets is identical. For measuring t-value and degrees of freedom with equivalent t-test
variation, the following formula is used:

Unequal variance t-test


The t-test variation is used as the two data sets also have a wide range degree, which is the substantial
number of findings in all groups. This even test is known as a Welch t-test. The following formula is used with
an inconsistent variance t-test to calculate the t-value and liberty degrees:

Influential t-test
Depends on the characteristics, the below figure used to decide which t-test is to be used. If the sample
documents are similar, the key thing to consider is the number of data documents in each set of samples and
the difference in each set of samples.

2-8 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty Unequal variance t-test example


Suppose we take a triangular estimate of the paintings obtained in an art gallery. One comparison category
contains 10 paintings, while another contains 20 paintings.

Data set as below:

While the Set 2 average is higher than Set 1 average, however, we cannot assume that Set 2 population
exceeds the average Set 1 population. Is the disparity between 19.4 and 21.6 the chance alone, or do
variations in the general population exist in all the paintings made in the gallery of art. By stating that the
mean between the two samples is the same, we identify the problem and conduct a t-test to decide whether
the hypothesis is plausible.

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

By claiming that the mean of the two samples is the same, we identify the problem and test whether the
hypothesis is probable.
The value of t is -2.24787. The estimated value is 2.24787, as it is possible to disregard the minus sign when
the t-values are associated.
Under the formula, which requires the sum to be rounded to the least possible number, the degrees of
freedom are 24.30 that is reduced to 24. The amount of freedom is reduced.
A degree of probability (alpha degree, essential level p) should be described as an acceptance requirement.
A 5% interest can be required in some instances.

2-10 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty

Z-test IBM ICE (Innovation Centre for Education)


IBM Power Systems

• It is a method that tests how two population variables differ in the defined variances and the
sample size.

• It is believed that the test statistics have a normal distribution.

• To do an accurate z-test, irritating parameters such as standard deviation need to be


understood.

• Z-statistical or z-value is a sum that represents the amount, over or below the general
population of standard variations generated by the Z-test.

Figure 2-4. Z-test PAD011.0

Notes:
Z-test: How z-tests function experiment instances that can be conducted with z-tests include a one-sample
position control, two-sample position control, a coupled conditional test, and estimation of maximum
likelihood. Z-tests are essentially identical to t-tests, meaning that t-tests are performed more efficiently where
an experiment needs a small sample size. T-tests often indicate that there is an unknown standard deviation
when z-tests are supposed to determine it. When there is no doubt about the usual population transition, the
sample variance would be equal to that of the group.
Z-statistics always decide a regular distribution. With more than 30 samples, the z-test is used instead, as the
samples are supposed to naturally spread under the central boundary rule as the quantity of samples
increases. During the z-test, the z-test should be defined for zero, alternative, alpha, and z-score hypotheses.
Firstly, the numbers of the test should be quantified, and the outcomes and conclusions stated.
Exhibit z-test one-sample: Assume that an investor must check whether a stock's daily average return is 1%.
A simple random sample of 50 reports is calculated and an average of 2%. Assume that the standard range is
2.5%. The null hypothesis then is that the average or total is below 3%. On the other hand, the alternative
interpretation is that the median return is over 3%. Suppose an alpha of 0.05% is chosen for a two-tailed
search. Consequently, 0.025 percent of the samples are each tail and 1.96 or -1.96 are important for alpha
samples. If z is greater than or below -1.96, the null argument would otherwise be rejected. A value for z is
calculated by extracting the sum of the average daily return chosen for the analysis, or 1% in this case, from
the measured average measurements.

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

Next, determine the resulting sum by the standard deviation of the square root calculated values. Then exam
number (0,02-0, 0.02/(50)^(1/2)) is measures 2.83 and (0.02-0.01). The investor disputes that z is like 1.96
and believes that the average daily return is 1%.
P-test: A P-test is a statistical approach that tests the validity and acceptance of the null hypothesis. Although
the word null is a little confusing, the intention is to check the agreed reality by attempting to disprove it or to
nullify it. The P-test will include proof that will either deny or refuse to condemn a commonly agreed claim
(statistics talk for 'inconclusive').
Understanding P-test
A P-test measures a meaning that helps the researcher to assess the reliability of the argument as agreed.
The resulting p-value is contrasted with a statistically meaningful amount (amount of confidence), alpha (α),
selected by the researcher to gauge the randomness of the tests. Typically, the P-test results obey a regular
normal distribution by utilizing broad sample sizes.

Figure: P-test example


Source: https://images.app.goo.gl/SaSvTTfLjgnrC5Z88
Researchers would typically prefer 5 percent or lower alpha rates which converts to 95 percent or higher trust
rates. In other terms, a p-value below an alpha point of 5 percent indicates that there is more than a 95
percent probability that the outcomes will not be random, thereby increasing the importance of your tests.
This is proof that will cause the researcher to reject the hypothesis null.
• The lower the p-value (p-value < alpha), the greater the proof that the null hypothesis can be discarded,
and the more plausible the alternative hypothesis may be.
• The higher the p-value (p-value > alpha), the lower the proof against the null result renders the study
inconclusive.
When conducting a hypothesis test to validate a claim, the researcher postulates two hypotheses - null (H0)
and alternate (H1). Formulating the null and alternate hypotheses is key to the usefulness that a P-test can
offer the researcher.

2-12 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty The null hypothesis notes that the study checks a widely accepted theory or concept to see whether they may
dismiss it. The main thing to remember is that the participant still needs to refute the null hypothesis and the
P-test lets them fulfill this aim. A further thing to note is that if the P-test does not refute the null hypothesis
then the study is deemed incomplete and is not intended to be an endorsement of the null hypothesis in any
way. The alternative theory is the interpretation the researcher has brought forth to help understand the
phenomena under study. As such it must be the only, or the best, alternate interpretation imaginable. In this
way, if a dismissal of the null hypothesis is confirmed by the p-value then the alternative hypothesis may be
treated as true.

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

Self evaluation: Exercise 6 IBM ICE (Innovation Centre for Education)


IBM Power Systems

• To continue with the training, after learning the various steps involved in pattern recognition
and anomaly detection, it is instructed to utilize the concepts to perform the following activity.

• You are instructed to write the following activities using python code.

• Exercise 6: Polynomial regression for classification.

Figure 2-5. Self evaluation: Exercise 6 PAD011.0

Notes:

2-14 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty

Z-test and t-test difference IBM ICE (Innovation Centre for Education)
IBM Power Systems

• A typical and simplified method of statistical research is a z-test that measures the statistical
validity of a sample mean to the predicted mean population but needs awareness of the
standard model variance, which is not always feasible.

• The t-test is a more practical form of study since it needs just the standard deviation of the
sample in comparison to the norm of the population.

Figure 2-6. Z-test and t-test difference PAD011.0

Notes:
Z-test and T-test
Knowing how statistics can affect the production of goods, especially in biotechnology, can help guide
investors to make more educated investment choices. For example, a basic understanding of the statistical
results for a promising drug's clinical trial can be invaluable in assessing a biotech stock's potential returns.

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

P-value IBM ICE (Innovation Centre for Education)


IBM Power Systems

• In mathematics, the p-value is the chance of producing outcomes as extreme as the findings
of a mathematical experiment test obtained, given the null hypothesis is accurate.

• A lower p-value indicates better support for the alternate hypothesis.

Figure 2-7. P-value PAD011.0

Notes:
P-value
How Is P-value calculated?
P-values are measured using p-value tables or mathematical applications/spreadsheets. Since various
studies use varying standards of importance when analyzing a query, it may be challenging for a reader to
interpret findings from two separate experiments at times. This problem is overcome by P-values. For
instance, if an analysis were conducted with various researchers and used the same data with varying
degrees of importance to analyze the returns from two individual assets, the researchers could come to
opposing findings on how the assets vary. The researchers should disclose the p-value of the hypothesis test
to prevent this issue and encourage the readers to evaluate the statistical significance themselves. It is
considered a hypothesis checking method of the p-value.
P-value approach to hypothesis testing
The p-value theory validation method utilizes the measured likelihood to assess that there is a reason to
refute the null hypothesis. The null hypothesis, also recognized as the conjecture, is the original argument
about a population (or method of producing data). The alternate hypothesis notes that the parameter of the
sample varies from the value of the sample function in the proposition. The degree of importance is specified
beforehand to decide how low the p-value needs to be to dismiss the null hypothesis. A mistake of form I is an
incorrect rejection of the null hypothesis. It occurs because the null hypothesis is valid, but the null hypothesis
is denied, getting a p-value smaller than the point of importance (often 0.05). The chance of an error of form I
is the meaning level (again, sometimes 0.05), which is the provisional probability of having a p-value less
than the meaning point, given the null hypothesis is correct.

2-16 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty Real-world example of P-value


Presume an individual says the performance of their investment fund is equivalent to the performance of the
Standard & Poor's (S&P) 500 Index. The participant performs a two-tailed check to evaluate this. The null
hypothesis claims that the returns of the portfolio are equal to the returns of the S&P 500 for a defined
duration whereas the alternative hypothesis notes that the returns of the portfolio and the returns of the S&P
500 are not comparable. (If the investor carried out a one-tailed check, the alternative explanation will claim
that the returns on the portfolio are either less than or greater than those of the S&P 500. One degree of
meaning widely used is 0.05. When the investigator considers that the p-value is smaller than 0.05, so the
null statement is proven. The buyer will then dismiss the null hypothesis and consider the alternative
hypothesis. The lower the p-value, the greater the proof in contrast to the null hypothesis. And, if the
shareholder considers the p-value to be 0.001, there is clear proof against the null hypothesis, then the
shareholder will safely assume the gains from the fund, and the yields from the S&P 500 are not equal.
Alternatively, a p-value greater than 0.05 implies that against the proposition there is (at best) insufficient
proof, and the consumer does not dismiss the null hypothesis. In this situation, the variations found between
data from the stock fund and data from the S&P 500 are explainable only by chance.

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

Descriptive statistics IBM ICE (Innovation Centre for Education)


IBM Power Systems

• Descriptive statistics are short descriptive equations summarizing a certain collection of


results, which may either reflect the whole population or a subset of a community.

• Descriptive statistics are divided into core pattern measurements and volatility
measurements (spread).

• Measures of central tendency include the mean, median, and mode, while measures of
variability include the standard deviation, variance, the minimum and maximum variables,
and the kurtosis and skewness.

Figure 2-8. Descriptive statistics PAD011.0

Notes:
Descriptive statistics:
Understanding descriptive statistics
In brief, concise statistics help explain and appreciate the characteristics of a specific data collection by
offering summaries of the data sample and steps. Center metrics are the most common forms of predictive
statistics the mean, median, and style are seen in almost any stage of math and statistics. The mean, or
average, is determined by including all the details in the data set and then separating them by the number of
figures in the sample. The number of the preceding tuple, for example, is 20: (2, 3, 4, 5, 6). The estimate will
be 4 (20/5). A data set mode is the most viewed number, and the median is the amount found in the center of
the data collection. This is the figure which distinguishes the higher figures from the lower figures inside a
collection of numbers. Less-common forms of concise figures, however, are also quite relevant.
We use concise statistics to repurpose abstract observations that are hard to understand through bite-sized
explanations through a broad data collection For example, the Grade Point Average (GPA) of a college
student gives a clear interpretation of the concise statistics. The concept of a GPA is to take data points from
a wide variety of tests, courses, and qualifications and combine them to have a general view of the overall
academic ability of an applicant. A specific GPA for a school represents the mean educational outcomes.

2-18 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty Measures of descriptive statistics: Both concise figures are either main pattern measurements or
uncertainty measurements, also known as dispersion tests. Central propensity measurements concentrate on
the average or middle values of data sets, while heterogeneity measurements concentrate on data dispersal.
Such two methods include maps, charts, and general conversations to help people grasp the context of the
data being examined. Central propensity calculations define a distribution's core location for a data collection.
A person analyzes and defines the occurrence of each data point in the distribution using the mean, median,
or mode that calculates the most typical characteristics in the data set being analyzed.
Variability tests, or scatter calculations, assist in determining whether the distribution for a collection of data
becomes spaced-out. For example, while central-tendency interventions that give an individual the average
of a collection of data, they do not explain how the data is spread within the collection. Therefore, although
the data average maybe 65 out of 100, data points at both 1 and 100 will also be present. Variability indicators
help to explain this, by defining the data set's form and distribution. Scope, quartiles, total differences, and
variances are all forms of volatility measures. Include collection of data below 5, 19, 24, 62, 91, 100. The size
of that data set is 95, determined by subtracting from the largest (100) the lowest number (5) in the data
collection.

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

Self evaluation: Exercise 7 IBM ICE (Innovation Centre for Education)


IBM Power Systems

• To continue with the training, after learning the various steps involved in pattern recognition
and anomaly detection , it is instructed to utilize the concepts to perform the following activity.

• You are instructed to write the following activities using python code.

• Exercise 7: Neural networks.

Figure 2-9. Self evaluation: Exercise 7 PAD011.0

Notes:

2-20 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty
Type I error IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Type I error always use If the null hypothesis is dismissed, even though it is true and should
not be dismissed during the hypothesis testing phase.

Figure: Type I error


Source: https://images.app.goo.gl/h3GAMnAgikgkrQ3GA

Figure 2-10. Type I error PAD011.0

Notes:
Type I error
A null hypothesis is formed in hypothesis research before the start of a study. In certain situations, the null
hypothesis suggests that between the object being evaluated and the stimulus being added to the research
subject to produce a response to the study, there is no cause and effect connection. However, errors may
arise when ignoring the null hypothesis, implying that it is assumed that there is a cause-and-effect
association between the test variables while it is a false positive. These false positives are called errors of
form I.
Understanding a type I error: Hypothesis research is a method whereby experimental evidence is used to
check a hypothesis. The check is formulated to provide proof that the data being tested confirms the
inference or theory. A null hypothesis is an assumption that apart from the two data sets, factors, or groups
included in the experiment, there is no statistical meaning or influence. A study will usually try to refute the
null hypothesis. Let us assume, for example, the null hypothesis says that an investing plan does not do
much differently than a stock average, like the S&P 500. The analyst will take data samples to check the
investment strategies past results to assess if the approach was working to a greater degree than the S&P.
The null hypothesis should be dismissed if the test results indicated that the technique worked at a higher
pace than the sample.
This scenario is referred to as "n=0." Because, when the study is done, the outcome tends to suggest that the
stimuli added to the subject of the study trigger a response, the null hypothesis that the stimuli do not
influence the subject of the test will have to be dismissed in effect. Preferably, if it is proven to be valid a null
statement will never be accepted and will still be discarded if it is shown to be incorrect. There are cases
where errors will occur, though.

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

False positive type I error: Often, it might be wrong to dismiss the null hypothesis that there is no
connection between the test topic, the stimulus, and the outcome. When anything other than the stimulus
produces the test outcome, that will outcome in a "false positive" effect, when the trigger operating on the
subject exists, but the result was induced by chance. Such "false positive," which results in an erroneous
dismissal of the null hypothesis, is considered a mistake in form I. An error form I rejects an idea which should
not have been denied.
Examples of type I errors: Let us glance at the path of a convicted suspect, for instance. The null hypothesis
is that the man is innocent while the inverse is convicted. In this scenario, a form I mistake will mean that the
individual is not presumed innocent and sent to prison while being innocence. A type I typo in medical testing
would provoke the presence that an illness medication has the effect of lowering the seriousness of the
illness when it does not. If a new drug is developed, the null result would be that the medication does not
impact disease development. Let us presume a laboratory is researching a potential cancer product. Their
null hypothesis may be that the medication does not impact cancer cell pace of growth.
Once the medication has been administered to cancer cells, the cancer cells start to develop. That would
result in the researchers denying their null hypothesis that the medication will have no impact. If the
medication-induced the ejection of development, then the presumption to deny the null will be right in this
situation. Furthermore, if anything else triggered the progress takedown rather than just the prescribed
medication during the study, which would be an indication of an erroneous dismissal of the null hypothesis,
i.e., a defect in form I.

2-22 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty

Type II error IBM ICE (Innovation Centre for Education)


IBM Power Systems

• Throughout mathematical research, the dismissal of a real null hypothesis is a Type I, while
the error of type II defines the mistake that happens when a null hypothesis is not discarded
and is simply incorrect.

• Or put things another way, things generate a false statement. The fallacy denies the alternate
explanation, although it does not arise out of chance.

Figure: Confusion Matrix


Source: https://images.app.goo.gl/vjRF7VdSvThqaGbY9

Figure 2-11. Type II error PAD011.0

Notes:
Type II error: A type II error is a mathematical term which corresponds to a false null hypothesis not being
rejected. It is included within the theory testing process.
Understanding type II errors: A type II fallacy supports a hypothesis that must have been ignored, stating
that the two observations are identical, albeit distinct. A form II fallacy does not deny the null hypothesis since
it is the real condition of existence that is the alternate hypothesis. To put it another way, a false result is
acknowledged as fact. Often a form II error is named a Beta failure. Abandoning a null result will reduce a
Type II mistake by having more rigorous requirements. For example, whether an observer accepts something
that comes beyond a trust interval of + /- 95 percent as statistically important, you the likelihood of a false
positive by that the threshold to + /- 99 percent. Moreover, doing this raises the odds of finding a Form I error
at the same time. When running a hypothesis check, attention should be provided to the likelihood or chance
of producing a type I mistake or type II mistake.

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

Differences between type I and


type II errors IBM ICE (Innovation Centre for Education)
IBM Power Systems

Figure: type I and II error


Source: https://images.App.Goo.Gl/sr5bvbv93kw9hm378

Figure 2-12. Differences between type I and type II errors PAD011.0

Notes:
Differences between type I and type II errors
If the error in type I is true (false negative), the null hypothesis is rejected. The probability of a formula I error
is proportional to the degree of value for the testing of the hypothesis. Also, a type I error will occur at a risk of
5 percent if the significance level is 0.05. The chance of a type II error being committed is equivalent to one
minus the test force, also known as î². The test's strength may be improved by growing the sample size,
thereby reducing the chance of making a form II mistake.
Example of a type 2 error
Suppose a pharmaceutical company must determine how effective 2 drugs are in diabetes care. The null
hypothesis suggests the equal effectiveness of these medications. The statement that the company needs to
use the one-tailed test to dismiss is a null hypothesis(H0). The alternative theory(HA), states that the two
drugs have no similar effect. The alternative interpretation, HA, is the estimate supported by denying dullness
theory. A new clinical survey of 3,000 diabetes patients is underway by the pharmaceutical business. The
organization seeks to meet an equal number of patients, both of whose medicines are effective. The value
norm of 0.05 is chosen, which means that the possibility of missing the null hypothesis while true or of having
a Type I error is considered by five percent. Suppose the difference is 0.025 or 2.5%. There is also a 2.5
percent risk of an error in Type II. If the two drugs are not the same, the null hypothesis is dismissed. When
you do not refute the null hypothesis because the medications are not functioning equally, so an error
happens in type II.

2-24 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty

Null hypothesis IBM ICE (Innovation Centre for Education)


IBM Power Systems

Figure: Null Hypothesis


Source: https://www.thoughtco.com/null-hypothesis-examples-609097

Figure 2-13. Null hypothesis PAD011.0

Notes:
Null hypothesis: Null hypotheses are the types of hypotheses used in statistics that indicate that a
population function (or data-producing mechanism) does not vary.
For example, a poker player may want to know that a chance game is truthful. If it is equal, the approximate
gain per match is 0 for all teams. If the game is not the same otherwise, one team's expected income is
positive and negative. The player collects income data from multiple periods of the game to see if the game is
fair, calculates the average income from these tests, and checks the null hypothesis if there is no variance
amongst the predictable incomes. If the average survey results are far from zero, the player rejects the null
hypothesis and takes the alternative hypothesis, namely that the expected benefit per match is unique to
zero. If the total survey results for the test are almost zero, the player will not ignore the null hypothesis but
will be persuaded that the gap between the total results and 0 is only fortuitous.
How a null hypothesis works: The null hypothesis, which is also known as the assumption, means that a
difference in a set of results is due to opportunity between the defined features. For example, if the estimated
game revenue is 0, then any discrepancy between the average data revenue and 0 is due to chance.
The four-phase approach is used to analyze mathematical theories. The first step is to state the two
assumptions so that only one hypothesis is right. The next step is to establish a method of analysis to analyze
the data. The third stage is the implementation of the software and the realistic evaluation of the sample data.
The fourth and final step is the assessment of data and the rejection of nullification or the argument that the
variations identified can be explained by chance alone. Analysts tend to dismiss the zero-hypothesis as it is a
straightforward deduction. The alternative interpretation that the results are "alone by chance" is bad as this
involves specific variables than the capacity to function.

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

Null hypothesis example


A realistic example is this: A lead teacher estimates the students score in their school an average of 7 in 10
examinations. The median population is 7.0 hypotheses. We record scores of some 30 students from the
whole population of school children (say 300) and calculate the mean of the survey to check this null
hypothesis. The (measured) mean of the sample should be equated to the (believed) mean of 7.0 and the null
hypothesis should be refuted. The mean annual return of a mutual fund has been stated to be 8 percent (the
null assertion that the average population is 7.0 cannot be checked using survey data it can only be denied.)
Subscribe to the 20-year mutual fund. The zero hypothesis is that the mid-range benefit for the mutual fund is
8%. For ten years, say, you will pick a random sample of the investment income of the bond fund, and the
average will be determined. To assess the null hypothesis, the mean (measured) sample is equated to the
mean (asserted) population (8%).
Null hypotheses for the following cases are:
• Example A: School students scoring 7 out of 10 on average.
• Example B: The mutual fund's mean annual return is 8% per year.
The zero hypothesis (Shortened H0) is considered true for logic purposes to decide if the null hypothesis is to
be dismissed. This assumption determines the possible set of potential estimates (e.g., the average
performance of 30 student tests) (e.g. some acceptable ranges that differ from 6.2 to 7.8 because the median
population is 7). If this range is below the average value, the zero results are rejected. The variance is then
believed to be "explanatory by chance alone," falling within the continuum that is described by chance alone.
One crucial thing to note is that the null hypothesis is tested such that its validity has an element of ambiguity.
Whatever information is taken against the stated null hypothesis by the alternate (H1). In the following
instances, the alternative definition is:
• The average student score is 7.
• The average cumulative income of the mutual fund is not equal to 8% annually. In other words, a simple
opposite is an alternative response to the null hypothesis.
Hypothesis testing for investments
For a capital-market comparison, assume Alice is expecting better overall returns than just purchasing and
keeping its portfolio. The null hypothesis assumes that the two real returns do not differ a lot, so Alice would
probably conclude that before a proof is different. The non-proofing of the null hypothesis is to show the
statistical significance that can be checked with different measurements. The alternative theory says that
higher investment program returns are than traditional buying and holding methods. The p-value is used for a
statistically valid estimation of the results. Often a p-value of less than or equal to 0.05 is used to indicate that
the zero statements are proven. If Alice completes one such experiment such as a check on the standard
formula and demonstrates that its returns and buy-and-hold returns vary substantially (p-value is less than or
equivalent to 0.05), she will contradict the null hypothesis and deduce the unstated theory.

2-26 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty

Statistical significance IBM ICE (Innovation Centre for Education)


IBM Power Systems

• Statistical importance is an analyst's conviction that the findings in the data cannot be
interpreted by chance alone.

• The tool by which the analyst makes the decision is mathematical hypothesis testing.

Figure: Probability and Significance overview


Source: https://images.app.goo.gl/jAxPTMP9VVxHhdHs6

Figure 2-14. Statistical significance PAD011.0

Notes:
Statistical significance
Understanding statistical significance: Statistical importance is a prediction of the null hypothesis,
hypothesizing that the findings are due purely to chance. A collection of data provides statistical meaning
when the p-value is minimal enough. If the p-value is high, then the test outcomes may be interpreted by luck
only, and the test is compatible with the null hypothesis (whilst not demonstrating). When the p-value is
relatively low (e.g., 5 percent or less), otherwise the outcomes are not readily interpreted by chance alone,
and the evidence is considered incompatible with the null hypothesis; in this situation, the null hypothesis of
chance alone as an interpretation of the findings is discarded in favor of the more rigorous theory.
Example of statistical significance
Presume Joe sample, a business analyst, is concerned about how any creditors were informed of the
unexpected collapse of a corporation in preparation. Joe wants to equate the overall regular stock returns
before the collapse of the business with those after determining that there is a statistically meaningful gap in
the two figures. The p-value of the analysis was 28 percent (> 5 percent), suggesting that under chance-only
interpretation a disparity as significant as the observed (-0.0033 to + 0.0007) is not uncommon. On the other
side, if the p-value is 0.01 percent (far less than 5 percent), then under possibility-only interpretation, the
observed discrepancy will be quite rare. In this scenario, Joe may prefer to ignore the null hypothesis, and
then examine whether other traders have advance information.

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

Often used for researching novel pharmaceutical items, including medications, tools, and vaccinations, is
statistically important. Publicly accessible statistical importance studies often educate customers on how
effective the company is in launching new goods. A pharmacy pioneer of diabetes medicine, for example,
announced that a statistically meaningful decline of type 1 diabetes resulted after evaluating the new insulin.
The research required 30 weeks of supervised monitoring for diabetes patients, and the findings obtained a
p-value of less than 5%. This indicates that the results indicate a statistically important decrease in type 1
diabetes, for creditors and regulatory authorities. Announcements of statistical importance for their latest
drugs often greatly impact the market values of pharmaceutical firms.

2-28 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty

Self evaluation: Exercise 8 IBM ICE (Innovation Centre for Education)


IBM Power Systems

• To continue with the training, after learning the various steps involved in pattern recognition
and anomaly detection, it is instructed to utilize the concepts to perform the following activity.

• You are instructed to write the following activities using python code.

• Exercise 8: Sparse kernel machines

Figure 2-15. Self evaluation: Exercise 8 PAD011.0

Notes:

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

Hypothesis testing IBM ICE (Innovation Centre for Education)


IBM Power Systems

Figure: Hypothesis testing Process


Source: https://images.app.goo.gl/DcJE1FPHEm9xLeT89

Figure 2-16. Hypothesis testing PAD011.0

Notes:
Hypothesis testing: In statistics, hypothesis testing is an act whereby an observer checks an inference
about a population parameter. The analyst's approach relies on the quality of the information included, and
the justification for the study. Theory checking is conducted for the aid of experimental evidence to determine
the plausibility of a theory. These data may come from a broader population, or from a mechanism that
produces data. In each of these instances, the term "nation" would be included in the definitions below.
How hypothesis testing works?
A researcher measures a statistical study in hypothesis testing, to present data on the plausibility of the null
hypothesis. Statistical researchers evaluate a theory by calculating and analyzing a demographic sample that
is being evaluated at random. To check two separate theories, both researchers using a random population
sample:
• Null hypothesis.
• Alternate hypothesis.
The null hypothesis is generally an equality hypothesis amongst population parameters, e.g., a null
hypothesis will say that the mean return of the population is equal to zero. The alternate hypothesis is also
the inverse of a null hypothesis, e.g., the median return of the sample is not computable. Consequently, they
are mutually exclusive, so one may be valid. But one of two theories are probably likely to be accurate.

2-30 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty

Four steps of hypothesis testing IBM ICE (Innovation Centre for Education)
IBM Power Systems

• The first step is to state the two assumptions so that only one hypothesis is right.

• The next step is to establish a method of analysis to analyze the data.

• The third stage is the implementation of the software and the realistic evaluation of the
sample data.

• The fourth and final stage is to determine and dismiss the null hypothesis or to suggest that
the null hypothesis is plausible if the information is given.

Figure 2-17. Four steps of hypothesis testing PAD011.0

Notes:
Four steps of hypothesis testing: All hypotheses are checked via a 4-step method as above.

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

Real-world example of hypothesis


testing IBM ICE (Innovation Centre for Education)
IBM Power Systems

Figure: Beta Risk


Source: https://images.app.goo.gl/d8JcqwQ4Lws3oXN19

Figure 2-18. Real-world example of hypothesis testing PAD011.0

Notes:
Real-world example of hypothesis testing
For example, if a person decides to check whether a penny has precisely a 50 percent probability of falling on
faces, then the null hypothesis will be yes, and the alternate hypothesis would be no (it does not land on
heads). Mathematically it will reflect the null hypothesis as Ho P=0.5. The alternate explanation will be
referred to as "Ha" which would be the same as the null hypothesis, but for the struck-through equivalent
symbol, which implies that it is not equivalent to 50%. A reasonable set of 100-coin flips is made and then
checking the null hypothesis. If the 100-coin flips were shown to have been allocated as 40 heads and 60
tails, the researcher will conclude that a penny had no 50 percent probability of falling on the heads and will
deny the null hypothesis and consider the alternative hypothesis.
If on the other side, 48 heads and 52 tails were present, then the coin might be equal and yet yield such an
outcome. In situations like this where the null hypothesis is "accepted," the observer notes that the disparity
between the findings predicted (50 heads and 50 tails) and the findings found (48 heads and 52 tails) is
"explicable by chance alone.“
Beta risk: The beta chance is the likelihood that a predictive study would consider an incorrect null
hypothesis. This is often referred to as a form II fault of market danger. The word "risk" in this sense refers to
chance or a probability of making a false decision. The principal determinant of beta exposure is the sample
size used during the study.

2-32 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty Understanding beta risk: Beta risk, where an alternate hypothesis is valid, can be described as the danger
identified in incorrectly believing the null hypothesis. Simply placed, it takes the opinion that there is no
distinction when there is, in truth, one. To identify discrepancies, a statistical method will be used, and the
beta chance is the likelihood that a statistical method would not be able to do so. E.g., if the beta risk is 0.05,
the chance of inconsistency is 5 percent.
Beta risk is often referred to as "beta mistake," which is sometimes mixed with "alpha risk," often referred to
as a type I mistake. The alpha danger is a mistake that happens when a null result is turned down because it
is valid. The easiest approach to raising alpha exposure is to increase the scale of the study being tested, in
the expectation that the greater study would be more reflective of the community. Beta risk is dependent on
the essence and character of a decision being made and can be decided by a business or person. It depends
on the degree of sample mean-variance. The way beta danger is handled by improving the sample size of the
study. In decision making, a reasonable amount of beta danger is around 10 percent. Every number higher
will cause sample size increases.
Examples of beta risk
The Altman Z-score can be used to render an important use of hypothesis testing in finance. The Z-score is a
mathematical formula designed to forecast a company's potential failure dependent on such financial metrics.
Statistical studies of Z-score precision have demonstrated fairly good performance, forecasting recession
within one year. Such studies indicate a beta probability varying from around 15 percent to 20 percent, based
on the sample being evaluated (firms expected to go bankrupt but did not).
Beta risk vs. beta
Beta is also defined as the beta coefficient in the financial sense and is a calculation of a commodity or
portfolio's uncertainty, or statistical risk, relative to the market. In brief, an investment's beta suggested that it
is competitive toward the economy. It is a part of the CAPM (“capital asset pricing model”), which measures
an asset's projected return depending on its volatility and anticipated returns on the sector. As such, beta is
linked to beta danger just tangentially in the decision-making sense.

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

Bonferroni test IBM ICE (Innovation Centre for Education)


IBM Power Systems

• A Bonferroni test is multiple forms of comparison used for statistical evaluation. Finally, a
result can emerge from many hypothesis experiments with different variables, suggesting the
dependent variable's statistical significance, although none exists.

Figure 2-19. Bonferroni test PAD011.0

Notes:
Bonferroni test: When a single test returns 99 % of the correct results, 100 tests will lead to a false result in
any part of the mix. By altering the contrast monitoring, the Bonferroni test attempts to avoid incorrect findings
statistically significant. The "p" value for each process should be equal to the sum of tests broken up by alpha
indicates that the process of Bonferroni, also known as the "Bonferroni correction" or the "Bonferroni shift.“
Bonferroni check understanding: The Bonferroni test is the Italian mathematician who invented it after
Carlo Emilio Bonferroni (1892–1960). The Scheffe test and the Tukey-Kramer test method are other types of
multiple tests for comparison. A criticism of the Bonferroni test was that it was too restrictive to classify such
significant tests. In mathematics, a null hypothesis is simply that the relation of two data sets does not differ
significantly. To test for the hypothesis, a statistical analysis needs to be performed to confirm or deny a zero.
The analysis is conducted by randomly choosing a population or collective sample. When testing the null
hypothesis, the alternative hypothesis is always calculated, since the two outcomes are mutually exclusive.
There is, however, a chance that a false-positive result might happen for some test of a null hypothesis. Such
an error is called a form 1 error and the result is an error rate. A variety of occurrences are supposed to lead
to a mistake in a certain way. For instance, a test will usually be given a 5% error rate, meaning that 5% of the
time is incorrect. The error rate of 5% is considered in the alpha stage. However, given that a sample has
several similarities, the error rate will influence the tests for each comparison, generating some false-positive
results. Bonferroni found a way to describe some of the high hypothesis error rates. Bonferroni is amended
and divided into alpha-value by taking the number of steps. Using the 5 % error rate, the example would give
two experiments a 0.025 or (.05/2) error rate, and the error rate will be 0.0125 or 0.05/4 for four experiments.

2-34 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty

Check of one-tailed IBM ICE (Innovation Centre for Education)


IBM Power Systems

• A one-tail test is a statistical test in which a distribution's critical area is unilateral so that the
value is either greater or lower than a certain value but not both.

• If the test sample falls into the critical unilateral zone, an alternative hypothesis rather than
the null hypothesis is accepted.

Figure 2-20. Check of one-tailed PAD011.0

Notes:
Check of one-tailed
The essence of a one-tailed: Checking hypotheses is a central inferential statistics principle. Hypothesis
work is carried out to determine if the Group criterion is a true hypothesis or not. Research that demonstrates
whether the mean of the survey is significantly higher than the population average and significantly lower is
called two-tailed research. When the analysis is structured to demonstrate that the sample median is higher
or lower than the population median, it is called a single-tailed test. The one-tailed test is named after
measurements of the area under one of the usual tails (sides) of distribution, but for other non-standard
distributions, this approach can also be used. It is important to create zero and alternative hypotheses before
the single-tailed test can be performed. The researcher attempts to refute a null hypothesis. The second
reason is the statement that conflicts with the null hypothesis.
One-tailed test example
Let us say that an investor must prove that an S&P 500 index was 16.91% higher than a money manager
each year. He may define the following hypotheses: null (H0).
μ = 16.91 H0:
μ > 16.91 ha:
The zero hypothesis is the predictor that the investigator wants to deny. The analyst’s claim that the fund
manager was more than the S&P 500 is an alternative reason. When a single-tailed analysis is not successful
in dismissing the null, the alternative hypothesis would be preferred.

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

On the other hand, if the test results are not negative, the researcher will analyze the findings of the portfolio
manager more closely. The refusal region is on one side of the sample distribution in a single-tailed analysis.
The analyst should carry out a high-level test on the importance in which Severus values appear on the upper
tail of the traditional dividing curve to determine how the return on investment of the fund is compared with the
market indicators. The one-tail test in the upper right or tail region of the curve will tell the researcher how
much greater your return on the portfolio is than the index return, and if the difference is important.
One percent, 5%, or 10%: Most used in one-tailed test importance (p-values).
Significance in a single-tailed study
To determine how significant the return gap is, the norm must be established. Almost exclusively the letter p,
which stands for probability, expresses its degree of importance. The degree to which the null statement is of
interest is the likelihood of wrongly assuming. The significant element of a single tail test can be 1%, 5%, or
10%, whereas the researcher or statistics will determine for themselves a casual measure. The likelihood
value is determined if the null hypothesis is true. The less the p-value, the more evidence the null statement is
wrong.
If the resulting p-value is less than 5%, then the difference between both observations is statistically
significant, and the null hypothesis is rejected. Following our example above, if p-value=0.03, or 3%, then the
analyst can be 97% confident that the portfolio returns did not equal or fall below the return of the market for
the year. He will, therefore, ignore H0 and accept the statement that the portfolio manager surpassed the
index. Even a single tail of distribution would be half the likelihood of a two-tail distribution if similar
experiments have been tested using all methods of hypotheses testing.
By using a one-tailed approach the researcher tests the probability of relationship in one field and completely
ignores the probability of a relationship in another path. The analyst is interested to see if the return on
investment is higher than on the market using our example above. In this case, the fund manager does not
have to pay critically if the S&P 500 index has been underestimated. Therefore, a one-tailed cycle is only
suitable when the outcome is not important at the other end of a spectrum.
The results are 1 to 6 if you roll a decent die. It is equally possible to perform these checks, and this is the
basis for a stable distribution. Although all n possible consistent distribution outcomes are highly likely, unlike
Bernoulli distribution.
If the density function is: A vector X is assumed to be distributed evenly a and b are parameters for a
standardized distribution.

Figure: Uniform distribution curve


Source: https://images.app.goo.gl/yZKbbZ61owgMWRs89
The amount of bike sold in a showroom is equivalent to an average of 40 and a minimum of 10 every day.
Let us try to estimate the likelihood that daily revenues will fall from 15 to 30.
Day sales are expected to fall from 15 to 30 (30-15)*(1/(40-10))= 0.5
Likewise, the probability of daily sales over 20 is = 0.667

2-36 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty The mean and variance after a stable distribution are:


Mean = (a+b)/2
Variance = (b-a)²/12
The standard uniform density has parameters a = 0 and b = 1, so the PDF (Probability Density Function) for
standard uniform density is given by:

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

Probability distributions IBM ICE (Innovation Centre for Education)


IBM Power Systems

Figure: Discrete data and continuous data


Source: https://images.app.goo.gl/JdPuZvCNUxvNRV9S7

Figure 2-21. Probability distributions PAD011.0

Notes:
Probability distributions: As per Wikipedia (https://en.wikipedia.org/wiki/probability_distribution), “In
probability theory and statistics, a probability distribution is the mathematical function that gives the
probabilities of occurrence of different possible outcomes for an experiment. More specifically, the probability
distribution is a mathematical description of a random phenomenon in terms of the probabilities of events. For
instance, if the random variable X is used to denote the outcome of a coin toss, then the probability
distribution of X would take the value 0.5 for X = heads, and 0.5 for X = tails (assuming the coin is fair).
Examples of random phenomena can include the results of an experiment or survey.
A probability distribution is a mathematical function that has a sample space as its input and gives a
probability as its output. The sample space is the set of all possible outcomes of a random phenomenon
being observed; it may be the set of real numbers or a set of vectors, or it may be a list of non-numerical
values. For example, the sample space of a coin flip would be {heads, tails}.
Probability distributions are generally divided into two classes. A discrete probability distribution (applicable to
the scenarios where the set of possible outcomes is discrete, such as a coin toss or a roll of dice) can be
encoded by a discrete list of the probabilities of the outcomes, known as a probability mass function. On the
other hand, a continuous probability distribution (applicable to the scenarios where the set of possible
outcomes can take on values in a continuous range (e.g. real numbers), such as the temperature on a given
day) is typically described by probability density functions (with the probability of any individual outcome
being 0). The normal distribution is a commonly encountered continuous probability distribution. More
complex experiments, such as those involving stochastic processes defined in continuous time, may demand
the use of more general probability measures.

2-38 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty A probability distribution whose sample space is one-dimensional (for example real numbers, list of labels,
ordered labels, or binary) is called univariate, while a distribution whose sample space is a vector space of
dimension 2 or more is called multivariate. A univariate distribution gives the probabilities of a single random
variable taking on various alternative values; a multivariate distribution (a joint probability distribution) gives
the probabilities of a random vector a list of two or more random variables taking on various combinations of
values. Important and commonly encountered univariate probability distributions include the binomial
distribution, the hypergeometric distribution, and the normal distribution. The multivariate normal distribution
is a commonly encountered multivariate distribution”.
The mathematical representation of binomial distribution is given by:

A binomial distribution graph, where the probability of success does not suit the risk of failure.

Figure: Binomial distribution


Source: https://images.app.goo.gl/syLF4xUhbBuTvaTi7
If you are in a call center, how many calls are you going to get in one day? It could be a number. Today, the
allocation of Poisson is based on the total number of telephones calls a day. Additional sources are:
• Emergency call numbers reported in one day at a hospital.
• The number of robberies reported on a day in an area.
• The number of clients who arrive in one hour at the lounge.
• In a specific place, the number of suicides registered.
• The number of errors in printing on each book page.
• Poisson distribution is true in cases where incidents arise at random time and space points where our
value rests primarily in the number of incidents.

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

When the following assumptions are valid, a distribution is named Poisson distribution:
• The result of any positive event will not be affected by any successful case.
• The probability of success over a brief time will match the possibility of success over a longer cycle.
• Within an interval, the likelihood of performance exceeds zero as the time is smaller.
Formula with example:

X= No of successes = 35.
T= A length of time = 10 min.
λ = Average no of success.
e = constant

Figure: Poisson distribution


Source: https://images.app.goo.gl/V3kMk5Utv2SsVzEx5

2-40 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty

Self evaluation: Exercise 9 IBM ICE (Innovation Centre for Education)


IBM Power Systems

• To continue with the training, after learning the various steps involved in pattern recognition
and anomaly detection, it is instructed to utilize the concepts to perform the following activity.

• You are instructed to write the following activities using python code.

• Exercise 9: Sampling methods for pattern recognition.

Figure 2-22. Self evaluation: Exercise 9 PAD011.0

Notes:

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

Types of distributions IBM ICE (Innovation Centre for Education)


IBM Power Systems

• Bernoulli distribution:
– Bernoulli equation defines events with precisely two real-life outcomes. Several illustrations of such
activities are as follows: a team wins a tournament or not, a student passes or fails an assessment
and a roll-out dice shows either a 6 or another number.
– Only two possible tests, namely 1 (success) and 0 (failure) have a Bernoulli distribution and only one
analysis.

• Therefore, a random X variable with a Bernoulli distribution will have value 1 at the probability
of success (p), and value 0 at the probability of failure( q or 1-p).

• An incident of the head here is a success, and a tail occurrence is a deception. Having a
head = 0.5 = probability to get a neck because only two of these scenarios are likely.

Figure 2-23. Types of distributions PAD011.0

Notes:
Uniform distribution: The results are 1 to 6 if you roll a decent die. It is equally possible to perform these
checks, and this is the basis for a stable distribution. Although all n possible consistent distribution outcomes
are highly likely, unlike Bernoulli distribution.
Take the call center example once again. How long is the call time gap? Here we are rapidly expanding to our
rescue. The time interval between the calls is exponential probability models.
Additional cases are:
• Period of arrivals of between metro.
• The life of a laptop.
For evaluating life, the exponential distribution is also used. Expenditure distribution essentially ranges from
the life expectancy of a computer to a person's life expectancy.

2-42 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty Example: Let us take the example, x which is the amount of time taken (in minutes) by an office peon to
deliver from the manager's desk to the clerk's desk. The function of time taken is assumed to have an
exponential distribution with the average amount of time equal to five minutes.
Given that x is a continuous random variable since time is measured.
Average, μ = 5 minutes
Therefore, scale parameter, λ = 1 / μ = 1 / 5 = 0.20
Binomial distribution: In an experiment or test repeated several times, a binomial distribution be simply the
likelihood of a SUCCESS or FAILURES result. The binomial is a distribution type with two possible results
(the "bi" prefix means two or twice). A coin toss, for instance, only has two possible results heads or tails, and
testing would lead to two possible results pass or fail. In real life, there are several cases of binomial
distributions. For instance, if a new medication is developed to treat a disease, the disease will either be
cured (it is successful) or the disease will not be healed (it is a failure) you will either win money when you buy
a lottery ticket, or you will not. Anything you can think of is only successful or can fail by a binomial
distribution.
Normal distribution: The normal distribution is the most important distribution of probabilities, as it applies to
many natural phenomena. For instance, the normal distribution is followed by heights, blood pressure,
measurement error, and IQ values. The distribution and bell curve are known as the Gaussian. The default
variance influences the distribution range. A smaller default variance indicates that data are clustered closely
around the average; normal distribution is larger. A broader default indicates that the data are distributed over
the average; the normal distribution is flatter and broader.
Poisson distribution: The poisson distribution is the discrete probability distribution of the number of events
occurring in a given time period, given the average number of times the event occurs over that time period. A
certain fast-food restaurant gets an average of 3 visitors to the drive-through per minute.
Exponential distribution: In probability theory and statistics, the exponential distribution is the
probability distribution of the time between events in a Poisson point process, i.e., a process in which events
occur continuously and independently at a constant average rate. It is a particular case of the
gamma distribution.

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

Regression models IBM ICE (Innovation Centre for Education)


IBM Power Systems

Figure: Regression Analysis


Source: https://images.app.goo.gl/yAcBF7tq1zwh4gWaA

Figure 2-24. Regression models PAD011.0

Notes:
Regression models
What is the study of regression?
Regression analysis is a kind of methodology for predictive modeling that explores the relationship between a
dependent (target) and an independent (predictor) variable (s). The technique is used to detect the relation of
causal effects between variables, model time series, and predict.
Why do we use regression analysis?
Predicting a company's revenue development dependent on current economic conditions. We have the latest
statistics from the business that shows that revenue development is around two and a half times that of the
economy. With the perspective, we can forecast the company's potential revenues based on present & past
knowledge. Regression analysis often helps one to analyze the impact of measurable factors on different
measures, such as the influence of market increases, and the number of advertising events. Such
advantages assist market researchers/business analysts/data scientists in extracting and determining the
right selection of variables to be utilized to construct analytical replicas.
Benefits of using regression analysis:
• This demonstrates the important interaction between contingent and independent variables.
• This shows the effect intensity of a contingent variable of several independent variables.

2-44 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty

Self evaluation: Exercise 10 IBM ICE (Innovation Centre for Education)


IBM Power Systems

• To continue with the training, after learning the various steps involved in pattern recognition
and anomaly detection, it is instructed to utilize the concepts to perform the following activity.

• You are instructed to write the following activities using python code.

• Exercise 10: Decision tree.

Figure 2-25. Self evaluation: Exercise 10 PAD011.0

Notes:

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

Types of regression IBM ICE (Innovation Centre for Education)


IBM Power Systems

Figure: Regression types


Source: https://images.app.goo.gl/59CM8MmjMP1sGPQJA

Figure 2-26. Types of regression PAD011.0

Notes:
Linear regression: Linear regression is the best fit line for determining a relationship between a dependent
variable (Y) and one or more independent variables (X). This is often called a regression line. The equation
Y=a+b*X+e is represented, where a is the intercept, b is line slope and e is the error term. The value of the
target variable centered on a given preview variable (s) can be calculated by this equation. There are (> 1)
independent variables in multiple linear regressions like simple linear regression and multiple linear
regression, but simple linear regression has only 1 independent variable. Now the question is 'Why are we
able to fit perfectly?‘
How do I get the best combination (value of a and b)?
This task is rapidly achieved by the least square method. It is the most common technique used when fitting
out a regression grid. It determines the best fit line for the data observed by reducing the total squares of
vertical variance from each data point to the line.
Important points:
• The connections between independent and dependent variables must be linear.
• Multiple regression is multi-linear, autocorrelated, and heteroscedastic.
• Linear regression is very prone to outliers. The regression line and ultimately its predicted values can be
affected terribly.
• We can use forward collection, reverse exclusion, and measures to pick the most important independent
variables in the case of multiple indigenous variables.

2-46 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty

Figure: Simple Linear regression


Source: https://images.app.goo.gl/xJUXRt6w8tiUDNjx7
Logistic regression: Logistic regression is used to measure the probability of an event (success/failure).
When the dependent variable is conditional, we can use logistic regression (0/1, True / False, Yes / No).
Important points:
• Classification problems can solve with the help of logistic regression.
• We should have all relevant factor preventing overfitting and under fitting and the efficient use of a
step-by-step method to forecast regression in the logistics regression.
• Large sample measurements are needed as the maximum probability calculations in small sample sizes
are less successful than those in the usual fewer square samples. The independent variables, i.e. no
multicollinearity, shall not be connected; however, we do have the tools to use in the analysis and model
categorical variables results for their interaction.
• If the variable values are ordinal, the logistic regression is considered normal.
• If multi-class dependent variable, it is known as the multinomial regression.

Figure: Logistic function


Source: https://images.app.goo.gl/VqEkoDXP6aEMbFsq5

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

Polynomial regression: Polynomial regression modeled as nth grade polynomial in the relationship
between an independent variable x and a dependent variable y. Polynomial regression refers to the nonlinear
relationship between the value of x and the correct conditional mean y, referred to as E(y |x). This method of
regression does not consist of an optimum correspondence line, but instead of a curve that fuses with data
points. When a higher polynomial degree will appear to be applied to make a less error, it can overlap. Also,
the relationships are plotted to see the lines and try to make sure the current curve is in line with the problem.

Figure: Polynomial regression


Source: https://images.app.goo.gl/jEoNRKpAyP3LDx1D8
Stepwise regression: This model is the best fit for multiple independent variables. An automated system is
used, which does not require human intervention, to obtain independent variables under this technique. To
distinguish between essential variables, this method is carried out by statistical measures such as R-square,
t-status, and AIC metric (Akaike information criterion). Step-by-step regression essentially corresponds with
the principle of regression by adding or removing co-variable values one by one based on defined
parameters.
Ridge regression: Ridge regression means a technique used when the outcomes are influenced by
multicollinearity (independent variables are highly correlated). Even though in multicollinearity the
less-square (OLS) figures are unbiased, their variances are commonly found and deviate away from the true
value. Above, we saw the linear regression equation y=a+ b*x. This equation also has an error term. The
complete equation becomes y=a+b*x+e (mistake term), [mistake term is the right error value between the
observed and expected value] (mistake term).
y=a+y=a+ b1x1+ b2x2+....+e, for multiple independent variables.
Prediction errors may be broken down into two subcomponents of a linear equation. The first is skewed, and
the second is heterogeneity. There might be a prediction failure for either or both components and the two.
The ridge regression tackles the problem of multicollinearity by the shrinkage function μ (lambda). The
second component is the lambda of the β-2 (beta-square) summation in which β is the exponent, and the
parameter of β to be shortened by a very small variance added to the smaller term. We have two elements of
the equation.
Important points:
• The parameters of this regression are like the least squared regression except that normality cannot be
considered.
• Ridge regression reduces coefficients but does not exceed zero so that a selection function is not
involved.

2-48 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty

Figure: Ridge regression


Lasso regression: Like ridge recession, the Lasso method penalizes the absolute scale of the regression
coefficients frequently. It also decreases uncertainty and increases the precision of the linear regression
models.
Discriminative classification
Discriminative classifiers check which is the most useful feature to learn from the feedback of various
possible classes. Logistic regression is an example of a biased classifier. Mathematically, the post-probability
P(y) is calculated or the map from x to y is learned.

Table: Discriminative and Generative classifier


Source: https://images.app.goo.gl/tTTX4E3MnpEbkjEh9
Generative classifier
A generative classifier tries by estimating the assumptions and distributions of the model to learn the model
which generates information behind the scenes. This is used to forecast unknown data as the studied model
captures the true model.
Bayes classifier

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

MLE for Gaussians


The Maximum Likelihood Estimate (MLE) in mathematics is a method by modifying a likelier function to
calculate the probability distribution parameters, such that the proof foundation is most likely in the supposed
statistical model. The maximum estimate of the probability is the maximum function point of the parameter
space. The logic of high probabilities is easy and reliable, making the method a strong predictive inference.

Computing the mean


In general, you calculate the mean or average of a set of numbers by adding them all up and dividing by how
many numbers you have. This can be defined as follows:
For a set of numbers, {x1, x2, x3, ... xj} the mean or average is the sum of all "x" divided by "j"..
Computing the variance

Gaussian discriminant analysis

2-50 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty Important points:


• The standards for Lasso regression are the same as the least squared and normal regression which
cannot be predicted.
• Lasso regression reduces infinity (exactly endless) coefficients, which undoubtedly will help collect
applications.
• When predictor classes are closely connected, Lasso selects only one class and reduces others to zero.

Figure: Lasso regression


Elastic net regression: Elastic net is a combination of the Lasso and the ridge regression methods. The
Elastic net is supplied with L1 and L2 before normalization. The Elastic net is useful since certain functions
associated with it are related. Lasso-ridge trading off provides a functional advantage in allowing Elastic-net
to retain some of ridge's flexibility in rotation.
Important points:
• This makes collective control possible with closely associated variables.
• The number of selected variables is not limited.
• Can have double retrenchment.

Figure: Elastic net regression

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

How to select the best model for


regression? IBM ICE (Innovation Centre for Education)
IBM Power Systems

• A research institute asks its students to perform linear regression-whether the outcome is
constant.

• When you have a conditional regression logistics requirement! However, the more options
you have, the easier it is to pick one.

Figure 2-27. How to select the best model for regression? PAD011.0

Notes:
How to select the best model for regression?
• It is important that, according to the type of independent and dependent variables, dimensions of data,
and other fundamental data characteristics, a fit approach is selected in multiple regression models as
below:
- Data exploration is an important part of constructing a mathematical model. You can take the first
step to identify the relation and impact of variables before you choose the correct pattern.
- We shall assess various metrics like statistical validity for parameters, R-Square, Adjusted R-square,
AIC, BIC, and error words to determine the fitness of specific models.
- You should not use an automated sample selection system if you have multiple confounding
variables in your data set, because you do not want to configure this simultaneously.
- This can happen when a less efficient model is simply added in opposition to a strongly statistically
important model. It will also depend on your target.
- Methods of regularization regression (Lasso, Ridge, and Elastic Net) function well between data sets
variables with a large dimension and multicollinearity.

2-52 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty

Common questions IBM ICE (Innovation Centre for Education)


IBM Power Systems

• How many regression types do we have?

• How much mathematical knowledge is required to understand regression?

• Ridge vs. Lasso Regression - what is the difference?

• Which types of problems can be solved using regression?

• What are the major challenges faced by regression techniques?

• Is Regression Analysis relevant in the industry?

• Which programming language works best for regression?

Figure 2-28. Common questions PAD011.0

Notes:
Common questions
How many regression types do we have?
• Ridge regression.
• Lasso regression.
• Polynomial regression.
• Stepwise regression.
• Elastic net regression.
How much mathematical knowledge is required to understand regression?
It is quick and straightforward to grasp the mathematics behind regression analysis. Number know-how
includes:
• Probability.
• Partial derivation.
• Linear algebra.
• Statistics.

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

Ridge vs. lasso regression what is the difference?


Ridge penalizes the loss function in statistical terms by applying the squared value of the coefficients while
lasso regression penalizes the loss function by introducing the absolute value of the vector coefficient.
Which types of problems can be solved using regression?
Every issue that has a connection between cause and effect can be solved by regression analysis.
Techniques of regression allow you to solve both linear and classification problems. Several realistic
deployments include:
• Predicting prices of a commodity.
• Predicting demand of a commodity.
• Predicting binary outcomes such as credit default.
• Predicting multi-class problems such as genre of movie, etc.
What are the major challenges faced by regression techniques?
Any of the challenges that regression approaches address includes:
• Multicollinearity: A condition where there is a connection between the predictor variables.
• Correlation of error terms: That is where the error terms are plotted in the graph and create a sequence.
• Underfitting/overfitting: If there is an excess of predictor variables, the regression model that over fit and it
will result in under fitting if there is not sufficient evidence.
Is regression analysis relevant in the industry?
Regression analysis is one of the most widely employed approaches in project management, information,
and computer science. Given the enormous number of breakthroughs in machine learning and the plenty of
other algorithms out there, in several organizations linear regression remains the most common
methodology.
Which programming language works best for regression?
Here is the magic of regression analysis you can create regression models using any method or
programming language. In MS Excel, R, Python, Minitab, KNIME you will do regression analysis the list goes
on and on.

2-54 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty

Linear models for classification IBM ICE (Innovation Centre for Education)
IBM Power Systems

Figure: Classification and regression Model


Source: https://images.app.goo.gl/fzwieTpyn2DHXAwB9

Figure 2-29. Linear models for classification PAD011.0

Notes:
Linear models for classification
Classification: Classification is a process involving the use of machine learning algorithms that know how to
apply a class mark to problems domain instances. An idea that is simple to understand is to identify emails as
"spam" or "not spam."

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

A real-life example of classification:

Figure: Classification real-life example


Source: https://images.app.goo.gl/RS1vmd5PcW9Pj9Jz8

2-56 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty
Example of positive linear
regression IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Price elasticity research.

• Risk evaluation in an insurance company.

• Sports analysis.

Figure 2-30. Example of positive linear regression PAD011.0

Notes:
Example of positive linear regression
Price elasticity research: Price changes often affect the behavior of the consumer and linear regression can
help you to analyze how. For example, you can use a regression analysis if the price of a commodity
continues to rise to decide if demand decreases as prices increase. What if there is a substantial decrease in
consumption with increasing prices? How do customers stop buying the product at what price? This
knowledge will be of great benefit to retail executives.
Risk evaluation in an insurance company: The risk analysis may be using linear regression techniques.
An insurance company, for example, maybe constrained in resources to investigate insurance claims for
homeowners; the team should create a model to quantify the cost of claims using linear regression. The
research will help business leaders determine what threats they are dealing with.
Sports analysis: It is not just a matter of business in linear regression. In sports, too, it is relevant. For
example, you might question if the number of games a basketball team has won in the season is related to
the average team score of points per game. A distributed plot reveals the linear relation of these variables.
There are also linear relations between the number of games played and the total number of points earned by
the opponent. The relationship between these variables is negative. The total number of points earned by the
opponent decreases with the increase in the number of won games. You can model the relationship between
these variables with linear regression. You can use a reasonable model to determine how many teams will
win in the game.

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-57
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

Checkpoint (1 of 2) IBM ICE (Innovation Centre for Education)


IBM Power Systems

Multiple choice questions:

1. Memory decay affects what kind of memory?


a) Short tem memory in general
b) Older memory in general
c) Can be short term or older
d) None of the mentioned

2. How is pattern information distributed?


a) It is distributed across the weights
b) It is distributed in localized weights
c) It is distributed in certain proactive weights only
d) None of the above

3. What are the requirements of learning laws?


a) Learning should be able to capture more & more patterns
b) Convergence of weights
c) All the mentioned
d) None of the above

Figure 2-31. Checkpoint (1 of 2) PAD011.0

Notes:
Write your answers here:
1.
2.
3.

2-58 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty

Checkpoint (2 of 2) IBM ICE (Innovation Centre for Education)


IBM Power Systems

Fill in the blanks:

1. _______factors affect the performance of learner system does not include?


2. In language understanding, the levels of knowledge that does not include ___.
3. _______consists of the categories which does not include structural units.
4. A search algorithm takes _____as an input and returns ____as an output.

True or False:

1. In pattern mapping problem in neural nets, is there any kind of generalization involved
between input & output? True/False
2. Linear neurons can be useful for application such as interpolation, is it true? True/False
3. Does pattern classification & grouping involve same kind of learning? True/False

Figure 2-32. Checkpoint (2 of 2) PAD011.0

Notes:
Write your answers here:
Fill in the blanks:
1.
2.
3.
4.
True or false:
1.
2.
3.

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-59
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook

Question bank IBM ICE (Innovation Centre for Education)


IBM Power Systems
Two mark questions:
1. What is probability distributions ?
2. What are the components of probability distributions ?
3. List any 3 types linear models for regression.
4. What is y=mx+c formula for linear regression?

Four mark questions:


1. What is multiple regression model?
2. Describe r-squared method.
3. Describe classification techniques.
4. Describe any 3 types of classification methods.

Eight mark questions:


1. Explain linear models for classification.
2. Explain probability distributions with chart.

Figure 2-33. Question bank PAD011.0

Notes:

2-60 PAD © Copyright IBM Corp. 2020


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V10.1
Student Notebook

Uempty

Unit summary IBM ICE (Innovation Centre for Education)


IBM Power Systems

Having completed this unit, you should be able to:

• Understand the concept of probability distributions

• Gain knowledge on example of statistical approaches

• Understand linear models for regression

• Learn about linear models for classification

Figure 2-34. Unit summary PAD011.0

Notes:
Unit summary is as stated above.

© Copyright IBM Corp. 2020 Unit 2. Statistical Approaches for Pattern Recognition 2-61
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.

You might also like