You are on page 1of 4

Homework # 2 – CYS 607

Submission date: 24-03-21


Total Marks: 10

 In this homework, use the internet to search about the answers BUT
Do NOT copy and paste the answer.
 To avoid plagiarism, use an appropriate citation and referencing
style.
 Use more than one resource.
Q1- The supervised classification algorithm you choose will typically output
a real-valued score, and you will need to choose a threshold or thresholds
above which to block the activity or show additional friction. How do you
choose this threshold?

Classification is one of the supervised learning techniques to conduct


predictive analytics with the categorical outcome, it might be a binary class
or multiclass. imbalanced class distribution of a data set has encountered a
serious difficulty to most classifier learning algorithms which assume a
relatively balanced distribution.

The classification with the two-class classifier will have four possible
outcomes as follows.
True Positive or TP — an outcome where the model correctly predicts the
positive class
False Positive or FP (well-known as Type I Error)— an outcome where the
model incorrectly predicts the positive class
True Negative or TN — an outcome where the model correctly predicts the
negative class
False Negative or FN (well-known as Type II Error) — an outcome where
the model incorrectly predicts the negative class.
and FPR
The choice of a threshold depends on the importance of TPR (True Positive
Rate) and FPR (False Positive Rate) classification problem. For example, if
your classifier will decide which criminal suspects will receive a death
sentence, false positives are very bad (innocents will be killed!). Thus, you
would choose a threshold that yields a low FPR while keeping a reasonable
TPR (so you actually catch some true criminals). If there is no external
concern about low TPR or high FPR, one option is to weight them equally
by choosing the threshold that maximizes 𝑇𝑃𝑅−𝐹𝑃𝑅.

[ CITATION Qua15 \l 1033 ]

https://en.wikipedia.org/wiki/Receiver_operating_characteristic

Q2- Now suppose that you have two versions of your model with different
parameters (e.g., different regularization) or even different model families
(e.g., logistic regression versus random forest). Which one is better?

Which is better is generally determined by several things:


 It is determined on the basis of the problem, each problem has
different requirements and according to the requirements of the
problem, we decide which model is best, i.e. model with different
organization or model with different model groups (for example,
logistic regression versus random forest).

 Determined on the basis of complexity, model complexity often refers


to the number of features or terms included in a given predictive
model, as well as whether the chosen model is linear and non-linear,
etc., and can also refer to computational or arithmetic complexity of
learning. Basically, the more complex model works better for more
complex problems, but it will be more than necessary for simpler
problems. The simpler model will do the opposite.
 Determined on the basis of prediction time, it refers to the time taken
to predict the output of the algorithm after it has been trained on a
historical data set and applied to new data when predicting the
probability of a particular outcome, such as whether the customer will
act within its limits or not. Specific range. 30 days.

 Determined on the basis of training time, it simply means determining


when to learn (identifying) good values for all weights and biases
from ranked examples.

Higher accuracy typically means higher training time. Also, algorithms require
more time to train on large training data. In real-world applications, the choice of
algorithm is driven by these two factors predominantly.
Algorithms like Naïve Bayes and Linear and Logistic regression are easy to
implement and quick to run. Algorithms like SVM, which involve tuning of
parameters, Neural networks with high convergence time, and random forests,
need a lot of time to train the data.

 Determined on the basis of model sizes, once training and model are
complete, model size can vary according to the data set and algorithm
used. I want to know what is the range (in megabytes) the size of the
form ("general") can vary.

4. Linearity
Many algorithms work on the assumption that classes can be separated by a
straight line (or its higher-dimensional analog). Examples include logistic
regression and support vector machines. Linear regression algorithms assume
that data trends follow a straight line. If the data is linear, then these algorithms
perform quite good.
However, not always is the data is linear, so we require other algorithms which
can handle high dimensional and complex data structures. Examples include
kernel SVM, random forest, neural nets.
The best way to find out the linearity is to either fit a linear line or run a logistic
regression or SVM and check for residual errors. A higher error means the data
is not linear and would need complex algorithms to fit.
https://www.kdnuggets.com/2020/05/guide-choose-right-machine-
learning-algorithm.html
https://www.datasciencecentral.com/profiles/blogs/how-to-choose-a-
machine-learning-model-some-guidelines

Bibliography
Quan Zou, S. X. (2015). Finding the Best Classification Threshold in Imbalanced
Classification ✩. Xiamen, China: www.elsevier.com/locate/bdr.

You might also like