Professional Documents
Culture Documents
Machine Learning and Data Mining
Machine Learning and Data Mining
Machine Learning is a subset of AI, which enables the machine to automatically learn from data,
improve performance from past experience and make predictions.
Overfitting makes the model relevant to its own data set only and irrelevant to any other data sets. few
of the methods used to prevent overfitting include assembling, data augmentation, data simplification
and cross validation.
3. What is training set and test set in a machine learning model? How much Data will you allocate
for your training, validation and test sets?
Training Data
The observations in the training set from the experience that the algorithm uses to learn. In
supervised learning problems, each observation consists of an observed output variable and one or
more observed input variable.
Test Data
The test set is a set of observations used to evaluate the performance of the model using some
performance metric. It is very important that no observations from the training set are included in
the test sets. It will be difficult to assess whether the algorithm has learned to generalize from the
training set or has simply mentioned it.
It is common to allocate 50 percent or more to training set, 25 percent to the test data and the
remainder to the validation set.
Machine learning uses algorithms to parse data, learn from that data, and make informed decisions
based on what it has learned.
Deep learning structures algorithms in layers to create an “artificial neural network” that can learn
and make intelligent decisions on its own.
Dimensionality Reduction— the task of reducing the amount of input features in a dataset,
Anomaly Detection— the task of detecting instances that are very different from the norm, and