Professional Documents
Culture Documents
Model fitting is the measure of how well a machine learning model generalizes data similar to
that with which it was trained. A good model fit refers to a model that accurately approximates
the output when it is provided with unseen inputs.
Fitting refers to adjusting the parameters in the model to improve accuracy. The process
involves running an algorithm on data for which the target variable (“labeled” data) is known to
produce a machine learning model. Then, the model’s outcomes are compared to the real,
observed values of the target variable to determine the accuracy.
Overfitting negatively impacts the performance of the model on new data. It occurs when a
model learns the details and noise in the training data too efficiently. When random fluctuations
or the noise in the training data are picked up and learned as concepts by the model, the model
“overfits”. It will perform well on the training set, but very poorly on the test set. This negatively
impacts the model’s ability to generalize and make accurate predictions for new data.
Underfitting happens when the machine learning model cannot sufficiently model the training
data nor generalize new data. An underfit machine learning model is not a suitable model; this
will be obvious as it will have a poor performance on the training data.
RANSAC
RANSAC is a resampling technique that generates candidate solutions by using the minimum
number observations (data points) required to estimate the underlying model parameters.
Or
Random sample consensus, or RANSAC, is an iterative method for estimating a mathematical
model from a data set that contains outliers. The RANSAC algorithm works by identifying the
outliers in a data set and estimating the desired model using data that does not contain outliers.
Data points shown in blue, with the line of form y = mx+c estimated using RANSAC indicated
in red.
As pointed out by Fischler and Bolles, unlike conventional sampling techniques that use as much
of the data as possible to obtain an initial solution and then proceed to prune outliers, RANSAC
uses the smallest set possible and proceeds to enlarge this set with consistent data points. The
basic algorithm is summarized as follows:
RANSAC Algorithm
1: Select randomly the minimum number of points required to determine the model parameters.
3: Determine how many points from the set of all points fit with a predefined tolerance.
4: If the fraction of the number of inliers over the total number points in the set exceeds a
predefined threshold τ , re-estimate the model parameters using all the identified inliers and
terminate.