You are on page 1of 3

Model fitting and RANSAC,

Model fitting is the measure of how well a machine learning model generalizes data similar to
that with which it was trained. A good model fit refers to a model that accurately approximates
the output when it is provided with unseen inputs.

Fitting refers to adjusting the parameters in the model to improve accuracy. The process
involves running an algorithm on data for which the target variable (“labeled” data) is known to
produce a machine learning model. Then, the model’s outcomes are compared to the real,
observed values of the target variable to determine the accuracy.

Overfitting and Underfitting

Overfitting negatively impacts the performance of the model on new data. It occurs when a
model learns the details and noise in the training data too efficiently. When random fluctuations
or the noise in the training data are picked up and learned as concepts by the model, the model
“overfits”. It will perform well on the training set, but very poorly on the test set. This negatively
impacts the model’s ability to generalize and make accurate predictions for new data.

Underfitting happens when the machine learning model cannot sufficiently model the training
data nor generalize new data. An underfit machine learning model is not a suitable model; this
will be obvious as it will have a poor performance on the training data.

RANSAC

RANSAC is a resampling technique that generates candidate solutions by using the minimum
number observations (data points) required to estimate the underlying model parameters.

Or
Random sample consensus, or RANSAC, is an iterative method for estimating a mathematical
model from a data set that contains outliers. The RANSAC algorithm works by identifying the
outliers in a data set and estimating the desired model using data that does not contain outliers.

RANSAC is accomplished with the following steps

1. Randomly selecting a subset of the data set


2. Fitting a model to the selected subset
3. Determining the number of outliers
4. Repeating steps 1-3 for a prescribed number of iterations
For example, the equation of a line that best fits a set of points can be estimated using RANSAC.

Data points shown in blue, with the line of form y = mx+c estimated using RANSAC indicated
in red.

As pointed out by Fischler and Bolles, unlike conventional sampling techniques that use as much
of the data as possible to obtain an initial solution and then proceed to prune outliers, RANSAC
uses the smallest set possible and proceeds to enlarge this set with consistent data points. The
basic algorithm is summarized as follows:

RANSAC Algorithm

1: Select randomly the minimum number of points required to determine the model parameters.

2: Solve for the parameters of the model.

3: Determine how many points from the set of all points fit with a predefined tolerance.
4: If the fraction of the number of inliers over the total number points in the set exceeds a
predefined threshold τ , re-estimate the model parameters using all the identified inliers and
terminate.

5: Otherwise, repeat steps 1 through 4 (maximum of N times). The number of iterations, N, is


chosen high enough to ensure that the probability p (usually set to 0.99) that at least one of the
sets of random samples does not include an outlier.

You might also like