You are on page 1of 7

1.

In this question I’m going to use the route from our school which is National Tsing Hua
University to Big City which is a department store in Hsinchu.

The picture below is the route from Google Map that is recommended to take.

By the above map, the break points of line segments are selected by hand and each line
segment is computed using the least square method. Here, I divide it into 6 segments,
which equal to 12 points. Here I set seg_idx = [1, 9, 17, 54, 70, 102, 109].
Figure 1 Figure 2

Figure 1 is an example of compressing a trajectory using least square method. Figure 2


is also an example of compressing a trajectory using least square method but it is the
approximated trajectory using several line segments that are computed using the least
square method.

2. The numpy.linalg.lstsq is a function that computes the least-squares solution to a linear


matrix equation. The numpy.linalg.lstsq function solves the equation ax = b by computing
a vector x that minimizes the Euclidean 2-norm || b – ax ||^2.
a. In the first sub-question, it finds the values of w and z that minimize the equation
minw∥Aw − X∥^2 + minz∥Az − Y ∥^2. To know the errors in the previous
equation is by using the residuals, which is an object returned by
numpy.linalg.lstsq(a, b, rcond)[1] which contains the sums of squared
residuals of each element of the input matrix (a). In other words, residuals are
the differences between the observed values of the dependent variables. So first
we try to find the error of minw∥Aw − X∥^2 and minz∥Az − Y ∥^2
separately and the final error of the solution is the summation between these two
errors that we already found before.
b. rcond is one of the parameters that is used in the numpy.linalg.lstsq function.
rcond is mainly float number which represents the cut-off ratio for small singular
values of a (input matrix). For the purposes of rank determination, singular values
of a are treated as zero if they are smaller than rcond times the largest singular
value of a. Thus, setting a smaller value for rcond will make the function prone to
error for some poorly conditioned matrices such as singular matrices since it has
a determinant of 0 (A poorly conditioned matrix is when a matrix’s inverse is
poorly conditioned, meaning that little changes of the matrix may result a huge
change in the inverse). On the other hand, if we set a larger value to rcond, the
possibility of the function works properly (function computes a solution for poorly
conditioned matrices) will be higher. However, if we do this, the solution might be
less accurate. Therefore, the smaller the value for rcond, the more accurate
solution will be computed but at the same time, it may have an effect which is an
error of the function, and vice versa. The example of a really small value of
rcond is 1e-15. If we set the rcond to this number, and also set
A = np.array([[1, 1], [1, 1]]) which is a singular matrix, the possibility of error that
might be returned by the function is higher than if we set rcond to 1.

3. a. As stated in number 2, we can find the errors of the linear system by using residuals
which is object that is returned by the function numpy.linalg.lstsq. So we first calculate
the total of errors of the equation minw∥Aw − X∥^2 and minz∥Az − Y ∥^2. If we
already find the total error then we are going to compare it to our error boundary which is
ϵ. The error of each line segment is no more than the given error bound ϵ, which means

for each line segment.


The code below is how I solve the issues.

If the total error is greater than ϵ we can append the idx2 which is the index of the point
x. Here we only use x because x and y have the same length. But the residuals we still
count with respect to x and y. The first and the last index of the x or y must be appended
to seg_idx. Therefore, it can automatically break a trajectory into line segments.
The code below shows how all the line segments are connected

Here, we plot with based on x and y because x and y are the points in the GPX. By
doing this, the line segments are always connected because x and y are in the elements
of the GPX points, which is the actual points that generate the original map.

b. The time complexity of this algorithm is O(nmF(k)). This is because my code performs
a linear search through the input lists to find the segment line, and for each segment it
calls the np.linalg.lstsq function once. Thus, if the np.linalg.lstsq function has a time
complexity of O(k^2), the overall time complexity of the algorithm is O(nmk^2).

c. In my case, when I set


ϵ = 1e-4, it only needs 3 points. Thus, the compression ratio is
3/109 = 2.8%.

ϵ = 1e-5, it only needs 6 points. Thus, the compression ratio is


6/109 = 5.5%.

ϵ = 1e-6, it needs 11 points. Thus, the compression ratio is


11/109 = 10.1%.

ϵ = 1e-7, it needs 24 points. Thus, the compression ratio is


24/109 = 22%.
4. The given form of the quadratic curve is
x = a1t^2 + a2t + a3,
y = b1t 2 + b2t + b3.
Here, a1, a2, a3 and b1, b2, b3 are constants can be determined by solving the following
equations:
→ a1t1^2 + a2t1 + a3 = x1
→ a1t2^2 + a2t2 + a3 = x2
→…
→ a1tn^2 + a2tn + a3 = xn

→ b1t1^2 + b2t1 + a3 = y1
→ b1t2^2 + b2t2 + a3 = y2
→…
→ b1tn^2 + b2tn + a3 = yn
Here, t is the independent variable, and x and y are the dependent variables which are
the position coordinates.
Next step is to find the values of the coefficients that minimize the sum of the squares of
the differences between the observed values and the predicted values. Observed values
are the actual values of the dependent variables (Here, x and y) that are measured in
the data set. The predicted values are the values of the dependent variables that are
predicted by the system which is the quadratic curve that needs to fit to the data.

The goal of the least squares method is to find the values of the coefficients in the
quadratic curve based on the given problem that produces the best fit to the observed
data. This is done by minimizing the sum of the squares of the differences between the
observed values and the predicted values. The smaller the sum of the squares of the
differences, the better the fit of the model to the data.

Therefore, we are going to use numpy.polyfit to find the value a and b because this
function is able to fit a polynomial p(x) = p[0] * x**deg + ... + p[deg] of degree deg to
points (x, y). Returns a vector of coefficients p that minimizes the squared error in the
order deg, deg-1, … 0. Then, we are going to use the a and b as the inputs of the
function drawCurve. And the values that are returned by the drawCurve function, which
are x and y will be plotted. Thus, we can see the plotted graph as the figure below.
5. The total least squares method is one of regression methods that is used to fit a curve
to a set of data points by minimizing the sum of the squared errors between the points
and the curve or plane, rather than just the distance along one dimension like linear
least squares.

The algorithm for total least squares is similar to linear least squares, but it uses a
different approach to calculate the best fit line. Instead of finding the line or curve that
minimizes the sum of the squared residuals, which are the distances along one
dimension from the data points to the fitted line or plane, total least squares find the line
or plane that minimizes the sum of the squared distances from the data points to the
fitted line or plane. Below is the picture to illustrate how the total least squares method
and least squares method differs.

Total least squares method Least squares method

We can see that the total least squares method handles errors in both the x and y
dimensions. While the linear least squares method only handles errors in the y
dimension. This makes the total least squares method more accurate, especially when
dealing with data that has errors. The results of using the total least squares method to
compress trajectory data would likely be more accurate than using the linear least
squares method, since the total least squares method handles errors or both x and y
dimensions.

In terms of applications, the linear least squares method is often used when the data is
linear and the errors are small. However, the total least square method is more suitable
for data that is non-linear and has larger errors, and can be used to find the best fit line
for a set of data points in dimensions.
Sources:
https://realpython.com/k-means-clustering-python/#:~:text=Clustering%20is%20a%20set%20of,
Meaningfulness

https://numpy.org/doc/stable/reference/generated/numpy.linalg.lstsq.html#:~:text=Return%20the
%20least%2Dsquares%20solution,equation%20a%20%40%20x%20%3D%20b%20.

https://towardsdatascience.com/total-least-squares-in-comparison-with-ols-and-odr-f050ffc1a86
a#:~:text=Total%20least%20squares(aka%20TLS,often%20say%20a%20fitted%20value).

You might also like