Professional Documents
Culture Documents
Perform the following things and predict using Time series analysis (Write the code
using Python and explain every step) [4 marks]
2. Write a python code to identify the clusters by applying the k-means algorithm. Represent the
Clusters and centroids after each passthrough k-means algorithm. Data Set: Income. [3 marks]
3. Explain Decision Tree Algorithm in Machine learning using an example. Mention the dataset
link and python programming details in your answer sheet. [3 marks]
The first and last 5 rows have been printed using head() and Tail() command.
Next, Data has been visualised using matplotlib plot command.
2. Evaluate and plot the Rolling Statistics (mean and standard deviation)
3. Check stationarity of the dataset (Dickey Fuller Test, Augmented Dickey Fuller Test)
The p-value is greater than 0.05, so we cannot reject the null hypothesis. This suggests that the
time series is non-stationary.
4. Make the data stationary by (a) taking Log.
The p-value is greater than 0.05, indicating that the null hypothesis of non-stationarity cannot be rejected.
The time series is still non-stationary.
(b) Make the data stationary by subtracting Rolling Average
The p-value is less than 0.05, indicating that the null hypothesis of non-stationarity can be rejected. The
time series is now stationary.
The p-value is greater than 0.05, indicating that the null hypothesis of non-stationarity cannot be rejected.
The time series is still non-stationary.
The p-value is very small, indicating that the null hypothesis of non-stationarity can be rejected. The residual
component is now stationary.
5. Make the prediction of passengers for next 8 years using ARIMA model check the
confidence interval.
6. Check ARIMA (2,1,0), (0,1,2), (2,1,2)
The results show that the ARIMA(0,1,2) model has the lowest AIC value, which means it is the best fit for the data.
2. Write a python code to identify the clusters by applying the k-means algorithm. Represent the
Clusters and centroids after each passthrough k-means algorithm. Data Set: Income. [3 marks]
Decision tree is a popular machine learning algorithm used for both classification and regression tasks. It is a tree-like
model that makes decisions based on the input features to arrive at a prediction. Each internal node in the decision
tree corresponds to a feature, while the edges represent the decision rules based on that feature. The leaf nodes
represent the output or the predicted value.
here's an example of a dataset from Kaggle that we can use to demonstrate the decision tree algorithm:
We will use the famous Iris dataset that contains measurements for 150 iris flowers from three different species. The
dataset can be found at https://www.kaggle.com/uciml/iris
We will use Python and scikit-learn library to implement the decision tree algorithm.
STEP-4: creating the decision tree classifier and fit it on the training data:
STEP-5: now using the trained classifier to make predictions on the testing data:
STEP-6: Finally, we can evaluate the performance of the classifier using various metrics:
SAMPLE PREDICTIONS :
Karthik reddy
Signature of the Student