Professional Documents
Culture Documents
CERTIFICATE
ROLL NO .: 237738
EXAM NO.: 237738
External Examiner
INDEX
SR. No.: Practical Name Sign
1
Practical of Principal Component Analysis
2 Practical of Clustering
Program :
import numpy as np
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
# Generate synthetic data
np.random.seed(0)
X = np.random.rand(100, 3) # 100 samples with 3 features
from sklearn.preprocessing import StandardScaler
# Standardise the data
scaler = StandardScaler()
X_std = scaler.fit_transform(X)
# Create a PCA object with the number of components you want
pca = PCA(n_components=2) # Reduce to 2 principal components
# Fit the PCA model to the standardised data
pca.fit(X_std)
# Transform the data to the first 2 principal components
X_pca = pca.transform(X_std)
explained_variance = pca.explained_variance_ratio_
print("Explained Variance Ratios:", explained_variance)
plt.figure(figsize=(8, 6))
plt.scatter(X_pca[:, 0], X_pca[:, 1])
plt.title('PCA of Data')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.show()
Explanation :
Import the necessary libraries:
numpy for numerical operations.
sklearn.decomposition.PCA for performing PCA.
matplotlib.pyplot for data visualisation.
sklearn.preprocessing.StandardScaler for standardising the data.
Generate synthetic data:
A random 3-dimensional dataset with 100 samples is created using
np.random.rand.
Standardise the data:
The data is standardised using StandardScaler to have a mean of 0 and a
standard deviation of 1 for each feature.
Program :
import numpy as np
n_samples = 300
n_features = 2
n_clusters = 3
X, _ = make_blobs(n_samples=n_samples, n_features=n_features,
centers=n_clusters,
random_state=42)
kmeans.fit(X)
labels = kmeans.predict(X)
plt.ylabel('Feature 2')
plt.legend()
plt.title('Clustering Results')
plt.show()
Output :
Explanation :
Program :
import numpy as np
import pandas as pd
np.random.seed(0)
n = 100
values = np.cumsum(np.random.randn(n))
plt.figure(figsize=(10, 6))
plt.plot(time_series)
plt.xlabel('Date')
plt.ylabel('Value')
plt.show()
train_data = time_series
p, d, q = 1, 1, 1
results = model.fit()
forecast_periods = 10
forecast_values = results.forecast(steps=forecast_periods)
plt.figure(figsize=(10, 6))
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.show()
Output :
Program :
import numpy as np
np.random.seed(0)
X = np.random.rand(100, 1)
y = 2 * X + 1 + np.random.randn(100, 1)
model = LinearRegression()
model.fit(X, y)
y_pred = model.predict(X)
plt.scatter(X, y, label='Data')
plt.legend()
plt.show()
Explanation :
Random data is generated for the independent variable (X) and dependent
variable (y). The relationship between X and y is linear with some random
noise. The np.random.randn function is used to add random noise to the linear
relationship.
The model.fit(X, y) method is used to fit the linear regression model to the data.
This process estimates the coefficients of the linear relationship between X and
y.
Make predictions:
The model is used to make predictions on the same independent variable (X),
and the predicted values are stored in the y_pred variable using the
model.predict(X) method.
A scatter plot is created to visualize the original data points. The data points are
represented by blue dots using plt.scatter.
A red line is plotted to represent the linear regression model's fit to the data. The
y_pred values are plotted against the original X values. This line is drawn using
plt.plot with a specified color and linewidth.
X and y-axis labels are set using plt.xlabel and plt.ylabel. A legend is added to
distinguish between the data and the linear regression line using plt.legend.
Program :
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
x = np.linspace(-6, 6, 100)
y = sigmoid(x)
data_y = np.array([0.05, 0.1, 0.15, 0.4, 0.6, 0.7, 0.85, 0.9, 0.95])
plt.figure(figsize=(8, 6))
plt.xlabel('X')
plt.ylabel('Sigmoid(X)')
plt.legend()
plt.grid('True')
Yash Vijay Panwal
Roll No.: 237738
Batch : A2
plt.show()
Output :
Explanation :
import numpy as np: This imports the NumPy library and gives it the alias np to
make it easier to use in the code.
The sigmoid function takes a value x as input and returns the sigmoid function's
output, which is calculated as 1 / (1 + np.exp(-x)). This is the formula for the
sigmoid function.
y = sigmoid(x): This line computes the corresponding y-values for the sigmoid
curve by applying the sigmoid function to each value in the x array.
data_x and data_y represent some data points that you want to plot on the same
graph.
plt.figure(figsize=(8, 6)): This line creates a figure for the plot with a specified
size of 8x6 inches.
plt.xlabel('X') and plt.ylabel('Sigmoid(X)'): These set the labels for the x-axis
and y-axis, respectively.
plt.title('S-shaped Curve with Data Points'): This sets the title for the plot.
plt.legend(): This displays the legend on the plot to label the sigmoid curve and
data points.
Program :
import numpy as np
np.random.seed(0)
alpha = 0.05
print(f'T-Statistic: {t_statistic}')
print(f'P-Value: {p_value}')
else:
Explanation :
Importing libraries:
import numpy as np: This imports the NumPy library and gives it the alias np to
make it easier to use in the code.
The sigmoid function takes a value x as input and returns the sigmoid function's
output, which is calculated as 1 / (1 + np.exp(-x)). This is the formula for the
sigmoid function.
y = sigmoid(x): This line computes the corresponding y-values for the sigmoid
curve by applying the sigmoid function to each value in the x array.
data_x and data_y represent some data points that you want to plot on the same
graph.
plt.xlabel('X') and plt.ylabel('Sigmoid(X)'): These set the labels for the x-axis
and y-axis, respectively.
plt.title('S-shaped Curve with Data Points'): This sets the title for the plot.
plt.legend(): This displays the legend on the plot to label the sigmoid curve and
data points.
Program :
import numpy as np
print("F-statistic:", f_statistic)
print("p-value:", p_value)
else:
Explanation :
1. Importing libraries:
- `import numpy as np`: Imports the NumPy library with the alias `np` for
numerical operations.
- `from scipy import stats`: Imports the `stats` module from the SciPy library,
which contains statistical functions and tests, including ANOVA.
- `group_A`, `group_B`, and `group_C` are Python lists representing the data
for three groups. Each group contains five sample values.
- The F-statistic is a test statistic that measures the variance between group
means relative to the variance within the groups.
- `print("p-value:", p_value)`: This line prints the p-value obtained from the
ANOVA test.
- `if p_value < alpha:`: This condition checks whether the p-value is less than
the significance level. If it is, you reject the null hypothesis.
- If the condition is not met, the code prints, "Fail to reject the null hypothesis:
There is no significant difference among group means."
Program :
data = load_iris()
X = data.data
y = data.target
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
plt.figure(figsize=(12, 8))
tree.plot_tree(clf, filled=True, feature_names=data.feature_names,
class_names=data.target_names)
plt.show ()
Explanation :
Yash Vijay Panwal
Roll No.: 237738
Batch : A2
1.Importing libraries :
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
from sklearn import tree
load_iris: This function is used to load the Iris dataset, a well-known dataset in
machine learning.
DecisionTreeClassifier: This class is used to create a Decision Tree Classifier
model.
train_test_split: This function is used to split the dataset into training and testing
sets.
accuracy_score: This function is used to calculate the accuracy of the model.
matplotlib.pyplot: This library is used for data visualization.
tree: This module is used for visualizing the decision tree.