You are on page 1of 4

Important Question

Data Manipulation

a) Write a Python function to find the sum of all even numbers in a given list of integers.
Example:
Input: [1, 2, 3, 4, 5, 6]
Output: 12

b) Write a Python function that takes a sentence as input and returns the number of vowels (both
uppercase and lowercase) in it.
Example:
Input: "Hello, World!"
Output: 3

Car price prediction using Multivariable Regression Machine Learning model

You are given a dataset (model_selection_data.csv) with features ‘fueltype’, ‘horsepower’,


‘compressionratio’ and target variable ‘price’. Perform the following tasks:

a) Load the dataset and split it into 70% training and 30% testing sets.
b) Create a multi variable regression model
c) Determine which model performs better and report your findings.

Classification of 'iris' dataset using K-Clustering Machine Learning model

Consider the 'iris' dataset available in scikit-learn. Perform the following tasks:

a) Implement k-means clustering on the dataset with 'n_clusters' set to 3.


b) Visualize the clustered data with different colors for each cluster.
c) Calculate the silhouette score to evaluate the clustering performance.

Case Study Question: Building a Handwritten Digit Recognizer with a Feedforward Neural
Network

In this case study, you are tasked with building a handwritten digit recognizer using a feedforward neural
network. The goal is to accurately classify the digits from the MNIST dataset. Follow these steps to
complete the task:

1. Data Preparation:
o Load the MNIST dataset using TensorFlow library.
o Preprocess the data by normalizing pixel values and splitting it into training and testing sets.
2. Network Architecture:
o Design a feedforward neural network architecture for the digit recognizer.
o Specify the number of input neurons, hidden layers, activation functions, and output neurons.
3. Model Compilation:
o Compile the model by selecting an appropriate optimizer, loss function, and evaluation
metric.
4. Model Training:
o Train the neural network using the training data.
o Monitor the training process by tracking the loss and accuracy on the validation set.
5. Model Evaluation:
o Evaluate the trained model on the testing dataset.
o Calculate and report the accuracy of the model's predictions.
6. Predictions:
o Select a few images from the testing dataset.
o Use your trained model to make predictions on these images.
o Display the images along with their predicted and true labels.
7. Hyperparameter Tuning:
o Experiment with different hyperparameters, such as learning rate, number of hidden neurons,
and batch size.

Create various visualizations to analyze the sales of product by month

You are given a (sales_dataset.csv) containing information about the sales of different products over a
year. The dataset includes the following columns:

 Product: The name of the product.


 Month: The month in which the sales were recorded.
 Sales: The total sales of the product in that month.
Your task is to create various visualizations to analyze the sales data and answer the following questions:

a) Visualize the total sales of all products over the months using a line plot. Which month had the
highest sales?
b) Create a bar plot to compare the sales of different products in a specific month (you can choose any
month). Which product had the highest sales in that month?
c) Create a scatter plot to examine the relationship between the sales of two products. Is there any
correlation between their sales?
Make sure to add appropriate labels, titles, and legends to make the plots informative and visually
appealing.

Student GPA prediction using Machine Learning

You are given a dataset of student exam scores (exam_scores.csv) with features 'study_hour' and 'gpa'.
Perform the following tasks:

a) Load the dataset and visualize the relationship between 'study_hour' and 'gpa'.
b) Split the dataset into 80% training and 20% testing sets.
c) Train a linear regression model on the training data to predict the 'gpa' based on the 'study_hour'.
d) Calculate the model's performance.

Image Classification with CNN on CIFAR-10


You are tasked with building a Convolutional Neural Network (CNN) model to classify images from the
CIFAR-10 dataset. CIFAR-10 consists of 60,000 32x32 color images in 10 different classes, with 6,000
images per class. Your goal is to design and train a CNN architecture that achieves high accuracy on this
dataset.

 Data Preparation:
a. Load the CIFAR-10 dataset using suitable libraries.
b. Preprocess the data by normalizing pixel values and performing data augmentation.
 CNN Architecture:
a. Design a CNN architecture with suitable layers, such as convolutional, pooling, and fully
connected layers.
b. Experiment with different hyperparameters like kernel size, number of filters, activation
functions, etc.
 Model Training:
a. Compile the model with an appropriate loss function and optimizer.
b. Train the model on the training data and validate it using the validation data.
c. Monitor training progress and plot training/validation accuracy and loss curves.
 Evaluation and Analysis:
a. Evaluate your trained model on the test set and calculate the overall accuracy.
b. Display a confusion matrix to analyze the performance of your model on each class.

Answer the scenario-based questions according to the your understanding

Scenario 1: You are working on a machine learning project that involves predicting house prices based
on various features such as area, number of bedrooms, and location. Which machine learning algorithm
would you choose for this regression task, and why? Discuss the factors that influenced your decision.

Scenario 2: You are working on a project that requires reading data from a CSV file and performing
various operations on the data. Which Python libraries or modules would you use to efficiently handle
CSV files and process the data? Explain your choices.

Scenario 3: You are working on a project that involves classifying emails as either spam or not spam. The
dataset is imbalanced, with a significantly higher number of non-spam emails compared to spam emails.
How would you handle this class imbalance issue when training a machine learning model? Explain your
approach.

Scenario 4: You are working on a sentiment analysis project and have collected a large dataset of movie
reviews. You want to preprocess the text data by removing stop words, performing lemmatization, and
converting the text into numerical features. Which Python libraries or techniques would you use for
these preprocessing tasks? Justify your choices.

Write programs of the following questions each carry equal marks. (marks 5*4)

1. Write a Python function to compute the mean and standard deviation of a dataset X. (No
libraries allowed)
2. Implement a Python program to perform simple linear regression using the closed-form
solution. Given X and y, calculate the slope and intercept of the line.
3. Create a Python function that calculates the sigmoid function for a given input value z.
4. Implement a function in Python to calculate the Euclidean distance between two points p1 and
p2.
5. Implement a SVM classifier using scikit-learn for a dataset with two classes. Load the data, fit
the model, and predict the class of a new sample.

You might also like