0% found this document useful (0 votes)

34 views22 pages

ml2 Paper

The document is a term paper report submitted by Sujeet Kumar Behera for the Bachelor of Technology degree in Computer Science Engineering at Lovely Professional University. It focuses on the development of a personal finance machine learning model, detailing the objectives, methodology, and expected outcomes of the project. The report includes sections on theoretical background, hardware and software requirements, and a structured approach to implementing the machine learning model for financial management.

Uploaded by

Sujeet Kumar Behera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views22 pages

ml2 Paper

Uploaded by

Sujeet Kumar Behera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Annexure-I

A Term Paper Report

Submitted in partial fulfilment of the requirements for the award of degree of
Bachelor of Technology
(Computer Science Engineering)
Submitted to

LOVELY PROFESSIONAL UNIVERSITY

PHAGWARA ,PUNJAB
From 1st August, 2024 to October 25, 2024
SUBMITTED BY
Name of student :Sujeet Kumar Behera
Registration number:12105351
Faculty:Sajjad Manzoor Mir

1
Annexure-II: Student Declaration

To whom so ever it may concern

I, Sujeet Kumar Behera, 12105351, hereby declare that the work done by me on " Personal
finances Ml model" from August 1, 2024 to October 25, 2024, is a record of original work for
the partial fulfilment of the requirements for the award of the degree, Bachelor of Technology.
Name of the student: Sujeet Kumar Behera

Registration Number: 12105351

Dated: 22 October 2024

ACKNOWLEDGEMENT

Primarily I would like to thank God for being able to learn a new technology. Then I would
like to express my special thanks of gratitude to the teacher and instructor of the course
Machine Learning who provided me the golden opportunity to learn a new technology.
I would like to also thank my own college Lovely Professional University for offering such a
course which not only improve my programming skill but also taught me other new
technology.
Then I would like to thank my parents and friends who have helped me with their valuable
suggestions and guidance for choosing this course.
Finally, I would like to thank everyone who have helped me a lot.

2
SUPERVISOR’S CERTIFICATE
This is to certify that the work reported in the B. Tech Dissertation/dissertation proposal
entitled “Personal finances ML model”, submitted by Sujeet kumar Behera at Lovely
Professional University, Phagwara, India is a bonafide record of his original work carried out
under my supervision. This work has not been submitted elsewhere for any other degree.

Signature of Supervisor
Sajjad Manzoor Mir

3
[Link] Contents Page

1 TITLE 1
2 STUDENT DECLARATION 2

3 ACKNOWLEGDEMENT 2
4 TABLE OF CONTENT 3
5 ABSTRACT 4

6 OBJECTIVE 4
7 INTRODUCTION 5

8 THEORETICAL BACKGROUND 7

9 HARDWARE & SOFTWARE 9

10 METHODOLOGY 9

11 RESULTS 21

12 SUMMARY 21

13 COLCLUSION 22

14 BIBLIOGRAPHY 23

4
Abstract
Personal finances represent the individual or familial funds that one autonomously
oversees. Mastery in managing personal finances necessitates specialized training.
This project aims to devise a personal finance simulator and establish a machine
learning-based system to discern optimal financial strategies for individuals.
Personal finance is a multifaceted domain that encompasses the management of
individual or familial financial resources. In today's complex economic landscape,
effective management of personal finances is paramount for achieving financial
stability, security, and long-term prosperity. This abstract explores the fundamental
principles and challenges of personal finance, including budgeting, saving, investing,
debt management, and risk mitigation. It delves into the importance of financial
literacy and education in empowering individuals to make informed decisions about
their money. Furthermore, it discusses emerging technologies and tools, such as
mobile apps and online platforms, that facilitate financial management and planning.

Objectives of model
In the future, personal finance is expected to undergo significant transformations
driven by emerging trends and evolving consumer demands. One prominent trend is
the continued digitalization of financial services, with a growing integration of
technology and fintech solutions. This integration is anticipated to enhance
accessibility, automation, and personalization of financial products and services.
Additionally, there is a rising demand for holistic financial wellness programs,
emphasizing education, budgeting, saving, and retirement planning, offered by both
employers and financial institutions. Another notable trend is the increasing interest in
impact investing and consideration of environmental, social, and governance (ESG)
criteria in investment decisions. Personalized financial advice is also expected to
become more prevalent, facilitated by advanced data analytics, machine learning,
and artificial intelligence technologies.

5
Introduction (Personal finances )
As financial matters increasingly pervade our lives, it's crucial to stay abreast of
current economic trends and market dynamics. Equally important is the mastery of
personal finance management, an essential skill set for individuals and families alike.
Personal finances encompass one's own capital, managed autonomously,
necessitating dedicated training in effective financial management techniques.
incorporating the applicable criteria that follow.
Various scientific and applied sources worldwide extensively discuss a wide range
of approaches to personal finance management and planning. Typically, this issue is
analyzed through the lens of effectively managing and ensuring the safety of personal
fund liquidity.

Theoretical Background
What is Machine Learning?
Machine learning is a subfield of artificial intelligence (AI) that uses algorithms trained on
data sets to create self-learning models that are capable of predicting outcomes and
classifying information without human intervention. Machine learning is used today for a wide
range of commercial purposes, including suggesting products to consumers based on their
past purchases, predicting stock market fluctuations, and translating text from one language
to another. In common usage, the terms “machine learning” and “artificial intelligence” are
often used interchangeably with one another due to the prevalence of machine learning for
AI purposes in the world today. But, the two terms are meaningfully distinct. While AI refers
to the general attempt to create machines capable of human-like cognitive abilities, machine
learning specifically refers to the use of algorithms and data sets to do so.
1)Supervised Learning Models:

Supervised learning is a type of machine learning where the model learns a mapping between input
features and output labels based on labeled training data. In supervised learning, the algorithm learns
from a dataset that contains input-output pairs, where the inputs are the features or attributes, and the
outputs are the corresponding labels or target variables. The goal is to learn a mapping function from
the input variables to the output variable.

Regression Models: Predicts continuous values based on input features. Examples include Linear
Regression, Polynomial Regression, and Support Vector Regression.

Classification Models: Predicts class labels or discrete outcomes. Examples include Logistic
Regression, Decision Trees, Random Forests, and Support Vector Machines.

2) Unsupervised Learning Models:

Clustering Models: Groups similar data points together based on some similarity metric. Examples
include K-Means Clustering.

6
3)Semi-Supervised Learning Models:

Combines Supervised and Unsupervised Learning: Uses both labeled and unlabeled data for training.
Examples include Self-training and Co-training algorithms.

MODELS USED IN PROJECTS:

Linear Regression: A simple regression model that models the relationship between
a dependent variable and one or more independent variables by fitting a linear
equation to observed data.
Logistic Regression: A classification model used to model the probability of a
binary outcome based on one or more independent variables. Despite its name,
logistic regression is a linear model used for classification tasks.
Decision Trees: A non-linear model that makes decisions based on a set of rules
learned from the data. Decision trees partition the feature space into regions, with
each region representing a decision.
Random Forests: An ensemble learning method that builds multiple decision trees
during training and combines their predictions through averaging or voting to improve
predictive performance and reduce overfitting.
Support Vector Machines (SVM): A supervised learning algorithm used for
classification and regression tasks. SVMs find the hyperplane that best separates
the data points into different classes or predicts continuous values.

What is KNN?
The K-Nearest Neighbors (KNN) algorithm is a supervised machine learning method
employed to tackle classification and regression problems. Evelyn Fix and Joseph
Hodges developed this algorithm in 1951, which was subsequently expanded by
Thomas Cover. The article explores the fundamentals, workings, and implementation
of the KNN algorithm.
What is the K-Nearest Neighbors Algorithm?
KNN is one of the most basic yet essential classification algorithms in machine
learning. It belongs to the supervised learning domain and finds intense application
in
pattern recognition, data mining, and intrusion detection.
It is widely disposable in real-life scenarios since it is non-parametric, meaning it
does
not make any underlying assumptions about the distribution of data (as opposed to
other algorithms such as GMM, which assume a Gaussian distribution of the given
data). We are given some prior data (also called training data), which classifies
coordinates into groups identified by an attribute.

7
HARDWARE & SOFTWARE REQUIREMENTS
HARDWARE REQUIREMENTS

GPU (Graphics Processing Unit):

Deep learning models, especially large ones like convolutional neural networks (CNNs) or recurrent
neural networks (RNNs), benefit significantly from GPU acceleration.

NVIDIA GPUs are the most commonly used for deep learning due to their robust support for
frameworks
like TensorFlow and PyTorch.

The choice of GPU depends on your budget and requirements. High-end GPUs like NVIDIA GeForce
RTX
series or NVIDIA Tesla series are popular choices for deep learning workstations and servers.

CPU (Central Processing Unit):

Although not as crucial as GPUs for deep learning, CPUs are still important for tasks like data
preprocessing, model deployment, and handling non-GPU accelerated operations.
Multi-core CPUs with high clock speeds are preferable to speed up data processing tasks.

SOFTWARE REQUIREMENTS
Python:
Most deep learning frameworks are Python-based, so a working Python installation (preferably the
latest
version) is necessary.

Package management tools like pip or conda are useful for installing and managing Python packages
and
dependencies.

Development Environment:
Set up a development environment with integrated development environments (IDEs) like PyCharm,
Visual Studio Code, or Jupyter Notebooks for coding and experimentation.
Containerization tools like Docker can help manage project dependencies and ensure reproducibility
across different environments

METHODOLOGY
Importing libraries
1)import pandas as pd -pandas is a fast, powerful, flexible and easy to use open source
data analysis and manipulation tool, built on top of the Python programming language.
2)import numpy - Fast and versatile, the NumPy vectorization, indexing, and broadcasting
concepts are the de-facto standards of array computing today.

8
3)import ploty -Plotly's Python graphing library makes interactive, publication-quality
graphs. Examples of how to make line plots, scatter plots, area charts, bar charts, error bars,
box plots, histograms, heatmaps, subplots, multiple-axes, polar charts, and bubble charts.
4)import matplotlib - Matplotlib is a comprehensive library for creating static, animated, and
interactive visualizations in Python. Matplotlib makes easy things easy and hard things
possible.
5)import warnings - Warning messages are typically issued in situations where it is useful
to alert the user of some condition in a program, where that condition (normally) doesn’t
warrant raising an exception and terminating the program.
6) from sklearn.model_selection import train_test_split - This function allows you to
easily split your dataset into training and testing sets, which is crucial for evaluating the
performance of your machine learning models.
7) from sklearn.linear_model import LinearRegression- Linear regression is a simple yet
powerful technique used for predicting a continuous target variable based on one or more
predictor variables.
8) from [Link] import mean_squared_error, r2_score-Imports two important
functions, mean_squared_error and r2_score, from the [Link] library in Python.
These functions are commonly used for evaluating the performance of regression models.
9) from sklearn.linear_model import LogisticRegression-Imports the LogisticRegression
class from the sklearn.linear_model library in Python. This class implements the Logistic
Regression algorithm, which is a popular machine learning method used for classification
problems.
10) from [Link] import accuracy_score, classification_report,
confusion_matrix- Imports three beneficial functions for evaluating the performance of
classification models in Python, all from the [Link] library. These functions are
instrumental in assessing how well your classification model is performing.
accuracy_score: This function calculates the overall accuracy of your classification model.
It determines the proportion of predictions that your model got correct. It's calculated by
dividing the number of correctly classified samples by the total number of samples. A higher
accuracy score indicates better model performance.

classification_report: This function provides a more comprehensive assessment of your

model's performance for each class. It presents metrics like precision, recall, F1-score, and
support for each class label.

11) from [Link] import DecisionTreeRegressor- Imports the DecisionTreeRegressor

class from the [Link] library in Python. This class is used to create decision tree
regression models, which are a type of machine learning model well-suited for predicting
continuous target variables.

9
12) from [Link] import RandomForestRegressor-
Imports the RandomForestRegressor class from the [Link] library in Python. This
class is used to create random forest regression models, which are a powerful ensemble
machine learning technique for regression tasks.
13) from [Link] import SVR- Imports the SVR class from the [Link] library in
Python. This class is used for implementing Support Vector Regression (SVR), which is a
powerful technique for regression problems from the world of Support Vector Machines
(SVMs).

APPROACH
Phase 1: Data Collection and Preprocessing (Month 1):

Gather financial data including bank transactions, investment portfolios, income sources,
expenses, and credit card statements from users through secure APIs or data integrations.
Preprocess the financial data to handle missing values, categorize transactions, identify
recurring expenses, and aggregate data into meaningful features for analysis.
Collect external data sources such as economic indicators, market trends, and financial
news to provide contextual information for financial decision-making.

Phase 2: Machine Learning Model Development (Month 2):

Develop ML models to analyze financial data and provide personalized recommendations for
budgeting, saving, investing, and debt management.
Implement NLP algorithms to understand user queries and provide relevant responses,
allowing users to interact with the assistant through natural language interfaces (e.g.,
chatbots, voice assistants).
Train the ML models using historical financial data and user interactions, optimizing model
parameters to maximize accuracy and relevance of recommendations

Phase 3: System Integration and Evaluation (Month 3):

Integrate the ML models into the intelligent personal finance assistant platform, allowing
seamless interaction with users across various devices and channels.
Develop a user-friendly interface or mobile application for users to access their financial data,
receive personalized recommendations, and track progress towards their financial goals.
Evaluate the performance of the personal finance assistant through user testing and
validation studies, assessing metrics such as accuracy of recommendations, user
satisfaction ratings, and adherence to financial goals.

10
Deploy the personal finance assistant in real-world settings, collaborating with financial
institutions, fintech companies, and consumer platforms to promote adoption and integration
into daily financial routines.

4. Expected Outcomes:
Development of an intelligent personal finance assistant leveraging ML techniques to
provide personalized financial guidance and automate routine tasks.
Empowerment of individuals to make informed financial decisions, achieve their financial
goals, and improve their financial well-being.
Potential applications in financial planning, wealth management, and consumer banking to
enhance customer engagement and satisfaction.

5. Resources Required:
Financial data sources including bank APIs, investment platforms, and third-party data
providers.
Computational resources for model training and evaluation (e.g., cloud-based servers).
Collaboration with financial experts, data scientists, and software engineers for system
development and validation.

6. Timeline:
Month 1: Data collection, preprocessing, and exploration.
Month 2: ML model development, and training.
Month 3: System integration, evaluation, and deployment.

11
Exploratory Data Analysis

import numpy as np
import pandas as pd
import os
for dirname, _, filenames in [Link]('/kaggle/input'):
for filename in filenames:
print([Link](dirname, filename))
import seaborn as sns
import [Link] as plt
df = pd.read_csv("/content/[Link]")

1) # Let's visualize the distribution of Total Household Income

[Link](figsize=(10, 6))
[Link](df['Total Household Income'], bins=30, kde=True)
[Link]('Distribution of Total Household Income')
[Link]('Total Household Income')
[Link]('Frequency')
[Link]()

12
2) # Visualizing the relationship between Total Food Expenditure and Total Household
Income
[Link](figsize=(10, 6))
[Link](x='Total Household Income', y='Total Food Expenditure', data=df)
[Link]('Total Food Expenditure vs Total Household Income')
[Link]('Total Household Income')
[Link]('Total Food Expenditure')
[Link]()

13
3) Distribution of Household Head Age
[Link](figsize=(10, 6))
[Link](df['Household Head Age'], bins=20, kde=True)
[Link]('Distribution of Household Head Age')
[Link]('Household Head Age')
[Link]('Frequency')
[Link]()

14
MACHINE LEARNING
1)Feature selections

import pandas as pd

from sklearn.model_selection import train_test_split

from [Link] import DecisionTreeClassifier

X = df[['Total Food Expenditure', 'Bread and Cereals Expenditure','Total Rice Expenditure', 'Meat Expenditure' 'Total Fish and
marine products Expenditure','Fruit Expenditure', 'Vegetables Expenditure''Restaurant and hotels Expenditure', 'Alcoholic
Beverages Expenditure', 'Tobacco Expenditure', 'Clothing, Footwear and Other Wear Expenditure', 'Housing and water
Expenditure', 'Education Expenditure','Household Head Age', 'Total Number of Family members', 'Members with age less than
5 year old','Members with age 5 - 17 years old', 'House Floor Area', 'Number of bedrooms', 'Number of Television', 'Number of
Cellular phone']]

y = df['Type of Household'] # Assuming 'Type of Household' is the target variable

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the Decision Tree Classifier

dt_classifier = DecisionTreeClassifier(random_state=42)

# Fit the model on the training data

dt_classifier.fit(X_train, y_train)

# Get feature importances

feature_importances = dt_classifier.feature_importances_

# Create a DataFrame to display feature importances

feature_importance_df = [Link]({'Feature': [Link], 'Importance': feature_importances})

feature_importance_df.sort_values(by='Importance', ascending=False, inplace=True)

# Display the top N features by importance

top_n = 10 # Number of top features to display

print("Top", top_n, "features by importance:")

print(feature_importance_df.head(top_n))

Top 10 features by importance:

Feature Importance

13 Household Head Age 0.145237

14 Total Number of Family members 0.118960

16 Members with age 5 - 17 years old 0.096319

15 Members with age less than 5 year old 0.078877

11 Housing and water Expenditure 0.044962

5 Fruit Expenditure 0.043983

10 Clothing, Footwear and Other Wear Expenditure 0.043794

12 Education Expenditure 0.042492

6 Vegetables Expenditure 0.041309

4 Total Fish and marine products Expenditure 0.03922

15
2)LINEAR REGRESSION
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from [Link] import mean_squared_error, r2_score

features = df[['Total Food Expenditure', 'Bread and Cereals Expenditure',

'Total Rice Expenditure', 'Meat Expenditure',
'Total Fish and marine products Expenditure',
'Fruit Expenditure', 'Vegetables Expenditure',
'Restaurant and hotels Expenditure',
'Alcoholic Beverages Expenditure', 'Tobacco Expenditure',
'Clothing, Footwear and Other Wear Expenditure',
'Housing and water Expenditure', 'Imputed House Rental Value',
'Medical Care Expenditure', 'Transportation Expenditure',
'Communication Expenditure', 'Education Expenditure',
'Miscellaneous Goods and Services Expenditure',
'Special Occasions Expenditure', 'Crop Farming and Gardening expenses',
'Total Income from Entrepreneurial Acitivites', 'Household Head Age',
'Total Number of Family members', 'Members with age less than 5 year old',
'Members with age 5 - 17 years old', 'Total number of family members employed',
'House Floor Area', 'House Age', 'Number of bedrooms',
'Number of Television', 'Number of CD/VCD/DVD',
'Number of Component/Stereo set', 'Number of Refrigerator/Freezer',
'Number of Washing Machine', 'Number of Airconditioner',
'Number of Car, Jeep, Van', 'Number of Landline/wireless telephones',
'Number of Cellular phone', 'Number of Personal Computer',
'Number of Stove with Oven/Gas Range', 'Number of Motorized Banca',
'Number of Motorcycle/Tricycle']]

target = df['Total Household Income']

X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=42)

model = LinearRegression()
[Link](X_train, y_train)
y_pred = [Link](X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Mean Squared Error:", mse)
print("R^2 Score:", r2)
print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)

Mean Squared Error: 9314005171.223476

R^2 Score: 0.871480868737331
Coefficients: [ 5.32581252e-01 -5.70678620e-02 -1.72753381e-01 1.22236995e-01
8.08661461e-01 3.74836547e-01 -1.03792001e+00 1.92292019e-01
5.52998249e-02 -3.39374536e-02 2.50634826e+00 6.63569309e-01
5.76781809e-01 7.38353636e-01 1.19903438e+00 4.06569099e+00
8.63597659e-01 2.78391521e+00 1.16252278e+00 4.63944657e-02
6.86203432e-01 5.53156546e+02 -5.42222318e+03 9.36880439e+03
1.28265028e+03 2.70172213e+04 3.78973689e+01 -4.78977218e+01
4.43899140e+03 1.48705584e+03 -9.83015970e+02 -3.33790241e+03
3.02957799e+03 -1.48191745e+03 2.08026636e+04 2.04822929e+04
-1.50885409e+04 7.67770979e+02 9.93547413e+03 1.32974185e+04
-5.00932214e+03 1.68024307e+03]
Intercept: -36612.75464024517

16
3) Logistic regressions
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from [Link] import accuracy_score, classification_report,confusion_matrix
features = df[['Total Food Expenditure', 'Bread and Cereals Expenditure',
'Total Rice Expenditure', 'Meat Expenditure',
'Total Fish and marine products Expenditure',
'Fruit Expenditure', 'Vegetables Expenditure',
'Restaurant and hotels Expenditure',
'Alcoholic Beverages Expenditure', 'Tobacco Expenditure',
'Clothing, Footwear and Other Wear Expenditure',
'Housing and water Expenditure', 'Imputed House Rental Value',
'Medical Care Expenditure', 'Transportation Expenditure',
'Communication Expenditure', 'Education Expenditure',
'Miscellaneous Goods and Services Expenditure',
'Special Occasions Expenditure', 'Crop Farming and Gardening expenses',
'Total Income from Entrepreneurial Acitivites', 'Household Head Age',
'Total Number of Family members', 'Members with age less than 5 year old',
'Members with age 5 - 17 years old', 'Total number of family members employed',
'House Floor Area', 'House Age', 'Number of bedrooms',
'Number of Television', 'Number of CD/VCD/DVD',
'Number of Component/Stereo set', 'Number of Refrigerator/Freezer',
'Number of Washing Machine', 'Number of Airconditioner',
'Number of Car, Jeep, Van', 'Number of Landline/wireless telephones',
'Number of Cellular phone', 'Number of Personal Computer',
'Number of Stove with Oven/Gas Range', 'Number of Motorized Banca',
'Number of Motorcycle/Tricycle'
]]
target = df['Total Household Income']
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=42)

# Initialize the Logistic Regression model

model = LogisticRegression()

# Fit the model on the training data

[Link](X_train, y_train)

# Predict on the testing data

y_pred = [Link](X_test)

# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Additional evaluation metrics

print("Classification Report:")
print(classification_report(y_test, y_pred))

print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))

17
4)Decision trees
import pandas as pd

from sklearn.model_selection import train_test_split

from [Link] import DecisionTreeRegressor

from [Link] import mean_squared_error, r2_score

y = df['Type of Household'] # Assuming 'Type of Household' is the target variable

X_train, X_test, y_train, y_test = train_test_split(features, target,␣

↪ test_size=0.2, random_state=42)

model = DecisionTreeRegressor(random_state=42)

[Link](X_train, y_train)

y_pred = [Link](X_test)

# Evaluate the model

mse = mean_squared_error(y_test, y_pred)

r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)

print("R^2 Score:", r2)

Mean Squared Error: 18294548455.50716

R^2 Score: 0.7475630052677185

18
5)Random forests
import pandas as pd
from sklearn.model_selection import train_test_split
from [Link] import RandomForestRegressor
from [Link] import mean_squared_error, r2_score
X = df[['Total Food Expenditure', 'Bread and Cereals Expenditure','Total Rice Expenditure',
'Meat Expenditure' 'Total Fish and marine products Expenditure','Fruit Expenditure',
'Vegetables Expenditure''Restaurant and hotels Expenditure', 'Alcoholic Beverages
Expenditure', 'Tobacco Expenditure', 'Clothing, Footwear and Other Wear Expenditure',
'Housing and water Expenditure', 'Education Expenditure','Household Head Age', 'Total
Number of Family members', 'Members with age less than 5 year old','Members with age 5 -
17 years old', 'House Floor Area', 'Number of bedrooms', 'Number of Television', 'Number of
Cellular phone']]
y = df['Type of Household'] # Assuming 'Type of Household' is the target variable
X_train, X_test, y_train, y_test = train_test_split(features, target,␣
↪ test_size=0.2, random_state=42)
# Initialize the Random Forest Regressor
model = RandomForestRegressor(random_state=42)
# Fit the model to the training data
[Link](X_train, y_train)
# Make predictions on the testing data
y_pred = [Link](X_test)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Mean Squared Error:", mse)
print("R^2 Score:", r2)

Mean Squared Error: 9024413126.92843

R^2 Score: 0.8754767993030954

19
6)K -means clustering
import pandas as pd
from [Link] import KMeans
import [Link] as plt

# Assuming 'df' is your DataFrame containing features

X = df[['Total Food Expenditure', 'Bread and Cereals Expenditure','Total Rice Expenditure', 'Meat Expenditure' 'Total
Fish and marine products Expenditure','Fruit Expenditure', 'Vegetables Expenditure''Restaurant and hotels
Expenditure', 'Alcoholic Beverages Expenditure', 'Tobacco Expenditure', 'Clothing, Footwear and Other Wear
Expenditure', 'Housing and water Expenditure', 'Education Expenditure','Household Head Age', 'Total Number of
Family members', 'Members with age less than 5 year old','Members with age 5 - 17 years old', 'House Floor Area',
'Number of bedrooms', 'Number of Television', 'Number of Cellular phone']]

# Perform k-means clustering

kmeans = KMeans(n_clusters=3, random_state=42) # You can choose the number of clusters based on your data
cluster_labels = kmeans.fit_predict(features)

# Add cluster labels to the DataFrame

df['Cluster'] = cluster_labels

# Visualize the clusters (example for 2D data)

[Link](df['Total Food Expenditure'], df['Total Household Income'], c=df['Cluster'], cmap='viridis')
[Link]('Total Food Expenditure')
[Link]('Total Household Income')
[Link]('K-means Clustering')
[Link](label='Cluster')
[Link]()

from [Link] import silhouette_score

# Assuming 'df' is your DataFrame containing features and cluster labels

silhouette_avg = silhouette_score(features, cluster_labels)
print("Silhouette Score:", silhouette_avg)

Silhouette Score: 0.7608313629296152

20
RESULTS
Models ACCURACY
Linear regressions 87%
Logistic regressions 75%
Decision trees 74%
Random forests 87%
k-means clustering 76%

Summary
The benefits of ML over traditional methods as illustrated above
together with the existing but still limited number of ML applications
in finance suggest a still mostly untapped potential for future
research. However, it is unclear whether the usage of ML methods
will actually gain broad popularity in the finance community.
Furthermore, prospective users of ML need to know whether ML
applications can also reach the most prestigious journals of the
profession or if they tend to be published only in specialty journals.
Finally, the different application categories of ML described by our
taxonomy and the wide variety of research fields in finance make it
difficult to pinpoint exactly where the most promising applications
of ML in finance research lie. In this section, we give indicative
answers to these questions by systematically analysing the
existing finance literature that already uses ML methods. In
particular, we investigate the publication success of such papers
and how it differs by research field and application type. Our
results may not only indicate the future prospects of ML in finance
but also show where and how researchers can apply ML to
maximise its future potential.

21
Conclusion
ML applications in finance by analysing the ML papers published
in major finance journals. Over the last few years, there has been
a strong growth in the number of ML applications in finance, and
many of these applications reached the highest-ranked journals of
the profession. Our results suggest that ML may become even
more widespread in finance research in the coming years. They
also indicate a particularly large potential of applying ML to
unconventional data to construct superior and novel measures of
topics related to the field of corporate finance and governance.
The fields of behavioural and household finance may also offer a
mostly untapped potential for ML in future research.

BIBLIOGRAPHY

Kaggle
Geeksforgeeks
Upgrad

Overview of Machine Learning Concepts
No ratings yet
Overview of Machine Learning Concepts
101 pages
Understanding Machine Learning Concepts
No ratings yet
Understanding Machine Learning Concepts
16 pages
Training Report On Machine Learning
No ratings yet
Training Report On Machine Learning
32 pages
Machine Learning 101 Course Overview
No ratings yet
Machine Learning 101 Course Overview
39 pages
Human vs Machine Learning Explained
No ratings yet
Human vs Machine Learning Explained
34 pages
Overview of Machine Learning Concepts
100% (1)
Overview of Machine Learning Concepts
4 pages
Machine Learning for Loan Approval Prediction
No ratings yet
Machine Learning for Loan Approval Prediction
71 pages
Types of Machine Learning Explained
No ratings yet
Types of Machine Learning Explained
37 pages
ML Chatgpt
No ratings yet
ML Chatgpt
6 pages
Sat - 34.Pdf - A Systematic Approach Towards Description and Classification of Crime Incidents
No ratings yet
Sat - 34.Pdf - A Systematic Approach Towards Description and Classification of Crime Incidents
11 pages
ML Unit 1
No ratings yet
ML Unit 1
21 pages
Machine Learning.
No ratings yet
Machine Learning.
50 pages
Machine Learning For Data Science Unit-4
No ratings yet
Machine Learning For Data Science Unit-4
16 pages
ML Unit 2
No ratings yet
ML Unit 2
23 pages
Machine Learning
100% (2)
Machine Learning
31 pages
Machine Learning Internship Report
No ratings yet
Machine Learning Internship Report
27 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
19 pages
Overview of Machine Learning Types
No ratings yet
Overview of Machine Learning Types
25 pages
Introduction to Machine Learning Basics
No ratings yet
Introduction to Machine Learning Basics
11 pages
Machine Learning Is A Branch of Artificial Intelligence (AI)
No ratings yet
Machine Learning Is A Branch of Artificial Intelligence (AI)
80 pages
Prakhar Report
No ratings yet
Prakhar Report
17 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
15 pages
Module 1 (ML)
No ratings yet
Module 1 (ML)
17 pages
Machine Learning in Attendance Management
No ratings yet
Machine Learning in Attendance Management
51 pages
IJNRD2407347
No ratings yet
IJNRD2407347
5 pages
SupportcoursesM DLearning
No ratings yet
SupportcoursesM DLearning
118 pages
Unit-1 ML (Reference Guide For Students)
No ratings yet
Unit-1 ML (Reference Guide For Students)
22 pages
Book of 843 - AI - Student - HandbookXI-104-127
No ratings yet
Book of 843 - AI - Student - HandbookXI-104-127
24 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Intro To ML
No ratings yet
Intro To ML
4 pages
Machine Learning Fundamentals Course
No ratings yet
Machine Learning Fundamentals Course
18 pages
Aiml Report
No ratings yet
Aiml Report
70 pages
Loan Eligibility Prediction with ML
No ratings yet
Loan Eligibility Prediction with ML
28 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
7 pages
Machine Learning Basics: An Illustrated Guide For Non-Technical Readers
100% (5)
Machine Learning Basics: An Illustrated Guide For Non-Technical Readers
27 pages
What Is Machine Learning?
No ratings yet
What Is Machine Learning?
6 pages
Machine Learning Basics: An Illustrated Guide For Non-Technical Readers
No ratings yet
Machine Learning Basics: An Illustrated Guide For Non-Technical Readers
16 pages
Deep Learning for Stock Price Prediction
No ratings yet
Deep Learning for Stock Price Prediction
37 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
333 pages
Internship Report On Machine Learing
No ratings yet
Internship Report On Machine Learing
30 pages
Deep Learning Exam Questions Guide
No ratings yet
Deep Learning Exam Questions Guide
19 pages
Machine Learning vs Artificial Intelligence
No ratings yet
Machine Learning vs Artificial Intelligence
9 pages
Machine Learning, History and Types of ML
No ratings yet
Machine Learning, History and Types of ML
18 pages
Machine Learning Basics: An Illustrated Guide For Non-Technical Readers
50% (2)
Machine Learning Basics: An Illustrated Guide For Non-Technical Readers
27 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
54 pages
Machine Learning Course Overview and Concepts
No ratings yet
Machine Learning Course Overview and Concepts
225 pages
Machine Learning Life Cycle
No ratings yet
Machine Learning Life Cycle
25 pages
AI and ML Training Report 2024
No ratings yet
AI and ML Training Report 2024
70 pages
Introduction To Data Science Module 3
No ratings yet
Introduction To Data Science Module 3
24 pages
Machine Learning Trends and Applications
No ratings yet
Machine Learning Trends and Applications
19 pages
SK Sahidur Rahaman Bba504a 2024
No ratings yet
SK Sahidur Rahaman Bba504a 2024
9 pages
Summer Internship in AI & Machine Learning
No ratings yet
Summer Internship in AI & Machine Learning
27 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
5 pages
AI UNIT - 4 Notes
No ratings yet
AI UNIT - 4 Notes
9 pages
Project
No ratings yet
Project
36 pages
ML CH 1 Notes
No ratings yet
ML CH 1 Notes
6 pages
Module 1 - Intro To ML - V2
No ratings yet
Module 1 - Intro To ML - V2
47 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
104 pages
Lotte Chemical Indonesia Cause Diagram
No ratings yet
Lotte Chemical Indonesia Cause Diagram
18 pages
Team Charter - Excelerate Team 22
100% (1)
Team Charter - Excelerate Team 22
7 pages
ISO 17025:2017 Non-Conformities in Lab
No ratings yet
ISO 17025:2017 Non-Conformities in Lab
2 pages
Brahmagupta: Pioneer of Mathematics
No ratings yet
Brahmagupta: Pioneer of Mathematics
3 pages
Nation-States: Dynamics and Challenges
No ratings yet
Nation-States: Dynamics and Challenges
4 pages
Racial Fluidity and U.S. Inequality Insights
No ratings yet
Racial Fluidity and U.S. Inequality Insights
53 pages
PSAR and Super Trend Trading Strategy
No ratings yet
PSAR and Super Trend Trading Strategy
6 pages
Summative Test No. 4 Grade Iv - Science: Describe How Light Reflects or Refracts
0% (1)
Summative Test No. 4 Grade Iv - Science: Describe How Light Reflects or Refracts
36 pages
Affidavit of Identity - Bangladesh
No ratings yet
Affidavit of Identity - Bangladesh
1 page
Assessment Submission Declaration Form Amended
No ratings yet
Assessment Submission Declaration Form Amended
7 pages
Personal Statement: Economics for Wellbeing
No ratings yet
Personal Statement: Economics for Wellbeing
2 pages
Does Identity Precede Intimacy? Testing Erikson's Theory On Romantic Development in Emerging Adults of The 21st Century
No ratings yet
Does Identity Precede Intimacy? Testing Erikson's Theory On Romantic Development in Emerging Adults of The 21st Century
29 pages
UNIT-1 (Mathematical Basis of Managerial Decision)
No ratings yet
UNIT-1 (Mathematical Basis of Managerial Decision)
11 pages
Stormwater Design Course Outline
No ratings yet
Stormwater Design Course Outline
44 pages
BSL2 and BSL3 and Certification-10
No ratings yet
BSL2 and BSL3 and Certification-10
5 pages
Renkema - 2007 - Validation of Wind Turbine Wake Models-Annotated
No ratings yet
Renkema - 2007 - Validation of Wind Turbine Wake Models-Annotated
115 pages
Intensity-Modulated Radiation Therapy (Webb)
100% (1)
Intensity-Modulated Radiation Therapy (Webb)
454 pages
Introduction to Thermodynamics Concepts
No ratings yet
Introduction to Thermodynamics Concepts
23 pages
Stat and Prob - Q4 - Week 3 - Module 3 - Identifying The Appropriate Rejection Region For A Given Level of Significance
100% (2)
Stat and Prob - Q4 - Week 3 - Module 3 - Identifying The Appropriate Rejection Region For A Given Level of Significance
25 pages
2019 Author Index: Leonardo Volume 52 and Leonardo Music Journal Volume 29
No ratings yet
2019 Author Index: Leonardo Volume 52 and Leonardo Music Journal Volume 29
7 pages
Bowtie Analysis For Risk Assessment of Confined SP
No ratings yet
Bowtie Analysis For Risk Assessment of Confined SP
14 pages
Energy Conversions in Car Engines
No ratings yet
Energy Conversions in Car Engines
1 page
White-Dwarf Mass in Intermediate Polars
No ratings yet
White-Dwarf Mass in Intermediate Polars
14 pages
Cookery 11DLL Third Quarter
No ratings yet
Cookery 11DLL Third Quarter
28 pages
Force and Pressure Class8th Dazzle
No ratings yet
Force and Pressure Class8th Dazzle
35 pages
2022 NS Grade 7 Term 4 Common Exam QP
No ratings yet
2022 NS Grade 7 Term 4 Common Exam QP
12 pages
Frog Anatomy Dissection Report
100% (1)
Frog Anatomy Dissection Report
29 pages
Community Service Organization Analysis
No ratings yet
Community Service Organization Analysis
2 pages
The Research, Development, Extension and Training Experiences of Marinduque State College, Philippines: Basis For Excellence in Education
No ratings yet
The Research, Development, Extension and Training Experiences of Marinduque State College, Philippines: Basis For Excellence in Education
15 pages
Gma Handbook
No ratings yet
Gma Handbook
76 pages

ml2 Paper

Uploaded by

ml2 Paper

Uploaded by

Annexure-I

A Term Paper Report

LOVELY PROFESSIONAL UNIVERSITY

To whom so ever it may concern

Registration Number: 12105351

Dated: 22 October 2024

9 HARDWARE & SOFTWARE 9

2) Unsupervised Learning Models:

MODELS USED IN PROJECTS:

GPU (Graphics Processing Unit):

CPU (Central Processing Unit):

classification_report: This function provides a more comprehensive assessment of your

11) from [Link] import DecisionTreeRegressor- Imports the DecisionTreeRegressor

Phase 2: Machine Learning Model Development (Month 2):

Phase 3: System Integration and Evaluation (Month 3):

1) # Let's visualize the distribution of Total Household Income

from sklearn.model_selection import train_test_split

from [Link] import DecisionTreeClassifier

y = df['Type of Household'] # Assuming 'Type of Household' is the target variable

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the Decision Tree Classifier

# Fit the model on the training data

# Get feature importances

# Create a DataFrame to display feature importances

feature_importance_df = [Link]({'Feature': [Link], 'Importance': feature_importances})

feature_importance_df.sort_values(by='Importance', ascending=False, inplace=True)

# Display the top N features by importance

top_n = 10 # Number of top features to display

print("Top", top_n, "features by importance:")

Top 10 features by importance:

13 Household Head Age 0.145237

14 Total Number of Family members 0.118960

16 Members with age 5 - 17 years old 0.096319

15 Members with age less than 5 year old 0.078877

11 Housing and water Expenditure 0.044962

5 Fruit Expenditure 0.043983

10 Clothing, Footwear and Other Wear Expenditure 0.043794

12 Education Expenditure 0.042492

6 Vegetables Expenditure 0.041309

4 Total Fish and marine products Expenditure 0.03922

features = df[['Total Food Expenditure', 'Bread and Cereals Expenditure',

target = df['Total Household Income']

X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=42)

Mean Squared Error: 9314005171.223476

# Initialize the Logistic Regression model

# Fit the model on the training data

# Predict on the testing data

# Evaluate the model

# Additional evaluation metrics

from sklearn.model_selection import train_test_split

from [Link] import DecisionTreeRegressor

from [Link] import mean_squared_error, r2_score

y = df['Type of Household'] # Assuming 'Type of Household' is the target variable

X_train, X_test, y_train, y_test = train_test_split(features, target,␣

# Evaluate the model

mse = mean_squared_error(y_test, y_pred)

print("Mean Squared Error:", mse)

print("R^2 Score:", r2)

Mean Squared Error: 18294548455.50716

R^2 Score: 0.7475630052677185

Mean Squared Error: 9024413126.92843

# Assuming 'df' is your DataFrame containing features

# Perform k-means clustering

# Add cluster labels to the DataFrame

# Visualize the clusters (example for 2D data)

from [Link] import silhouette_score

# Assuming 'df' is your DataFrame containing features and cluster labels

Silhouette Score: 0.7608313629296152

You might also like