Professional Documents
Culture Documents
FINALSECOND
FINALSECOND
Guide:
Submitted by:
Mr. Himanshu Pabbi
Sakshi Ujjlayan(BCA-V) , 35413702021
(Assistant Professor)
Nivedita Agarwal (BCA-V) , 09413702021
Ms. Suman Singh
(Assistant Professor)
Date:
Certified that the Project Report (BCA-331) entitled “120 years of Olympic Dataset Using
Logistic and Random Forest Algorithm” done by the above student is completed under
our guidance.
Signature of the Guide:
Date:
We , would like to express our sincere gratitude to the everyone who played an important
role in the successful completion of our project.
Our dedicated project guides Mr. Himanshu Pabbi and Ms. Suman Singh whose guidance,
support, and invaluable insights were instrumental in shaping this project. Your
unwavering commitment to excellence and your willingness to share your knowledge have
been truly inspiring.
Our esteemed summer training teacher Dr. Prateek Gupta, whose expertise and mentorship
during the training period has enriched our understanding of the subject matter. Your
encouragement and constructive feedback were invaluable in honing our skills.
Our esteemed Head of Department Prof. (Dr.) Sudhir Kumar Sharma, whose leadership
and vision have created an environment conducive to learning and innovation. Your
support for academic endeavors has been a constant source of motivation.
I am also grateful to all the faculty members, friends, and family who supported me
throughout this journey.
This project would not have been possible without the collective wisdom and
encouragement of these individuals. I thank each one of you from the bottom of our heart
for your contributions.
This comprehensive study employs logistic regression and random forest analysis to delve
into 120 years of Olympic Games data, spanning from 1896to 2016. By utilizing these
advanced statistical techniques, we aim to predict medal-winning probabilities for
countries while considering a multitude of factors, including population size, host country
advantage, historical performance, and more. This analysis provides valuable insights into
the intricate interplay of factors influencing a nation's performance in the world's most
prestigious sporting event, shedding light on the ever-changing landscape of global sports
dominance The project offers the best approach to the model with the accuracy of
0.89.
TABLE OF CONTENTS
1. -
CERTIFICATE
2. -
ACKNOWLEDGEMENT
3. -
ABSTRACT
4. -
SYNOPSIS
5. 1-4
CHAPTER-1 INTRODUCTION
5-8
CHAPTER-2 LITRATURE REVIEW
6.
4.3 Results
5.1 Conclusion
Title: 120 years of Olympics Dataset Using Logistic and Random Forest Algorithms
The questions posed by the Olympic 120 Years Dataset are complex and fascinating,
providing a unique opportunity to analyze and understand a century of Olympic history.
Researchers and data scientists can explore a wide range of questions and challenges within
this dataset, predictive modeling, and ethical considerations. By leveraging this rich dataset,
we can uncover historical trends, make predictions about future Olympic events, and gain
valuable insight into the evolution of sports, athletes, and countries' performance on the
world stage
The project involving the application of logistic regression to the "120 Years of Olympics"
dataset holds significant importance as it combines the power of data analytics with the
historical legacy of the Olympic Games. By employing logistic regression, researchers can
unravel intricate patterns and relationships within this extensive dataset, particularly in the
context of predicting medal outcomes. Logistic regression can uncover factors that
influence an athlete's likelihood of winning a medal, providing insights into the nuanced
dynamics of the Olympics.
4. Objective
The objectives of analyzing the "120 Years of Olympics" dataset using logistic regression
and Random Forest are to predict future medal winners, understand historical trends in
Olympic performance, assess the influence of athlete demographics, evaluate the impact of
hosting the Olympics, promote fairness and inclusivity, allocate resources effectively, and
uncover insights into the evolution of Olympic participation.
5. Scope
The scope of this project encompasses the following aspects:
CPU 1.60ghz
Software Specifications
Frontend Python
Data Collection is one of the most important aspects in Data Analysis. The dataset can be
been taken from www.kaggle.com, www.brightdata.com, etc. Due to the wide adoption of
machine learning models, simply having large datasets on a domain specific task does not
ensure superior performance. Therefore, the dataset must be cleaned and preprocessed
before training. As Machine Learning models learn from the data, they are trained with
automatic predictions are likely to mirror the human disagreement identified during
annotation. As a result, having proper data cleaning and preprocessing of dataset is required.
8. Algorithm
The algorithm for analyzing the "120 Years of Olympic" dataset using logistic regression
begins with data preprocessing, which involves cleaning the dataset, encoding categorical
variables, and defining a binary target variable (medal or no medal). Next, the dataset is
split into training and testing sets for model evaluation. Also build random forest models to
predict medal outcomes using an ensemble of decision trees, fine-tuning hyperparameters.
The algorithm will be explained in detail, including the mathematical foundation and
implementation.
In conclusion, this project aims to provide businesses with an improved methodology for
medal prediction using logistic regression. The project's limitations will be acknowledged.
The future scope includes enhancing data collection methods, exploring advanced machine
learning techniques, and implementing real-time data integration for more dynamic
predictions.
11. References
All sources and references used in this project documentation will be listed in accordance
with the chosen citation style to ensure zero plagiarism.
LIST OF FIGURES
LIST OF TABLES