Analysis On Olympic Dataset

ANALYSIS ON OLYMPIC DATASET
Using Logistic Regression ,Random Forest Algorithm
Submitted in partial fulfillment of the requirements
For the award of the degree of
Bachelors of Computer Applications
To
Guru Gobind Singh Indraprastha University, Delhi
Guide:
Submitted by:
Mr. Himanshu Pabbi
Nivedita Agarwal (BCA-V) , 09413702021
(Assistant Professor)
Sakshi Ujjlayan(BCA-V) , 35413702021
Ms. Suman Singh
(Assistant Professor)
Institute of Information Technology & Management
New Delhi- 110058
Batch (2021-2024)
CERTIFICATE
We, Nivedita Agarwal (09413702021), Sakshi Ujjlayan( 35413702021),certify that the

Summer Training Project Report (BCA-331) entitled “Analysis on OLYMPIC DATASET
Using Logistic Regression ,Random Forest Algorithm” is done by me and it is an authentic
work carried out by me at Institute of Information Technology & Management, the matter
embodied in this project work has not been submitted earlier for the award of any degree or
diploma to the best of our knowledge and belief.
Signature of the Student Signature of the Student
Date:
Certified that the Project Report (BCA-331) entitled “Analysis on Google Play Store” done
by the above student is completed under our guidance.
Signature of the Guide:
Date:
Name of the Guide: Mr. Himanshu Pabbi
Ms. Suman Singh
Designation: Assistant Professor
Prof.(Dr.) Sudhir Kumar Sharma Prof. (Dr.) Rachita Rana
Counter sign HOD- Computer Science Counter sign Director

Acknowledgement
We , would like to express our sincere gratitude to the everyone who played an important role
in the successful completion of our project.
Our dedicated project guides Mr. Himanshu Pabbi and Ms. Suman Singh whose guidance,
support, and invaluable insights were instrumental in shaping this project. Your unwavering
commitment to excellence and your willingness to share your knowledge have been truly
inspiring.
Our esteemed summer training teacher Dr. Prateek Gupta, whose expertise and mentorship
during the training period has enriched our understanding of the subject matter. Your
encouragement and constructive feedback were invaluable in honing our skills.
Our esteemed Head of Department Prof. (Dr.) Sudhir Kumar Sharma, whose leadership and
vision have created an environment conducive to learning and innovation. Your support for
academic endeavors has been a constant source of motivation.
I am also grateful to all the faculty members, friends, and family who supported me
throughout this journey.
This project would not have been possible without the collective wisdom and encouragement
of these individuals. I thank each one of you from the bottom of our heart for your
contributions.
Nivedita Agarwal (BCA-V) , 09413702021

Sakshi Ujjlayan(BCA-V) , 35413702021
ABSTRACT
This comprehensive study employs logistic regression and random forest analysis to delve
into 120 years of Olympic Games data, spanning from 1896to 2016. By utilizing these
advanced statistical techniques, we aim to predict medal-winning probabilities for countries
while considering a multitude of factors, including population size, host country advantage,
historical performance, and more. This analysis provides valuable insights into the intricate
interplay of factors influencing a nation's performance in the world's most prestigious
sporting event, shedding light on the ever-changing landscape of global sports dominance
The project offers the best approach to the model with the accuracy of 0.89.
TABLE OF CONTENTS
S. No. TOPIC PAGE No.
1. CERTIFICATE -
2. ACKNOWLEDGEMENT -
3. ABSTRACT -
4. SYNOPSIS 1-5
5. CHAPTER-1 INTRODUCTION 6-11
1.1 Description of the topic
1.2 Problem Statement
1.3 Objectives
1.4 Scope of the Project
1.5 Project planning Activities
1.5.1 Team-Member wise work distribution

table
1.5.2PERT Chart
1.6 Organization of the report
6. Chapter-2 LITRATURE REVIEW 12-16
7. CHAPTER 3 – SYSTEM DESIGN AND 17-19

METHODOLOGY
3.1 System Design
3.2 Algorithm Used
8. CHAPTER 4 – IMPLEMENTATION & RESULT 20-36
4.1 Hardware and Software Requirement:
4.2 Implementation Details
4.3 Results
9. CHAPTER 5 – CONCLUSION AND FUTURE 37-39

WORK
5.1 Conclusion
5.2 Future Scope
10. REFERENCES 40
Synopsis
1. Title of the Project
Title: 120 years of Olympics Dataset Using data science algorithms
2. Statement about the Problem
The questions posed by the Olympic 120 Years Dataset are complex and fascinating,
providing a unique opportunity to analyze and understand a century of Olympic history.
Researchers and data scientists can explore a wide range of questions and challenges within
this dataset, predictive modeling, and ethical considerations. By leveraging this rich dataset,
we can uncover historical trends, make predictions about future Olympic events, and gain
valuable insight into the evolution of sports, athletes, and countries' performance on the
world stage
3. Significance of the Project
The project involving the application of logistic regression to the "120 Years of Olympics"
dataset holds significant importance as it combines the power of data analytics with the
historical legacy of the Olympic Games. By employing logistic regression, researchers can
unravel intricate patterns and relationships within this extensive dataset, particularly in the
context of predicting medal outcomes. Logistic regression can uncover factors that influence
an athlete's likelihood of winning a medal, providing insights into the nuanced dynamics of the
Olympics.
4. Objective
The objectives of analyzing the "120 Years of Olympics" dataset using logistic regression and
Random Forest are to predict future medal winners, understand historical trends in Olympic
performance, assess the influence of athlete demographics, evaluate the impact of hosting the
Olympics, promote fairness and inclusivity, allocate resources effectively, and uncover insights
into the evolution of Olympic participation.
5. Scope
The scope of this project encompasses the following aspects:
• Data collection from diverse product categories.

• Preprocessing and cleaning of the collected data.
• Development and training of a logistic regression model and random forest.
• Comparative analysis of performance across different sports categories.
• Presentation of results and insights.
6. Hardware and Software Specification
Hardware Specifications
Minimum Hardware Requirements
Processor Intel(R) Core(TM) I5 or equivalent
CPU 1.60ghz
Memory At least 2.00GB
Hard Disk 500GB
Display Super VGA (1366 ´ 768) or higher
resolution monitor
Input Devices Keyboard, Mouse
Software Specifications
Minimum Software Requirements
Frontend Python
Browser Mozilla Firefox, Google Chrome etc.
Development tool Jupiter Notebook, Google

Colab,Anaconda
6. Data Collection and Methodology
Data Collection is one of the most important aspects in Data Analysis. The dataset can be been
taken from www.kaggle.com, www.brightdata.com, etc. Due to the wide adoption of machine
learning models, simply having large datasets on a domain specific task does not ensure
superior performance. Therefore, the dataset must be cleaned and preprocessed before training.
As Machine Learning models learn from the data, they are trained with automatic predictions
are likely to mirror the human disagreement identified during annotation. As a result, having
proper data cleaning and preprocessing of dataset is required.
7. Algorithm
The algorithm for analyzing the "120 Years of Olympics" dataset using logistic regression
begins with data preprocessing, which involves cleaning the dataset, encoding categorical
variables, and defining a binary target variable (medal or no medal). Next, the dataset is split
into training and testing sets for model evaluation. Also build random forest models to predict
medal outcomes using an ensemble of decision trees, fine-tuning hyperparameters. The
algorithm will be explained in detail, including the mathematical foundation and
implementation.
8. Limitations of the Project
• Possible issues with missing or incomplete historical records.

• Risk of overfitting due to complex dataset with limited data points.
• May not account for external factors like technology advancements or geopolitical
events.
• Assumes linear relationships between features and outcomes, which may not hold in
all cases.
• Limited ability to capture nonlinear relationships between variables.
• Logistic regression might not handle highly complex interactions between features
effectively.
• Olympic contexts may change, requiring continuous model updates to maintain
accuracy and relevance.
9. Conclusion and Future Scope for Modification
In conclusion, this project aims to provide businesses with an improved methodology for medal
prediction using logistic regression. The project's limitations will be acknowledged. The future
scope includes enhancing data collection methods, exploring advanced machine learning
techniques, and implementing real-time data integration for more dynamic predictions.
10. References
All sources and references used in this project documentation will be listed in accordance with
the chosen citation style to ensure zero plagiarism.

Analysis On Olympic Dataset

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Analysis On Olympic Dataset

Uploaded by

Copyright:

Available Formats

ANALYSIS ON OLYMPIC DATASET

Using Logistic Regression ,Random Forest Algorithm

Submitted in partial fulfillment of the requirements

For the award of the degree of

Bachelors of Computer Applications

Guru Gobind Singh Indraprastha University, Delhi

Institute of Information Technology & Management

New Delhi- 110058

We, Nivedita Agarwal (09413702021), Sakshi Ujjlayan( 35413702021),certify that the

Signature of the Student Signature of the Student

Signature of the Guide:

Name of the Guide: Mr. Himanshu Pabbi

Ms. Suman Singh

Designation: Assistant Professor

Prof.(Dr.) Sudhir Kumar Sharma Prof. (Dr.) Rachita Rana

Counter sign HOD- Computer Science Counter sign Director

Nivedita Agarwal (BCA-V) , 09413702021

S. No. TOPIC PAGE No.

5. CHAPTER-1 INTRODUCTION 6-11

1.1 Description of the topic

1.2 Problem Statement

1.4 Scope of the Project

1.5 Project planning Activities

1.5.1 Team-Member wise work distribution

1.6 Organization of the report

6. Chapter-2 LITRATURE REVIEW 12-16

7. CHAPTER 3 – SYSTEM DESIGN AND 17-19

3.2 Algorithm Used

8. CHAPTER 4 – IMPLEMENTATION & RESULT 20-36

4.1 Hardware and Software Requirement:

4.2 Implementation Details

9. CHAPTER 5 – CONCLUSION AND FUTURE 37-39

5.2 Future Scope

1. Title of the Project

Title: 120 years of Olympics Dataset Using data science algorithms

2. Statement about the Problem

3. Significance of the Project

• Data collection from diverse product categories.

6. Hardware and Software Specification

Minimum Hardware Requirements

Processor Intel(R) Core(TM) I5 or equivalent

Memory At least 2.00GB

Hard Disk 500GB

Display Super VGA (1366 ´ 768) or higher

Input Devices Keyboard, Mouse

Minimum Software Requirements

Browser Mozilla Firefox, Google Chrome etc.

Development tool Jupiter Notebook, Google

8. Limitations of the Project

• Possible issues with missing or incomplete historical records.

You might also like