Olympic Data Minor Project 5th Sem

INTRODUCTION
1. Introduction
Modern Olympic Games or simply The Olympics are leading international sports events
which features summer as well as winter sports competitions, here thousands of athletes
round the world participates in a variety of competitions held. The Olympic Games are
considered as the world’s foremost sports competition with a heavy count of more than 200
nations participating to get world recognition. There are a number of data sets related to the
Olympics that provide information on various aspects of the Games, including historical
results, athlete profiles, and more. These data sets are valuable for conducting various
analyses, research, and data visualizations related to the Olympic Games.
1.1 Purpose
In order to analyze Olympic data, here we have used a multi-faceted approach that involves
collecting data on on modules like: medal counts, individual athlete performance, and news
regarding the Olympics, and then finally using various analytics/analytical techniques to
uncover the patterns or the trends in the collected data.
1.2 Overview
The Olympic Games are undoubtly one of the most widely followed and yes closely watched
sporting events in the world of sporting events. In Every four years, athletes round the
world/globe come together to compete with one another in a wide range of sports. This event
has a long history, dating back to ancient Greece, and have undergone significant changes and
evolution over the time. In the recent years, there has been a great interest in using the data and
analytics in order to gain awareness regarding the performance of the athletes and the countries
participating in the Olympics events.
1.3 Research Gap
In this research paper, we have an aim to analyze the Olympics data over the past 60 years
in order to uncover the patterns and the trends related to the Event. We collected data on
modules like medal counts, individual athlete performance, and also news coverage. We
analyzed the data collected using Exploratory Data Analysis technique(EDA). Our study
provides a comprehensive analysis of the Olympics data and offers the insights into the
historical development of the great event of Olympic Games. In particular, mainly we will
look at the medal tallies by a country, overall and general analysis of Olympics events,
countrywise analysis of medals and performance over the years in the games, athlete-wise
1
analysis and finally providing the news regarding the Olympics in our web application. We
strongly believe that this analysis will provide or deliver the deeper understanding of the
Olympic Games, as well as the countries and athletes that have been most successful in these
competitions in all the years of participation.
CHAPTER 2
System Design and Formulation
2. Functional Requirement
2.1 Pandas
2.1.1 Introduction to Pandas:
Pandas is an important Python library, which is used for working with data sets. It has features
like analysing, cleaning, exploring, and manipulating the data collected. The name Pandas "has a
reference to both " the Panel Data", and "the Python Data Analysis" and was created by
Mr. Wes McKinney in the year 2008.
2.1.2 Uses of Pandas:

So, Pandas allows us to analyse big the data and make useful conclusions based on statistical
theories being constructed. Interestingly Pandas can clean messy data sets, and make them easily
readable and relevant to use. Relevancy is very important in data science.
Pandas library are also able to delete rows that are not relevant to the dataset, or contains wrong
values, like empty or the NULL values. This is technically called as cleaning the data.
2.2 The Classic Notebook : Jupyter Notebook
2
The Jupyter Notebook is the original web application for creating and sharing computational
documents. It offers a simple, streamlined, document-centric experience.
2.2.1 Introduction to Jupiter Notebook

The Jupyter Notebook is a notebook that is an open-source web application that allows the data
scientists to create and also share the documents that integrate live codes, the equations, the
computational output, visualizations, and yes other multimedia resources, along with the
explanatory text in a single document. Interestingly You can use the Jupyter Notebooks for all
sorts of data science tasks including the data cleaning and transformation, numerical simulation,
exploratory data analysis, data visualization, statistical modelling, machine learning, deep
learning, and much more to work with.
Jupyter Notebook provides you with an easy-to-use GUI, interactive data science environment
that NOT ONLY works as an integrated development environment (IDE), can also be used as a
presentation or educational tool. This Notebook is one of the way of working with Python inside
a virtual “notebook” and is rapidly growing in popularity with data scientists, in large parts due
to its flexibility. It features you with ways to combine code, images, plots, comments, and etc.,
in alignment with the steps of the “data science processes”. Furthermore, it is a form of
interactive computing, an environment in which users execute their code, see what happens, then
modify, and repeat it in a kind of iterative conversation between the data scientists and the
collected data. Data scientists can also use notebooks to create tutorials or interactive manuals
for their own versions of software.
A Jupyter Notebook has two components: First, the data scientists enters the programming code
or text in rectangular “cells” in a front-end web page. The browser then passes this code to a
back-end “kernel” which then runs the code and returns the results. Many Jupyter kernels have
been already been created, supporting the dozens of programming languages. The kernels need
not to be resided on the data scientist’s computer. These Notebooks can also run in the clouds
3
such as Google’s “Collaboratory project” commonly known as Google Colab. You can even run
Jupyter without network access right on your own computer and perform your work locally very
efficiently.
2.2.2 Uses of Jupyter Notebooks
Jupyter is being used to do Python machine learning work. It is a great environment which cab
be used to develop code, and also to communicate results.
The name “Jupyter” was primarily chosen to bring to the minds the ideas and traditions of
science and the scientific methods. On addition, the core programming languages supported by
the Jupyter Notebook are R, Python, and Julia. While this name Jupyter is not a direct acronym
for these languages (Julia (Ju), Python (Py) and R), it sure does establish a firm alignment with
them.
Summarizing the Pros and Cons:
We can conclude that these Notebooks are absolutely amazing in performing tasks related to
visualizations, cleansing of data, and any projects related to data science or Python in general.
Pros:
• Best Platform for getting started with data science.
• It is Easy to share the notebooks and visualizations.
• Easy Availability of mark-downs and other additional functionalities.
Cons:
• Lack of powerful features which are included in some IDE’s.
2.3 NumPy
2.3.1 Introduction to NumPy

NumPy : Python library mainly used for working with arrays. It also has features/functions to
work with domains of linear algebra, Fourier transformation, and the matrices. NumPy was
4
created back in 2005 by Travis Oliphant. It is an open-source project and is available for free
use.
So, NumPy stands for “Numerical Python”.
2.3.2 Uses of NumPy

 Data manipulation and analysis
 Machine learning and artificial intelligence
 Scientific computing and simulation
 Image processing and computer vision
 Signal processing and audio analysis
 Data visualization and graphics
 Education and research
 Prototyping and experimentation
5
2.4. WHAT IS STREAMLIT ?
Streamlit: Python-based library that allows the data scientist’s to easily create Free machine
learning applications. Streamlit allows us to display the descriptive text and model outputs,
visualize the data and the model performance and modify model inputs through the GUI using
sidebars.
Matplotlib: The Visualization with Python

Matplotlib is a library for creating various plots and creating static, animated, and interactive
visualizations in Python. Matplotlib makes easy things look easy and hard things possibly
visualizable.
 Creating publication quality plots.

 Make interactive looking figures that we can zoom, pan, update.
 Customize the visuals, style and the layout.
 Can Export to many file formats.
 Embedded in Jupyter-Lab and Graphical User Interfaces(GUI).
 Matplotlib is an amazing visualization library in Python for plotting the 2D arrays. Matplotlib
is a multi-platform data visualization library built on NumPy arrays and designed to work
with the broader Sci-Py (scientific-python) stack. It was introduced by John Hunter back in
the year 2002.
One of the greatest benefit of this visualization library is that it allows us to visually access to the
huge amounts of data in easily digestible formats. Matplotlib consists of varity of plots like line,
bar, scatter, histogram etc.
Fig. 1
6
Fig. 2
2.5. SEABORN
Seaborn is also a Python data visualization library based on matplotlib, extended version of
matplotlib. It provides a high-level interface for drawing attractive and the informative statistical
graphics.
Fig. 3
7
Fig. 4
Fig. 5
CHAPTER 3
CODING
APP.py
8
Fig. 3.1
Fig. 3.2
Fig.3.3
9
Fig. 3.4
Fig. 3.5
Fig. 3.6
10
Fig. 3.7
Fig. 3.8
Fig. 3.9
PREPROCESSOR.py
Fig. 3.10
HELPER.py
11
Fig. 3.11
Fig. 3.12
Fig. 3.13
12
Fig. 3.14
Fig. 3.15
Fig. 316
13
Fig.3.17
STEPWISE IMPLEMENTATION
Fig. 3.18
MEDAL TALLY
14
Fig. 3.19
Fig. 3.20
15
Fig. 3.21
OVERALL ANALYSIS
Fig. 3.22
16
Fig. 3.23
Fig. 3.24
17
Fig. 3.25
COUNTRY-WISE ANALYSIS
Fig. 3.26
Fig. 3.27
18
Fig. 3.28
Fig. 3.29
ATHELETE-WISE ANALYSIS
Fig. 3.30
19
Fig. 3.31
Fig. 3.32
20
Fig. 3.33
Fig. 3.34
21
CHAPTER 4
RESULT AND CONCLUSION
4.1 Result
Data Analysis of the Olympic dataset has lightened us on the valuable insights about the trends and
patterns that have occurred throughout the historical Olympic Games. Upon thoroughly examining
and Statistical Analysis, the key modules or points include the distribution of medals across different
countries, the evolution of participation in the Olympic Sports over the years, and yes the impact of
various factors on athletic performance of athletes.
4.2 Conclusion
In conclusion, this minor project on the data analysis of the Olympic dataset has provided a effective
understanding of the dynamics carried throughout the Olympic Games. Several trends have been
identified and the trends identified are very helpful and crucial for stakeholders, policymakers, and
yes specially sports enthusiasts as it helps them in making informed decisions for forthcoming
Olympics. Moving Further, the project highlights the significance of leveraging data analytics to
extract meaningful information from large datasets, contributing to the growing field of sports
analytics. Furthermore , the future research could dig deeper into specific aspects revealed in this
analysis, opening avenues for more targeted investigations into the world of Olympic sports.
22
CHAPTER 5
Future Scope
5. Future Scope0
 The Minor Project on the "Data Analysis of Olympic Dataset" lays down the foundation for
several potential avenues of future explorations:
 Predictive Modeling: Implement predictive models to forecast medal outcomes based on
historical data, incorporating variables such as economic indicators, host country influence,
and advancements in sports science.
 Athlete Performance Analysis: Conduct a detailed analysis of individual athlete
performance trends, considering factors like age, training methodologies, and the impact of
changing sports technologies.
 Country-Specific Analysis: Explore in-depth analyses for a specific country , examining it's
performance over a course of time, investments made in sports infrastructure, and yes, also
the socio-economic factors influencing their success in the Olympic Games.
 Impact of Host City on Participation: Investigating the influence of the host city on athlete's
participation and the athletic performance, taking into account geographical, cultural, and
logistical factors.
 Incorporate Qualitative Data: Integrate qualitative data, such as athlete interviews, coach
feedback, and crowd sentiments, to provide a more holistic understanding of the Olympic
experience.
 Longitudinal Study: Extend the analysis over a more extended period to identify long-term
trends and patterns, allowing for a more comprehensive understanding of the evolution of
the Olympic Games.
REFRENCES
1. Analyzing Evolution of the Olympics by Exploratory Data Analysis using R Rahul Pradhan1
, Kartik Agrawal1 and Anubhav Nag1
2. Leonardo De Marchi, “Data mining of Sports performance data”.
3. https://www.kaggle.com/search?q=+dataset+olympics
23

Olympic Data Minor Project 5th Sem

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Olympic Data Minor Project 5th Sem

Uploaded by

Copyright:

Available Formats

INTRODUCTION

1.3 Research Gap

System Design and Formulation

2.1.1 Introduction to Pandas:

Mr. Wes McKinney in the year 2008.

2.1.2 Uses of Pandas:

2.2 The Classic Notebook : Jupyter Notebook

2.2.1 Introduction to Jupiter Notebook

2.2.2 Uses of Jupyter Notebooks

Summarizing the Pros and Cons:

• Best Platform for getting started with data science.

• It is Easy to share the notebooks and visualizations.

• Easy Availability of mark-downs and other additional functionalities.

• Lack of powerful features which are included in some IDE’s.

2.3.1 Introduction to NumPy

2.3.2 Uses of NumPy

Matplotlib: The Visualization with Python

 Creating publication quality plots.

RESULT AND CONCLUSION

various factors on athletic performance of athletes.

You might also like