You are on page 1of 7

08/09/2023, 16:05 59e75f7212024afe8a2f519b64228155.

knit

::: {.cell .markdown _cell_guid=“24781183-092e-152a-5972-f3f2a712c76a”} ## LAB 5 FDA :::

::: {.cell .code execution_count=“1” _cell_guid=“fa424d3c-5da1-4fac-de31-cf7782e6c3b9”


execution=“{"iopub.execute_input":"2023-09-08T10:19:25.787923Z","iopub.status.busy":"2023-09-
08T10:19:25.785374Z","iopub.status.idle":"2023-09-08T10:19:30.705090Z"}” trusted=“true”}

library(ggplot2) # Data visualization


library(readr) # CSV file I/O, e.g. the read_csv function
library(plyr)
library(rworldmap)
library(repr)

summer=read.csv('../input/summer.csv')

Loading required package: sp

The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
which was just loaded, will retire in October 2023.
Please refer to R-spatial evolution reports for details, especially
https://r-spatial.org/r/2023/05/15/evolution4.html.
It may be desirable to make the sf package available;
package maintainers should consider adding sf to Suggests:.
The sp package is now running under evolution status 2
(status 2 uses the sf package in place of rgdal)

Please note that 'maptools' will be retired during October 2023,


plan transition at your earliest convenience (see
https://r-spatial.org/r/2023/05/15/evolution4.html and earlier blogs
for guidance);some functionality will be moved to 'sp'.
Checking rgeos availability: TRUE

### Welcome to rworldmap ###

For a short introduction type : vignette('rworldmap')

:::

::: {.cell .code execution_count=“2” _cell_guid=“b82ba573-3654-024f-139d-f3788076e1cc”


execution=“{"iopub.execute_input":"2023-09-08T10:19:31.532660Z","iopub.status.busy":"2023-09-
08T10:19:31.404093Z","iopub.status.idle":"2023-09-08T10:19:31.770611Z"}” trusted=“true”}

head(summer)
str(summer)

A data.frame: 6 × 9

Year City Sport Discipline Country Gender Event Medal


<int> <chr> <chr> <chr> Athlete <chr> <chr> <chr> <chr> <chr>

1 1896 Athens Aquatics Swimming HAJOS, Alfred HUN Men 100M Gold
Freestyle

2 1896 Athens Aquatics Swimming HERSCHMANN, AUT Men 100M Silver


Otto Freestyle

file:///home/student/Downloads/vertopal.com_20mia1032-fda-lab-5/59e75f7212024afe8a2f519b64228155.html 1/7
08/09/2023, 16:05 59e75f7212024afe8a2f519b64228155.knit

Year City Sport Discipline Country Gender Event Medal


<int> <chr> <chr> <chr> Athlete <chr> <chr> <chr> <chr> <chr>

3 1896 Athens Aquatics Swimming DRIVAS, GRE Men 100M Bronze


Dimitrios Freestyle
For
Sailors

4 1896 Athens Aquatics Swimming MALOKINIS, GRE Men 100M Gold


Ioannis Freestyle
For
Sailors

5 1896 Athens Aquatics Swimming CHASAPIS, GRE Men 100M Silver


Spiridon Freestyle
For
Sailors

6 1896 Athens Aquatics Swimming CHOROPHAS, GRE Men 1200M Bronze


Efstathios Freestyle

'data.frame': 31165 obs. of 9 variables:


$ Year : int 1896 1896 1896 1896 1896 1896 1896 1896 1896 1896 ...
$ City : chr "Athens" "Athens" "Athens" "Athens" ...
$ Sport : chr "Aquatics" "Aquatics" "Aquatics" "Aquatics" ...
$ Discipline: chr "Swimming" "Swimming" "Swimming" "Swimming" ...
$ Athlete : chr "HAJOS, Alfred" "HERSCHMANN, Otto" "DRIVAS, Dimitrios" "MALOKINI
S, Ioannis" ...
$ Country : chr "HUN" "AUT" "GRE" "GRE" ...
$ Gender : chr "Men" "Men" "Men" "Men" ...
$ Event : chr "100M Freestyle" "100M Freestyle" "100M Freestyle For Sailors" "1
00M Freestyle For Sailors" ...
$ Medal : chr "Gold" "Silver" "Bronze" "Gold" ...

:::

::: {.cell .markdown _cell_guid=“70a1c72a-50dd-7fa2-609e-9cb4a52c8bcd”} We get to understand the following


information after looking at the tables above-

The dataset contains about 30,000 records


There are no numerical columns to enable summing up of medals. We will need to add a proxy count
column to enable us to roll up the sum of medals at a desired level
Country names appear as three letter codes
Year values are coming up as integers. We need to convert it to a factor variable

Let's add the count variable and make changes to the dataset. :::

::: {.cell .code execution_count=“5” _cell_guid=“ed15d834-c3c5-abf4-f859-f8fb8b4696d6”


execution=“{"iopub.execute_input":"2023-09-08T10:20:05.565661Z","iopub.status.busy":"2023-09-
08T10:20:05.563447Z","iopub.status.idle":"2023-09-08T10:20:05.584072Z"}” trusted=“true”}

count=c(1)
summer$MedalCount=count
summer$Athlete=as.character(summer$Athlete)

:::

file:///home/student/Downloads/vertopal.com_20mia1032-fda-lab-5/59e75f7212024afe8a2f519b64228155.html 2/7
08/09/2023, 16:05 59e75f7212024afe8a2f519b64228155.knit

::: {.cell .markdown _cell_guid=“5361bb3b-850a-887c-05e3-c5f171e69fa0”} ## Top Countries Overall in the


Olympic Games

The dataset contains data for 148 countries. :::

::: {.cell .code execution_count=“6” _cell_guid=“97f095f8-fb7a-5dcd-974e-dc8e00c6ce67”


execution=“{"iopub.execute_input":"2023-09-08T10:20:08.539920Z","iopub.status.busy":"2023-09-
08T10:20:08.537760Z","iopub.status.idle":"2023-09-08T10:20:09.293043Z"}” trusted=“true”}

options(repr.plot.width=6, repr.plot.height=6)
world <- map_data(map="world")
world <- world[world$region != "Antarctica",] #
y=ddply(summer, .(Country,Medal), numcolwise(sum))
sPDF <- joinCountryData2Map( y
,joinCode = "ISO3"
,nameJoinColumn = "Country")

mapCountryData(sPDF
,nameColumnToPlot='MedalCount')

209 codes from your data successfully matched countries in the map
140 codes from your data failed to match with a country code in the map
159 codes from the map weren't represented in your data

You asked for 7 quantiles, only 4 could be created in quantiles classification

:::

::: {.cell .markdown _cell_guid=“d919546d-3790-f979-6721-2fc9f4056cf3”} As we can see, the countries which


win a lot of medals at the Olympic games are the developed countries in North America, Asia and Europe.
Poorer countries in Africa and South America do not win lot of medals.

To enable effective visualization of data, we will filter out the data for the top 5 countries overall. We will identify
the top 10 countries with the highest number of medals and then plot various charts to better understand their
performance over the years. :::

file:///home/student/Downloads/vertopal.com_20mia1032-fda-lab-5/59e75f7212024afe8a2f519b64228155.html 3/7
08/09/2023, 16:05 59e75f7212024afe8a2f519b64228155.knit

::: {.cell .markdown _cell_guid=“acc4d0ed-7076-885a-74e4-1b011865065d”} ## Performance of Top Countries


over the years :::

::: {.cell .code execution_count=“7” _cell_guid=“de7be1e8-763e-9388-fabe-4a57382eb1dd”


execution=“{"iopub.execute_input":"2023-09-08T10:20:14.051741Z","iopub.status.busy":"2023-09-
08T10:20:14.049893Z","iopub.status.idle":"2023-09-08T10:20:14.873397Z"}” trusted=“true”}

Countries=as.data.frame(table(summer$Country,summer$MedalCount))
colnames(Countries)=c("Country","a","MedalCount")
Countries=Countries[order(-Countries$MedalCount),]
CountriesFilter=head(Countries,n=5)
topCountryFilter=summer[summer$Country %in% CountriesFilter$Country,]

options(repr.plot.width=6, repr.plot.height=3)
x=ddply(topCountryFilter, .(Country,Year), numcolwise(sum))
ggplot(x,aes(Year,MedalCount,color=Country,group=Country))+geom_point()+geom_line()

:::

::: {.cell .markdown _cell_guid=“c2c668e2-29f2-be78-9b34-05b3fd678179”} Here are my observations from the


chart above-

1. USA has been the consistent high performer in the olympic Games, except for a period between 1950-
1980, where Soviet Russia won more medals than USA
2. Soviet Union won about 450 medals in teh 1980 games, highest ever by a country
3. Another interesting thing to note is that there is no China in the list of top most medal winning countries
4. Performance of Germany skyrocketed between 1950-1980, after going down again in the events
thereafter
5. Great Britain and France have had a very similar performance in terms of number of medals won :::

::: {.cell .markdown _cell_guid=“6431437b-6a0a-837c-f1cf-077688d33b11”} ## Who won the highest number of


medals overall? :::

::: {.cell .code execution_count=“8” _cell_guid=“14e43e7e-fe5f-4af4-789d-7b248d0193bc”


execution=“{"iopub.execute_input":"2023-09-08T10:20:18.469789Z","iopub.status.busy":"2023-09-
08T10:20:18.467994Z","iopub.status.idle":"2023-09-08T10:20:18.943172Z"}” trusted=“true”}

y=ddply(topCountryFilter, .(Country,Medal), numcolwise(sum))


ggplot(y,aes(x=reorder(Country,MedalCount),y=MedalCount,fill=Medal,group=Medal))+geom
_bar(stat='identity')

file:///home/student/Downloads/vertopal.com_20mia1032-fda-lab-5/59e75f7212024afe8a2f519b64228155.html 4/7
08/09/2023, 16:05 59e75f7212024afe8a2f519b64228155.knit

:::

::: {.cell .markdown _cell_guid=“eddace5f-0e2a-744e-65ae-e15da74fa202”} 1. USA has the highest overall


number of medals, its total medals go above 4000. Gold medals have the highest share of total medals in the
USA Medal tally 2. Even though Soviet Union broke down in the 1980s, it is still the country with the second
highest number of overall medals :::

::: {.cell .markdown _cell_guid=“b3ad627d-b8ba-e615-cb88-de64acdd842c”} ## Top Athletes in the History of


Olympic Games

Let's know try and figure out who were the best performing athletes over the entire history of Olympic Games.
Let's also look at their sports, countries and total number of medals won by these athletes. :::

::: {.cell .markdown _cell_guid=“0495f094-5a09-3466-c01c-1852681d6cb7”} ## Who are the Top 20 athletes in


the History of Olympic Games ? {#who-are-the-top-20-athletes-in-the-history-of-olympic-games-} :::

::: {.cell .code execution_count=“9” _cell_guid=“0901528c-0797-e5fb-2d14-09d0d7080a42”


execution=“{"iopub.execute_input":"2023-09-08T10:20:23.898513Z","iopub.status.busy":"2023-09-
08T10:20:23.896428Z","iopub.status.idle":"2023-09-08T10:20:24.748305Z"}” trusted=“true”}

tab=as.data.frame(table(summer$Athlete,summer$MedalCount))
colnames(tab)=c("Athlete","a","MedalCount")
topAthelete=tab[order(-tab$MedalCount),]
topAthelete=head(topAthelete,n=20)
topAthelete$Athlete=as.character(topAthelete$Athlete)

topAtheleteFilter=summer[summer$Athlete %in% topAthelete$Athlete,]


y=ddply(topAtheleteFilter, .(Athlete,Medal), numcolwise(sum))
ggplot(y,aes(x=reorder(Athlete,MedalCount),y=MedalCount,fill=Medal,group=Medal))+geom
_bar(stat='identity') +coord_flip()

:::

::: {.cell .markdown _cell_guid=“1dc8deb7-9816-1dd7-527f-9f7ffd4fd435”} As we can see in the chart above,


Michael Phelps, Andrianov Nikolay, Larisa Latynina have been the most successful athletes over the entire
history of Olympic Games. Michael Phels is much ahead of every other athlete with 22 medals, a vast majority

file:///home/student/Downloads/vertopal.com_20mia1032-fda-lab-5/59e75f7212024afe8a2f519b64228155.html 5/7
08/09/2023, 16:05 59e75f7212024afe8a2f519b64228155.knit

of his medals are gold medals. :::

::: {.cell .markdown _cell_guid=“54f1ed65-8a37-f296-9b3e-cf6e7e9cf5af”} ## Which Countries produce the


most successful athletes? :::

::: {.cell .code execution_count=“10” _cell_guid=“975dd6bf-9071-8af0-be4f-fe6f75149eb3”


execution=“{"iopub.execute_input":"2023-09-08T10:20:29.059039Z","iopub.status.busy":"2023-09-
08T10:20:29.056593Z","iopub.status.idle":"2023-09-08T10:20:29.425519Z"}” trusted=“true”}

y=ddply(topAtheleteFilter, .(Country,Medal), numcolwise(sum))


ggplot(y,aes(x=reorder(Country,MedalCount),y=MedalCount,fill=Medal,group=Medal))+geom
_bar(stat='identity')

:::

::: {.cell .markdown _cell_guid=“90ca45a0-c246-8fa2-72bb-638644e0832f”} The above chart suggests that


USA and Soviet Russia have contributed the highest number of Top Athletes. Both these countries have won
more than 160 medals combined through top 20 athletes. Also it seems that these athletes tend to win gold
more oftenthan any other medal. :::

::: {.cell .markdown _cell_guid=“d673eed9-3c03-f0da-a9ee-6b39924d75fa”} ## Which sports produces the


most successful athletes? :::

::: {.cell .code execution_count=“11” _cell_guid=“29a9cd2d-c1f4-dc85-d47e-451881095e58”


execution=“{"iopub.execute_input":"2023-09-08T10:20:34.997385Z","iopub.status.busy":"2023-09-
08T10:20:34.995532Z","iopub.status.idle":"2023-09-08T10:20:35.335782Z"}” trusted=“true”}

options(repr.plot.width=6, repr.plot.height=3)
y=ddply(topAtheleteFilter, .(Sport,Medal), numcolwise(sum))
ggplot(y,aes(x=reorder(Sport,MedalCount),y=MedalCount,fill=Medal,group=Medal))+geom_b
ar(stat='identity')

:::

file:///home/student/Downloads/vertopal.com_20mia1032-fda-lab-5/59e75f7212024afe8a2f519b64228155.html 6/7
08/09/2023, 16:05 59e75f7212024afe8a2f519b64228155.knit

::: {.cell .markdown _cell_guid=“986bd7cf-620d-7aac-e6e0-0a594b2908d6”} Majority of the medals from top 20


athletes have come from the sports of Gymnastics and Aquatics, indicating that a single individual can win
medals in multiple events at these sports. :::

::: {.cell .markdown _cell_guid=“358549fd-af50-f37f-4333-7033b5ef1c4e”} # Total number of Medals per Sport


:::

::: {.cell .markdown _cell_guid=“edc862ee-d372-2b0a-4a8f-f50778ff96bb”} After looking at the performance of


top athletes, let us also look at the distribution of total number of medals with different sports, and which sports
offer the highest number of medals :::

::: {.cell .code execution_count=“12” _cell_guid=“74b0010b-6fc2-23bf-ce1c-167576f2f253”


execution=“{"iopub.execute_input":"2023-09-08T10:20:43.949312Z","iopub.status.busy":"2023-09-
08T10:20:43.947614Z","iopub.status.idle":"2023-09-08T10:20:44.475877Z"}” trusted=“true”}

options(repr.plot.width=6, repr.plot.height=6)
y=ddply(summer, .(Sport,Medal), numcolwise(sum))
ggplot(y,aes(x=reorder(Sport,MedalCount),y=MedalCount,fill=Medal,group=Medal))+geom_b
ar(stat='identity') +coord_flip()

:::

file:///home/student/Downloads/vertopal.com_20mia1032-fda-lab-5/59e75f7212024afe8a2f519b64228155.html 7/7

You might also like