You are on page 1of 54

APPLICATIONS OF TRIGONOMETRY IN

REAL LIFE

A dissertation submitted to

Mahatma Gandhi University

in partial fulfilments of requirements for the degree of

BACHELOR OF SCIENCE IN MATHEMATICS

Submitted by

ANNU MARIA VARKEYACHAN 200021032320

ASNA P A 200021032321

BIJI JOSEKUTTY 200021032322

UNDER THE GUIDANCE OF


Ms THERESA J PUZHAKKARA

CBCS SEMESTER VI

DEPARTMENT OF MATHEMATICS ,

ALPHONSA COLLEGE , PALA

MARCH 2023
CBCS SEMESTER VI

DEPARTMENT OF MATHEMATICS

ALPHONSA COLLEGE,PALA
ARUNAPURAM

CERTIFICATE

I do hereby certify that the dissertation entitled “APPLICATIONS OF TRIGONOMETRY


IN REAL LIFE” is a bonafide record of work done by ANNU MARIA
VARKEYACHAN, ASNA P A , BIJI JOSEKUTTY ,under my supervision in partial
fulfilment of the requirements for the degree[CBCS] of Bachelor of Science in Mathematics
of M.G.University and that no part of this work has been submitted earlier for the award of
any degree.

Place:Arunapuram Ms Theresa J Puzhakkara

Date: (Supervising Guide)

Forwarded:

Dr.Sr.Sonia K Thomas

Head of the department


ACKNOWLEDGMENT

First and foremost, we thank GOD ALMIGHTY for all the blessings he bestowed
on us during the course of our work. We praise him for his great wisdom and guidance
throughout the endeavour. We also thank our principal .

We express our sincere gratitude to Ms Theresa J Puzhakkara , our guide, who took
utmost care in all our phase of doing this project and helped us with valuable guidance and
support in finishing our project successfully.

We wish to extend our profound thanks to all the teachers of Department of


Mathematics for their help during the period of this study.

We thank our parents for their support in doing our project. We are grateful to all
the members of the staff in the library of Alphonsa College, who helped us a lot collecting
materials for the project.
We express our heartfelt thanks to all our friends and well-wishers for their keen
interest and encouragement.

ANNU MARIA VARKEYACHAN

ASNA P A

BIJI JOSEKUTTY
DECLARATION

We do here by affirm that this dissertation entitled “APPLICATIONS OF


TRIGONOMETRY IN REAL LIFE” is the result of our research work.It is prepared in
partial fulfilment of the B.Sc. Mathematics syllabus of Mahatma Gandhi University.We
declare that this has not been submitted for the award of any others degree/diploma from any
other University. We humbly submit this dissertation for evaluation.

ANNU MARIA VARKEYACHAN

ASNA P A

BIJI JOSEKUTTY
CONTENTS

Topic Name Page no.

• INTRODUCTION...............................................................................1

• CHAPTER 1: APPLICATIONS IN OCEANOGRAPHY...............2

• CHAPTER 2: APPLICATIONS IN GAME DEVELOPMENT.....7

• CHAPTER 3: APPLICATIONS IN AVIATION…………………..8


CHAPTER 4: APPLICATION OF MATHEMATICS IN SPORTS…34

Basketball

• CONCLUSION..........................................................................................41

• BIBLIOGRAPHY.....................................................................................43
INTRODUCTION

Mathematics is a fascinating subject. Indians have a long history of being regarded


as the country with great mathematicians. It helps us run errands , manage our money and
measure progress, as well as helping us in sport. In recent years, with the development of
technology, maths has played a more and more important role in sport. As technology to
measure and improve performance gains momentum, even sport cannot escape maths. From
amateur athletic training to high-level sporting prowess, similar technology is used to give
athletes feedback.

People think about mathematics as being applied only in the field of science and
engineering. Yet mathematics plays a large role in the efficiency of sports. Within every
sport, there is multiple mathematical concepts which allow athletes to compete and be
successful within their chosen sport. Whether it is through discussing statistics or talking
tactics, deciding where players are going to be positioned on the pitch, or through the way in
which the game is scored, there is mathematics involved. Behind every shot, tackle, sprint,
kick, hit or throw etc., there has been a mathematical idea which has allowed the athlete
decide why they are carrying out the skill the way they are.

Coaches constantly try to find ways to get the most out of their athletes and sometime
they attend to mathematics for help. May include the best batting order for a team to
maximize the number of runs. Sports such as bridge, whist ,chess ,baseball
football ,basketball ,scorer and cricket are some of the sports that use maths. The new system
will allow them to use calculus to improve their players in training . In addition calculus can
be used to calculate the projectile motion of a baseball trajectory. This can be used in
baseball to optimise a pitchers throwing mechanism to maximise efficiency. Calculus can
also be used in basketball to find the arc length of the shot from the shooters released to the
net. Finding the most consistent percentage of shots made with using a certain angle you can
find out which player will score the most basket. In this project we are going to say about the
above topics in detail. From this project one will get an awareness about the mathematics in
sports projectile and calculus and its application.

1
CHAPTER-1

MATHEMATICS IN SPORTS

Mathematics plays a very important role in sports. Whether discussing a players


statistics, a coaches formula for drafting certain players, or even a judges score for a
particular athlete, mathematics are involved . Even concepts such as the likelihood of a
particular athlete or team winning, a mere case of probability, and maintain equipment are
mathematical in nature.

Let’s begin by looking at the throwing of a basketball. Now, we can use the equation
−16
f(x) = [ ]𝑥2 + (𝑡𝑎𝑛𝛼)𝑥 + ℎ
𝑣2(𝑐𝑜𝑠𝛼)2

It is helpful in finding out the velocity at which a basketball player must throw the
ball in order for it to land perfectly in the basket. When shooting a basketball , the ball is to
hit the basket at as close to a right angle as possible. For this reason, most players attempt to
shoot the ball at a 45°angle. To find the velocity at which a player would need to throw the
ball in order to make the basket, the range of the ball is to be determined when it is thrown at
a 45° angle.

The formula for the range of the ball is

[𝑣2𝑠𝑖(2𝛼)] 32
𝑅𝑎𝑛𝑔𝑒 =

But since the angle at which the ball is thrown is 45°,

𝑣2𝑠𝑖(2𝛼) 𝑣2𝑠𝑖𝑛 (2 ∗ 45) 2


𝑅𝑎𝑛𝑔𝑒 = [ ] = [ ]= 𝑉
32 32 32

Now, if a player is shooting a 3 point shot, then he is approximately 25 feet from the
basket. The graph of the range function indicates an idea of how hard the player must throw
the ball in order to make a 3 point shot.

2
fig 1.1

So, by solving the formula knowing that the range of the shot must be 25 feet we
have

2
25 = 𝑣
32

𝑣2 = 800

𝑣2 ≈ 28.2843

So in order to make the 3 point shot, the player must throw the ball at approximately
28 feet per second, 19 mph.

While throwing and hitting a baseball , the pitcher wants to throw the ball so that he
will strike out the batter. If his throw is too high or low then it is a ball and the better still has
at least three more opportunities to hit the ball. Similarly, when the batter hits the ball, he
wants to hit the ball so that it will be as far away from any of the other players as possible if
not outside of the ball field itself. The players must take into consideration the speed and
height of the ball to ensure that they will throw or hit it properly. Here is the equation for
finding the projectile motion of a baseball will travel:

−16
(𝑥) = [ ]2 + (𝑡𝑎𝑛𝛼)𝑥 + ℎ
𝑣2𝑐𝑜𝑠2𝛼

3
where all distances are measured in feet, h is the height from which the ball is
thrown, α is the angle at which the ball is thrown, v is the speed at which the ball is thrown,
and x is the distance that the ball travels. The distance that the ball will travel can be found by
using

[𝑣2𝑠𝑖(2𝛼)] 32
𝑦=

Now, a batter would be more concerned with the range of the ball, wanting it to
travel far enough to allow him to at least make it to first base safely. Several graphs of the
range with different α's and a fixed v and h are shown in fig 1.2.

fig 1.2

The black graph is when α = 30°, the blue graph when α = 45°, and the red graph
when α = 60°. It can be inferred from the graph that an angle of 45° will send the ball the
furthest. So, a batter would want to hit the ball as close to a 45° angle as possible, while a
pitcher, who is more concerned about the ball veering off path, would want to throw the ball
so the ball so that it would travel as close to a straight line as possible. Now, it is
approximately 420 feet from home plate to the edge of a baseball field. The batter wants to
hit the ball hard enough so that it will travel out of the field, over the approximately 7 foot
wall at the back of the outfield. If the batter hits the ball at a 40°angle and the ball is
approximately 5 feet in the air when struck, how hard must he hit the ball in order to have a
home run?

In the projection equation, f(x) is the height of the ball, so


4
−16
7 = [ 𝑣2𝑐𝑜𝑠 40 ]4202 + (𝑡𝑎𝑛40)420 +
52
2
−16 ∗ 4202
𝑣 =
(7 − [𝑡𝑎(40)] ∗ 420 − 5)𝑐𝑜𝑠240
𝑣2 ≈ 14128.4074

𝑣 ≈ 118.863𝑓𝑡/𝑠𝑒𝑐

Therefore, the batter must hit the ball at approximately 118 feet per second, which is
approximately 81 mph, in order to hit a home run when he hits the ball at an angle of 40°.

Many people consider bowling to be quite simplistic. However, you must consider
the angle of the ball and the velocity with which the ball is thrown when trying to get a strike.
The path of a bowling ball, thrown in a straight line, can be represented by the following
equation:


�](1 − −(𝑟∗𝑡)
𝑓(𝑡) = [ )
𝑟𝑒

where v is the velocity of the ball, t is the time in seconds that the ball travels, r is a
constant represents the friction, and g(t) is the distance in feet that the ball travels after t
seconds.

Now, the length of a blowing lane is approximately 60 feet. Let's say that the
friction caused by the bowling ball on the slick surface of the bowling lane is approximately
0.3 and the ball is rolled at approximately 15 mph, or 22 feet per second.The equation can be
graphed as

5
fig 1.3
From the graph,it is understood that the bowling ball, if thrown at 15 mph, should
make it all the way down the bowling lane.

Mathematics is also used in ranking players and determining playoff scenarios.


From something as simple as using a matrix to the formulas used to determine a players or
teams statistics, mathematics is an integral part of this system. For example, in the Olympics,
most sports have players draw numbers to see who they will be competing against. If there
are 2𝑘contestants then all athletes participate in the first round of play, if not, then some of the
participants enter during the second round of play. The number of athletes entering during the
second round of play will be 2𝑘 − 𝑛, where n is the number of contestants. Rankings are also
an important aspect of sports. In sports such as tennis, when rating athletes, an integral
estimator is used which is based on a players performance in a series of matches over a
certain period of time. Even horse racing uses mathematics to rank the horses based on how
well they have performed in previous matches, and these rankings go into determining the
value of a horse when a bet is placed. Mathematics is very prevalent in sports, from the most
complex of formulas to the simplest ideas such as betting.

♦♦♦

6
CHAPTER-2

CALCULUS IN SPORTS TO IMPROVE


PERFORMANCE

Mathematics plays an important role in the field of sports. Coaches, athletes, trainers
often use mathematics to gain a competitive advantage over their counterparts. With statistics
of games, statistics of players, and probabilities of winning or losing games, mathematics is
everywhere. Applications of calculus in sports are endless.

CALCULUS IN SPORTS: RUNNING RACES

Mathematics is involved in running to optimize the run, runners must keep


themselves at the right speed in order to finish in the shortest time possible. According to
Joseph Keller’s, A Theory of Competitive Running, the physiological running capacity of a
human body can be measured using a set of differential equations.

According to this theory, to win a running race under 291 meters, the optimum
strategy is to sprint at 100% acceleration for the entire 291 meters. Races above 291 meters
require a different strategy to optimize performance.

fig 2.1: Running Races

FINDING THE OPTIMAL VELOCITY FOR THE RUN WHILE


CONSERVING ENERGY

Keller’s theory, which is based on Newton’s second law and the


calculus of variations, provides an optimum strategy for running one-lap and
half-lap races. Keller wrote the equation of motion as:

7
𝑑𝑣 𝑣
+ = 𝑓(𝑡)
𝑑𝑡 𝑟

where υ is runner’s speed as a function of time t, τ is a constant characterizing the


resistance to running, assumed to be proportional to running speed, and f(t) ≤ F is the
propulsive force per unit mass.

Empirical knowledge of human exercise physiology is expressed in the assumed


relation between propulsive force and energy supply,

𝑑𝐸
𝑑𝑡 = 𝜎 − 𝑓𝑣
where E represents the runner’s energy supply, which has a finite initial value E 0, and
is replenished at a constant rate σ. In spite of this replenishment, the energy supply reaches
zero at the end of the race. Τ, σ, E0 and F are found by comparing the optimal race times.

CALCULUS IN BASEBALL

In baseball, calculus can be used to optimize the pitcher’s throw to achieve maximum
efficiency. Also, calculus can be used to calculate the projectile motion of baseball’s
trajectory and to predict if runners can make it to the next base on time given their running
speed and the speed of a hit ball.

FINDING THE WORK REQUIRED TO THROW THE BASEBALL

The work done W on a moving ball from a position s0 to s1 is equal to the change
in ball’s kinetic energy. The kinetic energy K of a baseball of mass m and velocity v is given
by
𝐾 = 1𝑚𝑣2
2

fig 2.2 : Baseball Field


8
𝑠1
1 1
𝑊 = ∫ 𝐹(𝑠)𝑑𝑠 = 𝑚𝑣12 − 𝑚𝑣02
𝑠0 2 2
where 𝑣0 and 𝑣1 are initial and final velocities. Using this, baseball players can
figure out how much force they need to exert on the ball to reach the place where they want
the ball to go.

2.2.2 FINDING THE AVERAGE FORCE ON THE BAT DURING THE COLLISION

The collision of ball and bat, are quite complex and their models are discussed in
detail in a book by Robert Adair, The Physics of Baseball.

fig 2.3

The above image shows an overhead view of the position of a baseball bat, shown
every fiftieth of a second during a typical swing. We can calculate the average force on the
bat during this collision by first calculating the change in the ball’s momentum.

It is known that the momentum p of an object is the product of its mass m and its
velocity v, that is, 𝑝 = 𝑚𝑣. Suppose an object, moving along a straight line, is acted on by a
force F = F(t) which is a continuous function of time t.
𝑡1
(t1) − 𝑃(t0) = ∫ 𝐹(𝑡)𝑑𝑡
𝑡 0

Using the above formula, one can find the average force on the bat during the
∆𝑣
collision F = ma where 𝑎 = . The application of calculus in sports does not end with

running
𝑡

and baseball, it can be applied in basketball too.

9
CALCULUS IN BASKETBALL

Calculus can be used in basketball to find the exact arc length of a shot from the
shooter’s hands to the basket. The moment the basketball is released from the shooter’s
hands, its travelling path creates an arc all the way to the net.

fig 2.4 : Basketball throw

Using the angle of release and strength of the release, one can mathematically
predict the travelling path and the length of the arc. While the ball is in the air, it is affected
by only one force, which is gravity.

FINDING THE ARC LENGTH OF A BASKETBALL THROW

The travel path of a basketball can be divided into two components, the horizontal
(x) direction and the vertical (y) direction. These two components can be represented by the
following parametric equations:
For horizontal, (𝑡) = 𝑥0 + 𝑣0 cos(𝜃) t 1
For vertical, 𝑦(𝑡) = 𝑦0 + 0 sin(𝜃) 𝑡 + 𝑔𝑡2
𝑣 2

where,

𝑥0 is the initial horizontal position of the basketball.

𝑦0 is the initial vertical position of the basketball.

𝑣0 is the initial velocity of the basketball.

𝜃 is the angle the ball is projected with respect to the x-axis.

g is the acceleration due to gravity, -9.81 m/s^2

t is the time travelled.

10
The derivatives of x(t) and y(t) with respect to time t are :

𝑑𝑥
= 𝑣0 cos(𝜃) 𝑡
𝑑𝑡

𝑑𝑦
= 𝑣0 sin(𝜃) − 9.81 𝑡
𝑑𝑡

Now, the distance of the travel distance of the basketball can be found using the arc length
equation

𝛽 𝑑2𝑥 𝑑2𝑦 .
𝐿 = ∫𝛼 √ + 𝑑𝑡,𝛼 ≤𝑡≤𝛽
𝑑𝑡2 𝑑𝑡2

Now, by inserting the derivatives of x(t) and y(t) in the arc length equation:
𝐿 = ∫𝛽 √(𝑣 cos(𝜃))2 + (𝑣 sin(𝜃) – 9.81 𝑡)2 𝑑𝑡
𝛼 0 0

This equation can be modified based on : (𝑎 − 𝑏)2 = 𝑎2 − 2𝑎𝑏 + 𝑏2

𝛽
𝐿 = ∫ √𝑣02𝑐𝑜𝑠2(𝜃) + 𝑣02𝑠𝑖𝑛2(𝜃) − 19.62 ∗ 𝑡 ∗ 𝑣0𝑠𝑖𝑛(𝜃) = 96.24𝑡2 𝑑𝑡
𝛼

By further modifying, the formula become,

𝐿 = ∫𝛽 √𝑣 2 − 19.62 ∗ 𝑡 ∗ sin(𝜃) + 96.24 𝑡2 dt


𝑣
𝛼 0 0

Example :If the average velocity of a basketball throw is 2.24 m/s, the angle of release is 45 °
degrees, and the time t required for the ball to travel is about 2 seconds, then the arc length
can be calculated using the above formula:
2
𝐿 = ∫0 √(2.24)2 − 19.62 ∗ 𝑡 ∗ 2.24 sin(45) + 96.24 𝑡2dt = 17.34 m

11
fig 2.5
The above figure shows different angles and entry points of a basketball into a
basketball hoop.The diameter of the hoop ring is 18 inches. As the basketball size is
smaller than the hoop ring, there is always a constant hoop margin. Hoop margin is the
amount of space left in the hoop ring after the basketball enters it.
Free throws, jump shots, and three-pointers enter at an angle that gives an oval
entrance to the hoop. This changes the given hoop margin. Apparent hoop size is the apparent
opening of the hoop to the ball. So, flatter the arc of throw, the smaller the ellipse of the hoop
ring.

An apparent hoop margin is the apparent hoop size minus the basketball’s
diameter. A basketball can be thrown in different ways and different angles. So, the apparent
hoop margin varies with each shot.

FINDING THE VELOCITY REQUIRED FOR THE BASKETBALL TO ENTER THE


BASKET
The velocity required for the basketball shot given the height of the player’s throw
and distance from the hoop can also be found.This is the equation for a player to shoot the
basketball in order to make it enter the basket perfectly.

−16
(𝑥) = [ ]𝑥2 + (𝑡𝑎𝑛𝛼)𝑥 + ℎ
𝑣 𝑐𝑜𝑠 𝛼
2 2

where,

h is the height from which the ball is thrown

𝛼 is the angle at which the ball is thrown.

𝑣 is the speed at which the ball is thrown

12
x is the distance the ball travels.

And the formula for the range of a basketball trajectory is

𝑣02 sin(2𝛼)
𝑅𝑎𝑛𝑔𝑒 =
32

Once, the range and the α angle of throw are known, then the velocity required for
the throw can be calculated using the above formulae.
The application of calculus in sports does not end with running, baseball and
basketball. Calculus can be applied to any physical sports to optimize performance.

♦♦♦

13
CHAPTER-3

PREDICTION OF SPORTS INJURIES BY MATHEMATICAL

MODELS

A number of different methodological approaches have been used to describe the


inciting event for sports injuries. These include interviews of injured athletes, analysis of
video recordings of actual injuries, clinical studies. Sports injuries can affect any and all parts
of the body depending on the particular repetitive movement performed just like any
repetitive motion injury. While there are factors that raise the risk of injury, there are also
elements that predispose athletes to sports injuries. rehabilitation and preventative efforts
should be centered on a thorough knowledge of risk factor etiology as well as knowledge of
how such factors contribute to sports injuries.

Predictive factors of sports injuries

Predictive factors of sports injuries are biological variables and the relations
between them that can be indicators for creating a health profile or diagnosis. For example,
weight can be a predictive factor of diabetes, arteriosclerosis, and other metabolic illnesses. It
is even more useful when associated with height, BMI, and waist-hip ratio since it can then
be used in predicting hypertension, myocardial infarction, diabetes, and strokes. In order to
effectively predict health complications, the WHO recommends using anthropometry to
monitor risk factors of chronic diseases and to perform studies that define the association
between the aforementioned factors and specific outcomes, such as arterial hypertension.
Predicting factors of sports injuries can be grouped into two types of factors: Intrinsic factors
and extrinsic factors.

Extrinsic factors

Sports injuries are most commonly caused by poor training methods; structural
abnormalities; weakness in muscles, tendons, ligaments; and unsafe exercising environments.
The most common cause of injury is poor training. For example, muscles need 48 hours to
recover after a workout. Increasing exercise intensity too quickly and not stopping when pain
develops while exercising also causes injury.

14
Intrinsic factors

Everyone’s bone architecture is a little different, and almost all of us have one or
two weak points where the arrangement of bone and muscle leaves us prone to injury. There
is an increase in the occurrence of injuries in children and adolescents locomotion devices
when they try to perform more ambitiously in hopes of improving their short-term
performance. As age and competition level increase, so increases the risk of injury.

Predictive factors of injuries

When an injury occurs, biomechanical, kinematic, and body composition analyses


tend to provide more predictive information than the analyses focused on training intensity,
resistance, muscle tone, agility, physical maturity, previous injuries or training methods.
Unevenness in the length of lower limbs, misalignments, anatomical abnormalities, club foot,
genu valgum, support type, or posture defects are typically factors cited as injury predictors.
Footprints have also been examined: the average arch, the foot’s plantar flexion and
dorsiflexion, excessive pronation, as well as the quadriceps’ Q angle.

The relationship between lower limb structure and sports injuries

Common predisposing factor in injuries to the ankles, legs, knees, and hips include:
Bilateral weight and structural symmetry, Quadriceps and calf girth, patella alta, a kneecap
that’s higher than usual, Q-angle of the knee (high Q angle: kneecap displaced to one side, as
with knock knees), Forefoot varus, Rear foot valgus, true and apparent leg length, uneven leg
length, excessive pronation (flat feet), cavus foot (over-high arches), bowlegged or knock-
knee alignment.

(a) Uneven leg length may lead to awkward running and increases the chance of injury,
but many people with equal-length legs suffer the same effects by running on tilted running
tracks or along the side of a road that is higher in the centre. The hip of the leg that strikes the
higher surface will suffer more strain.

(b) Pronation is the inward rolling of the foot after the heel strikes the ground, before
the weight is shifted forward to the ball of the foot. By rolling inwards, the foot spreads the
shock of impact with the ground. If it rolls too easily, however, it can place uneven stress on
muscles and ligaments higher in the leg.

15
While an overly flexible ankle and foot can cause excessive pronation, a too-
rigid ankle will cause the effects of cavus foot. Although the arch of the foot itself may be
normal, it appears very high because the foot doesn’t flatten inward when weight is placed
on it.
Such feet are poor shock absorbers and increase the risk of fractures higher in the legs.
Bowlegs or knock knees add extra stress through knees and ankles over time, and may make
ankle sprains more likely. Other structural conditions that make sports injuries more common
include lumbar lordosis. Overuse injuries are caused by repeated, microscopic injuries to a
part of the body. Many long distance runners experience overuse injuries even after years of
running. For road runners, the surface is hard and sometimes uneven, and the running
movements are repetitive. In addition, there are usually both up- and downhill elements, and
these increase the stress on tendons and muscles in the lower leg. These will develop running
injuries, so use footwear that doesn’t allow side-to-side movement of the heel, and that
adequately cushions the foot.

Logistic regression equations

The purpose of regression techniques is two-fold:

1) To estimate the relation between two variables while taking the presence of other factors
into account

2) To construct a model that allows for the prediction of the value of the dependent variable
(in logistic regression, the probability of success) for specific values of a predicted group of
variables

The concept of logistic regression

The benefit of logistic regression no doubt comes from its capacity to analyse
clinical and epidemiological research data. The primary objective that this technique
accomplishes is modelling how the presence, or absence, of diverse factors and their values
influence the probability of the, typically dichotomic, occurrence of an event. This technique
can also be used to estimate the probability of the occurrence of an event with more than two
categories. These sorts of situations are approached using regression techniques. Nonetheless,
lineal regression methodology is not applicable since the outcome variable only provides two
values such as the presence/absence of a knee sprain, or the presence/absence of injury. If we

16
classify the value of the outcome variable as 0 when the event does not occur (the absence
of a knee

17
sprain) and as 1 when it does occur (the athlete sprains his or her knee), and we look to
calculate the possible relation between the occurrence of a sprained knee and, for example,
the difference in the thickness of both thighs (considered a possible risk factor), can be
determined using a linear regression:

𝐾𝑛𝑒𝑒 𝑠𝑝𝑟𝑎𝑖𝑛 = 𝑎 + 𝑏 ∗ [𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑛 𝑡ℎ𝑖𝑔ℎ 𝑡ℎ𝑖𝑐𝑘𝑛𝑒𝑠𝑠]

And, based on the data, gauge the coefficients a and b of the equation through the
normal procedure of least squares. However, although this is mathematically possible, we
arrive at nonsensical results; upon calculating the resulting equation for different values of
thigh thickness, we will obtain results that generally differ from 0 and 1, while the only
results actually possible in this case are 0 and 1. Since this restriction is not imposed in lineal
regression, the outcome can theoretically take on any value. If p as the dependent variable of
probability that an athlete suffers a knee sprain,the equation can be built:
𝑝
𝑙𝑛
1−𝑝
As there is a variable taking any value, traditional regression equation can be
proposrd in order to find that value:

𝑝
𝑙𝑛 = 𝑎 + 𝑏 (𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑛 𝑡ℎ𝑖𝑔ℎ 𝑡ℎ𝑖𝑐𝑘𝑛𝑒𝑠𝑠)
1−
𝑝

which, with a slight algebraic manipulation, can be turned into

1
𝐼𝑛𝑗𝑢𝑟𝑦 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 =
1 + [−𝑎−𝑏−{𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑛 𝑡ℎ𝑖𝑔ℎ 𝑡ℎ𝑖𝑐𝑘𝑛𝑒𝑠𝑠}]

And this is exactly the kind of equation known as a logistic model, where the
number of factors can be greater than one. Therefore, in the denominator exponent,

𝑏1.𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑛 𝑡ℎ𝑖𝑐𝑘𝑛𝑒𝑠𝑠 + 𝑏2.𝑎𝑔𝑒 + 𝑏3. 𝑠𝑒𝑥 + 𝑏4.ℎ𝑒𝑖𝑔ℎ𝑡

Logistic model coefficients as risk quantifiers

One of the factors that make logistic regression so interesting is the relation that
logistic model coefficients preserve with a risk quantification parameter known in the field as
an “odds ratio”. The odds associated with an event is the quotient of the probability of
occurrence given the probability that it does not occur:
18
𝑂𝑑𝑑𝑠 𝑅𝑎𝑡𝑖𝑜 = 𝑝
1−𝑝

with p being the probability of occurrence. Therefore, calculate the odds of an injury
occurrence when the difference in thigh thickness is equal to or greater than a specific
quantity, which determines how much more probable it is that an injury occurs than if it were
not to occur in this situation. Likewise, calculate the odds of an injury occurrence when the
difference in thigh thickness in less than that same figure. Divide the first odds by the second,
then calculate an odds quotient, or an odds ratio, which in some way quantifies how probable
the occurrence of an injury is when the difference in thickness is greater than a specific figure
(first odds) relative to when the difference in thickness is less. The notion being measured is
similar to what find in the relative risk, which corresponds to the probability quotient that an
injury occurs when a specific factor is present (difference in thickness) compared to when it
is not. In fact, when the prevalence of the event occurring is low (<20 %), the odds value
ratio and the relative risk are very similar; but such is not the case when the occurrence of the
event is quite common, a fact that is often ignored.

𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝐼𝑛𝑗𝑢𝑟𝑦 𝑡ℎ𝑒 𝑝𝑟𝑒𝑠𝑒𝑛𝑐𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑟𝑖𝑠𝑘 𝑓𝑎𝑐𝑡𝑜𝑟


𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑅𝑖𝑠𝑘 =𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝐼𝑛𝑗𝑢𝑟𝑦 𝑡ℎ𝑒 𝑎𝑏𝑠𝑒𝑛𝑐𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑟𝑖𝑠𝑘 𝑓𝑎𝑐𝑡𝑜𝑟
Absolute risk increase=

(Post test probability if risk factor is present )

-(Post test probability if risk factor is present )

If there is a dichotomic factor in the regression equation, for example if the subject
is not a jumper, the b coefficient of the equation for this factor is directly related to the odds
ratio OR of being a smoker compared to not being one:

OR= exp (b)

where exp(b) is a measurement that quantifies the risk presented when the corresponding
factor is present compared to when it is not, assuming that the rest of the model’s variables
Remain constant.

When the variable is numerical, for example, age or body mass index, it is a
measurement that quantifies the change in risk when a variable changes its value while the
rest
19
of the variables remain constant. Insomuch, the odds ratio that, in theory, moves from age X1
to age X2, with b being the coefficient that corresponds to age in the logistic model is:

OR = exp [b * (X2 - X1)]

This is a model in which the increase or decrease of risk is proportional to the


change in one factor’s value to another. In other words, it is proportional to the difference
between the two values, but not to the starting point, meaning that the change in risk, in the
logistic model, is the same when we move from 20 years old to 30 years old as move from 40
to 50. When the variable’s coefficient b is positive, obtain an odds ratio greater than 1 that
therefore corresponds to a risk factor. On the other hand, if b is negative the odds ratio will be
less than 1 and will correspond to a non-risk factor.

𝑝𝑟𝑒 𝑡𝑒𝑠𝑡 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑖𝑛𝑗𝑢𝑟𝑦


𝑃𝑟𝑒 𝑡𝑒𝑠𝑡 𝑜𝑑𝑑𝑠 =
1 − 𝑝𝑟𝑒 𝑡𝑒𝑠𝑡 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑖𝑛𝑗𝑢𝑟𝑦
.
Pre-test odds=pre-test odds x positive likelihood ratio negative -likelihood ratio
where

𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑟𝑎𝑡𝑖𝑜 = 𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦


1 − 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦
𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑟𝑎𝑡𝑖𝑜 = 1 − 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦
𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦
𝑝𝑜𝑠𝑡 𝑡𝑒𝑠𝑡 𝑜𝑑𝑑𝑠
𝑃𝑜𝑠𝑡 𝑡𝑒𝑠𝑡 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 =
𝑝𝑜𝑠𝑡 𝑡𝑒𝑠𝑡 𝑜𝑑𝑑𝑠 + 1

Qualitative variables in the logistic model

Given that the employed methodology for calculations with the logistic model is
based on using quantitative variables, the same way as in any other regression process, it is
incorrect that qualitative variables are used in regression processes, whether nominal or
ordinal variables. Assigning a number to each category does not solve the problem since the
physical exercise variable has three possible answers: sedentary, sporadically performing
exercise, frequently performing exercise; and we assign the values 0, 1, 2, respectively, to
these variables. But then, performing frequent exercise has twice the value of performing
exercise sporadically, which makes little sense. for example: civil status, did not have any
ordering relation among the outputs.

20
The solution to this problem is to create as many dichotomic variables as the
number of outputs.

These new variables, artificially created, are called “dummy”, or indicator, internal,
or design variables. Therefore, if the variable in question produces exposure data with the
following outputs: Never ran, Ex-runner, Runs less than 10 kilometers per day, Runs 10 or
more kilometers per day, we have 4 possible answers from which we will construct 3
dichotomic internal variables (values 0,1) with different possibilities for codification that lead
to different interpretations. The most frequent of which is the following:

11 12 13

Never ran 0 0 0

Ex-runner 1 0 0

Run less than 0 1 0

10 km per day

Runs 10 or more 0 0 1

Km per day

Table 3.1: Design variables.

In this type of codification the regression equation’s coefficient for each design
variable (always transformed with the exponential function), corresponds to the odds ratio for
this category given the reference level (the first output). In our example, it quantifies how the
risk changes given the situation of never having run. There are other possibilities, among
which we will highlight an example with a qualitative variable and three outputs:

21
11 12

Output 1 0 0

Output 2 1 0

Output 3 1 1

Table 3. 2: Qualitative variable and three outputs.

With this codification, each coefficient is interpreted as an average of the change in


risk upon moving from one category to the next. In the event that a category cannot naturally
be considered a reference level, for example blood group, a possible classification system is:

11 12

Output 1 -1 -1

Output 2 1 0

Output 3 0 1

Table 3.3: Classification system of category not natural.

where each coefficient of the indicator variables has a direct interpretation as a


change in risk regarding the average of the three outputs.

Representation of logistic regression results

It is common to present logistic regression results in a table wherein each variable


will be shown with a coefficient value, its standard error, a parameter (labeled chi² Wald),
which allows us to check if the coefficient is significantly different from 0 and check the p
value for this context. It also allows us to check the odds ratio of each variable, together with
its confidence interval for 95% reliability.

22
Term Coefficient Standand 𝐶ℎ𝑖2 P Interpretation

Error

Indepen. -1.2168 0.9557 1.621 0.2029 NO

Age -0.0465 0.0374 1.545 0.2138 NO

Race* *5.684 0.0583 Almost(p<0.1)

Race 1 1.0735 0.5151 4.343 0.0372 P<0.05

Race 2 0.8154 0.4453 3.353 0.0671 Almost(p<0.1)

Runner 0.8072 0.4044 3.983 0.0460 P<0.05

Injury 1.4352 0.6483 4.902 0.0268 P<0.05

Dissymmetry 0.6576 0.4666 1.986 0.1587 NO

Q Angle 0.8421 0.4055 4.312 0.0379 P<0.05

Thigh Thickness 1.2817 0.4621 7.692 0.0055 P<0.01

Table 3.4: Example of Logistic Regression Presentation.

23
Variable Odds ratio OR< 95% OR >95%

Age 0.95 0.89 1.03

Race 1 2.93 1.07 8.03

Race 2 2.26 0.94 5.41

Runner 2.24 1.01 4.95

Injury 4.20 1.18 14.97

Dissymmetry 1.93 0.77 4.82

Q Angle 2.32 1.05 5.14

Thigh Thickness 3.60 1.46 8.91

Table 3.5: Odds Ratio.

Goodness of fit

As long as dealt with a regression model, it is fundamental that the model be


checked for an appropriate adjustment to the data used in the calculation before drawing
conclusions (Bender, 1996).

In the case of logistic regression, a rather intuitive idea is to calculate the probability
of an event, the occurrence of an injury or knee sprain in our case, for all athletes from the
sampling. If the goodness of fit is acceptable, one would expect a high probability value to be
associated with the presence of an injury, and vice-versa, if the calculated probability value is
low, one would likewise expect the absence of injury. This intuitive idea is formally realized
through the HosmerLemeshow test , that basically consists in dividing the range of
probability
24
in deciles of risk (which would be injury probability 0.1, 0.2, and so forth up to 1) and
calculating the distribution of both injured athletes as well as uninjured athletes that are
calculated in the equation and actually observed. These distributions, both calculated and
observed, contrast with each other through a chi² test. In the final presentation of logistic
regression data, a goodness of fit test should be included as well as a commented conclusion
drawn from the same test. With these, the HosmerLemeshow test would be more illustrative
than the mere obtained distribution values.

Logistic regression analysis

Despite the fact that accidents are unavoidable in sports, injury prediction and
prevention is a practical aspect of sports medicine considered to be the best treatment.
Regression models encompass mathematical techniques that deal with measuring the relation
between an outcome variable and predictive variables. When the outcome variable is
continuous, the preferred model is logistic regression. However, when the outcome variable
is dichotomic (injured/not injured) and the object of study is the relation between this and one
or more predictive variables (right Q angle, left Q angle, the difference in thigh thickness,
lower limb dissymmetry, age, sex, hours of training, kilometers run, etc...) the chosen
regression model is a simple logistic regression model (for one factor) or a multiple logistic
regression model (for more than one factor). Therefore, the logistic regression analysis
technique is used when it is suspected that one of the values of specific categorical variables
depends on a series of predictive or independent variables, along with the goal of finding a
mathematical function that expresses such a relation.

When the goal is to calculate the relation or association between two variables, the
regression models allow for the consideration that there may be other factors that affect this
relation. So, if the possible relation between lower limb dissymmetry and the probability of
suffering a knee injury is being studied as a risk factor, that relation can be different if other
variables are taken into account such as age, sex, or body mass index. Because of this, these
factors could be included in a logistic regression model as independent variables in addition
to dissymmetry. The other variables, in addition to the interest factor (in this example AGE,
SEX, BMI ), are called by several names: control variables, external variables, covariants, or
confounding variables.

25
Interaction

When the relation between the factor being studied and the dependent variable is
modified by the value of a third variable, we are then dealing with interaction. In our
example, we assume that the probability of suffering a sports injury increases with age when
there is lower limb dissymmetry.
In this case it is found that there is an interaction between the variables of Age and
Dissymetry.

Independent variable and probability direction

One of the first considerations we must take into account is that the relation
between the independent variable and the event probability doesn’t change direction. In such
a case, the logistic model doesn’t work for us. A very clear example of this situation arises
when we evaluate the probability of an athlete’s sports injuries in relation to the age when he
or she first began sports competitions. Up to a certain age, the probability can increase as the
age at which the athlete began competing is earlier. And starting from a mature age, the
likelihood of injury also increases compared to the older age at which an athlete competes. In
this case, a logistic model would be inadequate.

Collinearity

Another problem that may arise in regression models, and not only logistic models,
is that the variables involved may be correlated, which would lead us to a nonsensical model
and therefore to some values of the coefficients that cannot be interpreted. This situation,
with correlated independent variables, is called collinearity.

In order to understand it, an extreme case is discussed in which the same variable is
introduced in the model twice. Then,
exp (-b0 – b1 * X – b2 * X) or
exp [-b0 – (b1 + b2) * X ]

where the sum of b1+b2 allows infinite possibilities when the value of a coefficient
is divided into two addends, and therefore the calculation obtained from b1 and b2 doesn’t
make sense.

26
An example of this situation could be given if we include variables such as the
length of the lower limbs and the length of the calves in the equation, two variables that are
closely correlated.

Sample size

As a basic rule, it is necessary to have at least 10 participants, or (k + 1) cases to


estimate a model with k independent variables; in other words, at least 10 cases for each
dependent variable (the probability of the event). It is useful to point out that the qualitative
variables appear as c – 1 variables in the model, when constructing the corresponding internal
variables based on the qualitative variables.

Model selection

When talking about models that can be multivariable, an interesting topic is how to
choose the best set of independent variables to include in the model . The definition of the
“best” model depends on the type and objective of the study. In a case where something will
be predicted, the best model would be one that produces the most reliable predictions. And in
a case where the relation between two variables is being calculated (correcting the effect of
other variables), the best model will be one that obtains the most precise calculation of the
coefficient of the variable in question.

Types of differences

Whenever data is analysed, it is important to distinguish between numerical


differences, statistically significant differences, and clinically relevant differences. These
three concepts do not always coincide.

Number of variables

One must consider the maximum model, or the maximum number of independent
variables that can be included in the equation, while taking their interactions into account
when appropriate. Although there are different processes for choosing a model, there are only
three basic mechanisms for doing so: start with only one independent variable and, one by
one, add more according to the pre-established criteria (forward-moving process). Or also,
starting with
27
the maximum model, eliminate the variables one by one according to a pre established
criteria ( reverse moving process).
The method, called “ stepwise ”, combines the two previous mechanisms and, in
each step, a variable already present in the equation can be eliminated or another can be
added. In the case of logistic regression, the criteria for deciding if we should choose a new
model or stay with the currently used one at each step is established by the models’
likelihood ratio logarithm .

The likelihood equation

A model’s likelihood equation is a measurement of how compatible the model is


with the actual outcome data. If upon adding a new variable to the model, the likelihood does
not increase in a statistically significant way, then that variable will not be included in the
equation. To evaluate the statistical significance of a particular variable within the model, we
will focus on the Wald 𝐶ℎ𝑖2 value corresponding to the variable’s coefficient and on its level
of probability.
To develop this equation it is necessary to perform a prior monitoring of a
representative group of athletes taking into account their age, sex, and sport during a
sufficiently long observation period that could be called a season. During this period it is
crucial to differentiate the subjects into two groups: injured and non-injured. Consequently,
the relation between the different measured variables and the final outcome of injury or no-
injury is established. In order to determine the predictive variables, we should identify those
that show significant differences among the two groups, thus establishing the relation
between the injury/no injury dependent variable given the distinct anthropometric and sports
variables (activity time, training time, team position, etc...).

Sensitivity, specificity, positive predictive value and negative predictive


value

It is useful to use control techniques to evaluate the fit of the outcome results. With
the mathematical equations defined in the logistic regression analysis. The results should be
analysed in all studied subjects, for the studied group of athletes in question, and for a control
group of both sexes and differentiating the success rate by sex.

28
Sensitivity

Proportion of injured subjects in relation to how many the equation predicted would
be injured.

𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦(𝑆𝑛) =
(𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒)

The following table summarizes these calculations:

POSITIVE TEST(T+) NEGATIVE TEST(T-)

INJURY PRESENT (1+) TRUE POSITIVE (TP) FALSE NEGATIVE (FN)

INJURY ABSENT (1-) FALSE POSITIVE (FP) TRUE NEGATIVE (TN)

Table 3.6: Sensitivity.

𝑆𝑛 = 𝑃[𝑇 + 𝑖𝑓 𝐷+]

𝑇𝑃
𝑆𝑛 =
(𝑇𝑃 + 𝐹𝑁)

Specificity

Proportion of uninjured subjects in relation to how many the equation predicted


would not be injured.

𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦(𝑆𝑝) = 𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 ÷ (𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒)

Think of specificity as 1- the false positive rate. Notice what the denominator for
specificity is the number of healthy players. Using conditional probabilities, we can also
define specificity as:

29
Sp = P[Test is negative if Patient is healthy]

𝑆𝑝 = 𝑃[𝑇 − 𝑖𝑓 𝐼− ]

The following table summarizes these calculations:

POSITIVE TEST (T+) NEGATIVE TEST (T-)

INJURY PRESENT (1+) TRUE POSITIVE (TP) FALSE NEGATIVE (FN)

INJURY ABSENT (1-) FALSE POSITIVE (FP) TRUE NEGATIVE (TN)

Table 3.7: Specificity

𝑆𝑝 = 𝑃 [𝑇 − 𝑖𝑓 𝐷−]

𝑇𝑁
𝑆𝑝 =
(𝑇𝑁 + 𝐹𝑃)

False positives

Proportion of uninjured subjects in relation to how many the equation predicted


would be injured.

False negatives

Proportion of injured subjects in relation to how many the equation predicted would
not be injured.

In order to know the probability of whether or not a subject injures him or herself
in relation to the outcome injury ratio, we must know the positive predictive values (PPV)
and the negative predictive values (NPV) that should be defined as the following:

30
Positive predictive values: The probability of an athlete injuring him or herself
when predicted by the equation. To calculate this we use the equation:

(𝑆 ∗ 𝑃𝐿)
𝑃𝑃𝑉 =
(𝑆 ∗ 𝑃𝐿)(𝐹𝐿 ∗ 𝑃𝑁𝐿)

where S = Sensitivity; PL = Probability of injury; FP = False Positives; PNL = Probability of


noninjury;
The following table summarizes these calculations:

POSITIVE TEST (T+) NEGATIVE TEST(T-)

INJURY PRESENT (1+) TRUE POSITIVE (TP) FALSE NEGATIVE (FN)

INJURY ABSENT (1-) FALSE POSITIVE (FP) TRUE NEGATIVE (TN)

Table 3.8: False negatives.

𝑃𝑃𝑉 = 𝑃 [𝐼 + 𝑖𝑓 𝑇+]

𝑇𝑃
𝑃𝑃𝑉 =
(𝑇𝑃 + 𝐹𝑃)

Negative predictive values: The probability that the athlete does not injure him or herself
when the model has predicted a situation of non-injury. To calculate this we use the
Equation:

(𝐸 ∗ 𝑃𝑁𝐿)
𝑁𝑃𝑉 =
(𝐸 ∗ 𝑃𝑁𝐿) + 𝐹𝑁 ∗ 𝑃𝐿)

31
POSITIVE TEST (T+) NEGATIVE TEST (T-)

INJURY PRESENT (1+) TRUE POSITIVE (TP) FALSE NEGATIVE (FN)

INJURY ABSENT(1-) FALSE POSITIVE (FP) TRUE NEGATIVE (TN)

Table3. 9: Negative predictive values.

𝑁𝑃𝑉 = 𝑃 [𝐼 − 𝑖𝑓 𝑇−]

𝑇𝑁
𝑁𝑃𝑉 =
(𝑇𝑁 + 𝐹𝑁)

where S = Sensitivity; PL = Probability of injury; FP = False Positives; PNL = Probability of


noninjury;

It is always necessary to find false negatives and positives beforehand, as well as


the probability of injury or non-injury for each athlete before determining the positive and
negative predictive values.

In order to perform this type of calculation, the probability that an individual


exhibits the characteristic in question (suffering an injury) is expressed in function of the
predictive variable or variables; if we make P the probability, the model is expressed as
follows:

𝑃 = 𝛽0 + 𝛽1 𝑋

where βo y β1 are the model parameters and X is the predictive variable. The probability (P)is
equal to a constant β0 plus the product of the other constant β 1 multiplied by the value of the
predictive variable X.The coefficient β0 is an independent or constant term and it is the value
of the outcome variable’s average. The coefficient β 1 is the regression coefficient and it is
interpreted as the change in the outcome variable’s average by the unit of increase of the
predictive variable. The change will be an increase if the regression coefficient value is
positive and it will be a decrease if the value is negative.
32
It is possible that once the model parameters are calculated, the substitution of some
values of the predictive variable gives way to values that aren’t allowed for a probability.
This is why one should perform a probability transformation for the probability of showing
the characteristics in question.

𝑝
This logit transformation that consists in the logarithmic odd
1−𝑝 that a characteristic
will present itself, is modelled by the following formula:

𝑝
𝐿𝑜𝑔 [
1−𝑝] = + 𝛽1𝑋
𝛽0

The Log [
1−𝑝] is called logit(P)
In the logistic regression model, the coefficient is the logarithm of the odds ratio
between two individuals that are differentiated in a unit in terms of the predictive variable.

Likewise, by raising e to β1 , we obtain the OR value between those two


individuals
𝐿𝑜𝑔(𝑂. 𝑅) = 𝛽
Or:
𝑂. 𝑅 = 𝑒𝛽1
where e is the number that serves as the base of the Napierian logarithm, approximately
2.72.In the logistic regression model, β1 is the OR logarithm between two individuals that are
differentiated in a unit in terms of the β1 predictive variable, or likewise, by raising e to β1
one obtains the OR value between these two individuals. In the case where β 1 =0, it is implied
that the logit(P) = β0 + (0)X = 0, in other words does not change with X. Or equally

O.R. = 𝑒0 = 1,which indicates that the two variables are independent and there is no relation
between them. The calculation of β1 is called the logistic regression coefficient.

If we have several predictive variables and we try to study the relation between the
outcome variable and the whole set of predictive variables simultaneously, a multiple logistic
regression model will be used.
𝑝
𝐿𝑜𝑔
𝛽 [ ] = + 𝛽 𝑋 + … … . . . +𝛽 𝑋
1−𝑝 0 1 𝑝

33
where P is also the probability of presenting the characteristic in question.

♦♦♦

34
CHAPTER-4

APPLICATION OF MATHEMATICS IN SPORTS

Mathematics is indeed a fascinating subject. We Indians have a long history of


being regarded as a country with great Mathematicians. Learning math is introduced to
children at a very young age, from Vedic maths to abacus, children are included in these
classes at an early age to encourage faster learning. Math is a part of everyday life. From
calculating profits of a company to proportions and ratios for measuring each ingredient
while cooking, it is used in almost all the aspects of life. When people think or listen about
math, the first thing which comes to their mind is, mathematics can be only applied in
sciences and engineering but apart from that mathematics has a very big role in sports as
well. Athletes and their coaches constantly try to find different ways for improving their sport
and also turn towards mathematics for help.

It is not always realised that mathematics has a crucial role in most of the sports.
From discussing a players statistics, to coaches formulating and drafting certain players or
judges scoring a particular athlete, mathematics is dynamic in nature. For that matter even the
possibility of a team or an athlete winning is a mere case of probability. From something as
simple as using matrix to application formulas which help in determining a player or team
statistics, mathematics a part of the system .

Application of mathematics in sports -

Basket ball-

At first glance, basketball and math seemingly have little in common. However, a
closer look at the sport reveals that there is a considerable amount of math in basketball.

Geometry in basketball

Whether they realize it or not, basketball players make use of many geometric
concepts while playing a game. The most basic of these ideas is in the dimensions of the
basketball court. The diameter of the hoop (18 in), the diameter of the ball (9.4 in), the width

35
of the court (50 ft.) and the length from the three point line to the hoop (19 ft.) are all
standard measures that must be adhered to in any basketball court.

The path the basketball will take once it’s shot comes down to the angle at which it
is shot, the force applied and the height of the player’s arms. When shooting from behind the
free throw line, a smaller angle is necessary to get the ball through the hoop. However, when
making a field throw, a larger angle is called for. Understanding arcs will help determine how
best to shoot the ball.

Basketball players understand that throwing the ball right at the basket will not help
it go into the hoop. On the other hand, shooting the ball in an arc will increase its chances of
falling through the hoop. Getting the arc right is important to ensure that the ball does not fall
in the wrong place. The best height to dribble can also be determined mathematically.
Understanding geometry is also important for good defense. This will help predict the
player’s moves, and also determine how to face the player. Mathematics can also be used to
decide how to stand while going on defense. The more you bend your knees, the quicker you
can move. Utilizing geometry, math in basketball plays a crucial role in the actual playing of
the sport.

Statistics in basketball

fig 4.1

Statistics is essential for analysing a game of basketball. For players, statistics can
be used to determine individual strengths and weaknesses. For spectators, statistics is used to
determine the value of players and analyse the performance of an individual or the entire
team. Percentages are a common way of comparing players’ performances. It is used to get
values like the rebound rate, which is the percentage of missed shots a player rebounds while
on the court. Statistics is also used to rank a player based on the number of shots, steals and
36
assists

37
made during a game. Averages are used to get values like the points per game average, and
ratios are used to get values like the turnover to assist ratio.

Baseball –

fig 4.2

Baseball is a game that lends itself very well to all kinds of statistics and math
calculation., here are a few important baseball numbers and how they are calculated.

Hitters Batting Average (BA)

This is perhaps the most commonly calculated statistic. You might hear more about
home runs or strike-outs, but those are really just totals. You will hear and see the hitter’s
Batting

Average for every player that comes up to the plate. Basically, the Average is a
percentage of how many times a batter hits the ball safely (reaching base – not the number of
non-injury plays) divided by the number of times he comes to the plate. Sounds easy right?
Just divide the total safe hits by the number of plate appearances and you have the Batting
Average?

𝐻𝑖𝑡𝑠
𝐵𝑎𝑡𝑡𝑖𝑛𝑔 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 =
𝐴𝑡 − 𝑏𝑎𝑡𝑠
Well, yes, but not quite. That is basically true, but there are a couple of things that
change those two basic numbers. The number of plate appearances isn’t exactly the number
used, and you have to be sure you understand what counts as a safe hit. The number used as
the number of plate appearances to be used in the Batting Average calculation is called an At-
bat. Take a walk, for example, (no don’t leave – just use the ‘walk’ as an example), if the
batter walks, it does not count as an At-bat. Also, if a batter gets hit by a pitch, it is not
counted as an At-bat. If the batter hits the ball and gets out, but advances a baserunner, it is
called a Sacrifice and is not counted as an At-bat. Other instances not counted in the At-bat
total are when a batter gets to go to 1st base because of an obstruction or interference call, if

38
the batter is still batting and a baserunner is called out for some reason to end the inning, or if
he gets replaced during

39
his plate appearance (there are specific rules for 2 strikes, but that is too much detail for
us…). An At-bat is counted whenever the batter hits the ball and reaches safely, or is safe due
to an error on the play, or when the batter is called out for any reason after the ball is put into
play, or on a fielder’s choice (where the batter is safe but another baserunner is called out,
like a force-out at second). Out of all of those possible outcomes, only actual hits (singles,
doubles, triples, home runs) are counted in the Hits total – even if the batter makes safely on
base due to some other reason. So, now that you know all of that, the calculation is easy…

Batting Average = Hits/At-bats, and is rounded to the third decimal place. The
number is said as if multiplied by 1000, so a hitter that had 30 hits out of 100 Atbats
(30/100=.300) is said to be “batting three-hundred”, which is quite good at the Major League
level. The closer you get to .300 and above, the more likely you are a well known star. Hitters
near .200 are not doing so well. The highest BA ever through an entire season was .406 (Ted
Williams of the Red Sox in 1941) and the highest career average is .366 (Ty Cobb from
1905-1928). Among Active players, Joe Mauer has the highest career BA with .323.

On-Base Percentage (OBP)

On-base percentage is similar to the Batting Average, but includes more. This
number gives a better idea of how often a hitter reaches base – which is a useful statistic for
deciding who the lead-off batter should be, since you want them on base as much as possible.
The OBP includes not only hits (H), but walks (or Base on Balls, BB) and number of time the
batter is hit by a pitch (HBP). The sum of these is divided by the total At-bats plus BB plus
HBP plus Sacrifices (SF).

On-base percentage is calculated using this formula:

𝐻𝑖𝑡𝑠 + 𝐵𝐵 + 𝐻𝐵𝑃
𝑂𝑛 – 𝐵𝑎𝑠𝑒 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 =
𝐴𝐵 + 𝐵𝐵 𝐻𝐵𝑃 + 𝑆𝐹

The career leader in OBP is Ted Williams with an OBP of .4817 and the highest
single season OBP was Barry Bonds with .6094 in 2004. Any hitter with OBP of .350 or
more is doing pretty well.

40
Slugging Percentage (SLG)

The Slugging Percentage of a hitter tells you how many bases the hitter generates
per At-bat. It is simply a total of the bases gained by way of the hits, divided by the number
of At- bats (see above to know what counts as an At-bat). The total bases is pretty
straightforward. A single is 1 base, a double is 2, a triple is 3, and a home run is 4. Multiply
each of these by the number a hitter has of each, add them all together, and divide by the AB.

[𝑠𝑖𝑛𝑔𝑙𝑒𝑠 + (2 ∗ 𝐷𝑜𝑢𝑏𝑙𝑒𝑠) + (4 ∗ 𝐻𝑜𝑚𝑒𝑅𝑢𝑛𝑠]


𝑆𝑙𝑢𝑔𝑔𝑖𝑛𝑔 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 =
𝐴𝐵

This measurement is useful for power hitters since they may not have as high of a
batting average, but the hits they get are usually for extra bases. This number is a good way to
compare their production, even if their BA is lower than desired.

The theoretical maximum is 4.000, if a hitter hits a home run every At-Bat – which
may happen the first time they ever come to the plate as a Major Leaguer, but is doesn’t take
long to drop from there. The highest for a season is .8634 by Barry Bonds in 2001. The
highest over a career (at least more than a single AB, that is) is Babe Ruth with .6897. A
really good power hitter would have a SLG of .500 or more.

On Base Plus Slugging (OPS)

With use of the above numbers, baseball people determined that the OBP and the
SLG combined to give a pretty good idea of the hitter’s overall production in a way that
neither of the values did individually. Eventually, the simplest combination was to simply add
the two values together, and this became known as On Base Plus Slugging, or OPS.

𝑂𝑛 − 𝐵𝑎𝑠𝑒 𝑃𝑙𝑢𝑠 𝑆𝑙𝑢𝑔𝑔𝑖𝑛𝑔 = 𝑂𝐵𝑃 + 𝑆𝐿𝐺

Pitchers Earned Run Average (ERA)

The most frequently referenced calculated statistic is the Earned Run Average. This
is the average number of earned runs allowed per 9 innings. Any run that is the result of a
defensive error is not included in the ERA, so the basic calculation is the number of earned
runs divided by the number of innings pitched, then times that ratio by 9.

41
𝐸𝑎𝑟𝑛𝑒𝑑 𝑅𝑢𝑛𝑠
𝐸𝑎𝑟𝑛𝑒𝑑 𝑅𝑢𝑛 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 = ∗9
𝐼𝑛𝑛𝑖𝑛𝑔𝑠 𝑃𝑖𝑡𝑐ℎ𝑒𝑑
This shows the average runs a pitcher would give up if they pitched the entire 9
innings and there were no errors on defense. Most pitchers throw much less than 9 innings
per outing, an so this is a good method to compare them and their different inning counts.
Really good pitchers will have an ERA below 4.00. While career ERA is important, the ERA
is usually more meaningful for the season as it shows how well the pitcher is currently
keeping other teams from scoring.

The active pitcher with the lowest career ERA is Mariano Rivera with 2.215.

Walks and Hits per Innings Pitched (WHIP)

This is a straight-forward calculation that helps us understand how well a pitcher is


keeping runners off of the bases. All hits (singles, doubles, triples, home-runs) count as 1, so
it is that total plus the number of walks issued, divided by the number of innings the pitcher
has pitched (partial innings are measured in thirds, by the number of outs recorded).

𝐵𝐵 + 𝐻
𝑊𝑎𝑙𝑘𝑠 𝑎𝑛𝑑 𝐻𝑖𝑡𝑠 𝑝𝑒𝑟 𝐼𝑛𝑛𝑖𝑛𝑔𝑠 𝑃𝑖𝑡𝑐ℎ𝑒𝑑
𝐼𝑛𝑛𝑖𝑛𝑔𝑠 𝑃𝑖𝑡𝑐ℎ𝑒𝑑
=
Mariano Rivera is also the active leader in WHIP at 1.0026 and the top 50 active
pitchers are under 1.35. This metric has much smaller differences between pitchers, but if you
think about how what this represents, it can be a big deal. Think about this, a third of a runner
every inning, or a runner every 3 innings, or 3 runners per 9 inning game. Even though the
differences in the average are small, those extra runners could make a big difference in the
outcome of a game.

Wins (W)

So the number of Wins is a simple total, but it gets really confusing when you try to
understand what counts as a win. Only a single pitcher is credited with a win in any given
game, but many pitchers on the winning team could contribute to the victory – so how do you
decide? To get the W, you have to be the pitcher at the time your team takes a lead that it
does not give up for the remainder of the game. So a starting pitcher may pitch beautifully,
but if his reliever gives up the lead, the starter will not

42
get the W even if his team retakes the lead. If the starter doesn’t complete 5 innings, he
cannot get the W no matter what, and the official scorer determines which reliever was most
effective and he is given the W.

The overall leader in Wins is Cy Young with 511. Pitchers pitch much less often
than they did when Cy Young played, so his Win total may never be surpassed. The current
active Win leaders are Andy Pettitte with 255 and Tim Hudson with 205. 300 wins in a career
is very big deal, and 20 wins in a single season is a fantastic mark.

Fielders Fielding Percentage

We can’t ignore the defence – there aren’t a lot of things to measure but they do
have Fielding Percentage. This is the number of putouts and assists divided by their total
number of chances. A putout is making an out – like catching a fly ball or touching a base on
a force out. An assist would be throwing the ball to another who gets the putout. If the player
makes a mistake doing either of these two things, they can be assigned an error. The errors
are part of the total number of chances.

𝑃𝑢𝑡𝑜𝑢𝑡𝑠 + 𝑎𝑠𝑠𝑖𝑠𝑡𝑠
𝐹𝑖𝑒𝑙𝑑𝑖𝑛𝑔 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 =
𝑃𝑢𝑡𝑜𝑢𝑡𝑠 + 𝑎𝑠𝑠𝑖𝑠𝑡𝑠 + 𝑒𝑟𝑟𝑜𝑟𝑠

♦♦♦

43
CONCLUSION
As it has been observed, there really exist a very close relationship between sports
and maths in which the playing of all sports has been found to apply mathematical principles
like calculus,arithmetic,geometry,percentage etc. When researching the math behind sports,
we found that there are a multitude of formulas that go behind the simple actions in sports
such as basketball and baseball. To be successful in these sports, one must make their baskets,
and hit the ball a certain way. The point of this exploration is to delve into the math behind
these sports, and see what formulas occur during a game. We chose this topic, because we
love sports.

Calculus is the part of mathematics that has various applications in real-life we have
observed numerous applications of calculus in different types of sports. It plays an important
role in the field of sports. Baseball is one type of sport in which we use the application of
calculus. Athletes, trainers, and coaches often use calculus to gain benefits over their
counterparts. Calculus can also be used to calculate the projectile motion of baseball's
trajectory, speed of baseball when hit, and predict if runners can make it to the next base on
time, given their running Speed. Sports injuries affecting the lower extremities in high impact
sports, such as athletics or basketball, can be predicted by means of logistic regression
equations. The first injury score was described by Shambaugh in 1991, using imbalance in
bilateral weight and deviation of the Q-angle of the quadriceps as dependent variables.
Salazar (2000) developed a mathematical equation to predict lesions based on Shambaugh's
score and constructed through logistic regression analysis, while Fernández (2004) introduced
thigh thickness as a transcendence variable in the prediction of injuries, leading to a more
precise equation. From these investigations, we observed that logistic regression analysis can
be a valid method for discriminating among anthropometric parameters related to sports
injuries, providing a simple and reliable method that could be used in the routine practice of
sports medicine.

Sport and maths are very different activities, but some aspects of the mindset
required to be successful in maths or sport can certainly help us to achieve success in the
other. Mathematics plays an essential role in sport at all levels, whether it be through human
intelligence or through the use of technology to monitor working levels. As technology and
techniques continue to evolve, the data available and performance analysis can only improve
44
further. Mathematics is everywhere from daily lives to sports. When we sit down to watch
our favourite sports star or team we should recognize the behind-the-scenes role that
mathematics is playing in bringing these events to us and making it possible to have fair,
competitive and efficient sports events. This project give us a brief insight into the world of
mathematics and how it influences the world of sports.

45
BIBLIOGRAPHY

1) https://digitash.com/engineering/mathematics/how-to-apply-calculus-in-sports/

2)http://jwilson.coe.uga.edu/EMAT6680/Huffman/Mathematics%20in%20Sports/Mathemati
csSports.html

3)Prediction of Sports Injuries by Mathematical Model – University of Granada, Department


of Physical Education and Sports, Spain

4)https://hillarydoshi.blogspot.com/2021/03/application-of-mathematics-in-sports.html?m=1

5)http://www.makemathagame.com/everyday_math/baseball-math/

46
47
48

You might also like