You are on page 1of 9

Business Analytics

Report on
Faithful

Submitted by:
Rahul Gupta

18021141087
Section-B
Old Faithful is a cone geyser located in Yellowstone National Park in Wyoming,
United States. It was named in 1870 during the Washburn-Langford-Doane
Expedition and was the first geyser in the park to receive a name. It is a highly
predictable geothermal feature, and has erupted every 44 to 125 minutes since
2000. The geyser and the nearby Old Faithful Inn are part of the Old Faithful Historic
District.

Why geyser is important?


There are three reasons why it’s important to study geysers. First, geysers are a
model for how volcanoes erupt. And we care about how they erupt, what initiates the
eruption, how everything rises to the surface, how it gets transported in the
atmosphere. Volcanoes are big and dangerous, and they don’t erupt very often.
Geysers are small, and much less dangerous, and they erupt many times. And one
of the things we hope to learn from geysers is how to understand and model
eruptions more generally. We can also deploy a range of geophysical instruments at
geysers. We can use seismometers to measure ground motion, we can measure
electric and magnetic fields, we can take videos, and we can try and integrate all
these different types of measurements to understand what happens during an
eruption. And then we can try to transfer this understanding from small geysers to
big volcanoes.
The second reason that we care about geysers is that they are a window into how
the Earth transports hot water. There are features called geothermal systems which
we use for geothermal energy, and geothermal systems make materials like gold
deposits. By transporting hot fluids, you can transport all of the elements that
dissolve in water. And when we look at a geyser, we get a window into how the
Earth is transporting a mixture of steam and water.
And the third reason is that they are interesting, fascinating natural phenomena. If
we understand how the Earth transports fluids and energy, we should be able to
explain how geysers work. And the extent to which we can’t do so tells us that
there’s basic things about the heat transport of the Earth that we don’t know yet.

Description

Waiting time between eruptions and the duration of the eruption for the Old Faithful
geyser in Yellowstone National Park, Wyoming, USA.

Usage

faithful

Format

A data frame with 272 observations on 2 variables.

[,1] eruptions numeric Eruption time in mins


[,2] waiting numeric Waiting time to next eruption (in mins)

Details

A closer look at faithful$eruptions reveals that these are heavily rounded


times originally in seconds, where multiples of 5 are more frequent than expected
under non-human measurement. For a better version of the eruption times, see the
example below.

There are many versions of this dataset around: Azzalini and Bowman (1990) use a
more complete version.

Analysis

faithful(calling faithful data)


Double tap on the table to view whole data

> nrow(faithful) (Finding out the number of rows in the given data set)
[1] 272
> ncol(faithful) (Finding out the number of columns in the given data set)
[1] 2
> str(faithful) (to know the structure of data set, number of observations and
number of variables)
'data.frame': 272 obs. of 2 variables:
$ eruptions: num 3.6 1.8 3.33 2.28 4.53 ...
$ waiting: num 79 54 74 62 85 55 88 85 51 85 ...

Structure function tells about the structure of the data set.How many observations
and how many variables are there in the given dataset we can be able to find out
very easily by using “str” function
> head(faithful) (to call top 6 rows of data set)
eruptions waiting
1 3.600 79
2 1.800 54
3 3.333 74
4 2.283 62
5 4.533 85
6 2.883 55

>
> tail(faithful) (to call bottom 6 rows of data set)
eruptions waiting
267 4.750 75
268 4.117 81
269 2.150 46
270 4.417 90
271 1.817 46
272 4.467 74

>
> summary(faithful) (to know the statistical parameters)
eruptions waiting
Min. :1.600 Min. :43.0
1st Qu.:2.163 1st Qu.:58.0
Median :4.000 Median :76.0
Mean :3.488 Mean :70.9
3rd Qu.:4.454 3rd Qu.:82.0
Max. :5.100 Max. :96.0
> range(faithful$eruptions)
[1] 1.6- 5.1
> range((faithful$waiting))
[1] 43- 96

Analysis: Summary functions gives the whole picture of the dataset in the most
efficient way. We can say data our dataset lies in the range of 1.6 to 5.1 for
eruptions and 43 to 96 for waiting variable. In the given dataset mean and
median are almost similar which means that distribution is symmetric.
mode_eruptions=names(sort(-table(faithful$eruptions)))
> mode_eruptions

[1] 1.867
It means that 1.867 comes maximum times among the given dataset
mode_waiting=names(sort(-table(faithful$waiting)))
mode_waiting
[1] 78
This number tells us that waiting time in most cases is 78 in the given faithful
dataset

> cor(faithful$eruptions,faithful$waiting)
[1] 0.9008112
Analysis Cor is the co-relation between two variables. Co-relation generally tells
about how much one variable depend on second variable. Co-relation about 0.5
are supposed to be best. In the given dataset correlation 0.9 shows that co-
relation is highly positive and linear.
> sd(faithful$eruptions)
[1] 1.141371
> sd(faithful$waiting)
[1] 13.59497
Analysis: Sd is the standard deviation. It tells us how much is the deviation from
the mean. Since standard deviation is small in both eruptions and waiting which
means that in both variables data are very close to their respective mean.
> cov(faithful$eruptions,faithful$waiting)
[1] 13.97781
Analysis: Cov is covariance. It is a measure of how much two random variables
vary together. Here covariance is small and positive, it means that the smaller
values of one variable mainly correspond with the smaller values of the other
variables, since the sign is positive it shows that variables show similar behavior.
The sign of covariance therefore shows the tendency in the linear relationship
between the variables.
> library(ggplot2) (Call library function ggplot)
ggplot(faithful,aes(eruptions))+
geom_histogram(binwidth=1)+xlab("eruptions")+
ylab("waiting")
Analysis: The histogram clearly shows that there is a linear relationship between
eruptions and waiting time.

> plot(faithful$eruptions,faithful$waiting) (Plotting correlation between eruptions


and waiting)

Analysis: It is noted that there are two clusters that must correspond to the two
different types of eruption hypothesised earlier. There is, however, a very clear
pattern that affects both sections of the data. As imagined straight line could be
added fairly easily to the diagram, there is a positive correlation between the x
and y co-ordinates. In other words, as the length of time between eruptions
increases, so does the duration of the eruption itself. Physical arguments would
suggest this sensible as the longer time will cause a bigger build-up of pressure.
> lm1 <- lm(eruptions ~ waiting, data=faithful)
> summary(lm1)

Call:
lm(formula = eruptions ~ waiting, data = faithful)

Residuals:
Min 1Q Median 3Q Max
-1.29917 -0.37689 0.03508 0.34909 1.19329
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.874016 0.160143 -11.70 <2e-16 ***
waiting 0.075628 0.002219 34.09 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4965 on 270 degrees of freedom


Multiple R-squared: 0.8115, Adjusted R-squared: 0.8108
F-statistic: 1162 on 1 and 270 DF, p-value: < 2.2e-16

Analysis: The standard value to accept the hypothesis is that P(r)≤ 0.05 and in
our regression square we got P(r) for eruptions and waiting time both 2*10^-16
which is very less than our standard value. According to our values our
hypothesis for the variables is true and we will accept them.

> plot(faithful$waiting, faithful$eruptions, pch=19, col="blue", xlab= "Waiting",


ylab= "eruptions")
> lines(faithful$waiting, lm1$fitted, lwd=3)
Summary
Our analysis shows a strong positive correlation for waiting time and eruption
time (0.9008112) As the eruption time of one geyser increases, the waiting time
between eruptions also increases. By looking at the scatter plot, we can see a
visual representation of the data. If you are planning on visiting Old Faithful, on
average, you are going to wait around 70.9 minutes to see it erupt for about 3.49
minutes.
The information on the scatter plot would lead you to believe that seeing a
eruption around 3.0 minutes would be rare. This is because there are two
clusters of data, the first cluster is eruptions over 3.5 minutes and the second
cluster is below 2.5 minutes. Very few eruptions are between 2.5 and 3.5
minutes so you are
most likely to see an eruption over 3.5 minutes or under 2.5 minutes. The time
you wait will coincide with how long Old Faithful erupts. There is a, 0.90081,
correlation between wait time and eruption time, this is very close to one. This
means that the eruption’s time depends on the time that you wait. Thus, if the
geyser eruption is short in duration, the wait time will be shorter than if the
geyser erupts for a longer time, on average. Your wait time will vary from 43
minutes to 96 minutes

You might also like