You are on page 1of 7

Mini Project

Collect daily temperature data for a specific location. Calculate the mean,
median, and mode of the temperatures. Determine the probability of
having a temperature above or below a certain threshold.

Subject: Statistical Methods Using R


20SMT-460

Section: 20BCS34-A

Submitted by: Submitted to:


Harshpal Singh (20BCS1639) Er. Gehna Sachdeva (E16466)
Rahul Garg (20BCS1864)
Manpreet Singh (21BCS10740)

Chandigarh University
JUNE 2024
ABSTRACT

This project delves into a comprehensive analysis of daily temperature data for a
particular region. By collecting data over a designated timeframe, the project aims to
extract meaningful insights into the temperature patterns of the region.

The project will encompass the following steps:

Data Acquisition: Daily temperature data will be meticulously gathered from reliable
sources such as weather websites, Application Programming Interfaces (APIs), or local
weather stations.
Descriptive Statistical Analysis: Essential statistical measures like mean, median, and
mode will be calculated to characterize the central tendency and dispersion of the
temperature data. This will provide a clear picture of the 'typical' temperature and its
variability within the dataset.
Temperature Distribution Analysis: To understand the likelihood of experiencing
specific temperature ranges, the project will determine the probability of
temperatures exceeding or falling below a user-defined threshold.

By employing these analytical techniques, the project will provide valuable insights
into the typical temperature range of a specific region, the extent of temperature
fluctuations, and the probability of encountering temperatures above or below a
specific value. This information can be beneficial for various purposes, such as
understanding the local climate, planning outdoor activities, or informing agricultural
decisions.
Aim : Collect daily temperature data for a specific location. Calculate the mean,
median, and mode of the temperatures. Determine the probability of having a
temperature above or below a certain threshold.

Description:
The project aims to collecting daily temperature data for a specific location and
computing statistical measures such as mean, median, and mode. This process
provides insights into the central tendencies of the temperature variations and Create
a line chart to visualize temperature trends

Algorithms:

[1]. Data Import and Overview:


The project starts by importing a CSV dataset containing daily temperature
records using the read.csv function in R. The dataset includes columns such as
'Date' and 'Temperature'. The summary function is applied to gain an initial
understanding of the dataset, presenting key statistics like mean, median, and
quartiles.

[2]. Data Processing:


The dataset is divided into subsets based on specific criteria, such as date or
location. This division aids in conducting analyses focused on specific aspects,
enhancing the ability to draw insights.

[3]. Statistical Measures:


Statistical measures are computed to understand the characteristics of the
temperature data. This includes calculating the mean, median, and mode of
the temperatures.
Mean: The average temperature value across all recorded days.
Median: The middle temperature value, separating the higher and lower
halves of the dataset.
Mode: The temperature value(s) that appear most frequently in the dataset.

[4]. Probability Calculation:


The probability of experiencing temperatures above or below a specified
threshold is determined. For instance, if the threshold is set at a certain
temperature, the probability of daily temperatures surpassing or falling below
this threshold is calculated.

[5]. Graphical Representation:


The ggplot2 library is employed to create visualizations, such as line charts,
showcasing temperature trends over time. These visual representations
facilitate the interpretation of temperature patterns, making it easier to
identify trends, fluctuations, and potential outliers.
Statistical Measurements:
1. Mean Temperature (μ):

where Xi is the temperature for day i, and n is the number of days.

2. Median:

If the number of observations (n) is odd:


Median = Middle Value

If the number of observations is even:

3. Mode:

The mode is the value that appears most frequently in the dataset.

4. Probability of Temperature above a Threshold (e.g.,T):

5. Probability of Temperature below a Threshold (e.g.,T):

These formulas help analyze the central tendencies of temperature data


and assess
the likelihood of temperatures exceeding or falling below a specified
threshold.
CODE:

# Install package
install.packages("tidyverse")

# Load required libraries


library(tidyverse)

# Read the CSV file


file_path <- "C:/Users/garg0/OneDrive/Documents/city_temperature.csv"
temperature_data <- read_csv(file_path)

# Convert 'Date' columns to Date format


temperature_data$Date <- as.Date(paste(temperature_data$Month,
temperature_data$Day, temperature_data$Year), format = "%m %d %Y")

# Statistical Analysis
mean_temperature <- mean(temperature_data$AvgTemperature)
median_temperature <- median(temperature_data$AvgTemperature)
mode_temperature <-
as.numeric(names(sort(table(temperature_data$AvgTemperature), decreasing =
TRUE)[1]))

# Probability Calculation
threshold <- 25
probability_above_threshold <- sum(temperature_data$AvgTemperature >
threshold) / nrow(temperature_data)
probability_below_threshold <- 1 - probability_above_threshold

# Visualization - Line Chart


ggplot(temperature_data, aes(x = Date, y = AvgTemperature)) +
geom_line(color = 'blue') +
geom_hline(yintercept = threshold, linetype = "dashed", color = 'red') +
labs(title = 'Daily Average Temperature Trends', x = 'Date', y =
'AvgTemperature') +
theme_minimal()

# Summary
cat(paste("Mean Average Temperature: ", mean_temperature, "\n"))
cat(paste("Median Average Temperature: ", median_temperature, "\n"))
cat(paste("Mode Average Temperature: ", mode_temperature, "\n"))
cat(paste("Probability (Above ", threshold, "): ", probability_above_threshold,
"\n"))
cat(paste("Probability (Below ", threshold, "): ", probability_below_threshold,
"\n"))
OUTPUT:

DATASET:
GRAPHICAL REPRESENTATION:

References:

[1]. Frost, J. (n.d.). Measures of Variability: Range, Interquartile Range, Variance, and Standard
Deviation. Statistics By Jim.
[2]. Bhandari, P. (2020). Variability | Calculating Range, IQR, Variance, Standard Deviation. Scribbr.
[3]. Statology. (2021). How to Find Quartiles Using Mean & Standard Deviation.
[4]. R Tutorial - W3Schools

You might also like