Professional Documents
Culture Documents
Introduction
creating vectors, manipulating data and basic statistics. The analysis consisted of solving
data. The summary of the analysis and main results is given in this report.
Key Findings
Mathematical Operations
multiplication exponentiation and logical operation. It is worth noting that the evaluation of
expressions such as 123 *456 and TRUE | FALSE was simple, involving basic arithmetic or
Problem 1 Results:
cat("123 * 453 =", result1, "\n")
123 * 453 = 55719
cat("5^2 * 40 =", result2, "\n")
5^2 * 40 = 1000
cat("TRUE & FALSE =", result3, "\n")
TRUE & FALSE = FALSE
cat("TRUE | FALSE =", result4, "\n")
TRUE | FALSE = TRUE
cat("75 %% 10 =", result5, "\n")
75 %% 10 = 5
cat("75 / 10 =", result6, "\n\n")
75 / 10 = 7.5
Vector construction was an integral part of the analysis. Functions such as c(), seq() and rep
were used to build vectors that had certain patterns and sequences. For example, the seq()
function allowed to easily create vectors containing even numbers within specified bounds.
The code snippet below shows how vectors can be manipulated in R, and the comments
# Problem 14
> # Adding 20 to each element of the 'second_vector'.
> second_vector + 20
[1] 30 32 34 36 38 40 42 44 46 48 50
3
>
> # Multiplying each element of the 'second_vector' by 20.
> second_vector * 20
[1] 200 240 280 320 360 400 440 480 520 560 600
>
> # Creating a logical vector indicating whether each element of
'second_vector' is greater than or equal to 20.
> second_vector >= 20
[1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
>
> # Creating a logical vector indicating whether each element of
'second_vector' is not equal to 20.
> second_vector != 20
[1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
In the snippet below, brackets were used to extract elements from the first_vector, and the
# Problem 23
> vector_from_boolean_brackets <- first_vector[c(FALSE, TRUE, FALSE,
TRUE)]
> cat("Problem 23 Result:\n")
Problem 23 Result:
> print(vector_from_boolean_brackets)
[1] 12 5
> # Comment: Elements at positions where the corresponding logical values
are TRUE are extracted from first_vector.
In problem 24, 25 and 26, we examined the pieces of code and wrote a one-sentence
comment explaining what was happening as shown in the code snippets below:
> # Problem 24
> cat(second_vector >= 20)
FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE> #returns a
logical vector indicating whether each element of second_vector is
> #greater than or equal to 20
# Problem 25
> cat(ages_vector <- seq(from = 10, to = 30, by = 2))
10 12 14 16 18 20 22 24 26 28 30> cat("ages_vector is a sequence of
numbers from 10 to 30 with a step of 2.\n\n")
ages_vector is a sequence of numbers from 10 to 30 with a step of 2.
> # Problem 26
> subset_ages_vector <- ages_vector[ages_vector >= 20]
> cat("A subset of ages_vector containing only elements greater than or
equal to
+ 20 is created.\n")
A subset of ages_vector containing only elements greater than or equal to
20 is created.
4
In problem 30, wrote code and explained in a comment what we thought the code was doing
as shown below:
> # Problem 30
> set.seed(5)
> random_vector <- runif(n = 10, min = 0, max = 1000)
> cat("A vector of 10 random numbers between 0 and 1000 is generated with
a fixed
+ seed of 5.\n")
A vector of 10 random numbers between 0 and 1000 is generated with a fixed
seed of 5.
For problem 37, we explained in a comment what we thought the code was set to do and the
# Problem 37
> set.seed(5)
> random_vector <- rnorm(n = 1000, mean = 50, sd = 15)
> cat("A vector of 1000 random numbers from a normal distribution with
mean 50 and
+ standard deviation 15 is generated with a fixed seed of 5.\n\n")
A vector of 1000 random numbers from a normal distribution with mean 50
and standard deviation 15 is generated with a fixed seed of 5.
Summary Statistics
Descriptive statistics involving sum, mean, median max and min were computed for vectors.
The analysis showed mastery of utilizing R functions to gain useful information from the
data.
For problem 38, we used the hist function and provided it with random_vector. The
The hist function is used to create a histogram of the values in random_vector. The
histogram visually represents the distribution of the random numbers generated with mean 50
In the analysis, conditional subsetting and filtering were used prominently. Data manipulation
skills were highlighted through the operations such as extracting specif elements from vectors
Random number generation with set.seed() and runif() functions presented the capability to
produce random values which could be used for further analysis purposes afterward.
The analysis also demonstrated data exploration and visualization using ggplot2 package. The
given data set (ds_salaries.csv) was loaded into R and examined to demonstrate proficiency
in reading the data into R, as well as extracting useful information from it.
In Problem 42, explanations for the given chunk of code were provided, and the resulting
# Problem 42
6
>
> # Display the first 6 rows of the dataframe
> cat("head(first_dataframe):\n")
head(first_dataframe):
The analysis was an extensive practice of essential R programming skills. The competence in
vector construction, modification and data visualization shows a progressive level of skills to
analyze data with R. Moreover, the calculation capabilities for summary statistics along with
datasets that better represent real-life situations with more complex structures can make a
.
8
References
Chambers, J. M. (2008). Software for data analysis: programming with R (Vol. 2, No. 1). New York:
Springer.
Crawley, M. J. (2012). The R book. John Wiley & Sons.
Ihaka, R., & Gentleman, R. (1996). R: a language for data analysis and graphics. Journal of
computational and graphical statistics, 5(3), 299-314.
Kabacoff, R. (2022). R in action: data analysis and graphics with R and Tidyverse. Simon and
Schuster.
Maindonald, J., & Braun, J. (2006). Data analysis and graphics using R: an example-based
approach (Vol. 10). Cambridge University Press.
Peikert, A., & Brandmaier, A. M. (2021). A reproducible data analysis workflow with R Markdown,
Git, Make, and Docker. Quantitative and Computational Methods in Behavioral Sciences, 1-
27.
Ripley, B. D. (2001). The R project in statistical computing. MSOR Connections. The newsletter of
the LTSN Maths, Stats & OR Network, 1(1), 23-25.
Wickham, H., & Wickham, H. (2016). Data analysis (pp. 189-201). Springer International
Publishing.
Zagalsky, A., German, D. M., Storey, M. A., Teshima, C. G., & Poo-Caamaño, G. (2018). How the
R community creates and curates knowledge: an extended study of stack overflow and
mailing lists. Empirical Software Engineering, 23, 953-986.