You are on page 1of 22

Home Blog R Programming

40 R Programming Interview Questions &


Answers For All Levels
Learn the 40 fundamental R programming interview questions and answers to
them for all levels of seniority: entry-level, intermediate, and advanced
questions.
Nov 2023 · 20 min read

Elena Kosourova

TO P I C S

R Programming

Career Services

Being well-prepared for an R programming interview is a crucial factor for succeeding in it.
This success has effectively two sides: for a job hunter, it means to be employed by the
company, while for the company itself, it means to find a perfect fit for their job position.

To increase your chances to succeed in an R interview, it's always beneficial to know in


advance what potential questions you can be asked if you're a job hunter—or what
questions you can ask a candidate if you're a hiring manager or a recruiter.

This article discusses 40 fundamental R programming interview questions and answers to


them for all levels of seniority, as well as some general interview questions. For convenience,
all technical questions are divided into three levels: entry-level, intermediate, and advanced
questions.

As additional resources for your R programming interview preparation, consider the following
helpful resources:

Practicing Statistics Interview Questions in R

Data Science Interview Preparation

21 Top Data Scientist Interview Questions

General R Programming Interview Questions


At the beginning of an R interview, an interviewer may ask a candidate some general, non-
technical questions about their overall experience with R. For example:

How long have you been working in R?

What kind of tasks do you perform in R?

How do you estimate your level of proficiency in R?

If you're a job hunter, you should think in advance about these and similar questions and
prepare your answers. Don't worry if you haven't had any real working experience in R yet:
describing your internship in R programming or your individual or group R projects that you
completed during your studies works just fine.

Besides, if you're interviewing for an entry-level position, your interviewer doesn't necessarily
expect from you an extensive (or even any) work experience in R. Remember that since you
were invited to this interview, the company found your resume attractive anyway.
Entry-Level R Programming Interview Questions
Let’s start with some of the basic technical R interview questions that you might face from
your potential employer. These require you to have mastered the basics and have some
practical experience of using R.

1. What is R, and what are its main characteristics?


R is a programming language and environment widely used for solving data science
problems and particularly designed for statistical computing and data visualization. Its main
characteristics include:

Open source

Interpreted (i.e., it supports both functional and object-oriented programming)

Highly extensible due to its large collection of data science packages

Functional and flexible (users can define their own functions, as well as tune various
parameters of existing functions)

Compatible with many operating systems

Can be easily integrated with other programming languages and frameworks

Allows powerful statistical computing

Offers a variety of data visualization tools for creating publication-quality charts

Equipped with the command-line interface

Supported by a strong online community

2. What are some disadvantages of using R?


Non-intuitive syntax and hence a steep learning curve, especially for beginners in
programming

Relatively slow

Inefficient memory usage

Inconsistent and often hard-to-read documentation of packages

Some packages are of low quality or poorly-maintained

Potential security concerns due to its open-source nature

3. List and define some basic data types in R.


1. Numeric—decimal numbers.

2. Integer—whole numbers.

3. Character—a letter, number, or symbol, or any combination of them, enclosed in regular


or single quotation marks.

4. Factor—categories from a predefined set of possible values, often with an intrinsic


order.

5. Logical—the Boolean values TRUE and FALSE, represented under the hood as 1 and 0,
respectively.

4. List and define some basic data structures in R.


1. Vector—a one-dimensional data structure used for storing values of the same data
type.

2. List—a multi-dimensional data structure used for storing values of any data type and/or
other data structures.

3. Matrix—a two-dimensional data structure used for storing values of the same data
type.

4. Data frame—a two-dimensional data structure used for storing values of any data type,
but each column must store values of the same data type.
5. How to import data in R?
The base R provides essential functions for importing data:

read.table() —the most general function of the base R for importing data, takes in
tabular data with any kind of field separators, including specific ones, such as |.

read.csv() —comma-separated values (CSV) files with . as the decimal separator.

read.csv2() —semicolon-separated values files with , as the decimal separator.

read.delim() —tab-separated values (TSV) files with . as the decimal separator.

read.delim2() —tab-separated values (TSV) files with , as the decimal separator.

In practice, any of these functions can be used to import tabular data with any kind of field
and decimal separators: using them for the specified formats of files is only the question of
convention and default settings. For example, here is the syntax of the first function:
read.table(file, header = FALSE, sep = "", dec = ".") . The other functions have the same
parameters with different default settings that can always be explicitly overwritten.

The tidyverse packages readr and readxl provide some other functions for importing specific
file formats. Each of those functions can be further fine-tuned by setting various optional
parameters.

readr

read_tsv() —tab-separated values (TSV) files.

read_fwf() —fixed-width files.

read_log() —web log files.

read_table() , read_csv() , read_csv2() , and read_delim() —equivalent to the


base R functions.

readxl

read_excel() —Excel files.

read_csv() —equivalent to the function from the base R functions.

To dive deeper into data loading in R, you can go through the tutorial on How to Import
Data Into R.

6. What is a package in R, and how do you install and load packages?


An R package is a collection of functions, code, data, and documentation, representing an
extension of the R programming language and designed for solving specific kinds of tasks. R
comes with a bunch of preinstalled packages, and other packages can be installed by users
from repositories. The most popular centralized repository storing thousands of various R
packages is called Comprehensive R Archive Network (CRAN).

To install an R package directly from CRAN, we need to pass the package name enclosed in
quotation marks to the install.packages() function, as follows:
install.packages("package_name") . To install more than one package from CRAN in one go,
we need to use a character vector containing the package names enclosed in quotation
marks, as follows: install.packages(c("package_name_1", "package_name_2") . To install an
R package manually, we need first to download the package as a zip file on our computer
and then run the install.packages() function :

install.packages("path_to_the_locally_stored_zipped_package_file", repos=NULL,

B LO G S P O W E R E D B Y D ATA C A M P W O R K S PA C E CategoryEN

To load an installed R package in the working R environment, we can use either library() or
require() functions. Each of them takes in the package name without quotation marks and
loads the package, e.g., library(caret) . However, the behavior of these functions is different
when they can't find the necessary package: library() throws an error and stops the
program execution, while require() outputs a warning and continues the program execution.
7. How to create a data frame in R?
1. From one or more vectors of the same length—by using the data.frame() function:

df <- data.frame(vector_1, vector_2)

P O W E R E D B Y D ATA C A M P W O R K S PA C E

2. From a matrix—by using the data.frame() function:

df <- data.frame(my_matrix)

P O W E R E D B Y D ATA C A M P W O R K S PA C E

3. From a list of vectors of the same length—by using the data.frame() function:

df <- data.frame(list_of_vectors)

P O W E R E D B Y D ATA C A M P W O R K S PA C E

4. From other data frames:

To combine the data frames horizontally (only if the data frames have the same
number of rows, and the records are the same and in the same order) —by using the
cbind() function:

df <- cbind(df1, df2)

P O W E R E D B Y D ATA C A M P W O R K S PA C E

To combine the data frames vertically (only if they have an equal number of identically
named columns of the same data type and appearing in the same order) —by using the
rbind() function:

df <- rbind(df1, df2)

P O W E R E D B Y D ATA C A M P W O R K S PA C E

8. How do you add a new column to a data frame in R?


1. Using the $ symbol:

df <- data.frame(col_1=10:13, col_2=c("a", "b", "c", "d"))


print(df)

df$col_3 <- c(5, 1, 18, 16)
print(df)

P O W E R E D B Y D ATA C A M P W O R K S PA C E

Output:

col_1 col_2
1 10 a
2 11 b
3 12 c
4 13 d
col_1 col_2 col_3
1 10 a 5
2 11 b 1
3 12 c 18
4 13 d 16

P O W E R E D B Y D ATA C A M P W O R K S PA C E

2. Using square brackets:

df <- data.frame(col_1=10:13, col_2=c("a", "b", "c", "d"))


print(df)

df["col_3"] <- c(5, 1, 18, 16)
print(df)

P O W E R E D B Y D ATA C A M P W O R K S PA C E

Output:

col_1 col_2
1 10 a
2 11 b
3 12 c
4 13 d
col_1 col_2 col_3
1 10 a 5
2 11 b 1
3 12 c 18
4 13 d 16

P O W E R E D B Y D ATA C A M P W O R K S PA C E

3. Using the cbind() function:

df <- data.frame(col_1=10:13, col_2=c("a", "b", "c", "d"))


print(df)

df <- cbind(df, col_3=c(5, 1, 18, 16))
print(df)

P O W E R E D B Y D ATA C A M P W O R K S PA C E

Output:

col_1 col_2
1 10 a
2 11 b
3 12 c
4 13 d
col_1 col_2 col_3
1 10 a 5
2 11 b 1
3 12 c 18
4 13 d 16

P O W E R E D B Y D ATA C A M P W O R K S PA C E

In each of the three cases, we can assign a single value or a vector or calculate the new
column based on the existing columns of that data frame or other data frames.

9. How to remove columns from a data frame in R?


1. By using the select() function of the dplyr package of the tidyverse collection. The name
of each column to delete is passed in with a minus sign before it:

df <- select(df, -col_1, -col_3)

P O W E R E D B Y D ATA C A M P W O R K S PA C E

If, instead, we have too many columns to delete, it makes more sense to keep the rest of the
columns rather than delete the columns in interest. In this case, the syntax is similar, but the
names of the columns to keep aren't preceded with a minus sign:

df <- select(df, col_2, col_4)

P O W E R E D B Y D ATA C A M P W O R K S PA C E

2. By using the built-in subset() function of the base R. If we need to delete only one column,
we assign to the select parameter of the function the column name preceded with a minus
sign. To delete more than one column, we assign to this parameter a vector containing the
necessary column names preceded with a minus sign:

df <- subset(df, select=-col_1)


df <- subset(df, select=-c(col_1, col_3))

P O W E R E D B Y D ATA C A M P W O R K S PA C E

If, instead, we have too many columns to delete, it makes more sense to keep the rest of the
columns rather than delete the columns in interest. In this case, the syntax is similar, but no
minus sign is added:

df <- subset(df, select=col_2)


df <- subset(df, select=c(col_2, col_4))

P O W E R E D B Y D ATA C A M P W O R K S PA C E

10. What is a factor in R?


A factor in R is a specific data type that accepts categories (aka levels) from a predefined
set of possible values. These categories look like characters, but under the hood, they are
stored as integers. Often, such categories have an intrinsic order. For example, a column in a
data frame that contains the options of the Likert scale for assessing views ("strongly
agree," "agree," "somewhat agree," "neither agree nor disagree," "somewhat disagree,"
"disagree," "strongly disagree") should be of factor type to capture this intrinsic order and
adequately reflect it on the categorical types of plots.

11. What is RStudio?


RStudio is an open-source IDE (integrated development environment) that is widely used as
a graphical front-end for working with the R programming language starting from version
3.0.1. It has many helpful features that make it very popular among R users:

User-friendly

Flexible

Multifunctional

Allows creating reusable scripts

Tracks operational history

Autocompletes the code

Offers detailed and comprehensive help on any object


Provides easy access to all imported data and built objects

Makes it easy to switch between terminal and console

Allows plot previewing

Supports efficient project creation and sharing

Can be used with other programming languages (Python, SQL, etc.)

To learn more about what RStudio is and how to install it and begin using it, you can follow
the RStudio Tutorial.

12. What is R Markdown?


R Markdown is a free and open-source R package that provides an authoring framework for
building data science projects. Using it, we can write a single .rmd file that combines
narrative, code, and data plots, and then render this file in a selected output format. The
main characteristics of R Markdown are:

The resultant documents are shareable, fully reproducible, and of publication quality.

A wide range of static and dynamic outputs and formats, such as HTML, PDF, Microsoft
Word, interactive documents, dashboards, reports, articles, books, presentations,
applications, websites, reusable templates, etc.

Easy version control tracking.

Multiple programming languages are supported, including R, Python, and SQL.

13. How to create a user-defined function in R?


To create a user-defined function in R, we use the keyword function and the following
syntax:

function_name <- function(parameters){


function body
}

P O W E R E D B Y D ATA C A M P W O R K S PA C E

1. Function name—the name of the function object that will be used for calling the
function after its definition.

2. Function parameters—the variables separated with a comma and placed inside the
parentheses that will be set to actual argument values each time we call the function.

3. Function body—a chunk of code in the curly brackets containing the operations to be
performed in a predefined order on the input arguments each time we call the function.
Usually, the function body contains the return() statement (or statements) that
returns the function output, or the print() statement (or statements) to print the
output.

An example of a simple user-defined function in R:

my_function <- function(x, y){


return(x + y)
}

P O W E R E D B Y D ATA C A M P W O R K S PA C E

14. List some popular data visualization packages in R.


ggplot2—the most popular R data visualization package allowing the creation of a wide
variety of plots.

Lattice—for displaying multivariate data as a tiled panel (trellis) of several plots.

Plotly—for creating interactive, publication-quality charts.

highcharter—for easy dynamic plotting, offers many flexible features, plugins, and
themes; allows charting different R objects with one function.
Leaflet—for creating interactive maps.

ggvis—for creating interactive and highly customizable plots that can be accessed in
any browser by using Shiny's infrastructure.

patchwork—for combining several plots, usually of various types, on the same graphic.

Intermediate R Programming Interview Questions


For more experienced practitioners, it’s likely that the interviewer will ask some questions
that require more detailed knowledge of R. Here are some to prepare for:

15. How to assign a value to a variable in R?


1. Using the assignment operator <- , e.g., my_var <- 1— the most common way of
assigning a value to a variable in R.

2. Using the equal operator = , e.g., my_var = 1 —for assigning values to arguments inside
a function definition.

3. Using the rightward assignment operator -> , e.g., my_var -> 1 —can be used in pipes.

4. Using the global assignment operators, either leftward ( <<- ) or rightward ( ->> ), e.g.,
my_var <<- 1 —for creating a global variable inside a function definition.

16. What are the requirements for naming variables in R?


A variable name can be a combination of letters, digits, dots, and underscores. It can't
contain any other symbols, including white spaces.

A variable name must start with a letter or a dot.

If a variable name starts with a dot, this dot can't be followed by a digit.

Reserved words in R ( TRUE , for , NULL , etc.) can't be used as variable names.

Variable names are case-sensitive.

In the course Writing Efficient R Code, you'll find further best practices for writing code in R.

17. What types of loops exist in R, and what is the syntax of each type?
1. For loop—iterates over a sequence the number of times equal to its length (unless the
statements break and/or next are used) and performs the same set of operations on each
item of that sequence. This is the most common type of loops. The syntax of a for loop in R
is the following:

for (variable in sequence) {


operations
}

P O W E R E D B Y D ATA C A M P W O R K S PA C E

2. While loop—performs the same set of operations until a predefined logical condition (or
several logical conditions) is met—unless the statements break and/or next are used.
Unlike for loops, we don't know in advance the number of iterations a while loop is going to
execute. Before running a while loop, we need to assign a variable (or several variables) and
then update its value inside the loop body at each iteration. The syntax of a while loop in R
is the following:

variable assignment

while (logical condition) {


operations
variable update
}

P O W E R E D B Y D ATA C A M P W O R K S PA C E

3. Repeat loop—repeatedly performs the same set of operations until a predefined break
condition (or several break conditions) is met. To introduce such a condition, a repeat loop
has to contain an if-statement code block, which, in turn, has to include the break
statement in its body. Like while loops, we don't know in advance the number of iterations a
repeat loop is going to execute. The syntax of a repeat loop in R is the following:

repeat {
operations
if(break condition) {
break
}
}

P O W E R E D B Y D ATA C A M P W O R K S PA C E

You can read more about Loops in R with our separate tutorial.

18. How to aggregate data in R?


To aggregate data in R, we use the aggregate() function. This function has the following
essential parameters, in this order:

x —the data frame to aggregate.

by —a list of the factors to group by.

FUN —an aggregate function to compute the summary statistics for each group (e.g.,
mean , max , min , count , sum ).

19. How to merge data in R?


1. Using the cbind() function—only if the data frames have the same number of rows, and
the records are the same and in the same order:

df <- cbind(df1, df2)

P O W E R E D B Y D ATA C A M P W O R K S PA C E

2. Using the rbind() function to combine the data frames vertically—only if they have an
equal number of identically named columns of the same data type and appearing in the
same order:

df <- rbind(df1, df2)

P O W E R E D B Y D ATA C A M P W O R K S PA C E

3. Using the merge() function to merge data frames by a column in common, usually an ID
column:

Inner join:

df <- merge(df1, df2, by="ID")

P O W E R E D B Y D ATA C A M P W O R K S PA C E

Left join:

df <- merge(df1, df2, by="ID", all.x=TRUE)

P O W E R E D B Y D ATA C A M P W O R K S PA C E

Right join:
df <- merge(df1, df2, by="ID", all.y=TRUE)

P O W E R E D B Y D ATA C A M P W O R K S PA C E

Outer join:

df <- merge(df1, df2, by="ID", all=TRUE)

P O W E R E D B Y D ATA C A M P W O R K S PA C E

4. Using the join() function of the dplyr package to merge data frames by a column in
common, usually an ID column:

df <- join(df1, df2, by="ID", type="type_of_join")

P O W E R E D B Y D ATA C A M P W O R K S PA C E

The type parameter takes in one of the following values: "inner", "left", "right", or "full".

20. How to concatenate strings in R?


We can concatenate two or more strings in R by using the paste() or cat() functions. The
first approach is more popular. Both functions take in any number of strings to be
concatenated and can also take in an optional parameter sep (along with some other
optional parameters)—a character or a sequence of characters that will separate attached
strings in the resulting string (a white space by default).

21. How to transpose two-dimensional data in R?


We can transpose a data frame or a matrix in R so that the columns become the rows and
vice versa. For this purpose, we need to use the t() function of the base R. For example:

df <- data.frame(col_1=c(10, 20, 30), col_2=c(11, 22, 33))


print(df)

transposed_df <- t(df)
print(transposed_df)

P O W E R E D B Y D ATA C A M P W O R K S PA C E

Output:

col_1 col_2
1 10 11
2 20 22
3 30 33
[,1] [,2] [,3]
col_1 10 20 30
col_2 11 22 33

P O W E R E D B Y D ATA C A M P W O R K S PA C E

22. How to chain several operations together in R?


We can chain several operations in R by using the pipe operator ( %>% ) provided by the
tidyverse collection. Using this operator allows creating a pipeline of functions where the
output of the first function is passed as the input into the second function and so on, until
the pipeline ends. This eliminates the need for creating additional variables and significantly
enhances the overall code readability.

An example of using the pipe operator on a data frame:


df <- data.frame(a=1:4, b=11:14, c=21:24)
print(df)

df_new <- df %>% select(a, b) %>% filter(a > 2)
print(df_new)

P O W E R E D B Y D ATA C A M P W O R K S PA C E

Output:

a b c
1 1 11 21
2 2 12 22
3 3 13 23
4 4 14 24
a b
1 3 12
2 4 13

P O W E R E D B Y D ATA C A M P W O R K S PA C E

23. What types of data plots can be created in R?


Being data visualization one of the strong sides of the R programming languages, we can
create all types of data plots in R:

Common types of data plots:

Bar plot—shows the numerical values of categorical data.

Line plot—shows a progression of a variable, usually over time.

Scatter plot—shows the relationships between two variables.

Area plot—based on a line plot, with the area below the line colored or filled with a
pattern.

Pie chart—shows the proportion of each category of categorical data as a part of


the whole.

Box plot—shows a set of descriptive statistics of the data.

Advanced types of data plots:

Violin plot—shows both a set of descriptive statistics of the data and the
distribution shape for that data.

Heatmap—shows the magnitude of each numeric data point within the dataset.

Treemap—shows the numerical values of categorical data, often as a part of the


whole.

Dendrogram—shows an inner hierarchy and clustering of the data.

Bubble plot—shows the relationships between three variables.

Hexbin plot—shows the relationships of two numerical variables in a relatively large


dataset.

Word cloud—shows the frequency of words in an input text.

Choropleth map—shows aggregate thematic statistics of geodata.

Circular packing chart—shows an inner hierarchy of the data and the values of the
data points

etc.

The skill track Data Visualization with R will help you broaden your horizons in the field of R
graphics. If you prefer to learn data visualization in R in a broader context, explore a
thorough and beginner-friendly career track Data Scientist with R.

24. What is vector recycling in R?


If we try to perform some operation on two R vectors with different lengths, the R interpreter
detects under the hood the shorter one, recycles its items in the same order until the lengths
of the two vectors match, and only then performs the necessary operation on these vectors.
Before starting vector recycling, though, the R interpreter throws a warning message about
the initial mismatch of the vectors' lengths.

For example, if we try to run the following addition:

c(1, 2, 3, 4, 5) + c(1, 2, 3)

P O W E R E D B Y D ATA C A M P W O R K S PA C E

The second vector, due to the vector recycling, will actually be converted into c(1, 2, 3, 1, 2) .
Hence, the final result of this operation will be c(2, 4, 6, 5, 7) .

While sometimes vector recycling can be beneficial (e.g., when we expect the cyclicity of
values in the vectors), more often, it's inappropriate and misleading. Hence, we should be
careful and mind the vectors' lengths before performing operations on them.

25. What is the use of the next and break statements in R?


The next statement is used to skip a particular iteration and jump to the next one if a
certain condition is met. The break statement is used to stop and exit the loop at a
particular iteration if a certain condition is met. When used in one of the inner loops of a
nested loop, this statement exits only that inner loop.

Both next and break statements can be used in any type of loops in R: for loops, while
loops, and repeat loops. They can also be used in the same loop, e.g.:

for(i in 1:10) {
if(i < 5)
next
if(i == 8)
break
print(i)}

P O W E R E D B Y D ATA C A M P W O R K S PA C E

Output:

[1] 5
[1] 6
[1] 7

P O W E R E D B Y D ATA C A M P W O R K S PA C E

26. What is the difference between the str() and summary() functions
in R?
The str() function returns the structure of an R object and the overall information about it,
the exact contents of which depend on the data structure of that object. For example, for a
vector, it returns the data type of its items, the range of item indices, and the item values (or
several first values, if the vector is too long). For a data frame, it returns its class
(data.frame), the number of observations and variables, the column names, the data type of
each column, and several first values of each column.

The summary() function returns the summary statistics for an R object. It's mostly applied
to data frames and matrices, for which it returns the minimum, maximum, mean, and median
values, and the 1st and 3rd quartiles for each numeric column, while for the factor columns, it
returns the count of each level.

27. What is the difference between the subset() and sample()


functions n R?
The subset() function in R is used for extracting rows and columns from a data frame or a
matrix, or elements from a vector, based on certain conditions, e.g.: subset(my_vector,
my_vector > 10) .

Instead, the sample() function in R can be applied only to vectors. It extracts a random
sample of the predefined size from the elements of a vector, with or without replacement.
For example, sample(my_vector, size=5, replace=TRUE)

Advanced R Programming Interview Questions


28. How to create a new column in a data frame in R based on other
columns?
1. Using the transform() and ifelse() functions of the base R:

df <- data.frame(col_1 = c(1, 3, 5, 7), col_2 = c(8, 6, 4, 2))


print(df)

# Adding the column col_3 to the data frame df
df <- transform(df, col_3 = ifelse(col_1 < col_2, col_1 + col_2, col_1 * col_2)
print(df)

P O W E R E D B Y D ATA C A M P W O R K S PA C E

Output:

col_1 col_2
1 1 8
2 3 6
3 5 4
4 7 2
col_1 col_2 col_3
1 1 8 9
2 3 6 9
3 5 4 20
4 7 2 14

P O W E R E D B Y D ATA C A M P W O R K S PA C E

2. Using the with() and ifelse() functions of the base R:

df <- data.frame(col_1 = c(1, 3, 5, 7), col_2 = c(8, 6, 4, 2))


print(df)

# Adding the column col_3 to the data frame df
df["col_3"] <- with(df, ifelse(col_1 < col_2, col_1 + col_2, col_1 * col_2))
print(df)

P O W E R E D B Y D ATA C A M P W O R K S PA C E

Output:

col_1 col_2
1 1 8
2 3 6
3 5 4
4 7 2
col_1 col_2 col_3
1 1 8 9
2 3 6 9
3 5 4 20
4 7 2 14

P O W E R E D B Y D ATA C A M P W O R K S PA C E

3. Using the apply() function of the base R:

df <- data.frame(col_1 = c(1, 3, 5, 7), col_2 = c(8, 6, 4, 2))


print(df)

# Adding the column col_3 to the data frame df
df["col_3"] <- apply(df, 1, FUN = function(x) if(x[1] < x[2]) x[1] + x[2] else
print(df)

P O W E R E D B Y D ATA C A M P W O R K S PA C E

Output:

col_1 col_2
1 1 8
2 3 6
3 5 4
4 7 2
col_1 col_2 col_3
1 1 8 9
2 3 6 9
3 5 4 20
4 7 2 14

P O W E R E D B Y D ATA C A M P W O R K S PA C E

4. Using the mutate() function of the dplyr package and the ifelse() function of the base R:

df <- data.frame(col_1 = c(1, 3, 5, 7), col_2 = c(8, 6, 4, 2))


print(df)

# Adding the column col_3 to the data frame df
df <- mutate(df, col_3 = ifelse(col_1 < col_2, col_1 + col_2, col_1 * col_2))
print(df)

P O W E R E D B Y D ATA C A M P W O R K S PA C E

Output:

col_1 col_2
1 1 8
2 3 6
3 5 4
4 7 2
col_1 col_2 col_3
1 1 8 9
2 3 6 9
3 5 4 20
4 7 2 14

P O W E R E D B Y D ATA C A M P W O R K S PA C E

29. How to parse a date from its string representation in R?


To parse a date from its string representation in R, we should use the lubridate package of
the tidyverse collection. This package offers various functions for parsing a string and
extracting the standard date from it based on the initial date pattern in that string. These
functions are ymd() , ymd_hm() , ymd_hms() , dmy() , dmy_hm() , dmy_hms() , mdy() ,
mdy_hm() , mdy_hms() , etc., where y, m, d, h, m, and s correspond to year, month, day,
hours, minutes, and seconds, respectively.

For example, if we run the dmy() function passing to it any of the strings "05-11-2023",
"05/11/2023" or "05.11.2023", representing the same date, we'll receive the same result: 2023-
11-05 . This is because in all three cases, despite having different dividing symbols, we
actually have the same pattern: the day followed by the month followed by the year.

30. What is the use of the switch() function in R?


The switch() function in R is a multiway branch control statement that evaluates an
expression against items of a list. It has the following syntax:

switch(expression, case_1, case_2, case_3....)


P O W E R E D B Y D ATA C A M P W O R K S PA C E

The expression passed to the switch() function can evaluate to either a number or a
character string, and depending on this, the function behavior is different.

1. If the expression evaluates to a number, the switch() function returns the item from the
list based on positional matching (i.e., its index is equal to the number the expression
evaluates to). If the number is greater than the number of items in the list, the switch()
function returns NULL . For example:

switch(2, "circle", "triangle", "square")

P O W E R E D B Y D ATA C A M P W O R K S PA C E

Output:

"triangle"

P O W E R E D B Y D ATA C A M P W O R K S PA C E

2. If the expression evaluates to a character string, the switch() function returns the value
based on its name:

switch("red", "green"="apple", "orange"="carot", "red"="tomato", "yellow"="lemo

P O W E R E D B Y D ATA C A M P W O R K S PA C E

Output:

"tomato"

P O W E R E D B Y D ATA C A M P W O R K S PA C E

If there are multiple matches, the first matched value is returned. It's also possible to add an
unnamed item as the last argument of the switch() function that will be a default fallback
option in the case of no matches. If this default option isn't set, and if there are no matches,
the function returns NULL .

The switch() function is an efficient alternative to long if-else statements since it makes the
code less repetitive and more readable. Typically, it's used for evaluating a single expression.
We can still write more complex nested switch constructs for evaluating multiple
expressions. However, in this form, the switch() function quickly becomes hard to read and
hence loses its main advantage over if-else constructs.

31. What is the difference between the functions apply() , lapply() ,


sapply() , and tapply() ?
While all these functions allow iterating over a data structure without using loops and
perform the same operation on each element of it, they are different in terms of the type of
input and output and the function they perform.

apply() —takes in a data frame, a matrix, or an array and returns a vector, a list, a
matrix, or an array. This function can be applied row-wise, column-wise, or both.

lapply() —takes in a vector, a list, or a data frame and always returns a list. In the
case of a data frame as an input, this function is applied only column-wise.
sapply() —takes in a vector, a list, or a data frame and returns the most simplified
data structure, i.e., a vector for an input vector, a list for an input list, and a matrix for an
input data frame.

tapply() —calculates summary statistics for different factors (i.e., categorical data).

32. List and define the control statements in R.


There are three groups of control statements in R: conditional statements, loop statements,
and jump statements.

Conditional statements:

if —tests whether a given condition is true and provides operations to perform if it's so.

if-else —tests whether a given condition is true, provides operations to perform if it's
so and another set of operations to perform in the opposite case.

if... else if... else —tests a series of conditions one by one, provides operations
to perform for each condition if it's true, and a fallback set of operations to perform if
none of those conditions is true.

switch —evaluates an expression against the items of a list and returns a value from
the list based on the results of this evaluation.

Loop statements:

for —in for loops, iterates over a sequence.

while —in while loops, checks if a predefined logical condition (or several logical
conditions) is met at the current iteration.

repeat —in repeat loops, continues performing the same set of operations until a
predefined break condition (or several break conditions) is met.

Jump statements:

next —skips a particular iteration of a loop and jumps to the next one if a certain
condition is met.

break —stops and exits the loop at a particular iteration if a certain condition is met.

return —exits a function and returns the result.

33. What are regular expressions, and how do you work with them in R?
A regular expression, or regex, in R or other programming languages, is a character or a
sequence of characters that describes a certain text pattern and is used for mining text
data. In R, there are two main ways of working with regular expressions:

1. Using the base R and its functions (such as grep() , regexpr() , gsub() ,
regmatches() , etc.) to locate, match, extract, and replace regex.

2. Using a specialized stringr package of the tidyverse collection. This is a more convenient
way to work with R regex since the functions of stringr have much more intuitive names
and syntax and offer more extensive functionality.

A Guide to R Regular Expressions provides more detail about how to work with regex in R.

34. What packages are used for machine learning in R?


caret—for various classification and regression algorithms.

e1071—for support vector machines (SVM), naive Bayes classifier, bagged clustering,
fuzzy clustering, and k-nearest neighbors (KNN).

kernlab—provides kernel-based methods for classification, regression, and clustering


algorithms.

randomForest—for random forest classification and regression algorithms.

xgboost—for gradient boosting, linear regression, and decision tree algorithms.

rpart—for recursive partitioning in classification, regression, and survival trees.


glmnet—for lasso and elastic-net regularization methods applied to linear regression,
logistic regression, and multinomial regression algorithms.

nnet—for neural networks and multinomial log-linear algorithms.

tensorflow—the R interface to TensorFlow, for deep neural networks and numerical


computation using data flow graphs.

Keras—the R interface to Keras, for deep neural networks.

35. How to select features for machine learning in R?


Let's consider three different approaches and how to implement them in the caret package.

1. By detecting and removing highly correlated features from the dataset.

We need to create a correlation matrix of all the features and then identify the highly
correlated ones, usually those with a correlation coefficient greater than 0.75:

corr_matrix <- cor(features)


highly_correlated <- findCorrelation(corr_matrix, cutoff=0.75)
print(highly_correlated)

P O W E R E D B Y D ATA C A M P W O R K S PA C E

2. By ranking the data frame features by their importance.

We need to create a training scheme to control the parameters for train, use it to build a
selected model, and then estimate the variable importance for that model:

control <- trainControl(method="repeatedcv", number=10, repeats=5)


model <- train(response_variable~., data=df, method="lvq", preProcess="scale",
importance <- varImp(model)
print(importance)

P O W E R E D B Y D ATA C A M P W O R K S PA C E

3. By automatically selecting the optimal features.

One of the most popular methods provided by caret for automatically selecting the optimal
features is a backward selection algorithm called Recursive Feature Elimination (RFE).

We need to compute the control using a selected resampling method and a predefined list
of functions, apply the RFE algorithm passing to it the features, the target variable, the
number of features to retain, and the control, and then extract the selected predictors:

control <- rfeControl(functions=caretFuncs, method="cv", number=10)


results <- rfe(features, target_variable, sizes=c(1:8), rfeControl=control)
print(predictors(results))

P O W E R E D B Y D ATA C A M P W O R K S PA C E

If you need to strengthen your machine learning skills in R, here is a solid and
comprehensive resource: Machine Learning Scientist with R.

36. What are correlation and covariance, and how do you calculate them
in R?
Correlation is a measure of the strength and direction of the linear relationships between
two variables. It takes values from -1 (a perfect negative correlation) to 1 (a perfect positive
correlation). Covariance is a measure of the degree of how two variables change relative to
each other and the direction of the linear relationships between them. Unlike correlation,
covariance doesn't have any range limit.

In R, to calculate the correlation, we need to use the cor() function, to calculate the
covariance—the cov() function. The syntax of both functions is identical: we need to pass in
two variables (vectors) for which we want to calculate the measure (e.g., cor(vector_1,
vector_2) or cov(vector_1, vector_2) ), or the whole data frame, if we want to calculate the
correlation or covariance between all the variables of that data frame (e.g., cor(df) or
cov(df) ). In the case of two vectors, the result will be a single value, in the case of a data
frame, the result will be a correlation (or covariance) matrix.

37. List and define the various approaches to estimating model accuracy
in R.
Below are several approaches and how to implement them in the caret package of R.

Data splitting—the entire dataset is split into a training dataset and a test dataset. The
first one is used to fit the model, the second one is used to test its performance on
unseen data. This approach works particularly well on big data. To implement data
splitting in R, we need to use the createDataPartition() function and set the p
parameter to the necessary proportion of data that goes to training.

Bootstrap resampling—extracting random samples of data from the dataset and


estimating the model on them. Such resampling iterations are run many times and with
replacement. To implement bootstrap resampling in R, we need to set the method
parameter of the trainControl() function to "boot" when defining the training
control of the model.

Cross-validation methods

k-fold cross-validation —the dataset is split into k-subsets. The model is trained on
k-1 subsets and tested on the remaining one. The same process is repeated for all
subsets, and then the final model accuracy is estimated.

Repeated k-fold cross-validation —the principle is the same as for the k-fold cross-
validation, only that the dataset is split into k-subsets more than one time. For each
repetition, the model accuracy is estimated, and then the final model accuracy is
calculated as the average of the model accuracy values for all repetitions.

Leave-one-out cross-validation (LOOCV) —one data observation is put aside and


the model is trained on all the other data observations. The same process is
repeated for all data observations.

To implement these cross-validation methods in R, we need to set the method parameter of


the trainControl() function to "cv" , "repeatedcv" , or "LOOCV" respectively, when defining
the training control of the model.

38. What is the chi-squared test, and how do you perform it in R?


The chi-squared statistical hypothesis test is a technique used to determine if two
categorical variables are independent or if there is a correlation between them. To perform
the chi-squared test in R, we need to use the chisq.test() function of the stats package. The
steps are as follows:

1. Create a contingency table with the categorical variables in interest using the table()
function of the base R:

table = table(df["var_1"], df["var_2"])

P O W E R E D B Y D ATA C A M P W O R K S PA C E

2. Pass the contingency table to the chisq.test() function:

chisq.test(table)

P O W E R E D B Y D ATA C A M P W O R K S PA C E

You refresh you knowledge of chi-squared tests and other hypothesis tests in our
Hypothesis Testing in R course.

39. What is Shiny in R?


Shiny is an open-source R package that allows the easy and fast building of fully interactive
web applications and webpages for data science using only R, without any knowledge of
HTML, CSS, or JavaScript. Shiny in R offers numerous basic and advanced features, widgets,
layouts, web app examples, and their underlying code to build upon and customize, as well
as user showcases from various fields (technology, sports, banking, education, etc.) gathered
and categorized by the Shiny app developer community.

40. What is the difference between the with() and within()


functions?
The with() function evaluates an R expression on one or more variables of a data frame and
outputs the result without modifying the data frame. The within() function evaluates an R
expression on one or more variables of a data frame, modifies the data frame, and outputs
the result. Below we can see how these functions work using a sample data frame as an
example:

df <- data.frame(a = c(1, 2, 3), b = c(10, 20, 30))


print(df)

with(df, a * b)

print(within(df, c <- a * b))

P O W E R E D B Y D ATA C A M P W O R K S PA C E

Output:

a b
1 1 10
2 2 20
3 3 30

10 40 90
a b c
1 1 10 10
2 2 20 40
3 3 30 90

P O W E R E D B Y D ATA C A M P W O R K S PA C E

When using the within() function, to save the modifications, we need to assign the output of
the function to a variable.

Conclusion
To conclude, in this article, we considered the 40 most common R programming interview
questions and what answers are expected for each of them. Hopefully, with this information
in hand, you feel more confident and ready for a successful R interview, whether you're
looking for a job in R or the right candidate for an open position in your company.

To get some hands-on experience answering questions, check out our Practicing Statistics
Interview Questions in R course.

AUTHOR

Elena Kosourova

TO P I C S

R Programming Career Services

Start Your R Journey Today!

TRACK
TRACK

R Developer
52hrs hr

From data wrangling to developing packages, gain the career-building skills you need to succeed as an R developer. No prior
coding experience needed.

See Details Start Course

See More

Related
Navigating R Certifications in
2024: A Comprehensive Guide

Matt Crabtree

The 12 Best Azure Certifications


For 2024: Empower Your Data…

Matt Crabtree

AWS Cloud Practitioner Salaries


Explained: Skills, Demand, and…

Nisha Arya Ahmed

See More

Grow your data skills with DataCamp for Mobile


Make progress on the go with our mobile courses and daily 5-minute coding challenges.

LEARN

Learn Python

Learn R

Learn AI

Learn SQL

Learn Power BI

Learn Tableau

Learn Data Engineering

Assessments

Career Tracks

Skill Tracks
Courses

Data Science Roadmap

DATA C O U R S E S

Python Courses

R Courses

SQL Courses

Power BI Courses

Tableau Courses

Azure Courses

Spreadsheets Courses

AI Courses

Data Analysis Courses

Data Visualization Courses

Machine Learning Courses

Data Engineering Courses

Probability & Statistics Courses

WO R KS PA C E

Get Started

Templates

Integrations

Documentation

C E R T I F I C AT I O N

Certifications

Data Scientist

Data Analyst

Data Engineer

Hire Data Professionals

RESOURCES

Resource Center

Upcoming Events

Blog

Code-Alongs

Tutorials

Open Source

RDocumentation

Course Editor

Book a Demo with DataCamp for Business

Data Portfolio
Portfolio Leaderboard

PLANS

Pricing

For Business

For Universities

Discounts, Promos & Sales

DataCamp Donates

S U P PO R T

Help Center

Become an Affiliate

ABOUT

About Us

Learner Stories

Careers

Become an Instructor

Press

Leadership

Contact Us

DataCamp Español

Privacy Policy Cookie Notice Do Not Sell My Personal Information Accessibility Security Terms of Use

© 2024 DataCamp, Inc. All Rights Reserved.

You might also like