You are on page 1of 11

Challenges:

Day - 1 (Session - 1)

Challenge - I

Create vectors named height and weight using the following data:
height : 160.3, 134.2, 159, 149, 145, and 147.1
weight : 83.8, 37.2, 71.7, 72.8, 50.5, and 42.9.

i) Based on the above vectors, answer the following questions:

a) The average height is __________

b) The variance of height is __________

c) The SD of height is __________

d) The average weight is __________

e) The variance of weight is __________

f) The SD of weight is __________

ii) Extract the 4th element in weight and height vector

a) 4th element in weight __________

b) 4th element in height __________

iii) Based on the above vector, calculate BMI

a) Calculate BMI __________

b) Extract the 4th element in BMI vector __________

1
Challenge - II

Create a matrix using the following table, and answer the following questions using matrix
operations

a) The total number of smokers __________.

(Hint rowSums(___))

b) The total number of non smokers __________.

(Hint rowSums(___))

A
c) Incidence of CHD among smokers ( A+B )__________.

C
d) Incidence of CHD among non smokers ( C+D )__________.

A/(A+B)
e) Risk ratio of CHD ( C/(C+D) ) __________.

2
Challenge - III

Represent the following table using array, and answer the following using array operations

a) In rural, incidence of CHD among smokers __________.

b) In rural, incidence of CHD among non smokers __________.

c) In rural, what is the risk ratio of CHD __________.

d) In urban, incidence of CHD among smokers __________.

e) In urban, incidence of CHD among non smokers __________.

f) In urban, risk ratio of CHD __________.

3
Challenge - IV

4) Create a list that contains results of overall risk ratio (Challenge II), rural risk ratio
(Challenge IIIc) and urban risk ratio (challenge IIIf)

Challenge - V

5) Write a R program to create a data frame for the following data:

unique_id’s vector C10001, C10002, and C10003;


treatment vector A, B and C;
age vector 29,30 and 28.

Then extract 3rd entire row.

Challenge - VI

Find the error in the following R codes

a) temp <- c(99.4, 102.3; 100.3)

b) Consider mat is a 2X2 matrix. Now, to extract 2nd row 1st column, will this command
mat(2;1) works?

c) hba1c% <- c(16.4, 11.0, 10.3, 12.4)

d) vector <- c(13, 7A, 11, 30)

e) R command to view the last 6 rows of dataframe df is str(df).

4
Day - 1 (Session - 2)

In this hypothetical study, data from 25 individuals have been collected to explore the
relationship between demographic factors, systolic blood pressure, hypertension, and the
effectiveness of two types of drugs, A and B.

Lets work through these questions to undergo the data cleaning process.

1) Import the exercise data from the directory (File name is Exercise_data-Day1.csv)

i) How many variables are there in the datasheet? __________ (Hint ____ %>%
dim())

ii) The datasheet has how many observations? __________

2) Give the variables new names as the following (Hint ___ %>% rename())

i) “Height.in.cms” as height

ii) “Weight.in.kgs” as weight

iii) “Type.of.drug.given” as drugType

3) Give the variables labels as the following (Hint ___ %>% set_variable_labels())

i) height as Height (in Cms)

ii) weight as Weight (in Kgs)

iii) drugType as Type of Drug

5
4) Recode the values of the following variables (Hint ___ %>% recode())

i) Hypertension, Yes=1 and No=0

ii) Gender, Male=1 and Female=2

5) Assign value labels to the following (Hint ___ %>% set_value_labels())

i) In Hypertension 0 as “NO” and 1 as “YES”

ii) In Gender 1 as “Male” and 2 as “Female”

6) How many people participated in the study from urban? __________ (Hint ___
%>% filter( ))

7) How many individuals took drug A? __________ (Hint ___ %>% filter( ))

8) How many individuals took drug B? __________ (Hint ___ %>% filter( )

6
9) Find the duplicates. How many pairs that are the same did you find?__________

(Hint ___ %>% filter(duplicated(-----))

10) Find the missing data for the variable Systolic Blood pressure (mmHg). (Hint
filter(is.na(-----)))

How many missing values were discovered?__________

11) Identify the outliers in Systolic Blood Pressure (mmHg). (Hint use the range
80-160)

How many outliers were found? __________

12) Prepare summary table by drug type for diastolic blood pressure with count, mean
and median, and SD (Hint ___ %>% group_by(___) %>% summarise(___))

i) What is the mean diastolic blood pressure for A __________

ii) What is the median diastolic blood pressure for B __________

7
Day - 2 (Session - 1)

Let us create some data visualizations to understand how drug is effective in treatment
of blood pressure, and see if there are any baseline differences, and differences in outcomes
- hypertension, systolic and diastolic BP.

Import exercise data (Exercise_data-Day2.csv) from the directory.

1. Use the ggplot2 package to plot the bar graph for hypertension response (Univariate
bar graph). Which response has the most frequency? __________

2. Could you add drug type in the bar chart for hypertension? (Bivariate grouped bar
chart). How many people who indicated they had hypertension also took drug A?
__________

(Hint ggplot(aes(x=____,y=____), fill=____))

3. Could you now add the dwelling type to the previous bar graph. In bar graph, to
include the location use facet_wrap() function. What type of distribution does the
graph looks like in large city? __________

(Hint facet_wrap(~____))

4. Draw a density chart for systolic blood pressure (Univariate chart). What type of
distribution does the graph looks like? __________

a) Right skewed-distribution

b) Left skewed-distribution

c) Normal distribution

d) Uniform distribution

8
5. Create a box plot to represent systolic blood pressure by drug type (Bivariate box
plot). What is the median blood pressure for both drug type? __________

6. Using facet_wrap(), add the type of dwelling to the previous graph. Which sort of
dwelling has the highest blood pressure when using drug B? __________

7. Use a scatter chart to plot the graph for systolic and diastolic pressure (Bivariate
graph).

What is the relationship between systolic and diastolic blood pressure? __________

a) No association

b) Positive association

c) Negative association

9
Day - 2 (Session - 2)

Create summary tables for the following conditions. Then, fill in the blanks.

Import exercise data (Exercise_data-Day2.csv) from the directory.

1) Prepare summary statistics for the variables, sex, dwelling, drugType.

Variable n(%)
Gender

- Male __________

- Female __________

Location

- Small city __________

- Large city __________

- Town __________

Drug type

- Type A __________

- Type B __________

10
2) Prepare summary statistics for the following variables by type of drug, sex, dwelling,
hyper. Include statistical tests.

Variable Type A Type B p-value


Gender __________

- Male __________ __________

- Female __________ __________

Location __________

- Small city __________ __________

- Large city __________ __________

- Town __________ __________

Hypertension __________

- Yes __________ __________

- No __________ __________

3) Prepare the summary statistics for the numerical vectors systolic and diastolic blood
pressure by drug type. Include statistical tests.

Variable Type A Type B p-value

Systolic BP __________ __________ __________

Diastolic BP __________ __________ __________

11

You might also like