Professional Documents
Culture Documents
Day - 1 (Session - 1)
Challenge - I
Create vectors named height and weight using the following data:
height : 160.3, 134.2, 159, 149, 145, and 147.1
weight : 83.8, 37.2, 71.7, 72.8, 50.5, and 42.9.
1
Challenge - II
Create a matrix using the following table, and answer the following questions using matrix
operations
(Hint rowSums(___))
(Hint rowSums(___))
A
c) Incidence of CHD among smokers ( A+B )__________.
C
d) Incidence of CHD among non smokers ( C+D )__________.
A/(A+B)
e) Risk ratio of CHD ( C/(C+D) ) __________.
2
Challenge - III
Represent the following table using array, and answer the following using array operations
3
Challenge - IV
4) Create a list that contains results of overall risk ratio (Challenge II), rural risk ratio
(Challenge IIIc) and urban risk ratio (challenge IIIf)
Challenge - V
Challenge - VI
b) Consider mat is a 2X2 matrix. Now, to extract 2nd row 1st column, will this command
mat(2;1) works?
4
Day - 1 (Session - 2)
In this hypothetical study, data from 25 individuals have been collected to explore the
relationship between demographic factors, systolic blood pressure, hypertension, and the
effectiveness of two types of drugs, A and B.
Lets work through these questions to undergo the data cleaning process.
1) Import the exercise data from the directory (File name is Exercise_data-Day1.csv)
i) How many variables are there in the datasheet? __________ (Hint ____ %>%
dim())
2) Give the variables new names as the following (Hint ___ %>% rename())
i) “Height.in.cms” as height
3) Give the variables labels as the following (Hint ___ %>% set_variable_labels())
5
4) Recode the values of the following variables (Hint ___ %>% recode())
6) How many people participated in the study from urban? __________ (Hint ___
%>% filter( ))
7) How many individuals took drug A? __________ (Hint ___ %>% filter( ))
8) How many individuals took drug B? __________ (Hint ___ %>% filter( )
6
9) Find the duplicates. How many pairs that are the same did you find?__________
10) Find the missing data for the variable Systolic Blood pressure (mmHg). (Hint
filter(is.na(-----)))
11) Identify the outliers in Systolic Blood Pressure (mmHg). (Hint use the range
80-160)
12) Prepare summary table by drug type for diastolic blood pressure with count, mean
and median, and SD (Hint ___ %>% group_by(___) %>% summarise(___))
7
Day - 2 (Session - 1)
Let us create some data visualizations to understand how drug is effective in treatment
of blood pressure, and see if there are any baseline differences, and differences in outcomes
- hypertension, systolic and diastolic BP.
1. Use the ggplot2 package to plot the bar graph for hypertension response (Univariate
bar graph). Which response has the most frequency? __________
2. Could you add drug type in the bar chart for hypertension? (Bivariate grouped bar
chart). How many people who indicated they had hypertension also took drug A?
__________
3. Could you now add the dwelling type to the previous bar graph. In bar graph, to
include the location use facet_wrap() function. What type of distribution does the
graph looks like in large city? __________
(Hint facet_wrap(~____))
4. Draw a density chart for systolic blood pressure (Univariate chart). What type of
distribution does the graph looks like? __________
a) Right skewed-distribution
b) Left skewed-distribution
c) Normal distribution
d) Uniform distribution
8
5. Create a box plot to represent systolic blood pressure by drug type (Bivariate box
plot). What is the median blood pressure for both drug type? __________
6. Using facet_wrap(), add the type of dwelling to the previous graph. Which sort of
dwelling has the highest blood pressure when using drug B? __________
7. Use a scatter chart to plot the graph for systolic and diastolic pressure (Bivariate
graph).
What is the relationship between systolic and diastolic blood pressure? __________
a) No association
b) Positive association
c) Negative association
9
Day - 2 (Session - 2)
Create summary tables for the following conditions. Then, fill in the blanks.
Variable n(%)
Gender
- Male __________
- Female __________
Location
- Town __________
Drug type
- Type A __________
- Type B __________
10
2) Prepare summary statistics for the following variables by type of drug, sex, dwelling,
hyper. Include statistical tests.
Location __________
Hypertension __________
- No __________ __________
3) Prepare the summary statistics for the numerical vectors systolic and diastolic blood
pressure by drug type. Include statistical tests.
11