Professional Documents
Culture Documents
Mamcjmsbiostat
Mamcjmsbiostat
net/publication/281430006
CITATION READS
1 7,549
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Satyanarayana Labani on 02 September 2015.
1 study is transformed into Z (Z = [x – µ]/σ) this converted Z to (mean + 2SE) or (130 − 2 × 1.45) to (130 + 2 × 1.45) or 1
2 has zero mean and variance one. When the population SD is 127–133. This is interpreted as there is a 95% chance of 2
3 not available or unknown and replaced by the sample SD, population mean of cholesterol level in children of 3–12 years 3
4 the distribution is almost similar and has a different name of age to be included in the interval (127–133). The interval 4
5 called Student’s t‑distribution [Figure 3]. For large samples, when used with (mean ± 2SD) or 130 ± 2 × 25 or (130 + 50) to 5
t‑tests give virtually identical results in comparison to normal (130 – 50) or 80 – 180 is the 95% range of observations in the
6 6
tests. As compared to a normal distribution, a quantity called sample under study. This shows the clear distinction between
7 7
degrees of freedom (df) which depends on sample size is used interval of observations and CI.
8 in t‑distribution. For example, df is sample size minus the
8
9 number of the parameter under estimation. For estimating the
Concept of statistical significance and P value 9
10 Statistical significance is closely related to confidence 10
mean with a sample of n observations, the df is n − 1.
11 statement such as 95% CI. A threshold of 95% confidence 11
12 Estimation indicates that there remains an uncertainty of 5% which could 12
13 The sample estimates such as mean or proportion tend to vary result into a critical region that becomes basis for hypothesis 13
from sample to sample due to sampling variability or sampling testing. In statistical inference, hypotheses are formulated so
14 14
fluctuation. It is important to understand how much uncertainty that the hypotheses to be tested can be refuted this is called a
15 is conferred upon on point estimates such as central values null hypotheses or statistical hypotheses. The null indicates
15
16 mean or proportion. The measure of sampling variability called zero and in the null hypothesis either no difference or zero 16
17 standard error (SE) can be estimated for mean and proportion or difference is assumed. For any null hypothesis there could be 17
18 any other estimate of interest. If the estimates are computed in a one‑sided or two‑sided alternatives. Suppose our interest is 18
19 an interval as against a single value, such an interval is called to examine whether the hemoglobin level of children with 19
20 confidence interval (CI) an example of computation of SE of chronic diarrhea is same as that of healthy children. This is an 20
21 mean and 95% CI of mean is illustrated below. Replacement example for one‑sided test because the Hb level in the chronic 21
22 of SD with SE in the expression of mean ± 2SD provides 95% diarrhea is not expected to be higher than the normal Hb level 22
23 CI for mean (i.e. mean ± 2SE). among healthy children. An example of two‑sided hypothesis 23
24 is Hb level in children undergoing two types of feeding 24
Suppose we have data on the cholesterol level in 300 children
25 practices. Figures 4‑6 depict errors in decision making in the 25
of 3–12 years of age then what is the 95% CI of mean? The
context of marketing a new drug, diagnostic, and statistical
26 computed mean and SD of the cholesterol level is 130 and 25, 26
test settings. The probability of wrongly rejecting a true null
27 respectively. The SE of mean for the sample size of n = 300 27
hypothesis is an error (type I) in statistical decision‑making.
28 is computed as: SD/√(n) or 25/√(300) or 1.45. Now 95% CI 28
29 for mean cholesterol level is (mean ± 2SE) or (mean − 2SE) 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 Figure 2: Standard normal distribution (source: http://www.regentsprep.
43
44 org/regents/math/algtrig/ats2/normallesson.htm. [Accessed 22 Jul Figure 3: Standard normal and t-distribution (source: http://www.sjsu. 44
45 2015]) edu/faculty/gerstman/StatPrimer/probability. [Accessed 22 Jul 2015]) 45
46 46
47
AQ5 Figure 4: Errors in marketing a new drug setup Figure 5: Errors in diagnostic test setup AQ547
48 48
49 Marketing a new drug Disease status/ Diagnostic/screening test result 49
gold standard
50 Marketed Not marketed Test positive Test negative 50
51 Drug effective Correct decision Error Disease positive No error (true positive) Error (false negative) 51
52 Drug ineffective Error Correct decision Disease negative Error (false positive) No error (true negative) 52
1
AQ5 these distributions in decisions making. For details of all these 1
Figure 6: Errors in hypothesis testing setup
2 tests various other references may be referred.[3,4] 2
3 Actual position Statistical decision on the basis of data 3
Assessment of strength of association
4 Reject null Do not reject
There are more profound uncertainties in the assessment of 4
5 hypothesis null hypothesis 5
relationship between disease and exposure. For categorical
6 Null hypothesis false Correct decision Error (type II) 6
variables, the association between disease and exposure is
Null hypothesis true Error (type I) Correct decision
7 measured as relative risk (RR) or risk difference and odds ratio 7
8 (OR). The strength of association for continuous variables are 8
9 This is also referred to as P value. The value of this error is correlation coefficient (R) and coefficient of determination 9
10 generally kept at 0.05. This threshold of 5% is also called (R‑square) are computed. 10
11 the level of significance. A result is called as “statistically 11
significant” when P < 0.05. The other important concept
12
relates to not rejecting a false null hypothesis is of another Relative Risk 12
13 error (type II) in statistical decision‑making. The type I error The RR is measured as ratio of incidence rate among the 13
14 and type II errors can be viewed as false positive and false exposed to the unexposed. The RR or rate ratio for the 14
15 negative respectively, in the setting of diagnostic accuracy event of outcome such as disease would be calculate using 15
16 testing. Rejecting a false null hypothesis is called as the power exposure and un exposure categories. Consider a prospective 16
17 of the test. The power is also called probability of getting a study on follow‑up of women with HPV and without HPV to 17
18 statistically significant result. observe the outcome of cervical precancerous state of cervical 18
19 intraepithelial neoplasia grade (CIN). The hypothetical data 19
tabulated on 600 women is shown in Table 1. Relation between
20 General Significance Test Procedure 20
21 status of HPV and CIN may be obtained RR = 19.25, and 95% 21
The basis of any test procedure is to judge a sample mean with CI = 7.1–51.9 Chi‑square = 76.1, P < 0.001.
22 a hypothesized value (μ) in relation to the SE of mean in the 22
23 one sample context test criterion. The test criterion (based on The exact P value obtained from the statistical package is very 23
24 Student’s t‑test) in relation is: low. The RR = 19.25 is interpreted as follows. HPV presence 24
25 has a 19‑fold risk of developing CIN as compared to the 25
26 (x - µ) absence of HPV. The null value of RR = 1 did not include in 26
t=
27 SE (x) the 95% CI also indicated the significance of RR = 19.25. The 27
28 risk difference is the difference in incidence or risk between 28
With (n − 1) df. This ratio is used to reject or not to reject the exposure and nonexposure.
29 29
null hypothesis depending on the computed value of t. The
30 30
null hypothesis is rejected if the calculated value of t is more
31 than the critical value of t given in the t‑distribution table
Odds Ratio 31
32 corresponding to a prefixed level of significance either for a Case‑control studies assess the frequency of exposure in cases 32
33 one‑tailed or two‑tailed test. The calculated value of t more with disease and controls without the disease. These are called 33
34 than the critical value (1.96 for very large N) indicates that P odds, and their ratio is OR. The OR is approximately same as 34
35 is less than a threshold level of significance such as 0.05 (5%). RR when the disease is rare. Consider a case‑control study 35
36 The value of P less than the threshold probability (0.05) is that evaluates the role of low birth weight in early neonatal 36
37 interpreted as significant (P < 0.05). The above basic procedure mortality. A hypothetical data tabulated in 2 × 2 contingency 37
38 is the same in any test of significance and difference is with forms is shown in Table 2. 38
39 the test statistic for different comparison. The details of tests OR, its 95% CI and Chi‑square test are as follows. 39
40 for situations such as testing a sample mean in comparisons OR = 47 × 55/15 × 11 = 15.6 and 95% CI of OR is 6.5–37.4. 40
41 to a hypothetical population mean and, testing of two means Chi‑square = 45.1, and P = 0.000 is very low at 1 df and can 41
are the common situations in test of hypothesis using t‑test reported as P < 0.001. The 95% CI is computed using a statistical
42 42
for quentative data. There could be two settings in such test packages such as SPSS, Epi‑info[ 5], etc., The interpretation of AQ243
43 of hypothesis one is between two independent groups and the
44 OR = 15.6 indicate that the odds of death in neonates with low 44
other is in paired group data. For assessing significance in birth weight (<2000 g) is 15.6 times the odds of death in neonates
45 qualitative data tests such as Chi‑square, Fishers exact, and 45
without low birth weight. This is not the same as the RR.
46 Mc Nemar tests are used. The situations where data are not 46
47 normally distributed in quantitative type, nonparametric tests Correlation coefficient 47
48 such as rank tests are used in assessing statistical significance. Scatter diagrams[4] are important for initial exploration of 48
49 The situations where more than two means are to be compared the relationship between two quantitative variables. The 49
50 a procedure called analysis of variance (ANOVA) with F‑test relationship between two quantitative variables to assess 50
51 for assessing overall significance and several other choices for the strength of degree of linear or straight line relationship 51
52 pair wise comparisons are used critical values are available for is called correlation relationship or Pearson’s correlation 52
1
AQ4 (1‑sensetivity) divided by the probability of a person who does 1
Table 3: Presence HPV status in relation with CIN
2 not have the disease testing negative (specificity). 2
3 Exposure CIN Non‑CIN Total 3
HPV present 77 223 300
Sample size determination
4 The number of subjects decided to be included in a sample of a 4
HPV absent 4 296 300
5 research investigation is called sample size. Sample size plays 5
Total 81 519 600
6 HPV: Human papillomavirus, CIN: Cervical intraepithelial neoplasia an important role in estimation and test of the hypothesis. The 6
7 sample size of the proposed investigation should be calculated 7
8
AQ4 with the help of essential information based on scientific 8
9 Table 4: Relationship between low birth weight and early knowledge. The determination of sample size depends on a 9
10 neonatal mortality variety of considerations. These in the setup of estimations 10
11 Birth weight Neonatal outcome Total are as follows: (i) The proposed method of sampling through 11
12 Death Alive which sample of subjects to be enrolled is required as sample 12
13 <2000 g 47 11 58
size is determined based on simple random sampling, (ii) the 13
level of precision around which the estimate is desired to fall 14
14 ≥2000 g 15 55 70
in, (iii) knowledge of variability through SD is required in
15 Total 62 66 128 15
estimation of mean setting, and (iv) the desired confidence level
16 such as 95% or 99%. In the determination of sample size for
16
17 coefficient. The value of correlation coefficient lies between test of hypothesis situations the required considerations are: 17
18 − 1 and + 1 indicating negative and positive correlations. (i) The desired magnitude of difference that is considered to be 18
19 On the other hand, the relationship between two or more clinically significant, (ii) the assumption of normal distribution 19
20 quantitative variable in a structural form for the prediction of data being considered and the extent of variability through 20
21 of one variable when the other variable is given is called SD when the interest is of quantative variable, (iii) the level of 21
22 regression. The square of the correlation coefficient is called as significance or maximum type I error tolerable and the required 22
the coefficient of determination and interpreted as the percent statistical power for a specified clinically important difference,
23 23
of variation explained in one variable (dependent) by the other and (iv) the alternative hypothesis considered to be a one‑tailed
24 24
variable (independent) on which it is regressed. or two‑tailed test. Free statistical packages such as Epi‑info and
25 25
26 other available websites can be used to determine sample size 26
27 Evaluation of Diagnostic Test Performance after providing the required input for that purpose.[6] 27
28 Statistical measures for assessing the performance of a clinical 28
29 (screening/diagnostic) test are sensitivity‑specificity, positive Conclusions 29
30 and negative predictive values. Sensitivity and specificity are Findings from the medical research required to be understood 30
useful to identify or to rule out the disease and indicate the in order to put the emerged evidence into medical practice.
31 31
inherent quality of the test. These indicators do not depend Apart from the area of medicine in which investigations are
32 on the prevalence of the disease in a population. Contrary to
32
33 done, knowledge of biostatistics as part and parcel of research 33
this predictive values are dependent on the prevalence of the
34 methodology is essential. Beginning with research question, 34
disease in a community on which the test is applied. Predictive
35 how the design of the study, sample of observations chosen, 35
values speak about the probability that the test will give the
proceeding for further data analysis, and interpretation to
36 correct diagnosis. 36
provide final conclusions of the research investigation given
37 37
Sensitivity, specificity, and predictive values: Sensitivity of in this article is helpful in the understanding of advancement
38 a screening/diagnostic test means the ability of the test to taking place in a particular area medicine. This brief overview
38
39 correctly identify those patients who have the disease. The of the subject would serve a quick summary methods for UG 39
40 specificity of a screening/diagnostic test refers to the ability and PG medical students in their short projects and thesis works. 40
41 of the test to correctly ruling out persons without the disease. 41
42 Positive and negative predictive values: Positive predictive
Financial support and sponsorship 42
43 Nil. 43
value is the proportion of patients with positive test results
44 who are correctly diagnosed. Negative predictive value is Conflicts of interest 44
45 the proportion of patients with negative test results who are There are no conflicts of interest. 45
46 correctly diagnosed. 46
47 Positive and negative likelihood ratios References 47
48 The positive likelihood ratio is the probability of a person who 1. Random Number Generator. Available from: http://www.stattrek. 48
49 has the disease testing (sensitivity) positive divided by the com/statistics/random‑number‑generator.aspx. [Last accessed 49
on 2015 Jul 22].
50 probability of a person who does not have the disease testing 2. Sealed Envelope Ltd. Simple Randomisation Service; 2015. Available
50
51 positive (1‑specificity). The negative likelihood ratio is the from: https://www.sealedenvelope.com/simple‑randomiser/v1/. [Last 51
52 probability of a person who has the disease testing negative accessed on 2015 Jul 22]. 52
1 3. Satyanarayana L, Asthana S. Relevance of statistical significance in 5. Epi Info™ 7.1.5. Available from: http://www.cdc.gov/Epiinfo/7/index. 1
medical research. Ganga Ram J 2014:3;107‑213. htm. [Last accessed on 2015 Jul 22].
2 4. Indrayan A, Satyanarayana L. Simple Biostatistics for MBBS, PG 6. Web‑based Sample Size/Power Calculations. Available from: http://
2
3 Entrance and USMLE. 4th ed. Delhi: Academa Publishers; 2013. www.stat.ubc.ca/~rollin/stats/ssize/. [Last accessed on 2015 Jul 22]. 3
4 4
5 5
6 6
Author Queries???
7 7
AQ1: Kindly check the heading levels throughout the
8 8
article.
9 9
AQ2: Kindly provide manufactured details.
10 AQ3: Kindly provide editable format in Table 1 and 2. 10
11 AQ4: Kindly provide Table 3 and 4 citation in text part. 11
12 AQ5: Kindly check the Figures 4-6 given as table 12
13 format. Kindly check and confirm. 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
52 52