Professional Documents
Culture Documents
Research Method - Assignmt
Research Method - Assignmt
INSTITUTE OF TECHNOLOGY
February/2022
1. Yields in wells penetrating rock units without fractures were measured by Wright
(1985), and are given below.
Unit well yields (in gal/min/ft.) in Virginia (Wright, 1985)
0.001 0.030 0.10 0.003 0.040 0.454
0.007 0.041 0.49 0.020 0.077 1.02
Calculate the
a) Mean
n
Mean, ~
x = ∑ x i /n
i=1
=12√ 0.001∗0.030∗0.10∗0.003∗0.040∗0.454∗0.007∗0.041∗0.49∗0.020∗0.077∗1.0 2
=0.043
c) median
0.001 X +X
n th n+1 th
( ) ( )
0.003 Median=
2 2
2
0.007
0.02 Or
0.03
0.04 P0.5 =x( n+1 )∗0.5=6.5 item
0.041
Is between and therefore
0.077 X6 X7
0.1
0.454 x 6 + 0.5( x 7−x 6 ¿=0.04+0.5(0.041-0.04) =0.04
0.49
d) Compare these estimates of location. Why do they differ?
1.02
They differ because the data are skewed. The estimates which are more robust are
similar, while the mean is larger.
a) Standard deviation, s
S=√ s 2
s is sample variance, s =∑ ¿ ¿¿
2 2
i =1
2
s =0.0961
S=0.31
X_9+0.75(X_10-X_9) =0.3655
X_ ((12+1)0.25) =3.25
X_3+0.25(X_4-X_3) =0.00375
IQR=0.3655-0.00375=0.36
c) MAD
=(|0.001-0.19|+|0.003-0.19|+|0.007-0.19|+|0.02-0.19|+|0.03-0.19|+|0.04-0.19|+|0.041-0.19|+|
0.077-0.19|+|0.1-0.19|+|0.454-0.19|+|0.49-0.19|+|1.02-0.19|)/12
=2.785/12
=0.232
d) Skew
Skew=2.07
e) Quartile skew
(0.3655−0.0405)−( 0.0405−0.00375)
= =0.83
0.3655−0.00375
b) The median would be a better "typical" concentration, and the IQR a better "typical"
variability, than the mean and standard deviation. This is due to the strong effect of the one
unusual point on these traditional measures.
3. The following chemical and biological data were reported by Frenzel (1988) above and
below a waste treatment plant (WTP). Graph and compare the two sets of multivariate data.
What effects has the WTP appeared to have?
Therefore, the bar chart show that there appears to be no effect of the waste treatment plant.
Correlation measures observed co-variation. It does not provide evidence for causal
relationship between the two variables. Evidence for causation must come from outside the
statistical analysis from the knowledge of the processes involved.
Range −1 ≤ ρ ≤ 1.
When one variable is a measure of time or location, correlation becomes a test for temporal
or spatial trend.
5. Are uranium concentrations correlated with total dissolved solids in the following
groundwater samples? If so, describe the strength of the relationship.
Uranium TDS
682.65 0.9315
819.12 1.938
303.76 0.2919
1151.4 11.9042
582.42 1.5674
1043.39 2.0623
634.84 3.8858
1400
uranium versus total dissolved solids
1087.25 0.9772
1123.51 1.9354 1200
688.09 0.4367
1174.54 10.1142 1000
599.5 0.7551
Uranium
800
1240.81 6.8559
538.35 0.4806 600
607.75 1.1452
705.89 6.0876 400
1290.57 10.8823
200
526.09 0.1473
784.68 2.6741
0
953.14 3.0918 0 2 4 6 8 10 12 14
1149.31 0.7592 TDS
1074.22 3.7101
1116.59 7.2446
The plot of uranium versus total dissolved solids looks like it could be nonlinear near
the 0 TDS boundary.
So Spearman's rho
rho = 0.72
p<0.001
83292.78−23 ¿¿
6|Page HiT, SCHOOL OF WATER RESOURCE AND ENVIRONMENTAL ENGINEERING,
Pearson's r
tr = 4.75
6. Discuss about the Simple Linear Regression and Multiple Linear Regression
7. Define Outlier; what are the cause of outliners and how do you identifies outliers in the
data set?
Outliers, observations whose values are quite different than others in the data set,
often cause concern or alarm. Outliers may be the most important points in the data
set, and should be investigated further.
Outliers can have one of three causes:
1. A measurement or recording error.
2. An observation from a population not similar to that of most of the data, such as a
flood caused by a dam break rather than by precipitation.
3. A rare event from a single population that is quite skewed.
The graphical methods are very helpful in identifying outliers. Whenever outliers
occur, first verify that no copying, decimal point, or other obvious error has been
made. If not, it may not be possible to determine if the point is a valid one.
Answer: quality assurance process is the methods followed to achieve a specified quality and
the methods to check the quality of an existing data set.
Measurement is measured parameter checks: These tests represent the heart of the data
validation process and normally consist of range tests, relational tests, and trend tests.
Data acquisition is a method to acquire and store all types of data from a wide range of
parameters.
Data integration - Data integration procedures are the methods for combining various data
sets into a unified, geographically harmonious data set
Data validation is defined as the inspection of all the collected data for completeness and
reasonableness, and the elimination of erroneous values. This step transforms raw data into
validated data.
Correction Proper correction of data is crucial for data processing. To get qualitative data, it
is necessary to go through each record.
Dissemination This means that it is possible to access the data through special views so that
users can execute queries.
Use Handling of observational data until they are in a form ready to be used for a specific
purpose. To make the proper use of data, data should be stored in such a way that all-possible
errors can be avoided and that the data can be made easily accessible.
Variability check: To check the obvious errors of water level, discharge, groundwater,
sediment, rainfall and evaporation data three main characteristics of time series data have
been considered as criteria of. These are:
Quality check: If too many errors crop up, or if the surveyed area has changed greatly, the
work is updated and corrected. Preliminary data checking can be done by different ways such
as:
a. cross checking
b. general observation
c. graphical presentation, and
d. simple statistical analysis