0 Up votes0 Down votes

11 views8 pagesData analysis for sampling

Apr 21, 2013

© Attribution Non-Commercial (BY-NC)

PDF, TXT or read online from Scribd

Data analysis for sampling

Attribution Non-Commercial (BY-NC)

11 views

Data analysis for sampling

Attribution Non-Commercial (BY-NC)

- Chapter 10 stat
- 8426672
- Mathematics T, Term 3, 2015
- PERFORMANCE-TASK-IN-STATISTICS-AND-PROBABILITY.docx
- Statistics Study Guide
- MTH 231 Bright Tutoring/mth231.com
- validare colilert
- Analy Meth Dev
- sir90
- MeasMeasurement Error of Percent Diameter Carotid Stenosis Determined by Conventional Angiography: Implications for Noninvasive Evaluation
- FR1 Chem 28.1 Expt 1
- confidence interval
- Susan Blouin, Business Analytical Tools Business Management
- Hasil.docx
- HypothesisTesting_Lecture.pptx
- Making Statistical Methods More Useful.sage Open-2013-Wood
- 00066350_Prediction of Oil Production With Confidence Intervals
- Accuracy vs dfgbsdPrecision
- Basic Statistics 7 Probability and Confidence Intervals
- 08 Learning About Mean Difference

You are on page 1of 8

DATA ANALYSIS

(Part 1)

HANIM AWAB Department of Chemistry Faculty of Science UTM

assessment on fewer data (generally>25) or data accumulated from the analysis of similar samples The problem is examined with respect to precision, accuracy and reliability required of the results Analysis of the results obtained are resolved into two stages: - examination of the reliability of the results - assessment of the meaning of the results

TYPES OF ERROR

1. GROSS ERROR (eg. eg. C Contaminated ontaminated reagents, faulty instrument) - Serious obvious errors that give outlier readings - Detectable with sufficient replicate measurements - Experiments with gross errors must be repeated 2. RANDOM/INDETERMINATE ERROR (eg. eg. Inaccurate manipulation of procedure) - Data scattered symmetrically about a mean value - Deviations of measurements from the mean shown using the Gaussian or normal error curve - Cannot eliminate but can be minimized - Error can be assessed by statistical tests

and calculate the size of the errors

minimized and approximated to an acceptable precision

Some ways to overcome errors Carry out replicate measurements Analyse accurately using known standards or standard reference materials (SRM) Perform statistical tests on data

3. SYSTEMATIC/DETERMINATE ERROR Operator/Instrument error/Method error - All data too high/too low or data increases with magnitude of measurement - Causes bias in technique (either +ve +ve or ve) ve) - Affects accuracy - May be detected by: - blank determinations, - analysis of standard samples, - independent analyses by alternative/dissimilar methods - Can be avoided/eliminated avoided/eliminated by correcting instrument, method and personal errors* errors*

*Ways to minimize/eliminate systematic errors Instrument errors: - Careful recalibration and good maintenance of apparatus (eg (eg glassware) and instruments ( (eg eg AAS, GC)

materials (SRM) Use 2 or more independent methods - Analysis of blanks

Personal errors:

- Training of operator, care and selfself-discipline

when a standard sample is analyzed (value estimated from results of varying precision depending on the method used) Accuracy - nearness of a measurement or result to the true value (expressed in terms of error) Precision - variability of a measurement (Standard deviations are precision indicators) SpreadSpread - difference between the highest and lowest results in a set (spread is a measure of precision) Mean - average of a replicate set of results Median - middle value of a replicate set of results

Degree of Freedom - number of results in a set (each time another quantity is derived from the set, the degrees of freedom are reduced by 1) Range - difference between the highest and lowest value of the results Standard Deviation (s or ) - difference, with respect to sign, between an individual result and the mean or median of the set Relative Standard Deviation (RSD) - Also known as the coefficient of variation, often used in comparing precisions Variance (V) (V) - square of the value of standard deviation (2 or s2)

Determinations/Formula

MEAN (AVERAGE) MEDIAN

STANDARD DEVIATION Measure of spread about the mean Estimate the variability of individual measurement (The standard deviation is better estimated by the pooling of results from more than one set)

divided by number of measurements

N

order, if data in the middle is an odd number record it as the median Arranged in ascending order, if two middle data are even numbers then average the two numbers

x =

i = 1

N-1 = degree of freedom

xi

(

x

2222

xxxx

))))

iiii

ssss

iiii

1111 NNNN

aka population, N = Number of replicate

2222

))))

iiii

RELATIVE STANDARD DEVIATION (RSD)/ COEFFICIENT OF VARIATION (CV) Standard deviation divided by mean (depends on the units used)

Mean = xi/N = 0.077 (x xi-mean)2 = 4.01x10-4 VARIANCE The square of standard deviation - Sample variance ( 30) 30): V = s2 - Population variance (large #) #): : V = 2

Sample 1 2 3 4 5 6 7 8 9

Se (mg/g) 0.07 0.07 0.08 0.07 0.07 0.08 0.08 0.09 0.08

(xi - mean) 4.9x10-5 4.9x10-5 9.0x10-6 4.9x10-5 4.9x10-5 9.0x10-6 9.0x10-6 1.69x10-4 9.0x10-6

S.D. =

s=

(x

i

x)2

= 0.007

N 1

STD. DEV. FOR POOLED DATA (Spooled) To achieve a value of good approx. to s for N 30, it is sometimes necessary to pool pool data from a number of sets of measurements Suppose there are t small sets of data, comprising N1, N2,.Nt measurements, the equation for the resultant sample standard deviation is:

Analysis of 6 bottles for sugar

Bottle Sugar (% ) 1 0.94 2 1.08 3 1.20 4 0.67 5 0.83 6 0.76

2222

Obs 3 4 5 4 3 4

2222

Deviations from mean 0.05, 0.10, 0.08 0.06, 0.05, 0.09, 0.06 0.05, 0.12, 0.07, 0.00, 0.08 0.05, 0.10, 0.06, 0.09 0.07, 0.09, 0.10 0.06, 0.12, 0.04, 0.03

2222

2 2 2

N1

N2

N3

= (

5 0 . 0

) +(

2

0 1 2222 . 0

) +(

8 0 . 0

9 8 1 2222 0 . 0

) =

spooled =

i =1

i =1

i =1

N1 + N2 + N3 +......t

S 1 2 3 4 5 6 Total

ssss

7 9 0 . 0 =

1111

i

6 6666 2 3 1 3 . 0 2

ssss

% 8 8 0 . 0

d e l o o p

Solve this Problem Given a set of diameters of four cells in units of m, 120, 135, 160 150 (a) Use functions available in your calculator (b) Use the Excel Spreadsheet (at your own time and submit the data and result printout) Calculate the following: - Mean - Median - Standard Deviation - Relative Standard Deviation (RSD) - Variance

PRECISION

- Reproducibility (repeatability) of repeated measurements ie How similar are values obtained in exactly the same way? Useful for measuring deviation from the mean

d i = xi x

ACCURACY Nearness (proximity) to the true value, ie. measurement of agreement between experimental mean and true value (which may not be known!) Measures of accuracy:

- Absolute error:

- Relative error: E R = |

xi | 100%

Discussion Question 1 Four students analyzed Fe content in a sample. Each student performed 5 replicates and the results are illustrated below. Comment on the accuracy and precision of each set of results (Hint: Student C obtained the best results)

True value A B C D 9.80 10.00 10.20 mean 10.10 9.90 10.01 10.01

Discussion Question 2 - Comment on the accuracy and precision of the following results. Explain or show proof? - Which set of data has to be thrown out (discarded)? (discarded) ? Why?

Student A B 10.10 10.08 10.09 10.07 10.08 10.10 0.01 C 9.65 9.75 9.78 10.07 10.24 9.90 0.25 D 9.97 9.98 10.02 10.03 10.05 10.01 0.03 E 9.80 9.89 10.01 10.13 10.22 10.01 0.17

X

DATA VALUE 10.00 10.00 10.00 10.00 10.00 10.00 0.00

CONFIDENCE LIMIT & CONFIDENCE INTERVAL Confidence Interval (CI) is the range of values surrounding the mean, mean, within which the population mean, is expected to lie with a certain degree of probability The boundries of the range are called the Confidence Limits Confidence Level (CL) is the probability that the true mean lies within a certain interval (expressed as %) Example: It is 99% probable that for a set of measurement is 7.25mg 0.15. Thus, the mean should lie in the interval from 7.10mg to 7.40mg with 99% probability

CI for large no. of data (>30) with known population std deviation, CI for small no. of data (30) without knowing (know s)

=x z N

=x ts N

Values of z for determining confidence limits Confidence level (%) 50 68 80 90 95 96 99 99.7 99.9 z 0.67 1.0 1.29 1.64 1.96 2.00 2.58 3.00 3.29

N = Number of measurements z = values from normal distribution curve (Read from the zz-table) t = values from normal distribution curve but depends on the degree of freedom (N(N-1) (Read from the tt-table) t is also known as the students t, generally used in hypothesis tests

Degrees of Freedom (N (N-1) 1 2 3 4 5 6 7 8 9 19 59 80% 3.08 1.89 1.64 1.53 1.48 1.44 1.42 1.40 1.38 1.33 1.30 1.29 90% 6.31 2.92 2.35 2.13 2.02 1.94 1.90 1.86 1.83 1.73 1.67 1.64 95% 12.7 4.30 3.18 2.78 2.57 2.45 2.36 2.31 2.26 2.10 2.00 1.96 99% 63.7 9.92 5.84 4.60 4.03 3.71 3.50 3.36 3.25 2.88 2.66 2.58

SAMPLE QUESTION (CONFIDENCE INTERVAL) Calculate the confidence interval (CI) at 95%, 90% & 99% confidence level given the following data for the analysis of Ca in a rock sample: 14.35, 14.41, 14.40, 14.32, 14.37 Mean = 14.37, s = 0.037 From table: @ confidence level 95% & NN-1 = 4, t = 2.78 = 14.37 2.78 x 0.037 CI = = x t s =

Confidence interval is 14.37 0.05 or 14.32<< 14.42 Summary of results (calculate the rest by yourselves): @ Confidence level Confidence interval (CI) 90% = 14.37 0.04 95% = 14.37 0.05 = 14.37 0.08 99% If confidence level increases, the CI increases, and the probability of appearing in the interval also increases

AAS analysis of Cu in aircraft engine oil gave a mean value of 8.53 mg Cu/mL Cu/mL. . Pooled results of many analyses showed that s = 0.32 mg Cu/mL Cu/mL. . Calculate the confidence intervals (CI) at 90% & 99% confidence levels based on (a) 1 (b) 4 (c) 16 measurements (a) Confidence limit (CL) = = x t s

(b)

90%, CL = 8.53

99%, CL = 8.53

N

(c)

90%, CL = 8.53

@ 99%, CL = 8.53

99%, CL = 8.53

Analysis of an insecticide gave the following values for % of Lindane: 7.47, 6.98, 7.27. Calculate the CL for the mean value at the 90% confidence level

OTHER USAGE OF CONFIDENCE INTERVAL To determine # of replicates (N) needed for the the mean to be within the confidence interval To determine systematic error

2 i

x=

x

N

2172 . = 7.24 3

s=

@90%, CL = x ts

= 7.24

Example 1: 1: Calculate the number of replicates needed to change the confidence interval by 1.5 g/mL at 95% confidence level. Given, s = 2.4 g/mL

Example 2: 2: Ten measurements on a sample gave a mean of 0.461, with std dev of 0.003. A solution gave a reading of 0.470. Show whether systematic error exists at 95% confidence level At 95 95% % confidence level, (N (N 1) = 9, t = 2.26

(0.003 ) ts = 0.461 2.26 N 10 = 0.461 0.002 This means, 0.459 < < 0.463, ie 95% of the time, the true value lies between 0.459 to 0.463 Therefore, the the reading 0.470 is NOT in the range, and systematic error EXISTS

= x

DISTRIBUTION OF ERRORS

NORMAL or GAUSSIAN distribution (bell shaped, symmetrical curve) gives limits within which the population mean () is expected to lie with a given degree of probability (without any systematic error)

50% -0.67s +0.67s 80% -1.29s

dN/N

95% +1.29s

dN/N

Based on the curve, percentages of area under the curves between certain limits of z are as follows: 50% of area lies between 0.67s 80% " 1.29s 90% " 1.64s 95% " 1.96s 2.58s 99% " When we say that at a confidence level of 80%, the confidence limits are 1.29s we mean that: - 80% of the time the true mean will lie between 1.29s of the measurements made - or in other words 20% of the time the true mean will NOT lie between 1.29s

-1.96s

+1.96s

dN/N

1s 2s 3s 4s

-4s -3s -2s -1s 0 1s 2s 3s 4s -4s -3s -2s -1s 0 1s 2s 3s 4s -4s -3s -2s -1s 0

mean is indicated by

SIGNIFICANCE TESTS

Tests whether the difference between two results is significant (or merely due to random variations) - used to decide whether the difference between the measured and known values can be explained by random errors The NULL HYPOTHESIS, HYPOTHESIS, Ho If Ho is accepted: accepted: means there is NO significant difference between observed and known values (other than that due to random observation) If Ho is rejected: rejected: means difference is significant

Has two uses: (1) Comparison of true value, and mean, to detect if difference is significant - Used to detect the existence of systematic error or bias Calculate t (generally for 95% confidence level) If value of tcalculate < tcritical (ie tcalc < ttable), ACCEPT the null hypothesis, thus Ho: = Accepting Ho means that there is NO significant difference (or no systematic error) at the 95% confidence level, but there is 5% probability that there is a sgnificant difference

(2) Comparison of means ( ) of two samples - eg Compare mean of new method with a reference (or standard) method - Accept Null hypothesis (Ho) if NO significant difference between methods ie the results are the same, or =0 - Calculate t, if tcalc < ttable, accept Ho to show that there is NO significant difference in results Use pooled estimate of std dev, s2={(n1-1)s12+ (n2-1)s22} / (n1+n2-2), or

The F Test

F-TABLE

- One tailed test: test: test whether method A is more - Two tailed test: test: test whether methods A and B - F is ratio of two

sample variances:

precise than method B (assumes A is always precise) differ in their precision (ie any method can be precise)

F=

s2 1 1 = 2 2 s2

Ho: Population variances are equal (or 1) [F is always >1, thus the smaller ie the more precise is always the denominator] If Fcalc < Ftable (Accept Ho) which means that there is NO significant difference in precision between the two methods

Example Question: ONEONE-TAILED F TEST A proposed method for COD of wastewater was compared with a standardized method The results are given as follows: Standardized method (8 (8 determinations): determinations): mean =72 mg/L, s = 3.31 mg/L determinations): Proposed method (9 (9 determinations): mean = 72 mg/L, s = 1.51 mg/L () Is the proposed method significantly more precise than the standardized method? F = (SStd)2/(SProp)2 = (3.31)2/(1.51)2 = 4.8 Data values: 8 for Std & 9 for proposed, thus from the FF-table degrees of freedom (N(N-1) = 7numerator and 8denominator, Fcrit = 3.50 Since Fcalc >Ftable , reject Ho. Thus there is a significant difference bet the methods and the proposed method is significantly more precise

Set as denominator

Example: Determination of CO using a Standard Procedure gave an s value of 0.21 ppm. The method was modified twice giving s1 of 0.15 and s2 of 0.12 (both 9 degrees of freedom). Are the modified methods significantly more precise than the std? Ho : s1 = sstd Ho: s2 = sstd

F1 =

2 std 2 1

F2 =

In standard methods the # of data is large, thus s, & degrees of freedom becomes infinity, From FF-table, num=, den=9; Fcrit = 2.71 F1< Ftable : accept Ho but F2>Ftable : reject Ho Only the 2nd modified method is is significantly more precise than the standard method

The Q TEST or DIXONS TEST (Detection of gross errors) The QQ-Test is used for detecting outlier (suspected unreasonable data) which statistically does not belong to the set Example: Example : 10.05, 10.10, 10.15, 10.05, 10.45, 10.10

normal range (More easily observed when numbers are arranged in a decreasing or increasing order) 10.05, 10.05, 10.10, 10.10, 10.15, 10.45

The Qcal is compared with the Qtable and the null hypothesis, Ho is checked

Q expt =

= 0.75

From QQ-table (@95% & N=6) Q = 0.625 (Q-table:Next slide ) Qcal > Qtable data (10.45) can be rejected

will change from the original value if changed!)

Contd

Q TABLE No. of Observations 3 4 5 6 7 8 9 10 Confidence Level 90% 0.941 0.765 0.642 0.560 0.507 0.468 0.437 0.412 95% 99% 0.970 0.829 0.710 0.625 0.568 0.526 0.493 0.466 0.994 0.926 0.821 0.740 0.680 0.634 0.599 0.568

EXAMPLE QUESTION: QQ-TEST The following data was obtained for the determination of nitrite concentration (mg/L) in a sample of river water: 0.403, 0.410, 0.401, 0.380, 0.400, 0.413, 0.411 Should the data 0.380 be retained? Q = |0.380 - 0.400|/|0.413 - 0.380)| = 0.606 From the QQ-table: Sample size 7, Qtable = 0.570 Qcalc>Qtable, thus the suspect outlier is rejected

- Chapter 10 statUploaded bynazia01
- 8426672Uploaded byNeeta Joshi
- Mathematics T, Term 3, 2015Uploaded byLoo Siaw Choon
- PERFORMANCE-TASK-IN-STATISTICS-AND-PROBABILITY.docxUploaded byAnn Gutlay
- Statistics Study GuideUploaded byldlewis
- MTH 231 Bright Tutoring/mth231.comUploaded bylakshamama9
- validare colilertUploaded byhoria96
- Analy Meth DevUploaded byLe Thi Khanh Van
- sir90Uploaded byarygt
- MeasMeasurement Error of Percent Diameter Carotid Stenosis Determined by Conventional Angiography: Implications for Noninvasive EvaluationUploaded byjeheiserman1
- FR1 Chem 28.1 Expt 1Uploaded byMarrod Cruz
- confidence intervalUploaded byAnantaBenvenuto
- Susan Blouin, Business Analytical Tools Business ManagementUploaded byelizabethnn
- Hasil.docxUploaded byPMKP RSGH
- HypothesisTesting_Lecture.pptxUploaded byKash Jam
- Making Statistical Methods More Useful.sage Open-2013-WoodUploaded byLeopoldo Artiles
- 00066350_Prediction of Oil Production With Confidence IntervalsUploaded byMartik Martikian
- Accuracy vs dfgbsdPrecisionUploaded byMemo Leon
- Basic Statistics 7 Probability and Confidence IntervalsUploaded byMegadeth Shawon
- 08 Learning About Mean DifferenceUploaded byJustinMalin
- Quiz 1 review questions-1.docxUploaded bySteven Nguyen
- 2004 JACOB - Physical Activities and Low Back Pain a Community Based Study (1)Uploaded bymarcferr
- Www.scielo.br PDF Bjorl v75n3 v75n3a18Uploaded bykhanasifalam
- ContentServer OlinskyUploaded byRafa Cuevas
- Out of Specifications IUploaded byGerson Trejo Martinez
- Mba Zc417 Ec-3r Second Sem 2016-2017Uploaded byAbi
- AMR.823.13Uploaded bymaneco10
- 024 Developing a public health policy response to wet work exposureUploaded byISCRR
- BritaniaUploaded bylateateho
- Hw4 SolutionsUploaded bykiakamrani

- Blank Tarquin Engineering Economy Selected Solutions 6th Ed Chapter 1Uploaded byLusash1
- PFD ModificationUploaded byLei Yin
- Plant Design_Separation_Tower DesignUploaded byLei Yin
- Plant Design Costing RevisionUploaded byLei Yin
- Blank Tarquin Engineering Economy Selected Solutions 6th Ed Chapter 2Uploaded byLusash1
- 2.2 Data AnalysisUploaded byLei Yin
- Annuity Problems for EngineeringUploaded byAnonymous I7TYFsv
- Eit OrginalUploaded byLei Yin
- Appendix C CITablesUploaded byErika Madrazo
- 1.1 Introduction of analytical chemistryUploaded byLei Yin
- Ch6_AnnualWorthAnalysisUploaded byLei Yin
- Ch4 Effective Interest Feb 2013Uploaded byLei Yin
- Ch1 Foundations_Engineering Economic ExerciseUploaded byLei Yin
- skkk4173_Assignment2_EEUploaded byLei Yin
- skkk4173_Assignment1_engineering economicUploaded byLei Yin
- 29920922 Sample Problem 6Uploaded byLei Yin
- 04 Script Examples Solid Liquid ExtractionUploaded byLei Yin

- IJCSE_Template.docUploaded bysuv
- About Command Precedence.helpUploaded bySumit Deswal
- HP VDI Reference ArchitectureUploaded byAlee Di Vaio
- AIEEE 2010 SolutionUploaded byResonance Kota
- Relación Entre Los Índices de Volumenes Plaquetarios y La HiperlipidemiaUploaded byMemento Mori
- Product Ss823Uploaded bymojinjo
- An Extended Variational Principle for Non-linear Non-potential Potentializable OperatorsUploaded byAscanio Barbosa
- Kids Picv1Uploaded byKarthikeyan Sankarrajan
- RATS NSAUploaded byElsa Cristina David
- Generatordatenblatt-Marelli-MJB200SA4.pdfUploaded bytjiang
- IP AddressesUploaded byUditha Muthumala
- Common stationary and non-stationary factors in the euro area analyzed in a large-scale factor modelUploaded bymirando93
- Profiles of Potentially Antiallergic Flavonoids in 27 Kinds of Health Tea & Green Tea InfusionsUploaded byYvonne Tong
- Martin Golubitsky- Symmetry and NeuroscienceUploaded byJmasn
- Concrete Core Cutting TestUploaded byEngr. Jahanzeb Mahar
- Horne Bernardino - On the Desire of the AnalystUploaded byハイロ ナルバエス
- Sqr Getting StartedUploaded byFlorian Titu
- Nervous TissueUploaded byNatura Manila
- Cired paperUploaded byVenna Karthik Reddy
- Riak HandbookUploaded byAndres Tuells Jansson
- Common Causes of Bearing FailuresUploaded byPramod P
- CDMA_tutorial2Uploaded byDramane Bonkoungou
- Ce2404 Pcs NotesUploaded byChockalingam
- Belgrade_6_9Uploaded byBirdi Sid
- Dgc 2007 영문설명서Uploaded byMaksim Panfilov
- An 107Uploaded byoscer959
- Soosan Servicemanual SB GBUploaded byvan
- b_ba_guide_unx_lnxv63.pdfUploaded byAdam Mazouz
- L_Linear Factors Theorem and Conjugate Zeros TheoremUploaded bycitisolo
- A COMPARATIVE STUDY OF PARAMETRIC MODELS OF MAGNETORHEOLOGICAL FLUID SUSPENSION DAMPERS.docxUploaded byMuhammad Afif Ramli

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.