You are on page 1of 4

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/299832327

Filtering Data to Find Outliers and Calculating Uncertainty of Km Value

Research · April 2016

CITATIONS READS

0 72

1 author:

Adian Fatchur Rochim


Universitas Diponegoro
83 PUBLICATIONS   69 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

High Performance Server Development View project

Scientometrics and informetrics View project

All content following this page was uploaded by Adian Fatchur Rochim on 07 April 2016.

The user has requested enhancement of the downloaded file.


Filtering Data to Find Outliers and Calculating
Uncertainty of Km Value
Instructor : Prof.Dr. Ir. Harinaldi, M.Eng (HRN)
Adian Fatchur RochimAdian Fatchur Rochim
PhD Student - Department of Electrical Engineering, Faculty of Engineering
University of Indonesia, Depok, Indonesia
Email: adian@ieee.org

C.F Oduoza, A.A Wragg and M.A Patrick showed in Chem- easily usable format with tables he derived from Peirce’s
ical Engineering Journal, 68, (1997) that the local coefficient work. Peirce proposed three tables.
of convective mass transfer in an electrochemical process be- The first containing value of x for the double argument
tween two electrodes can be determined by limiting di↵usion
current method. They pointed out that the local mass transfer m0 = m n
N and y = N , would include the values corre-
coefficients, Km (m/s) were then calculated using the equation: sponding to any given number of unknown quantities. The
where Ilim is the local limiting current (A) measured at local second table, a table of single entry, would give log (Br).
cathode, z is the number of electrons exchanged, F is the Q for di↵erent calues of y, and the third contain log R, for
Faraday number (96 487 A.s/mol), A is the exposed mini- argument x , being deduced from the solution of equation
cathode area (m2) and C is the bulk electrolyte concentration
(mol/m3). (Course homework - I use R language). (D) [1]
II. Filter Data Using Peirce’s Criterion
Ilim
Km = (1)
zF AC Algorithm 1 Peirce’s Method [2]
Applying the above method, the attached time series data is 1) Calculate the mean and the sample standar devia-
the result of Ilim measurement using a high precision Digital tion (SD) of the complete data set.
Multi Meter (manufacturer calibration documents notes that 2) Obtain R corresponding to the number of measure-
uncertainty is 0.5 %). If z = 2, A = 1.8 x 10-6 m2 (uncertainty
2 %) and C = 500 mol/m3 (uncertainty 2 %), find out: ments taken from Peirce’s table.
(a) Filtered data suitable to be used for further analysis ! 3) Calculate the maximum allowable deviation: |xi –
(b) what is the uncertainty in Km value? xm| max .
4) For any suspicious data measurements, obtain |xi –
I. Methodology and Data Analysis xm|.
5) Eliminate the suspicious measurements if:
Theorem 1. Peirce’s criterion is an in-depth derivation
formulated directly from the theory of probability. It was |xi xn | > |xi xm |max (2)
based on the following principle: ” ...the proposed ob-
6) If this results in the rejection of one measurement, .
servations should be rejected when the probability of the
Go to step 8.
system of errors (my note: the actual deviations from the
7) If more than one measurement is rejected in the
mean) obtained by retaining them is less than that of the
above test.
system of errors obtained by their rejection multiplied by
8) Repeat the above calculations (steps 2 – 5),
the probability of making so many, and no more, abnormal
9) Now obtain the new value of the mean and sample
observations [2].
standard deviation of the reduced data set.
1) Methodology using Peirce’s Criterion Method to
throw “Outlier” data. Time series data is the result of measurement using a
2) Programming language using R language high precision Digital Multi Meter, 303 samples, Figure 1.
3) Timeseries data 303, I assume if N > 60, using N = shown the time series data.
60 on Peirce’s Table. Figure 1 shown of the identification an original data set.
4) Removing Outlier data The Identification of the appropriate data is not simple,
5) Calculating Standard deviation and mean which of the data to deleted or thrown away from an
6) Finding uncertainty Km Value, using Combining observation. Assume a data set with numeric variable x.
Uncertainty - Simple Method Suppose there are n observations in a dataset. We want to
The actual method of calculation used by Peirce was throw away all observations which are ”not good enough”.
mathematically cumbersome to use and a later paper by Several data sample is as suspicious “not good enough”
Gould [1] took Peirce’s criterion and presented it in a more data.
54.6
Result Measurements
54.4
54.2

0 50 150 250
times elapsed

Figure 1. Time Series Data Source

Several suspicous data is shown Figure 1 at time 150,


250 and 300. I try to throw outlier data with Peirce’s
Criterion. As follow algorithm my program code using
Peirce’s Criterion Algorithm.

Algorithm 2 Removing Outlier


Require: Read, data source (ds) and Peirce’s R table
1: Calc sd, mean, and max
2: Calc ds’s row number -> r ow1
3: rowx <- row1 + 1
4: rowy <- row1
5: while rowx 6= rowy do
Figure 2. Iteration step-by step
6: abs t = 0
7: for i : row y do
8: abs t[i] < abs(t[i] mean) No Iteration-n Data Outlier Removed
9: end for 1 1st 300 3
10: max dev < max(abs t) 2 2nd 298 2
11: Calc R = sd ⇤ R 3 3rd 295 3
12: Filtering dan store to row 2 4 4th 291 4
13: Remove xi 5 5th 289 2
14: calculate, row y, store to row 1 6 6th 289 0
15: row y = row1 Iteration-5th Total Outlier Removed 14
16: end while

Table I
Outlier removed on each iteration
Result of number outliers removed, standard deviation
and data mean:

1) Data have been iterated, at the 5th dan 6th iteration III. Combining Uncertainty - Simple Method
process data are same.
2) Outlier remove = 14
3) Standard Deviation = 0.071 Uncertainty calculation using Simple Method
4) Mean Data = 54.49

Ilim
the following is shown the result of each iteration process. Km = (3)
zF AC
# Data file Input is got from a csv’s file -> d1.csv
# sd = standard deviation
54.65 # xm = mean
# R Peirce’s Table by Ross 2003 ( N>60 using N=60 )
# N > 60, using N = 60
Result Measurement

# (Ref: Mathworks Discussion Forum, 17 Nov 2015)


# Reference: http://www.mathworks.com/matlabcentral/
# /fileexchange/21562-peirce-s-method
54.50

# accessed 22 March, 2016


#
# D[row index,col index]
# row index = number of Sample
# col index = iteration n-
54.35

#===================================
usr.filename <- function()
{
readline(”Please enter filename (csv format: ”)
0 50 150 250 }
data.filename <- usr.filename()
times elapsed # ————
# R Table Ross
# ————
D <- read.csv(”/data/Ross.csv”, header = FALSE,
Figure 3. Data have been omitted their outlier (at iteration-5th) strip.white = TRUE)
# ———–
# Data Source
z=2 # ———–
Ilim = 54.49 (uncertainty 0.5 %) t <- read.csv(data.filename,
header = FALSE, strip.white = TRUE)
A = 1.8 x 10 6 m2 (uncertainty 2 %) t <- t[,1]
C = 500 mol/m3 (uncertainty 2 %) data awal <- t
xm <- mean (t)
F = 96 487 A.s/mol sd <- sd(t, na.rm = FALSE)
Ilim : max <- max(t)
row1 <- NROW(t)
Ilim (max) = 54.49 + 0,27245 = 54,76245 c <- 60
Ilim (min) = 54.49 - 0,27245 = 54,49 if (row1 < 60){
c <- row1
A: }
A(max) = (1.8 + 0.036) 10 6 m2 = 1.836 x10 6 m2 row x <- NROW(t)+1
row y <- NROW(t)
A(min) = (1.8 - 0.036) 10 6 m2 = 1.764x10 6 m2 iteration <- 1
C: while (row x != row y) {
abs t<-0
C(max) = 500 + 10 mol/m3 = 510 for(i in 1:row y)
C(min) = 500 + 10 mol/m3 = 590 {
# i-th element of t abs t -> absolute |xi-xm| max
Use Simple Method to Calculate Km Uncertainty abs t[i] <- abs(t[i]-xm)
Ilim (max) 54.76245 }
Km (max) = zF A(min)C(min) = 2⇤96487⇤1.764⇤10 6 ⇤510 = max deviation <- max(abs t)
0.315 allow1 <- sd*D[c,iteration]
Ilim (min) 54,49 f <- data.frame(t,abs t)
Km (min) = zF A(max)C(max) = 2⇤96487⇤1.836⇤10 6 ⇤590 = t2 <- abs t[abs t < allow1 ]
0.261 abs f<-0
fi1 <- data.frame(0,0)
UKm = Km (max) 2 Km (min) d =1
U Km = (0,315 2 0,261) for(i in 1:row y)
{
UKm = 0.027 if(f[i,2] < allow1 )
{
Result: fi1[i,2] <- f[i,2]
fi1[i,1] <- f[i,1]
UKm = 0.027 d < d+1
}
References }
t <- fi1[apply(!is.na(fi1), 1, any), ]
[1] Jr B. A Gould. On peirce’s criterion for the rejection of doubtful row2 <- NROW(t)
observations, with tables for facilitating its application. The t <- t[,1]
Astronomical Journal, IV(83), 1855. cat (”ITERATION -”, iteration , ”\n”)
print(t)
[2] Stephen M Ross. Peirce’s criterion for the elimination of suspect iteration <- iteration + 1
experimental data. Journal of Engineering Technology, 20(2):1– row x <- row y
12, 2003. row y <- row2
}
Appendix xm <- mean (t)
sd <- sd(t, na.rm = FALSE)
#===================================
max <- max(t)
# Created by Adian March 23, 2016 ( R language )
outlier removed <- abs(NROW(t)-NROW(data awal))
#———————————————————————————-
cat(”Standard Deviation =”,sd)
# Method Reference: Article (Ross2003) Ross, S. M.
cat(”Mean Data =”, xm)
# Peirce’s Method for the elimination of suspect data
cat(”Outlier Removed =”, outlier removed)
# Journal of Engineering Technology, 2003,20, 1-12
cat(”done...”)
# ———————————————————————————

View publication stats

You might also like