# Applied  Statistics  and  Statistical  Software,  02441

Effect  of  hardness  and  detergent  on  enzymatic  catalysis

........... 2   Methods ......................... 9     ..........................................Table  of  Contents   Technical University of Denmark Dept of Mathematics and Computer Summary .................................................................................................................................................................................................................................................................................................................... 1   Introduction ................................................................................................................................................................................................................................................................................................................................................................................................................................................................ 6   Conclusion .............. 1   Data ................................................ 3   Results ................................................................... 8   Appendix II .................................................................................................................... 7   Appendix I..................................................................................................... 4   Discussion ..................................................................................................

To   optimize   the   effect   of   removing   certain   stains   on   textile   surfaces   the   effect   is   analyzed   by   means   of   a   laboratory   experiment.   The   aim   of   the   analysis   is   to   describe   and   compare   the   performance   of   the   enzymes   and   how   different   conditions   affect  enzyme  performance.   concentration   of   enzymes   and  enzyme  type  is  of  significant  importance  in  stain  removal  processes.  It  is  known  that  factors.     The   experiment   measures   how   much   protein   is   removed   from   a   surface   with   “Surface   Plasmon   Resonance  technology  (SPR)”.Technical University of Denmark Dept of Mathematics and Computer 1 2 3 4 5 6 7 8 9 10 11 Summary   Different   enzymes   improve   textile   washing.     1|Page .  Adding  more  protein  to  samples  also  has  an  effect   on   protein   removal.  determining  whether  the  day   to  day  measurements  are  reliable  and  testing  robustness  of  the  experimental  set-­‐up.  such  as  detergent  and  water   hardness   affect   this   catalyzation   rate.  and   a   range   of   enzymatic   concentrations.   and   it   is   therefore   of   interest   to   quantify   these   effects.   Enzymatic   activity   enhances  this  process  by  means  of  catalyzation.   and   translates   it   into   a   protein   removal   response.   We   wish   to   use   statistical   methods   to   examine   differences   and   trends   in   the   samples.   We   conclude   that   addition   of   detergent.   Using   the   15   nM   enzyme   concentration   and   addition  of  detergent  shows  that  protein  A  removes  proteins  of  surfaces  significantly  better  than   the   other   proteins   included.   A   replication   of   each   experiment   yields   a   grand   total   of   160   different  samples.  hard  and  soft  water.   We   construct   a   model   using   variance   analysis   that   accounts   for   90%   of   the   variance   in   the   data.  This  analysis  includes  factorial  interactions.  The  conditions  are  addition  of  detergent.   The   performance   of   five   enzymes   was   measured   as   the   amount   of   protein   removed   from   a   surface   when   exposed   to   the   factors   hardness   and   detergent.   and   that   addition   of   detergent  yields  the  highest  protein  removal.   These   factors   are   some   of   the   conditions   that   appear   in   normal   laundry   wash   processes.   but   the   effect   is   saturating.     12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Introduction   Efficient   protein   removal   is   of   vital   importance   in   the   textile   cleaning   industry.  which  is  a  biosensor  that  measures  a  resonance  signal  on  a  gold  surface.   We   show   that   hardness   of   water   has   no   significant   influence   on   enzyme   performance.   and   it   is   of   interest   to   know   performance   under   various   conditions.   The   experiment   is   conducted   using   5   different   proteins  under  different  conditions.

The  output  is  response  which  is  given  as  the  amount  of  protein  removed  in  RU  (10-­‐6  g  m-­‐2).   Figure  1  shows  the  log10(Response)  together  with  detergent  and  hardness.  so  each  specific  enzyme  is  labeled  with  a   single   date.         2|Page .   C.     A  summary  of  all  data  included  in  our  analysis  can  be  seen  in  Table 1.  De0. Data   The  dataset  consists  of  160  samples.  for  detergent  and  calcium  respectively. Green = 7.5 nM and Blue = 15 nM.   2. Red = 2. Black = 0 nM.  Ca+  and  Ca0.     The   enzyme   concentrations   were   tested   under   4   different   levels:   0   nM.   Both   these   variables  are  binary  and  denoted  De+.   B.5   nM   and   15   nM.  No  enzymes  were  run  the  same  day.5   nM.  with  80  different  experimental  combinations.   D   and   E   and   analyzed   one   at   a   time. Increasing concentration increases the response.  There  is  some  indication  in  the  data   that   the   enzyme   catalyzation   gets   saturated   at   the   high   concentrations   (Figure 2).   Increasing   the   concentration   of  enzymes  in  the  samples  also  increases  the   response. but there is some saturation towards the high concentrations.   7. There does not seem to be a correlation between hardness and response.  The  enzymes   were   labeled   A. B) The response on hardness of water.   the   result   is   sampled   randomly   into   a   variable   called   cycles.5 nM.   with   the   experiment   running   for   2   consecutive  days. extrapolation  beyond  existing  data  points  is  very  uncertain.Technical University of Denmark Dept of Mathematics and Computer 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 Figure 1: A) Boxplot of protein response with detergent+ and detergent0.   and   Figure 2: Response vs concentration of the 5 different enzymes. Detergent plus alone seems to have a positive effect on the response.   After   the   experiment   has   run.   This   may   potentially   yield   two   problems:   fitting   the   data   as   a   linear   model   gives   more   imprecise   results   at   high   concentrations.   The   factors   included   in   the   data   are   hardness   of   water   and   addition   of   detergent.

With Detergent (Det+) or without (Det0).  Det0 Ca+.  2. with the exception of a single outlier: Enzyme B. In order to deal with this we log transformed the response variable before making the model.  C.3     56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 Methods   Our first approach was to assume the level of response is given as a function of the various factors and enzymes.Technical University of Denmark Dept of Mathematics and Computer 53 54 55   Table 1: Table of variables and response variable in the model. Initially we assumed that there was no bias in the response measurements for neither cycle nor days. if this is needed in the future.4   Levels     10   34   5   6.  D. These figures revealed some inconsistencies in variance. have constant variance and are normal distributed. 3|Page .  B.  E Variance           33.0         154254. This gave acceptable distributions. An ANCOVA analysis of data was conducted using two-way interactions. Hardness: with ½(Ca+) or without (Ca0) Variable  Name   Variables   Run  Date   Cycle   Enzyme  Type   Enzyme   Concentration   Detergent   Hardness   Output   Response   Type     Categorical   Categorical   Categorical   Continuous   Categorical   Categorical     Continuous   Mean   Categories     3/12/2008   5/12/2008  … 1. and Cooks distance for the observations (Appendix 1).3   Det+. To verify this we inspected the fit with residuals. a QQ-normality plot. αi is the intercept of the model with factor i. as this allowed us to interpolate between levels.  Ca0   2   2     434. When subsetting data and only test for one factorial variable we use a regular ANOVA 𝑌!" =   𝛼! + 𝜀!" (2) The assumptions for these models are that residuals are independent.  3  …  34 A. β is the slope in the model and Xj are the observations in the data. ε is noise. Concentration was used as a continuous variable. The equation for ANCOVA is 𝑌!" = 𝛼! + 𝛽! 𝑋! + 𝜀!" (1) Where Y is the Response variable.

1. we used a more graphical approach. As we already concluded that detergent significantly influenced data. Enzyme B is apparently a little less efficient relative to the others when detergent is added. Enzyme Concentration and Detergent.Technical University of Denmark Dept of Mathematics and Computer 75 76 77 78 concentration = 0. The model showed that Hardness had no significant effect (p > 0.5.327   0. enzyme type and concentration significantly influence protein removal.001.30). Det0 and Ca0 looks biased in the response.14. All statistical analysis was performed using R 2.33E-­‐14***   2. Detergent.01. Significance levels: *** p < 0.932   15. 79 80 81 82 83 84 85 86 87 88 89 90 Results   We tested a multi-way ANCOVA model with log-response as a function of the continuous variable concentration. * p < 0. and was thus removed from the data set.54E-­‐03**   3. Hardness and all their two way interactions. P-values and a summary of the model are shown in table II.31E-­‐44***   3. Figure 3 shows that some of the enzymes react slightly different to addition of detergent. This concludes that detergent.     Enzyme  Type   Enzyme  Concentration   Detergent   Enzyme:Detergent   Enzyme  Concentration:Detergent   Degrees   of   Sum   of   P-­‐value   freedom   squares   4   1   1   4   1   3.33E-­‐27***   1. and the 3 categorical variables Enzyme. Table 2: Results from the ANCOVA model including Enzyme type.768   1. scripts and analysis are documented in Appendix II. ** p < 0. The final model explains 89% of the variation in the data. The sum of squares value shows that the most important factor to remove protein is detergent.61E-­‐13***   91 92 93 94 95 96 97 Secondly we wanted to determine whether or not the interactions where the same for all enzymes included in the model.711   2. The interactions are denoted with a “:”.29) on response. As the difference between the curves are not the same with and without detergent implies that not all the proteins interact the same way with the factor.611   31. After removing Hardness as a factor the model showed that the two-way interaction between Enzyme type and Enzyme concentration had no effect either (p > 0. 4|Page .

** p < 0. we also tested the response compared to the random variable.00E-­‐05  ***   8. Significance levels: *** p < 0. with the optimal conditions for protein removal. 98 99 100 101 102 103 104 105 106 107 108 The next step was to investigate if one enzyme was significantly removing proteins better than the others. The summary of all the enzymes can be seen in Table 3.Technical University of Denmark Dept of Mathematics and Computer Figure 3: log10 response vs enzyme concentration for the 5 different enzymes. with a possible minimum of 1391.5. Table 3 The 5 enzymes compared to enzyme A with 15nM added. In comparison the worst enzyme was D with a mean protein removal of 663.87E-­‐05  ***   109 110 111 112 113 114 5|Page Analysis  of  experimental  bias   To investigate if there was any systematic error in the data sampling.15E-­‐06  ***   1.5%   1392   893   932   605   1003   97. Given is also the mean and the 95% confidence interval.01.7 and a maximum of 1676. To do so we split up the data set into a smaller group.39E-­‐10  ***     8. Furthermore we tested day to day variation by testing the samples which has 0 nM concentration of enzymes against one another.5%   1676   1076   1122   728   1208   P-­‐value     3. Structural differences between the two plots suggest stronger interactions between some enzymes with detergent than others.5 RU. which also states the p-values compared to enzyme A.2 on a 95% confidence interval. B) without detergent added.     Enzyme  A   Enzyme  B   Enzyme  C   Enzyme  D   Enzyme  E   Estimate   1527   980   1023   663   1101   2. with detergent added and 15nM enzyme added. ANOVA shows that all the other enzymes are significantly different from A. . A) with detergent added to the sample.001. With an ANOVA-test we were able to determine that enzyme A was the best enzyme with a mean protein removal of 1527.3 RU. cycle. * p < 0.

and comparing it to a model with concentration as a continuous variable . which yielded the possibility of using the variable as a factor or as a continuous variable. This was done in order to be able to interpolate between experiments with different concentration. As all cycles are sampled randomly. in spite of the fact that using it as a factor gave a significantly better model. In order for our analysis to be valid in terms of hardness and detergent in the water. and therefore the credibility of the results.0013). The addition of the reference samples would also be helpful in this particular problem. The analysis shows that enzyme types are significantly different (p < 0. If the main purpose of the experiment is to determine the catalytic capabilities of enzymes. Using  concentration  as  factor  instead  of  continuous  variable   Enzyme concentration was given at four different levels. there should be no significant difference between the cycles. and comparing them with an ANOVA shows that using concentration as a factor gives a significantly better fit (p < 1. and with concentration 0.47).31E-21). using hardness. Concentration was used as a continuous variable instead of a factor. In order to make a better model it should be considered to make a model more appropriate than a linear fit for enzyme kinetics. this strongly indicates experimental bias in the setup. which apparently looks more like a Michaelis-Menten saturation model.Technical University of Denmark Dept of Mathematics and Computer 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 To test if cycle sampling influenced the deterioration of proteins we made an ANOVA with cycle as explanatory variable. Making a model with concentration as a factor. or sampling error at the day enzyme B was analyzed. The day-to-day runs were explored by an ANCOVA analysis. Notably the sample or experimental bias within day B yields high uncertainty. More data would have given us more samples to do the statistically analysis and check for errors within the different groups. The implication and possible solutions of this are evaluated in the discussion. In the previous section we have used it as a factor. 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 Discussion   The few data points for each group give large statistical uncertainty in the validation of the data. 6|Page . the addition of less detergent would be appropriate. in order to be able to extrapolate the model. As the samples should be equal (no enzyme added). This turns out to be true as the difference between cycles is insignificant (p > 0. With the data set available it would be reasonable to adjust data to an expected mean with 0 enzyme added. Enzyme type and detergent as explanatory variables. as it turns out to be the main factor explaining protein degradation from surfaces. we assume that the concentrations of these two factors are the same in all observations.

enzyme concentration and enzyme type has a highly significant positive effect. All five types of enzyme have a significant reaction with detergent. Their interactions are also significant. We also conclude that enzyme A has the highest catalytic capabilities under optimal conditions. with the interaction between concentration and enzyme type omitted. Oppositely detergent. 7|Page .Technical University of Denmark Dept of Mathematics and Computer 148 149 150 Future work with the data would include a more thorough analysis of interactions with the individual enzymes. and that enzyme D is the worst enzyme for removing proteins. but to what degree is important knowledge. as well as with concentration. 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 Conclusion   We conclude that hardness does not have an effect on protein degradation in our experiment.

Technical University of Denmark Dept of Mathematics and Computer 181 182 183 184 185 186 187 188   8|Page .179 180 Appendix  I   Figure  1  shows  the  residuals  of  the  log  transformed  analysis.

table('SPR. data = mydata) m <.read.'0')) # Make Enzyme a factor for extra model determination ##################################### Make a linear model of all variables ####################### fm <. # Script for case 1 detergent Technical University of Denmark Dept of Mathematics and Computer ################################# Load and view data ########################### setwd("~/My Dropbox/02441 Applied Statistics and Statistical Software/Case_1") # Set working directory graphics.factor(mydata\$EnzymeConc .factor(mydata\$Cycle.'2.mydata[-14.txt'.3]) # Log response for equal variance mydata <.log10(mydata[.lm(lResponse ~ (Enzyme+EnzymeConc+DetStock)^2 .'factor')) #################################### Manipulate data for cleaner analysis ######################### mydata\$Cycle <.lm(lResponse ~ (Enzyme+EnzymeConc+DetStock+CaStock)^2 .189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 Appendix  II   R-code used to calculate statistics in the report.lm(lResponse ~ (Enzyme+EnzymeConcF+DetStock)^2-Enzyme:EnzymeConcF.'numeric'. 'numeric'.] # Remove outlier 14 mydata\$EnzymeConcF <.5'.summary(fm) # See and assign summary anova(fm) # Make ANCOVA to see #### Leave calcium out of the final model.'factor'.'factor'.fmfac.'7. data = mydata) summary(fm1) an <. 'factor'.anova(fmfac) aF anf <. header = TRUE.5'. colClasses = c('factor'. data = mydata) summary(fmfac) aF <. also the interaction betweeen EnzCon and Enz ##### fm1 <.off() mydata <.levels = c('15'.Enzyme:EnzymeConc.anova(fm1) an ##### Use concentration as a factor to compare the two models ######## fmfac <.anova(fm1. levels = 1:34) # Put cycles conscutively mydata\$lResponse <. test = 'F') anf #### Using Concentration as a factor describes data significantly better ####################### #### Show graphically that the data have equal variance and are normally distributed #### 9|Page .

which = 1:4) # plot the model.2. col = cols[4]) points(lResponse ~ EnzymeConc. data = mydata.6] == 'Det+'.5] == '15') # Make a linear model.2)) # 4 subplots. data = mydata. data = mydata .subset = mydata[.off() cols <.subset = mydata[.summary(fmHigh) # Put summary into a variable ints <.off() # Remove existing graphics windows(width = 7) # Make a new window 7 inches wide par(mfrow=c(2.4] == 'A'. data = mydata.subset = mydata[.lm(lResponse ~ Enzyme -1.5] == '0') summary(fm0) anova(fm0) ############################ There is significantly differences between the days with 0 enzyme ### ########################## See if the enzymes differ under optimal conditions #################### mydatadet <. data = mydata. data = mydatadet.confint(fmHigh) # Calculate the 95% confidence anova(fmHigh) # There is difference among enzymes mens <. data = mydata. col = cols[3]) points(lResponse ~ EnzymeConc. and get all intercepts (remove -1 for more usable model).4] == 'A'.mydata[mydata[. subset = mydatadet[.1)) # Make margins as small as possible plot(fm1.subset = mydata[. col = cols[5]) ############################ Looks as a saturating function ################################# ########################## Test if the days differ with 0 protein added ##################### fm0 <. type = 'n') points(lResponse ~ EnzymeConc. data = enzA) summary(fmA) anova(fmA) 10 | P a g e .1:5 plot(lResponse ~ EnzymeConc.4] == 'D'.229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 Technical University of Denmark Dept of Mathematics and Computer graphics. data = mydata. with simplified cooks distance ################################ See the response as a function of enzyme concentration ##### graphics.subset = mydata[.4] == 'E'.lm(lResponse ~ (Enzyme+CaStock+DetStock)^2.mydata[mydata[.4.lm(lResponse ~ (DetStock + EnzymeConc)^2. in a 2by2 matrix par(mar = c(5.subset = mydata[.fmHigh\$coefficient # Retrieve means ####################### A is significantly better than all the other enzymes ###################### ####################### Do the enzymes interact the same way between effects ######### enzA <.4] == 'B'. col = cols[2]) points(lResponse ~ EnzymeConc.] # Find all the data with Det+ fmHigh <. col = cols[1]) #abline(afm[1].] fmA <. Subset 15 nM p <. afm[2]) points(lResponse ~ EnzymeConc.4] == 'C'.

lm(lResponse ~ (DetStock + EnzymeConc)^2.5.5.4] == 'B'.tapply(mydata\$lResponse.lm(lResponse ~ (DetStock + EnzymeConc)^2. data = enzC) summary(fmC) anova(fmC) enzD <. xlab = 'Enzyme' .4. ylim = c(0.] fmD <.mydata[mydata[.] fmB <.height= 2) par(mar=c(2.levels(mydata\$EnzymeConcF) enZ <.mydata[mydata[.mydata\$Enzyme).] fmE <.4] == 'E'.0.off() cols <. ylab = 'log10 Response'. beside = TRUE.lm(lResponse ~ (DetStock + EnzymeConc)^2.4] == 'C'. data = enzB) summary(fmB) anova(fmB) enzC <.5) for (i in 1:4) 11 | P a g e .mydata[mydata[.1:4 windows(width = 3.matrix(0.8.] lvl <.levels(mydata\$Enzyme) Detline <. data = enzE) summary(fmE) anova(fmE) ############################ They are all significantly correlated with both effects ######## ########################### Plot different figures ########################################## graphics.] fmC <.mydata[mydata[.4.270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 Technical University of Denmark Dept of Mathematics and Computer enzB <.6] == 'Det+'.mean) barplot(dat.lm(lResponse ~ (DetStock + EnzymeConc)^2.3)) abline(0.4] == 'D'.0) # Bar plot of concentration vs response for all 5 enzymes #################### Detergent figures ####################################### mydatadet <.5.list(mydata\$EnzymeConc.0.mydata[mydata[. data = enzD) summary(fmD) anova(fmD) enzE <. col = cols.5)) dat <.

ylab = '') for (i in 1:5) { points(lvl.mydatadet[mydatadet[.1)) plot(lResponse ~ EnzymeConc.j] <.3.j] <. type = 'o'.3.5] ==lvl[i]. data = mydatadet . ylim = c(0.5] ==lvl[i].lvl colnames(Detline) <. labels = 'B') ## Detergent plots! ########### Make a box plot of response vs addition of detergent and hardness ################ 12 | P a g e .mydatadet[mydatadet[. type = 'n'.Detline[.i].311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 Technical University of Denmark Dept of Mathematics and Computer for (j in 1:5){{ sub1 <.enZ mydatadet <.sub1[. ylab = 'log10 Response (RU)') for (i in 1:5) { points(lvl.] Detline2[i.sub1[.matrix(0.4).mean(subset(sub1\$lResponse. data = mydatadet .5) for (i in 1:4) for (j in 1:5){{ sub1 <.2.2)) par(mar= c(4.5.3.4). col = i. xlab = 'Enzyme Concentration (nM)'.mean(subset(sub1\$lResponse.i]. pch = 16) } legend('bottomright'.mydata[mydata[.2. horiz=TRUE. main = 'Without detergent'. type = 'n'. xlab = 'Enzyme Concentration (nM)'.4] == enZ[j])) }} rownames(Detline) <. ylim = c(0.5.6] == 'Det0'.enZ. cex = 0. fill = 1:5.3.off() windows(width=7. type = 'o'.2.3.8. main = 'With detergent'.3.] Detline[i. title = 'Enzyme') text(0.enZ graphics.lvl colnames(Detline2) <.4.] Detline2 <.5) par(mfrow=c(1.4] == enZ[j])) }} rownames(Detline2) <. labels = 'A') plot(lResponse ~ EnzymeConc. col = i.4. pch = 16) } text(0.3.Detline2[.

data = mydata.1. offset=2) 13 | P a g e . xlab = 'Hardness'. ylab = 'log10 Response (RU)') text(mydata\$DetStock[1].labels = 'A') plot(lResponse ~ CaStock.1.8. pos = 4.5. data = mydata. labels ='B'.0. ylab = '') text(mydata\$CaStock[1].5) par(mfrow=c(1.off() windows(width=7.4.1)) plot(lResponse ~ DetStock.352 353 354 355 356 357 358 359 360 361 362 363 Technical University of Denmark Dept of Mathematics and Computer graphics. height = 2.2)) par(mar=c(2.5.