You are on page 1of 14

PROTEIN SPRAY-DRYING (Full Fac)

Investigating the effect of process factors on the degradation of spray-dried protein

Background
Spray-drying is a process often used for drugs intended for inhalation. When spray-drying
proteins, the main aim is to produce particles of a specified size. In addition, it is important
that the protein temperature remains relatively low to avoid unnecessary denaturation.
Protein degradation may involve many complicated physical and chemical processes,
including denaturation. Therefore, we wish to study protein stability at a molecular level in
order to facilitate formulation applications. This example is based on a model protein (D7599)
developed by AstraZeneca where protein powders of D7599 were produced by spray-drying.

Objective
The experimental objective of this study was to determine which process parameters
influence the quality of the spray-dried product. The data analysis will introduce the use of
the optimizer functionality as there are some conflicting demands to resolve. Original data
source: Cronholm, M., The Effect of Process Variables on a Spray-dried Protein Intended for
Inhalation, Undergraduate Research Study, Department of Pharmaceutics, Uppsala
University, Uppsala, Sweden, 1998.

Data
Spray-drying conditions were varied using a full factorial design in four factors:
 Inlet Temperature – temperature of drying air at the inlet of the equipment. The high
and low levels of this factor were set such that degradation would be expected at the
high level (220C) but not at the low level (100C).
 Atomization gas flow – for this factor the low level (500 l/h) of the atomization gas
(nitrogen) was the minimum required to provide sufficient energy for atomization.
The high level (800 l/h) was the maximum achievable flow with this spray-dryer.
 Aspiration rate – the aspirator draws air through the instrument and this was varied
from 60% to 100% (full capacity).
 Feed-flow – indicates the material flow through the equipment. Here, the high level
of 5ml/min was the maximum rate which could be used at the low temperature
without condensation appearing in the drying chamber; the low level (2 ml/min) was
chosen as the slowest practical rate.
To characterize the outcome of spray-drying the following five responses were measured:
 Yield – the amount of product produced. This should be maximized.
 Size – particle size. Ideally, particles should be in the range 0.5–3.3 m in order to
reach the lower airways.
 Water – water content in the spray-dried protein. This should be minimized.
 Outlet temperature – outlet air temperature. This temperature may influence protein
degradation and was therefore included. No specific target value was specified for
this response.
 HMWP – high molecular weight proteins. Measures the extent of aggregations, i.e.,
the formation of dimers and oligomers of the protein. This should be as low as
possible.

Copyright Sartorius Stedim Data Analytics AB, 20-04-20 Page 1 (14)


Tasks
Task 1
Initiate a new investigation in MODDE. Define the four factors and the five responses
according to the information above. Select Screening and the full factorial design in 16 runs
supplemented by three center points. Enter the response data or copy them from the file
Raw data for DOE computer exercises.XLS.

Task 2
Use the analysis wizard to work through the responses. Evaluate the raw data. Is there any
need for data pretreatment, for example transforming the response data? Consider the
responses one by one and try to find best possible model. Which factors are important? Are
there any non-significant model terms? Are the residuals approximately normally
distributed? Refine the model, if necessary.

Task 3
Interpretation of how the factors will influence the responses can be difficult by studying the
coefficients only, especially when there are interactions and higher order model terms in the
model. An alternative way to judge how the factors influence the response is the contour
plot. In this case, when we have more than 2 factors, it can be convenient to use the 4D
contour plot. Compare for instance Yield with HMWP to see the factor influences.

Copyright Sartorius Stedim Data Analytics AB, 20-04-20 Page 2 (14)


View the difference in interpretation of HMWP when the factors InT and AsP are set as 1st
and 2nd axis in the 4D contour plot.
Create a 4D sweetspot plot accordingly and discuss the possibilities for a solution to the
problem. Can we get a solution where all criteria are fulfilled?

Task 4
With many factors and responses it may be difficult to find the optimal solution to a problem
and in many cases it will be a compromise. The Optimizer in MODDE will help with this
search for a solution.
To obtain an overview of what is possible the Dynamic Profile Plot in the optimizer can be
used. Discuss what’s needed to get a solution with the current data.
If it is impossible to reach a solution, alternative options involve extending the factor ranges
or relaxing the response specifications. Discuss and test the two options based on the
optimizer result and the Dynamic Profile Plot.
The last option is to consider a follow up design in an interesting area with some of the
current factors locked and optionally including some other factors in the investigation.

Task 5
When you have an investigation with more than one response it can be of interest to know
how the responses are correlated. Correlated responses should be influenced approximately
equally by the factors (approximately the same model) and uncorrelated responses will have
different models.
Correlations can for example be shown by plotting responses against each other. However
when the number of responses goes up it is better to show the correlation matrix (Worksheet
| Correlation Matrix).
Where can we detect a conflict based on the response correlations?

Copyright Sartorius Stedim Data Analytics AB, 20-04-20 Page 3 (14)


Solutions to PROTEIN SPRAY DRYING

Task 2
The analysis wizard was used to work through the analysis steps for all responses.
Response 1 (Yield)

The model is good and the residuals are


normally distributed with no outliers, the
proof for a valid model. However, the
coefficient plot shows that some interaction
terms are small, so these can be removed to
simplify the model, gain Df, and get the
most accurate predictions (good Q2).

In the coefficient plot page the exclude tool was used to remove the small and insignificant
interaction terms. Q2 was used as the criterion to determine whether the model improved,
starting with the smallest coefficient and excluding them one at a time. This procedure
resulted in the following model diagnostics.
Note: If you are running the autotune option the model will contain eight coefficients. The
model seen below has only six coefficients. There is a marginal difference in Q2 of 0,8%. Due
to the parsimony principle, the model with six coefficients is the preferred one.

Copyright Sartorius Stedim Data Analytics AB, 20-04-20 Page 4 (14)


Response 2 (Size)

The replicate error is small but the


distribution is skewed and a log-transform
of the response data might be preferable for
the model performance. The skewness test
does not recommend a transformation,
however.

Copyright Sartorius Stedim Data Analytics AB, 20-04-20 Page 5 (14)


The square test has detected a non-linear
phenomenon. The term Ato^2 was added to
the model because Ato is the largest main
effect. This square term is confounded with
the other square terms, and more
experiments would be needed to fully
resolve the confounded square terms.
The final model after autotuning has good
overall modelling statistics and normally
distributed residuals.

Copyright Sartorius Stedim Data Analytics AB, 20-04-20 Page 6 (14)


Response 3 (Water)

In this case everything looks good. The only


thing to do is to remove insignificant terms
from the model.

Removing the insignificant terms produced the following result.

Copyright Sartorius Stedim Data Analytics AB, 20-04-20 Page 7 (14)


Response 4 (Outlet Temp)

The replicate plot shows that there is limited


spread in the replicates.
The histogram shows a peculiar distribution
and indicates one dominating factor.
The residuals are mainly normally
distributed but one experiment (10) is on
the border to be an outlier and could be
checked (typos or other problems).
Insignificant terms should be removed from
the model.

Removing the insignificant terms produced the following diagnostics.

The model validity is a little low and experiment 10 is a statistical outlier. The model is overall
very good (high R2 and Q2). Removing this experiment leads to higher validity but will that
be a more reliable model? The influence of the deviating run number 10 is quite small due to
rather high Df. Recommendation is to always keep all data if no obvious fault is confirmed.

Copyright Sartorius Stedim Data Analytics AB, 20-04-20 Page 8 (14)


Response 5 (HMWP)

The replicate plot shows some experiments


out of specification with some accelerated
pattern. The histogram shows a
corresponding skewed distribution and the
response data should be log transformed.
The model is poor (low Q2 and negative
model validity).

Log-transforming the response data results in the following diagnostics.

Copyright Sartorius Stedim Data Analytics AB, 20-04-20 Page 9 (14)


The transformation improves model
statistics significantly.
The model needs trimming, i.e. insignificant
terms need to be excluded.

After trimming, the model statistics improved significantly. Note: If you are running the
autotune option the model will contain four coefficients. The model seen below has only
three coefficients. There is a marginal difference in Q2, less than 1%. Due to the parsimony
principle, the model seen below is the preferred one.

Copyright Sartorius Stedim Data Analytics AB, 20-04-20 Page 10 (14)


Task 3
4D contour plots for Yield and HMWP using factors InT and Asp on the inner axis shows that
high yield and low HMWP is achieved when InT is low, Asp high, FF high and Ato low.
4D contour plot for response Yield.

4D contour plot response HMWP.

Copyright Sartorius Stedim Data Analytics AB, 20-04-20 Page 11 (14)


Below is a 4D sweetspot plot using responses Yield and HMWP confirming the information extracted
from the 4D plots.

Task 4
The optimizer was used to find factor combinations with good operating conditions. The
response specifications were automatically set according to the experimental goals entered
in the response definition.

The results of running the optimizer are shown below. Apparently, we have not completely
achieved the desirable values for the responses, but many of the Alternative setpoints display
results where many of the goals are met. The main problem is achieving the Water
requirements. The following approximate operating parameters were suggested in order to
comply with as many of the endpoints as possible: Inlet temperature: 118C, Atomization gas-
flow 559 l/h, Aspiration rate 100%, and Feed-flow 2 ml/min.

Copyright Sartorius Stedim Data Analytics AB, 20-04-20 Page 12 (14)


The optimizer cannot find factor settings that will fulfill the criteria set up for the responses
visualized in the Graph where the black point must be in the white area to have reached the
criteria. In such a case your option is to change response specifications, exclude less
important responses and/or change the factor and/or response intervals to get inside
specifications.
In this situation, the Dynamic Profile Plot in the optimizer can be used to provide a system
overview (i.e., to understand what is possible and what is not possible). Compare factor
contribution and the slope in the DP plot. Atomization Gas Flow has the highest factor
contribution and the largest effect across all responses.
For example an increase in the AGF setpoint will move the prediction inside Size
specification but out of Yield specification and with no effect on HPMW. If it is impossible to
reach a solution a possibility is to either extend the factor settings or relax the response
specifications; test these two options based on the optimizer result and the Dynamic Profile
Plot. If Yield and Water content are the most critical responses, an obvious setpoint to
increase will be Inlet Temperature.

Copyright Sartorius Stedim Data Analytics AB, 20-04-20 Page 13 (14)


Task 5
The correlation matrix was opened to view how the variables are correlated. An excerpt from
the correlation matrix is shown below. This table indicates that there are two groups of
responses. The first subset contains Yield and Size which have a correlation coefficient > 0.7.
The second group is made up of Water, Outlet Temp and HMWP, which also have high
pairwise correlation coefficients. This subgrouping of the responses means that we should
expect the respective subgroups to have similar models within the groups but different
models between the groups. For example, there is a negative correlation between HMPW
and Water and that implies a conflict.

Conclusions
It is possible to develop strong models for the five responses. Best operating conditions
predicted by the models are: Inlet temperature: 118 C, Atomization gas-flow 559 l/h,
Aspiration rate 100%, and Feed-flow 2ml/min. However, at these factor settings, all
responses are not predicted inside the specification ranges and a discussion about response
priorities has to be done. A further experiment needs to be conducted to verify the results at
this point and future work could involve an optimization study based around these settings.

Copyright Sartorius Stedim Data Analytics AB, 20-04-20 Page 14 (14)

You might also like