Professional Documents
Culture Documents
DACA1
DACA1
Continuous Assessment 1
contents
INDEX ......................................................................................................................... .2
3. APPENDIX………………………………...…………………………………………8
1. BUSINESS UNDERSTANDING
1.1 BUSINESS OBJECTIVES
The organizational structure is a start-up team constituted of a software engineer and a
manufacturing engineer as shown in data. The software engineer has knowledge in different
programming languages, these skills will be important in the development of models for predictive
analysis. The manufacturing engineer has a good business domain knowledge which will be useful in
interpreting the model’s outputs and integrating them into the day-to-day business operations. The
team additionally intends to include maintenance technicians who will inspect the failure modes after
predictive analytics models detect the machine failure.It becomes obvious that the team structure is
collaborative and cross-functional.
The project faces challenges in five areas: The project faces challenges in five areas:
1. Data Analytics Expertise: The team is lacking in data analytics for it to be able to do the
forecasting.
2. Imbalanced Data: The data is skewed from normal operation, making abnormal failure
prediction hard.
3. Synthetic Training Data: Synthetic data may be not perfectly simulating real world situations
which can negatively affect the model effectiveness.
4. Failure Prediction Rate: The prediction rate of 50% may be reconsidered if the costs of false
positives and negatives are changed.
5. Maintenance Response: Whether the technicians’ reaction to technical failures is efficient or
not, depends on the success of the project.
This issue is to be addressed, possibly by a data scientist or analyst and methods of handling the
imbalance data. The success of the project depends on this.
The business objectives can be summarized as follows:
➢ Predictive Maintenance: The main purpose is to construct a model that can forecast machine
failures depending on operational conditions.
➢ Reduce Downtime: Through the forecasting of machine failures, the team aims to minimize
downtime which is quite expensive. A 50% prediction rate is already a benefit for the industry.
➢ Business Expansion: They choose the systems that are well-known, and they think that if they
are able to simulate the working of these systems and forecast failures,
➢ Leverage Domain and Technical Expertise: The group intends to take advantages of the
software engineer’s programming skills and the manufacturing engineer’s domain knowledge
to meet their goals.
The success criteria for this project can be outlined as follows:
I. Predictive Accuracy
II. Reduction in Downtime
III. Business Growth
IV. Real-world Deployment
In conclusion, the project will be determined by the predictive accuracy of the model, its ability to
reduce downtime, the business growth that follows, as well as the ease of deployment in the real
world.
1.2 ASSESS THE SITUATION
The inventory resources available for the project are Team Expertise, Synthetic Training Data &
Real-world Machinery
Requirements:
1. Modelling Machine Failure: The system requires the development of a model which can
anticipate the machine failure by the operating conditions.
2. Use of Synthetic Training Data: The model should be trained employing the synthetic data
given by our team.
3. Deployment of Predictive Analytics Models: The ultimate goal is to have these models being
run on real data that results in detection of machine failure conditions.
Assumptions:
1. Expertise of the Team: It is presumed that a software engineer has required coding skills and
a manufacturing engineer fully understands a business area.
2. Business Success: They assume their performance is guaranteed, since if they can correctly
model the normal and failure operations of the equipment, their business is destined to
succeed.
3. Machine Operation and Failure: The machine up-time is considered to exceed 95-99%. If a
failure does happen it is very expensive to deal with the downtime.
Constraints:
1. Data Availability: The team at the moment only has synthetic training data at their disposal
for the hardware they want to simulate.
2. Failure Prediction Rate: The business would benefit even in a failure prediction accuracy of
50%.
Risk & Contingency:
Data Quality: Machine failure is being modelled by the team using synthetic training data. This
synthetic data may be of varying quality and representativeness, and this may have a significant
impact on the predictive power of the model. Contingency Measure: The team needs to do an expert
validation of the synthetic data so that it is truly representative of real-world conditions.
Imbalanced Data: Machines work without any failure 95-99% of the time since the data is highly
imbalanced. This could lead to the model to have difficulty in learning how to predict breakdowns
carefully. Contingency Measure: The team could use techniques like majority class (successes) under
sampling, minority class (failures) oversampling.
High Expectations: The team reckon that predicting failures, even with a 50% rate, would be
beneficial for the business. But it may be that 50% of failures will be detected too late, that many
false alarms thus unnecessary checks and maintenance will be caused.Contingency Measure: A
realistic target and the objective of enhancing the prediction rate progressively should be set by the
team.
2. DATA UNDERSTANDING
2.1 INITIAL DATA
The dataset of Machine failure contains the following variables: The dataset of Machine failure
contains.1. Air temperature. 2.Process temperature. 3.Rotational speed. 4.Torque. 5.Tool wear
The dataset was formed after combining all the test cases files and creating a data frame from the
lines of the tests and fields of the variables. The test data was assured being free from such issues like
missing data, faulty data, errors and more by the data acquisition process. In addition, because no
solutions are needed here.
2.2 DATA DESCRIPTION & STANDARDISE
Properties of the data consists of 10,000 observations across 6 variables each.
Air temperature K: This is a numerical variable with values falling between 295.3 and 304.5. The
average air temperature of the system is approximately 300.00 K.
Process temperature K: This is a numeric variable with a range of values of 304.3 to 322.6. It is about
310.01 K for the average process temperature.
Rotational speed rpm: This is a variable of type int with values in the range 1168 to 2915. The average
rotational speed is approximately 1538.88 rpm.
Torque Nm: This is a numeric variable with values that range between values of 3.2 and 76.6. The
torque mean is about 39.98 Nm.
Tool wear min: This is an integer variable, with its values coming in the range of 0 to 255. The mean
tool wear is close to 1.79 hr.
Machine failure: It is a logical variable denoting whether a machine fault was detected. Out of 10 000
observations, 339 cases of failures were registered (3.4%).
The format of data is a numeric variable and process temperature being the air temperature in Kelvin,
an integer variable denoting the rotational speed in revolutions per minute, a numeric variable
representing the torque in Newton meters and an integer variable representing the tool wear in minutes
with a logical variable that indicates whether a machine failure occurred TRUE or FALSE.
About the Quantity The dataset with 10,000 observations across 6 variables: rotational speed, process
temperature, tool wear, torque, air temperature, and machine failure. The dataset is divided into two
subsets based on machine failure: One has 9.661 observations (no failure) and another has 339
observations (failure). Complete dataset that is free of any missing values.
The Central Tendency signify the values that determine the center of a given distribution. It is used
to watch over the middle point of the given data set. There are three measures of central tendency:
the mean, median, and mode. The mean and median values of each variable are shown as the central
tendency values in the presented data.
Overall Data:
Air temperature K: Mean = 300.00K, Median = 300.10K
Process temperature K: Mean = 310.01K, Median = 310.10K
Rotational speed rpm: Mean = 1538.88 rpm, Median = 1503.00 rpm
Torque Nm: Mean = 39.98 Nm, Median = 40.10 Nm
Tool wear min: Mean = 107.97 min, Median = 108.00 min
OK Operation Data (Machine failure = FALSE):
3. APPENDIX
*Summary of Regressions
Call:
lm(formula = Torque_.Nm. ~ Air_temperature_.K. + Process_temperature_.K. +
Rotational_speed_.rpm. + Tool_wear_.min., data = data)
Residuals:
Min 1Q Median 3Q Max
-21.716 -3.270 -0.848 2.414 30.349
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.357e+02 1.045e+01 12.985 < 2e-16 ***
Air_temperature_.K. 1.874e-01 4.912e-02 3.816 0.000137 ***
Process_temperature_.K. -2.489e-01 6.557e-02 -3.795 0.000149 ***
Rotational_speed_.rpm. -4.860e-02 2.694e-04 -180.355 < 2e-16 ***
Tool_wear_.min. -5.586e-05 7.600e-04 -0.073 0.941414
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1