DACA1

Technological University of the Shannon: Midlands Midwest
Master of Engineering in Engineering Management

(Level 9) Masters
Module: Data Analytics - (AL_ENGMF_9_1)
Continuous Assessment 1
Name Rinto Raphel

Student ID A00316049
Lecturer Name Fiona M. Walshe
Date Assignment due 25th February 2024
Declaration of Authenticity:
I confirm the attached Continuous Assessment is original and represents all my own work.
PgD in Eng in Eng Management
SN: A00304740
contents
INDEX ......................................................................................................................... .2
1. Business Understanding ...................................................................................... .2
1.1 BUSINESS OBJECTIVES…………….............................................……………………2

1.2 ASSESS THE SITUATION …………………………..……………….………….……..3
1.3 DATA MINING GOALS …………………………………………………………..…....4
1.4 PROJECT PLAN ………...………………………………………..…………………....4
2. Data Understanding ............................................................................................. .4
2.1 INITIAL DATA …………………………………..……………….………………….4

2.2 DATA DESCRIPTION & STANDARDISE……………………………...……………....5
2.3 DATA ASSOCIATIONS……………………………………………………………….7
2.4 DATA GRAPHS………………………………………………………………………7
2.5 DATA REGRESSION…………………………………………………………………7
2.6 DATA QUALITY………………………………….…………...……………………..7
3. APPENDIX………………………………...…………………………………………8
Data Page 1/9 AL_ENGPG_9_1

Master of Engineering in Engineering Management (Level 9) Masters
1. BUSINESS UNDERSTANDING
1.1 BUSINESS OBJECTIVES
The organizational structure is a start-up team constituted of a software engineer and a
manufacturing engineer as shown in data. The software engineer has knowledge in different
programming languages, these skills will be important in the development of models for predictive
analysis. The manufacturing engineer has a good business domain knowledge which will be useful in
interpreting the model’s outputs and integrating them into the day-to-day business operations. The
team additionally intends to include maintenance technicians who will inspect the failure modes after
predictive analytics models detect the machine failure.It becomes obvious that the team structure is
collaborative and cross-functional.
The project faces challenges in five areas: The project faces challenges in five areas:
1. Data Analytics Expertise: The team is lacking in data analytics for it to be able to do the
forecasting.
2. Imbalanced Data: The data is skewed from normal operation, making abnormal failure
prediction hard.
3. Synthetic Training Data: Synthetic data may be not perfectly simulating real world situations
which can negatively affect the model effectiveness.
4. Failure Prediction Rate: The prediction rate of 50% may be reconsidered if the costs of false
positives and negatives are changed.
5. Maintenance Response: Whether the technicians’ reaction to technical failures is efficient or
not, depends on the success of the project.
This issue is to be addressed, possibly by a data scientist or analyst and methods of handling the
imbalance data. The success of the project depends on this.
The business objectives can be summarized as follows:
➢ Predictive Maintenance: The main purpose is to construct a model that can forecast machine
failures depending on operational conditions.
➢ Reduce Downtime: Through the forecasting of machine failures, the team aims to minimize
downtime which is quite expensive. A 50% prediction rate is already a benefit for the industry.
➢ Business Expansion: They choose the systems that are well-known, and they think that if they
are able to simulate the working of these systems and forecast failures,
➢ Leverage Domain and Technical Expertise: The group intends to take advantages of the
software engineer’s programming skills and the manufacturing engineer’s domain knowledge
to meet their goals.
The success criteria for this project can be outlined as follows:
I. Predictive Accuracy
II. Reduction in Downtime
III. Business Growth
IV. Real-world Deployment
Data Analytics Page 2/9 AL_ENGMF_9_1

In conclusion, the project will be determined by the predictive accuracy of the model, its ability to
reduce downtime, the business growth that follows, as well as the ease of deployment in the real
world.
1.2 ASSESS THE SITUATION
The inventory resources available for the project are Team Expertise, Synthetic Training Data &
Real-world Machinery
Requirements:
1. Modelling Machine Failure: The system requires the development of a model which can
anticipate the machine failure by the operating conditions.
2. Use of Synthetic Training Data: The model should be trained employing the synthetic data
given by our team.
3. Deployment of Predictive Analytics Models: The ultimate goal is to have these models being
run on real data that results in detection of machine failure conditions.
Assumptions:
1. Expertise of the Team: It is presumed that a software engineer has required coding skills and
a manufacturing engineer fully understands a business area.
2. Business Success: They assume their performance is guaranteed, since if they can correctly
model the normal and failure operations of the equipment, their business is destined to
succeed.
3. Machine Operation and Failure: The machine up-time is considered to exceed 95-99%. If a
failure does happen it is very expensive to deal with the downtime.
Constraints:
1. Data Availability: The team at the moment only has synthetic training data at their disposal
for the hardware they want to simulate.
2. Failure Prediction Rate: The business would benefit even in a failure prediction accuracy of
50%.
Risk & Contingency:
Data Quality: Machine failure is being modelled by the team using synthetic training data. This
synthetic data may be of varying quality and representativeness, and this may have a significant
impact on the predictive power of the model. Contingency Measure: The team needs to do an expert
validation of the synthetic data so that it is truly representative of real-world conditions.
Imbalanced Data: Machines work without any failure 95-99% of the time since the data is highly
imbalanced. This could lead to the model to have difficulty in learning how to predict breakdowns
carefully. Contingency Measure: The team could use techniques like majority class (successes) under
sampling, minority class (failures) oversampling.
High Expectations: The team reckon that predicting failures, even with a 50% rate, would be
beneficial for the business. But it may be that 50% of failures will be detected too late, that many
false alarms thus unnecessary checks and maintenance will be caused.Contingency Measure: A
realistic target and the objective of enhancing the prediction rate progressively should be set by the
team.

Costs & Benefits:

Data Acquisition and Preparation: Acquisition of other real-world data could be needed for checking
and testing, and this could make the cost grow, Model Development and Deployment: Time,
resources, and efforts are needed to model, train, and deploy the system which has the possibility of
additional expenses for specialized tools such as computing resources, Maintenance: Deployed
models often require regular updates and maintenance, for which one may have to pay certain hidden
fees, Preventive Maintenance: Early predictions of machine failures prevent businesses from
incurring unplanned downtime and loss of profit, Business Opportunity: Proper prediction of
equipment failure is a big industrial size could be a great business and Expertise Utilization: To
develop a technologically sound and commercially viable solution, team’s combined competence of
software and manufacturing engineering skills becomes a valuable tool.
1.3 DATA MINING GOALS
1. Predictive Modelling: The primary aim is to establish a predictive model that is capable of
correctly classifying machine failure conditioned by the operating conditions. This entails
using the synthetically generated training data to train a model that can detect whether a
machine would fail or not.
2. Anomaly Detection: One of the aims is to see if patterns or anomalies are present in the
machine operation data which could be an indication of a possible failure (95-99% uptime
rate).
3. Improving Prediction Rate: With the downtime due to machine failure being highly expensive,
the 50% prediction rate will be a good choice too. Hence, one of the aims is to increase the
accuracy of the model by gradual refinement.
4. Feature Importance: Point out the most important operating conditions or characteristics that
lead to machine failure. This may lead to an insight into the failure modes and, consequently,
in better prevention.
1.4 PROJECT PLAN
The tools and techniques required for the project could be
1. Data Analysis Tools
2. Machine Learning Techniques
3. Model Evaluation Techniques
4. Data Visualization Tools
The duration and plan of a data analytics project’s phase depends on the project’s complexity, the
data’s size, and quality, and the available resources. However, here is a general guideline: However,
here is a general guideline:
Data Discovery and Collection: a couple of days to a few weeks.
Data Cleaning and Preprocessing: a couple of weeks up to a couple of months.
Data Exploration and Visualisation: a couple of days to a couple of weeks.
Data Modelling and Analysis: a few weeks to months.
Interpretation and Communication: a few days to a few weeks.
Implementation and Integration: This phase can take up several weeks to a few months.
Monitoring and Maintenance: During this stage, the model is continuously tuned to maintain its
performance over time. This phase is continuing.

2. DATA UNDERSTANDING
2.1 INITIAL DATA
The dataset of Machine failure contains the following variables: The dataset of Machine failure
contains.1. Air temperature. 2.Process temperature. 3.Rotational speed. 4.Torque. 5.Tool wear
The dataset was formed after combining all the test cases files and creating a data frame from the
lines of the tests and fields of the variables. The test data was assured being free from such issues like
missing data, faulty data, errors and more by the data acquisition process. In addition, because no
solutions are needed here.
2.2 DATA DESCRIPTION & STANDARDISE
Properties of the data consists of 10,000 observations across 6 variables each.
Air temperature K: This is a numerical variable with values falling between 295.3 and 304.5. The
average air temperature of the system is approximately 300.00 K.
Process temperature K: This is a numeric variable with a range of values of 304.3 to 322.6. It is about
310.01 K for the average process temperature.
Rotational speed rpm: This is a variable of type int with values in the range 1168 to 2915. The average
rotational speed is approximately 1538.88 rpm.
Torque Nm: This is a numeric variable with values that range between values of 3.2 and 76.6. The
torque mean is about 39.98 Nm.
Tool wear min: This is an integer variable, with its values coming in the range of 0 to 255. The mean
tool wear is close to 1.79 hr.
Machine failure: It is a logical variable denoting whether a machine fault was detected. Out of 10 000
observations, 339 cases of failures were registered (3.4%).
The format of data is a numeric variable and process temperature being the air temperature in Kelvin,
an integer variable denoting the rotational speed in revolutions per minute, a numeric variable
representing the torque in Newton meters and an integer variable representing the tool wear in minutes
with a logical variable that indicates whether a machine failure occurred TRUE or FALSE.
About the Quantity The dataset with 10,000 observations across 6 variables: rotational speed, process
temperature, tool wear, torque, air temperature, and machine failure. The dataset is divided into two
subsets based on machine failure: One has 9.661 observations (no failure) and another has 339
observations (failure). Complete dataset that is free of any missing values.
The Central Tendency signify the values that determine the center of a given distribution. It is used
to watch over the middle point of the given data set. There are three measures of central tendency:
the mean, median, and mode. The mean and median values of each variable are shown as the central
tendency values in the presented data.
Overall Data:
Air temperature K: Mean = 300.00K, Median = 300.10K
Process temperature K: Mean = 310.01K, Median = 310.10K
Rotational speed rpm: Mean = 1538.88 rpm, Median = 1503.00 rpm
Torque Nm: Mean = 39.98 Nm, Median = 40.10 Nm
Tool wear min: Mean = 107.97 min, Median = 108.00 min
OK Operation Data (Machine failure = FALSE):


Failure Data (Machine failure = TRUE):
Variability in statistics refers to the dispersion of data points from the mean or median. It’s a measure
of how spread out the data is. Standard Deviation is a measure of how spread out the numbers in a
data set are around the mean. Quartiles(IQR) are values that divide your data into quarters provided
data is sorted in an ascending order
Air temperature K.:
• Range: 9.20 K
• Standard Deviation: all data: 2.00 OK operation data: 1.99 failure data: 2.07
• IQR all data: 3.20 OK operation data: 3.20 failure data: 3.40
Process temperature K:
• Range: 18.30 K
• Standard Deviation: all data:1.50 OK operation data:1.50 failure data: 1.42
• IQR all data:2.30 OK operation data:2.30 failure data: 1.75
Rotational speed .rpm.:
• Range: 1747 rpm
Torque Nm.:
• Range: 73.40 Nm
• Standard Deviation: all data: 9.981 OK operation data: 9.48 failure data:16.50
• IQR all data:13.525 OK operation data:13.20 failure data:15.25
Tool wear min.:
• Range: 253 min
The variability in each of these variables can be further broken down when considering machine
failure. For instance, the standard deviation of Air temperature K. is 1.99 K for operations without
machine failure and 2.07 K for operations with machine failure. This indicates that air temperature
varies slightly more when the machine fails. These quartiles can provide a good understanding of the
distribution of your data. The difference between Q3 and Q1, also known as the interquartile range
(IQR), can be used to identify potential outliers.

2.3 DATA ASSOCIATIONS

On the provided correlation matrices, here’s a summary of Pearson’s correlation:
For machine failure cases:
The Air temperature K. and Process temperature K. strongly correlate (0.79).
Rotational speed rpm and Torque Nm. there is a strong negative correlation (-0.88) between them.
Air temperature K. and Tool wear. min are negatively correlated (-0.21).
For normal operation cases:
Air temperature K. and Process temperature K. is strongly positively correlated (0.87).
Rotational speed .rpm and Torque Nm. correlate strongly but negatively (-0.89).
For overall operation:
The Air temperature K. is the most positively correlated with the Process temperature K. (0.87).
Rotational speed rpm and Torque Nm. are inversely related by (-0.87).
A positive correlation means that as one variable rises, the other also rises, and a negative correlation
shows that as one variable rises, the other declines. The closer the correlation coefficient is to 1 or -
1, the more association.
2.4 DATA GRAPHS
* for graphs
2.5 DATA REGRESSION
The systematic method which was used found the relationship between Torque (dependent parameter)
and other parameters (independent). Each coefficient signifies the change in Torque for the same unit
change in an independent variable. To illustrate, coefficient of Air temperature is 0.1874, which
means that the torque increases by as much as 0.1874 units for every 1 unit increment in
temperature. The R Square value of 0.765 indicates that the explanatory power of the independent
variables is 76.5% that of the response variable. The p-value (<2.2 × 10^−16) indicates the model is
statistically significant. However, the Tool wear factor is not significant (p-value: (Ve= -0.941414),
which mean it doesn’t Torque significantly. Residuals, which is a measure of the observed minus the
predicted values, is used to test the model’s assumptions. If they are normally distributed and do not
show patterns against predicted values, this lends support to the model.
2.6 DATA QUALITY
The dataset is always properly formatted and error-free before loading it into R. This should involve
looking for missing values, incorrect data types and inconsistencies in the data.

3. APPENDIX
* for graphs-DATA GRAPHS

Normal Operations Ok Operations Failed Operations

*Summary of Regressions
Call:
lm(formula = Torque_.Nm. ~ Air_temperature_.K. + Process_temperature_.K. +
Rotational_speed_.rpm. + Tool_wear_.min., data = data)
Residuals:
Min 1Q Median 3Q Max
-21.716 -3.270 -0.848 2.414 30.349
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.357e+02 1.045e+01 12.985 < 2e-16 ***
Air_temperature_.K. 1.874e-01 4.912e-02 3.816 0.000137 ***
Process_temperature_.K. -2.489e-01 6.557e-02 -3.795 0.000149 ***
Rotational_speed_.rpm. -4.860e-02 2.694e-04 -180.355 < 2e-16 ***
Tool_wear_.min. -5.586e-05 7.600e-04 -0.073 0.941414
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.839 on 9995 degrees of freedom

Multiple R-squared: 0.765, Adjusted R-squared: 0.7649
F-statistic: 8135 on 4 and 9995 DF, p-value: < 2.2e-16

DACA1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DACA1

Uploaded by

Copyright:

Available Formats

Technological University of the Shannon: Midlands Midwest

Master of Engineering in Engineering Management

Module: Data Analytics - (AL_ENGMF_9_1)

Name Rinto Raphel

1. Business Understanding ...................................................................................... .2

1.1 BUSINESS OBJECTIVES…………….............................................……………………2

2. Data Understanding ............................................................................................. .4

2.1 INITIAL DATA …………………………………..……………….………………….4

Data Page 1/9 AL_ENGPG_9_1

Data Analytics Page 2/9 AL_ENGMF_9_1

Data Analytics Page 3/9 AL_ENGMF_9_1

Costs & Benefits:

Data Analytics Page 4/9 AL_ENGMF_9_1

Data Analytics Page 5/9 AL_ENGMF_9_1

Air temperature K: Mean = 299.97K, Median = 300.00K

Data Analytics Page 6/9 AL_ENGMF_9_1

2.3 DATA ASSOCIATIONS

Data Analytics Page 7/9 AL_ENGMF_9_1

* for graphs-DATA GRAPHS

Data Analytics Page 8/9 AL_ENGMF_9_1

Residual standard error: 4.839 on 9995 degrees of freedom

Data Analytics Page 9/9 AL_ENGMF_9_1

You might also like