Predict Impurity in Mined Ore - 1

Predict Impurity in Mined Ore | IME672A
Group - 2
Prashant Sinha (18807542)
Prashant (180538)
Harsh Raj (18807280)
Shyam Bahmani (180756)
Desarda Rushabh (18807230)
About Data
Context
The main goal is to use this data to predict how much impurity is in the ore concentrate. As this impurity is
measured every hour, if we can predict how much silica (impurity) is in the ore concentrate, we can help the
engineers, giving them early information to take actions (empowering!). Hence, they will be able to take
corrective actions in advance (reduce impurity, if it is the case) and also help the environment (reducing the
amount of ore that goes to tailings as you reduce silica in the ore concentrate).
Content
The first column shows time and date range (from march of 2017 until september of 2017). Some columns
were sampled every 20 second. Others were sampled on a hourly base.
The second and third columns are quality measures of the iron ore pulp right before it is fed into the flotation
plant. Column 4 until column 8 are the most important variables that impact in the ore quality in the end of the
process. From column 9 until column 22, we can see process data (level and air flow inside the flotation
columns, which also impact in ore quality. The last two columns are the final iron ore pulp quality
measurement from the lab. Target is to predict the last column, which is the % of silica in the iron ore
concentrate.
In [4]: df.head()
Out[4]:
Flotation Flotation
% % Ore Ore Ore
Starch Amina Column Column
date Iron Silica Pulp Pulp Pulp ...
Flow Flow 01 Air 02 Air
Feed Feed Flow pH Density
Flow Flow
2017-
0 03-10 55.2 16.98 3019.53 557.434 395.713 10.0664 1.74 249.214 253.235 ...
01:00:00
2017-
1 03-10 55.2 16.98 3024.41 563.965 397.383 10.0672 1.74 249.719 250.532 ...
01:00:00
2017-
2 03-10 55.2 16.98 3043.46 568.054 399.668 10.0680 1.74 249.741 247.874 ...
01:00:00
2017-
3 03-10 55.2 16.98 3047.36 568.665 397.939 10.0689 1.74 249.917 254.487 ...
01:00:00
2017-
4 03-10 55.2 16.98 3033.69 558.167 400.254 10.0697 1.74 250.203 252.136 ...
01:00:00
5 rows × 24 columns
In [5]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 737453 entries, 0 to 737452
Data columns (total 24 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 737453 non-null datetime64[ns]
1 % Iron Feed 737453 non-null float64
2 % Silica Feed 737453 non-null float64
3 Starch Flow 737453 non-null float64
4 Amina Flow 737453 non-null float64
5 Ore Pulp Flow 737453 non-null float64
6 Ore Pulp pH 737453 non-null float64
7 Ore Pulp Density 737453 non-null float64
8 Flotation Column 01 Air Flow 737453 non-null float64
15 Flotation Column 01 Level 737453 non-null float64
22 % Iron Concentrate 737453 non-null float64
23 % Silica Concentrate 737453 non-null float64
dtypes: datetime64[ns](1), float64(23)
memory usage: 135.0 MB
In [9]: plt.figure(figsize=(10,6))
df.corr()['% Silica Concentrate'].sort_values(ascending=False).plot(kind
='bar')
Out[9]: <matplotlib.axes._subplots.AxesSubplot at 0x7f3c18c8dc70>

In [13]: ax = sns.scatterplot(data=feed,x="% Iron Feed",y="% Silica Feed")
In [14]: pred.corr()
Out[14]:
% Iron Concentrate % Silica Concentrate
% Iron Concentrate 1.000000 -0.800598
% Silica Concentrate -0.800598 1.000000
In [15]: ax = sns.scatterplot(data=pred,x="% Iron Concentrate",y="% Silica Concentr

ate")
In [19]: pcaAirFlow = pd.DataFrame(data = pcaAirFlow1,columns = ["PCA Air Flow 1"])
pcaAirFlow['PCA Air Flow 2'] = pcaAirFlow2
pcaAirFlow['PCA Air Flow 3'] = pcaAirFlow3
pcaAirFlow.head()
Out[19]:
PCA Air Flow 1 PCA Air Flow 2 PCA Air Flow 3
0 1.678936 -2.45599 1.960631
1 1.713537 -2.45599 2.009270
2 1.773932 -2.45599 2.003772
3 1.652554 -2.45599 1.949543
4 1.693926 -2.45599 2.014497
hm = sns.heatmap(pcaAirFlow.corr(),annot=True)
In [25]: pcaLevel = pd.DataFrame(data = pcaLevel1,columns = ["PCA Level 1.1","PCA L
evel 1.2"])
pcaLevel2 = pd.DataFrame(data = pcaLevel2,columns = ["PCA Level 2.1","PCA
Level 2.2"])
pcaLevel["PCA Level 2.1"] = pcaLevel2["PCA Level 2.1"]
pcaLevel["PCA Level 2.2"] = pcaLevel2["PCA Level 2.2"]
pcaLevel.head()
Out[25]:
PCA Level 1.1 PCA Level 1.2 PCA Level 2.1 PCA Level 2.2
0 1.083831 -0.007463 1.312343 -0.183747
1 1.093430 -0.061362 1.143928 -0.204352
2 0.914977 0.154693 0.847732 -0.173428
3 0.926126 0.055801 0.507921 -0.215997
4 0.920496 -0.008825 0.512332 -0.178874
hm = sns.heatmap(pcaLevel.corr(),annot=True)
In [27]: pcaAirLevel = pd.DataFrame(pcaAirFlow)

pcaAirLevel["PCA Level 1.1"] = pcaLevel["PCA Level 1.1"]
In [28]: df2 = pcaAirLevel

df2["% Silica Concentrate"] = df["% Silica Concentrate"]
df2.head()
Out[28]:
PCA Air PCA Air PCA Air PCA PCA PCA PCA % Silica
Flow 1 Flow 2 Flow 3 Level 1.1 Level 1.2 Level 2.1 Level 2.2 Concentrate
0 1.678936 -2.45599 1.960631 1.083831 -0.007463 1.312343 -0.183747 1.31
1 1.713537 -2.45599 2.009270 1.093430 -0.061362 1.143928 -0.204352 1.31
2 1.773932 -2.45599 2.003772 0.914977 0.154693 0.847732 -0.173428 1.31
3 1.652554 -2.45599 1.949543 0.926126 0.055801 0.507921 -0.215997 1.31
4 1.693926 -2.45599 2.014497 0.920496 -0.008825 0.512332 -0.178874 1.31

Predict Impurity in Mined Ore - 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Predict Impurity in Mined Ore - 1

Uploaded by

Copyright:

Available Formats

Predict Impurity in Mined Ore | IME672A

Out[9]: <matplotlib.axes._subplots.AxesSubplot at 0x7f3c18c8dc70>

% Iron Concentrate 1.000000 -0.800598

% Silica Concentrate -0.800598 1.000000

In [15]: ax = sns.scatterplot(data=pred,x="% Iron Concentrate",y="% Silica Concentr

0 1.678936 -2.45599 1.960631

1 1.713537 -2.45599 2.009270

2 1.773932 -2.45599 2.003772

3 1.652554 -2.45599 1.949543

4 1.693926 -2.45599 2.014497

0 1.083831 -0.007463 1.312343 -0.183747

1 1.093430 -0.061362 1.143928 -0.204352

2 0.914977 0.154693 0.847732 -0.173428

3 0.926126 0.055801 0.507921 -0.215997

4 0.920496 -0.008825 0.512332 -0.178874

In [27]: pcaAirLevel = pd.DataFrame(pcaAirFlow)

In [28]: df2 = pcaAirLevel

0 1.678936 -2.45599 1.960631 1.083831 -0.007463 1.312343 -0.183747 1.31

1 1.713537 -2.45599 2.009270 1.093430 -0.061362 1.143928 -0.204352 1.31

2 1.773932 -2.45599 2.003772 0.914977 0.154693 0.847732 -0.173428 1.31

3 1.652554 -2.45599 1.949543 0.926126 0.055801 0.507921 -0.215997 1.31

4 1.693926 -2.45599 2.014497 0.920496 -0.008825 0.512332 -0.178874 1.31

You might also like