You are on page 1of 6

Predict Impurity in Mined Ore | IME672A

Group - 2
Prashant Sinha (18807542)
Prashant (180538)
Harsh Raj (18807280)
Shyam Bahmani (180756)
Desarda Rushabh (18807230)

About Data

Context

The main goal is to use this data to predict how much impurity is in the ore concentrate. As this impurity is
measured every hour, if we can predict how much silica (impurity) is in the ore concentrate, we can help the
engineers, giving them early information to take actions (empowering!). Hence, they will be able to take
corrective actions in advance (reduce impurity, if it is the case) and also help the environment (reducing the
amount of ore that goes to tailings as you reduce silica in the ore concentrate).

Content

The first column shows time and date range (from march of 2017 until september of 2017). Some columns
were sampled every 20 second. Others were sampled on a hourly base.

The second and third columns are quality measures of the iron ore pulp right before it is fed into the flotation
plant. Column 4 until column 8 are the most important variables that impact in the ore quality in the end of the
process. From column 9 until column 22, we can see process data (level and air flow inside the flotation
columns, which also impact in ore quality. The last two columns are the final iron ore pulp quality
measurement from the lab. Target is to predict the last column, which is the % of silica in the iron ore
concentrate.
In [4]: df.head()

Out[4]:
Flotation Flotation
% % Ore Ore Ore
Starch Amina Column Column
date Iron Silica Pulp Pulp Pulp ...
Flow Flow 01 Air 02 Air
Feed Feed Flow pH Density
Flow Flow

2017-
0 03-10 55.2 16.98 3019.53 557.434 395.713 10.0664 1.74 249.214 253.235 ...
01:00:00

2017-
1 03-10 55.2 16.98 3024.41 563.965 397.383 10.0672 1.74 249.719 250.532 ...
01:00:00

2017-
2 03-10 55.2 16.98 3043.46 568.054 399.668 10.0680 1.74 249.741 247.874 ...
01:00:00

2017-
3 03-10 55.2 16.98 3047.36 568.665 397.939 10.0689 1.74 249.917 254.487 ...
01:00:00

2017-
4 03-10 55.2 16.98 3033.69 558.167 400.254 10.0697 1.74 250.203 252.136 ...
01:00:00

5 rows × 24 columns

In [5]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 737453 entries, 0 to 737452
Data columns (total 24 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 737453 non-null datetime64[ns]
1 % Iron Feed 737453 non-null float64
2 % Silica Feed 737453 non-null float64
3 Starch Flow 737453 non-null float64
4 Amina Flow 737453 non-null float64
5 Ore Pulp Flow 737453 non-null float64
6 Ore Pulp pH 737453 non-null float64
7 Ore Pulp Density 737453 non-null float64
8 Flotation Column 01 Air Flow 737453 non-null float64
9 Flotation Column 02 Air Flow 737453 non-null float64
10 Flotation Column 03 Air Flow 737453 non-null float64
11 Flotation Column 04 Air Flow 737453 non-null float64
12 Flotation Column 05 Air Flow 737453 non-null float64
13 Flotation Column 06 Air Flow 737453 non-null float64
14 Flotation Column 07 Air Flow 737453 non-null float64
15 Flotation Column 01 Level 737453 non-null float64
16 Flotation Column 02 Level 737453 non-null float64
17 Flotation Column 03 Level 737453 non-null float64
18 Flotation Column 04 Level 737453 non-null float64
19 Flotation Column 05 Level 737453 non-null float64
20 Flotation Column 06 Level 737453 non-null float64
21 Flotation Column 07 Level 737453 non-null float64
22 % Iron Concentrate 737453 non-null float64
23 % Silica Concentrate 737453 non-null float64
dtypes: datetime64[ns](1), float64(23)
memory usage: 135.0 MB
In [9]: plt.figure(figsize=(10,6))
df.corr()['% Silica Concentrate'].sort_values(ascending=False).plot(kind
='bar')

Out[9]: <matplotlib.axes._subplots.AxesSubplot at 0x7f3c18c8dc70>


In [13]: ax = sns.scatterplot(data=feed,x="% Iron Feed",y="% Silica Feed")

In [14]: pred.corr()

Out[14]:
% Iron Concentrate % Silica Concentrate

% Iron Concentrate 1.000000 -0.800598

% Silica Concentrate -0.800598 1.000000

In [15]: ax = sns.scatterplot(data=pred,x="% Iron Concentrate",y="% Silica Concentr


ate")
In [19]: pcaAirFlow = pd.DataFrame(data = pcaAirFlow1,columns = ["PCA Air Flow 1"])
pcaAirFlow['PCA Air Flow 2'] = pcaAirFlow2
pcaAirFlow['PCA Air Flow 3'] = pcaAirFlow3
pcaAirFlow.head()

Out[19]:
PCA Air Flow 1 PCA Air Flow 2 PCA Air Flow 3

0 1.678936 -2.45599 1.960631

1 1.713537 -2.45599 2.009270

2 1.773932 -2.45599 2.003772

3 1.652554 -2.45599 1.949543

4 1.693926 -2.45599 2.014497

In [20]: plt.figure(figsize=(6,3))
hm = sns.heatmap(pcaAirFlow.corr(),annot=True)
In [25]: pcaLevel = pd.DataFrame(data = pcaLevel1,columns = ["PCA Level 1.1","PCA L
evel 1.2"])
pcaLevel2 = pd.DataFrame(data = pcaLevel2,columns = ["PCA Level 2.1","PCA
Level 2.2"])
pcaLevel["PCA Level 2.1"] = pcaLevel2["PCA Level 2.1"]
pcaLevel["PCA Level 2.2"] = pcaLevel2["PCA Level 2.2"]
pcaLevel.head()

Out[25]:
PCA Level 1.1 PCA Level 1.2 PCA Level 2.1 PCA Level 2.2

0 1.083831 -0.007463 1.312343 -0.183747

1 1.093430 -0.061362 1.143928 -0.204352

2 0.914977 0.154693 0.847732 -0.173428

3 0.926126 0.055801 0.507921 -0.215997

4 0.920496 -0.008825 0.512332 -0.178874

In [26]: plt.figure(figsize=(8,4))
hm = sns.heatmap(pcaLevel.corr(),annot=True)

In [27]: pcaAirLevel = pd.DataFrame(pcaAirFlow)


pcaAirLevel["PCA Level 1.1"] = pcaLevel["PCA Level 1.1"]
pcaAirLevel["PCA Level 1.2"] = pcaLevel["PCA Level 1.2"]
pcaAirLevel["PCA Level 2.1"] = pcaLevel["PCA Level 2.1"]
pcaAirLevel["PCA Level 2.2"] = pcaLevel["PCA Level 2.2"]

In [28]: df2 = pcaAirLevel


df2["% Silica Concentrate"] = df["% Silica Concentrate"]
df2.head()

Out[28]:
PCA Air PCA Air PCA Air PCA PCA PCA PCA % Silica
Flow 1 Flow 2 Flow 3 Level 1.1 Level 1.2 Level 2.1 Level 2.2 Concentrate

0 1.678936 -2.45599 1.960631 1.083831 -0.007463 1.312343 -0.183747 1.31

1 1.713537 -2.45599 2.009270 1.093430 -0.061362 1.143928 -0.204352 1.31

2 1.773932 -2.45599 2.003772 0.914977 0.154693 0.847732 -0.173428 1.31

3 1.652554 -2.45599 1.949543 0.926126 0.055801 0.507921 -0.215997 1.31

4 1.693926 -2.45599 2.014497 0.920496 -0.008825 0.512332 -0.178874 1.31

You might also like