Professional Documents
Culture Documents
In [1]:
In [121]:
RCD_path='C:\\Users\\Paco\\Desktop\\Code.Hub\\Assignment\\RawConstructionData.csv'
rcd=pd.read_csv(RCD_path,delimiter=';')
In [3]:
rcd.dtypes
Out[3]:
Scope object
Construction Element Type object
ID int64
Construction Element Family object
ConstructionElementPart object
BOQCategory object
BOQ object
BOQDescription object
Quantity float64
Unit object
UnitPrice float64
TotalCost float64
Length float64
Thickness float64
Height float64
X float64
Y float64
Z float64
Scope_ElementType_BOQ object
dtype: object
In [4]:
rcd.shape
Out[4]:
(2295, 19)
http://localhost:8890/notebooks/Project_F.ipynb 1/40
16/04/2018 Project_F
In [5]:
rcd.describe()
Out[5]:
for each categorical column we will find the number of unique values and the number of each time these values
appears
Scope:
In [6]:
i=0
u, c = np.unique(rcd['Scope'], return_counts=True)
for z in zip(list(u), list(c)):
i+=1
print('{}: {}'.format(z[0], z[1]))
print ("Number of unique values: {}".format(i))
PRB_BO_STR: 18
PRS_BO_STR: 658
PRS_L0_STR: 608
PRS_L1_STR: 534
PRS_L2_STR: 457
PRS_RF_STR: 20
Number of unique values: 6
http://localhost:8890/notebooks/Project_F.ipynb 2/40
16/04/2018 Project_F
In [9]:
i=0
u, c = np.unique(rcd['Construction Element Type'], return_counts=True)
for z in zip(list(u), list(c)):
i+=1
print('{}: {}'.format(z[0], z[1]))
print ("Number of unique values: {}".format(i))
Beam: 597
Columns: 222
ConcreteWall: 339
Earthwork: 7
Formwork: 146
Mat Foundation: 148
Parapet: 132
Protection Layer: 531
Ramp: 18
Retaining Wall: 48
Slab: 27
Stair: 80
Number of unique values: 12
In [10]:
i=0
u, c = np.unique(rcd['Construction Element Family'], return_counts=True)
for z in zip(list(u), list(c)):
i+=1
print('{}: {}'.format(z[0], z[1]))
print ("Number of unique values: {}".format(i))
ConcreteWork: 1611
EarthWork: 7
FormWork: 146
ProtectionWork: 531
Number of unique values: 4
http://localhost:8890/notebooks/Project_F.ipynb 3/40
16/04/2018 Project_F
In [11]:
i=0
u, c = np.unique(rcd['ConstructionElementPart'], return_counts=True)
for z in zip(list(u), list(c)):
i+=1
print('{}: {}'.format(z[0], z[1]))
print ("Number of unique values: {}".format(i))
BOQCategory:
http://localhost:8890/notebooks/Project_F.ipynb 4/40
16/04/2018 Project_F
In [12]:
i=0
u, c = np.unique(rcd['BOQCategory'], return_counts=True)
for z in zip(list(u), list(c)):
i+=1
print('{}: {}'.format(z[0], z[1]))
print ("Number of unique values: {}".format(i))
Concrete: 551
Earth Moving: 7
Formwork: 640
Protection Layers: 595
Reinforcement: 502
Number of unique values: 5
BOQ:
http://localhost:8890/notebooks/Project_F.ipynb 5/40
16/04/2018 Project_F
In [13]:
i=0
u, c = np.unique(rcd['BOQ'], return_counts=True)
for z in zip(list(u), list(c)):
i+=1
print('{}: {}'.format(z[0], z[1]))
print ("Number of unique values: {}".format(i))
BOQ Description:
http://localhost:8890/notebooks/Project_F.ipynb 6/40
16/04/2018 Project_F
In [14]:
i=0
u, c = np.unique(rcd['BOQDescription'], return_counts=True)
for z in zip(list(u), list(c)):
i+=1
print('{}: {}'.format(z[0], z[1]))
print ("Number of unique values: {}".format(i))
Unit:
http://localhost:8890/notebooks/Project_F.ipynb 7/40
16/04/2018 Project_F
In [15]:
i=0
u, c = np.unique(rcd['Unit'], return_counts=True)
for z in zip(list(u), list(c)):
i+=1
print('{}: {}'.format(z[0], z[1]))
print ("Number of unique values: {}".format(i))
kg: 551
m: 62
m2: 1173
m3: 509
Number of unique values: 4
In [17]:
i=0
u, c = np.unique(rcd['Scope_ElementType_BOQ'], return_counts=True)
for z in zip(list(u), list(c)):
i+=1
print('{}: {}'.format(z[0], z[1]))
print ("Number of unique values: {}".format(i))
PRS_L1_STR_Beam_ReinforcementforBeams: 63
PRS_L1_STR_Columns_C30/37forColumns&Con.Walls: 22
PRS_L1_STR_Columns_FormworkforColumns,CW&RW: 22
PRS_L1_STR_Columns_ReinforcementforColumns: 22
PRS_L1_STR_ConcreteWall_C30/37forColumns&Con.Walls: 34
PRS_L1_STR_ConcreteWall_FormworkforColumns,CW&RW: 34
PRS_L1_STR_ConcreteWall_ReinforcementforConcreteWalls: 34
PRS_L1_STR_Formwork_FormworkforSlabs: 48
PRS_L1_STR_ProtectionLayer_PolysterineInsulation(DOW)6cm: 66
PRS_L1_STR_ProtectionLayer_PolystyreneInsulation(DOW)3cm: 38
PRS_L1_STR_Slab_C30/37forSlabs: 1
PRS_L1_STR_Slab_FormworkforSlabs: 1
PRS_L1_STR_Slab_ReinforcementforSlabs: 1
PRS_L1_STR_Stair_C30/37forStairs: 7
PRS_L1_STR_Stair_FormworkforStairs: 5
PRS_L1_STR_Stair_ReinforcementforStairs: 7
PRS_L2_STR_Beam_C30/37forBeams: 61
PRS_L2_STR_Beam_FormworkforBeams: 61
PRS_L2_STR_Beam_FormworkforColumns,CW&RW: 1
PRS_L2_STR_Beam_FormworkforSlabs: 2
For the numeric type columns we will find the range of the values and we also check for missing values
Quantity:
http://localhost:8890/notebooks/Project_F.ipynb 8/40
16/04/2018 Project_F
In [86]:
x=round((rcd.Quantity.min()),2)
y=rcd.Quantity.max()
print("The range is {} - {}".format(x,y))
x=rcd.Quantity.isnull().sum()
print("number of missing values is: {}".format(x))
Unit Price:
In [21]:
x=rcd.UnitPrice.min()
y=rcd.UnitPrice.max()
print("The range is {} - {}".format(x,y))
x=rcd.UnitPrice.isnull().sum()
print("number of missing values is: {}".format(x))
So we notice that there are 22 missing values in unitprice. We will work on how to fill them in the next part
Total Cost:
In [24]:
x=rcd.TotalCost.min()
y=round((rcd.TotalCost.max()),2)
print("The range is {} - {}".format(x,y))
x=rcd.TotalCost.isnull().sum()
print("number of missing values is: {}".format(x))
Length:
In [25]:
x=rcd.Length.min()
y=rcd.Length.max()
print("The range is {} - {}".format(x,y))
x=rcd.Length.isnull().sum()
print("number of missing values is: {}".format(x))
Thickness:
http://localhost:8890/notebooks/Project_F.ipynb 9/40
16/04/2018 Project_F
In [26]:
x=rcd.Thickness.min()
y=rcd.Thickness.max()
print("The range is {} - {}".format(x,y))
x=rcd.Thickness.isnull().sum()
print("number of missing values is: {}".format(x))
Height:
In [27]:
x=rcd.Height.min()
y=rcd.Height.max()
print("The range is {} - {}".format(x,y))
x=rcd.Height.isnull().sum()
print("number of missing values is: {}".format(x))
So we notice that there are 454 missing values in thickness, height and length. We will work on how to fill them
in the next part
X,Y,Z :
In [28]:
x=rcd.X.min()
y=rcd.X.max()
x=print("The range is {} - {}".format(x,y))
x=rcd.X.isnull().sum()
print("number of missing values is: {}".format(x))
In [29]:
x=rcd.Y.min()
y=rcd.Y.max()
x=print("The range is {} - {}".format(x,y))
x=rcd.Y.isnull().sum()
print("number of missing values is: {}".format(x))
http://localhost:8890/notebooks/Project_F.ipynb 10/40
16/04/2018 Project_F
In [30]:
x=rcd.Z.min()
y=rcd.Z.max()
x=print("The range is {} - {}".format(x,y))
x=rcd.Z.isnull().sum()
print("number of missing values is: {}".format(x))
Path:
In [31]:
S_path='C:\\Users\\Paco\\Desktop\\Code.Hub\\Assignment\\Schedule.csv'
s=pd.read_csv(S_path,delimiter=';')
In [32]:
s.dtypes
Out[32]:
Scope object
ConstructionElementType object
ID object
Act_Code object
Activity_Desc object
BOQ object
START object
FINISH object
Scope_ConstructionElementType object
Scope_ConstructionElementType_BOQ object
Cost Overrrun object
Delay object
dtype: object
We need to change the START and FINISH values as they are dates
In [34]:
s.START=pd.to_datetime(s.START)
s.FINISH=pd.to_datetime(s.FINISH)
http://localhost:8890/notebooks/Project_F.ipynb 11/40
16/04/2018 Project_F
In [35]:
s.dtypes
Out[35]:
Scope object
ConstructionElementType object
ID object
Act_Code object
Activity_Desc object
BOQ object
START datetime64[ns]
FINISH datetime64[ns]
Scope_ConstructionElementType object
Scope_ConstructionElementType_BOQ object
Cost Overrrun object
Delay object
dtype: object
for each categorical column we will find the number of unique values and the number of each time these values
appears
In [36]:
i=0
u, c = np.unique(s['Scope'], return_counts=True)
for z in zip(list(u), list(c)):
i+=1
print('{}: {}'.format(z[0], z[1]))
print ("Number of unique values: {}".format(i))
PRS_BO_STR: 27
PRS_L0_STR: 22
PRS_L1_STR: 16
PRS_L2_STR: 22
PRS_RF_STR: 7
Number of unique values: 5
In [37]:
i=0
u, c = np.unique(s['ConstructionElementType'], return_counts=True)
for z in zip(list(u), list(c)):
i+=1
print('{}: {}'.format(z[0], z[1]))
print ("Number of unique values: {}".format(i))
Beam: 12
Columns: 9
ConcreteWall: 9
Earthwork: 1
Mat Foundation: 3
Parapet: 7
Protection Layer: 21
Ramp: 3
Retaining Wall: 3
Slab: 14
Stair: 12
Number of unique values: 11
http://localhost:8890/notebooks/Project_F.ipynb 12/40
16/04/2018 Project_F
In [38]:
i=0
u, c = np.unique(s['ID'], return_counts=True)
for z in zip(list(u), list(c)):
i+=1
print('{}: {}'.format(z[0], z[1]))
print ("Number of unique values: {}".format(i))
PRS_BO_STR_CWIR01: 1
PRS_BO_STR_CWIR02: 1
PRS_BO_STR_CWIR03: 1
PRS_BO_STR_CWPC01: 1
PRS_BO_STR_CWPC02: 1
PRS_BO_STR_CWPC03: 1
PRS_BO_STR_CWPF01: 1
PRS_BO_STR_CWPF02: 1
PRS_BO_STR_EWEX01: 1
PRS_BO_STR_MFIR01: 1
PRS_BO_STR_MFIR02: 1
PRS_BO_STR_MFPC01: 1
PRS_BO_STR_MFPC02: 1
PRS_BO_STR_MFPF01: 1
PRS_BO_STR_PWMF02: 1
PRS_BO_STR_PWMF03: 1
PRS_BO_STR_PWMF04: 1
PRS_BO_STR_PWMF05: 1
PRS_BO_STR_PWMF06: 1
PRS_BO_STR_PWMF07: 1
PRS_BO_STR_PWMF08: 1
PRS_BO_STR_SRIR01: 1
PRS_BO_STR_SRIR02: 1
PRS_BO_STR_SRPC01: 1
PRS_BO_STR_SRPC02: 1
PRS_BO_STR_SRPF01: 1
PRS_BO_STR_SRPF02: 1
PRS_L0_STR_BSIR01: 1
PRS_L0_STR_BSIR05: 1
PRS_L0_STR_BSPC01: 1
PRS_L0_STR_BSPC05: 1
PRS_L0_STR_BSPF01: 1
PRS_L0_STR_BSPF05: 1
PRS_L0_STR_CPIR01: 1
PRS_L0_STR_CPPC01: 1
PRS_L0_STR_CPPF01: 1
PRS_L0_STR_CWIR01: 1
PRS_L0_STR_CWIR03: 1
PRS_L0_STR_CWPC01: 1
PRS_L0_STR_CWPC03: 1
PRS_L0_STR_CWPF01: 1
PRS_L0_STR_PWPL01: 1
PRS_L0_STR_PWPL02: 1
PRS_L0_STR_PWPL03: 1
PRS_L0_STR_PWPL04: 1
PRS_L0_STR_PWPL05: 1
PRS_L0_STR_STIR01: 1
PRS_L0_STR_STPC01: 1
PRS_L0_STR_STPF01: 1
PRS_L1_STR_BSIR01: 1
PRS_L1_STR_BSIR02: 1
PRS_L1_STR_BSPC01: 1
http://localhost:8890/notebooks/Project_F.ipynb 13/40
16/04/2018 Project_F
PRS_L1_STR_BSPC02: 1
PRS_L1_STR_BSPF01: 1
PRS_L1_STR_BSPF02: 1
PRS_L1_STR_CWIR01: 1
PRS_L1_STR_CWIR03: 1
PRS_L1_STR_CWPC01: 1
PRS_L1_STR_CWPC03: 1
PRS_L1_STR_CWPF01: 1
PRS_L1_STR_PWPL01: 1
PRS_L1_STR_PWPL02: 1
PRS_L1_STR_STIR01: 1
PRS_L1_STR_STPC01: 1
PRS_L1_STR_STPF01: 1
PRS_L2_STR_BSIR01: 1
PRS_L2_STR_BSIR02: 1
PRS_L2_STR_BSPC01: 1
PRS_L2_STR_BSPC02: 1
PRS_L2_STR_BSPF01: 1
PRS_L2_STR_BSPF02: 1
PRS_L2_STR_CPIR01: 1
PRS_L2_STR_CPPC01: 1
PRS_L2_STR_CPPF01: 1
PRS_L2_STR_CWIR01: 1
PRS_L2_STR_CWPC01: 1
PRS_L2_STR_CWPF01: 1
PRS_L2_STR_PWPL01: 1
PRS_L2_STR_PWPL02: 1
PRS_L2_STR_PWPL03: 1
PRS_L2_STR_PWPL04: 1
PRS_L2_STR_PWPL05: 1
PRS_L2_STR_PWPL06: 1
PRS_L2_STR_PWPL07: 1
PRS_L2_STR_STIR01: 1
PRS_L2_STR_STPC01: 1
PRS_L2_STR_STPF01: 1
PRS_RF_STR_BSIR01: 1
PRS_RF_STR_BSIR02: 1
PRS_RF_STR_BSPC01: 1
PRS_RF_STR_BSPC02: 1
PRS_RF_STR_BSPF01: 1
PRS_RF_STR_BSPF02: 1
PRS_RF_STR_PWPL01: 1
Number of unique values: 94
http://localhost:8890/notebooks/Project_F.ipynb 14/40
16/04/2018 Project_F
In [39]:
i=0
u, c = np.unique(s['Act_Code'], return_counts=True)
for z in zip(list(u), list(c)):
i+=1
print('{}: {}'.format(z[0], z[1]))
print ("Number of unique values: {}".format(i))
BSIR01: 5
BSIR02: 3
BSPC01: 5
BSPC02: 3
BSPF01: 5
BSPF02: 3
CPIR01: 2
CPPC01: 2
CPPF01: 2
CWIR01: 4
CWIR02: 1
CWIR03: 3
CWPC01: 4
CWPC02: 1
CWPC03: 3
CWPF01: 4
CWPF02: 1
EWEX01: 1
MFIR01: 1
MFIR02: 1
MFPC01: 1
MFPC02: 1
MFPF01: 1
PWMF02: 1
PWMF03: 1
PWMF04: 1
PWMF05: 1
PWMF06: 1
PWMF07: 1
PWMF08: 1
PWPL01: 12
PWPL02: 1
PWPL03: 1
PWPL04: 1
SRIR01: 1
SRIR02: 1
SRPC01: 1
SRPC02: 1
SRPF01: 1
SRPF02: 1
STIR01: 3
STPC01: 3
STPF01: 3
Number of unique values: 43
http://localhost:8890/notebooks/Project_F.ipynb 15/40
16/04/2018 Project_F
In [40]:
i=0
u, c = np.unique(s['Activity_Desc'], return_counts=True)
for z in zip(list(u), list(c)):
i+=1
print('{}: {}'.format(z[0], z[1]))
print ("Number of unique values: {}".format(i))
Excavation AA: 1
Installing Reinf MF AA: 1
Installing Reinf. Beams L0: 1
Installing Reinf. Col. BO: 1
Installing Reinf. Columns L0: 1
Installing Reinf. Columns L1: 1
Installing Reinf. Con. Walls L0: 1
Installing Reinf. Con. Walls L1: 1
Installing Reinf. Conc.Walls BO: 1
Installing Reinf. MF AA: 1
Installing Reinf. Parapets L0: 1
Installing Reinf. Parapets L2: 1
Installing Reinf. Ramps BO: 1
Installing Reinf. Ret. Walls BO: 1
Installing Reinf. Slabs L0: 1
Installing Reinf. Slabs L1: 1
Installing Reinf. Slabs L2: 1
Installing Reinf. Slabs RF: 1
Installing Reinf. Stairs BO: 1
Installing Reinf. Stairs L0: 1
Installing Reinf. Stairs L1: 1
Installing Reinf. Stairs L2: 1
Installing Reinf.Beams L1: 1
Installing Reinf.Beams L2: 1
Installing Reinf.Beams RF: 1
Installing Reinf.Columns L2: 1
Placing FormWork Beams L0: 1
Placing FormWork Beams L1: 1
Placing FormWork Beams L2: 1
Placing FormWork Beams RF: 1
Placing FormWork Col. BO: 1
Placing FormWork Columns L0: 1
Placing FormWork Columns L1: 1
Placing FormWork Columns L2: 1
Placing FormWork MF AA: 1
Placing FormWork Parapets L0: 1
Placing FormWork Parapets L2: 1
Placing FormWork Ramps BO: 1
Placing FormWork Ret. Walls BO: 1
Placing FormWork Slabs L0: 1
Placing FormWork Slabs L1: 1
Placing FormWork Slabs L2: 1
Placing FormWork Slabs RF: 1
Placing FormWork Stairs BO: 1
Placing FormWork Stairs L0: 1
Placing FormWork Stairs L1: 1
Placing FormWork Stairs L2: 1
Pouring Concrete Beams L0: 1
Pouring Concrete Beams L1: 1
Pouring Concrete Beams L2: 1
Pouring Concrete Beams RF: 1
Pouring Concrete Col. BO: 1
http://localhost:8890/notebooks/Project_F.ipynb 16/40
16/04/2018 Project_F
Pouring Concrete Columns L0: 1
Pouring Concrete Columns L1: 1
Pouring Concrete Columns L2: 1
Pouring Concrete Con. Walls L0: 1
Pouring Concrete Con. Walls L1: 1
Pouring Concrete Conc.Walls BO: 1
Pouring Concrete MF AA: 2
Pouring Concrete Parapets L0: 1
Pouring Concrete Parapets L2: 1
Pouring Concrete Ramps BO: 1
Pouring Concrete Ret. Walls BO: 1
Pouring Concrete Slabs L0: 1
Pouring Concrete Slabs L1: 1
Pouring Concrete Slabs L2: 1
Pouring Concrete Slabs RF: 1
Pouring Concrete Stairs BO: 1
Pouring Concrete Stairs L0: 1
Pouring Concrete Stairs L1: 1
Pouring Concrete Stairs L2: 1
Pr.Mat Foundation & Retaining Walls AA: 7
Protection Layers L0 AA: 5
Protection Layers L1 AA: 2
Protection Layers L2 AA: 7
Protection Layers RF AA: 1
Number of unique values: 76
http://localhost:8890/notebooks/Project_F.ipynb 17/40
16/04/2018 Project_F
In [41]:
i=0
u, c = np.unique(s['BOQ'], return_counts=True)
for z in zip(list(u), list(c)):
i+=1
print('{}: {}'.format(z[0], z[1]))
print ("Number of unique values: {}".format(i))
http://localhost:8890/notebooks/Project_F.ipynb 18/40
16/04/2018 Project_F
In [42]:
i=0
u, c = np.unique(s['Scope_ConstructionElementType'], return_counts=True)
for z in zip(list(u), list(c)):
i+=1
print('{}: {}'.format(z[0], z[1]))
print ("Number of unique values: {}".format(i))
PRS_BO_STR_Columns: 3
PRS_BO_STR_ConcreteWall: 2
PRS_BO_STR_Earthwork: 1
PRS_BO_STR_Mat Foundation: 3
PRS_BO_STR_Protection Layer: 7
PRS_BO_STR_Ramp: 3
PRS_BO_STR_Retaining Wall: 3
PRS_BO_STR_Slab: 2
PRS_BO_STR_Stair: 3
PRS_L0_STR_Beam: 3
PRS_L0_STR_Columns: 3
PRS_L0_STR_ConcreteWall: 2
PRS_L0_STR_Parapet: 4
PRS_L0_STR_Protection Layer: 4
PRS_L0_STR_Slab: 3
PRS_L0_STR_Stair: 3
PRS_L1_STR_Beam: 3
PRS_L1_STR_Columns: 3
PRS_L1_STR_ConcreteWall: 2
PRS_L1_STR_Protection Layer: 2
PRS_L1_STR_Slab: 3
PRS_L1_STR_Stair: 3
PRS_L2_STR_Beam: 3
PRS_L2_STR_ConcreteWall: 3
PRS_L2_STR_Parapet: 3
PRS_L2_STR_Protection Layer: 7
PRS_L2_STR_Slab: 3
PRS_L2_STR_Stair: 3
PRS_RF_STR_Beam: 3
PRS_RF_STR_Protection Layer: 1
PRS_RF_STR_Slab: 3
Number of unique values: 31
http://localhost:8890/notebooks/Project_F.ipynb 19/40
16/04/2018 Project_F
In [43]:
i=0
u, c = np.unique(s['Scope_ConstructionElementType_BOQ'], return_counts=True)
for z in zip(list(u), list(c)):
i+=1
print('{}: {}'.format(z[0], z[1]))
print ("Number of unique values: {}".format(i))
PRS_BO_STR_Columns_C30/37forColumns&Con.Walls: 1
PRS_BO_STR_Columns_FormworkforColumns,CW&RW: 1
PRS_BO_STR_Columns_ReinforcementforColumns: 1
PRS_BO_STR_ConcreteWall_C30/37forColumns&Con.Walls: 1
PRS_BO_STR_ConcreteWall_ReinforcementforConcreteWalls: 1
PRS_BO_STR_Earthwork_Excavation: 1
PRS_BO_STR_MatFoundation_C30/37forMatFoundation: 1
PRS_BO_STR_MatFoundation_FormworkforMatFoundation: 1
PRS_BO_STR_MatFoundation_ReinforcementforMatFoundation: 1
PRS_BO_STR_ProtectionLayer_BituminousWPPaint: 1
PRS_BO_STR_ProtectionLayer_C20forBlinding: 1
PRS_BO_STR_ProtectionLayer_C20forLeveling: 1
PRS_BO_STR_ProtectionLayer_C20forShotcrete: 1
PRS_BO_STR_ProtectionLayer_Geotextile: 1
PRS_BO_STR_ProtectionLayer_HDPEDrainageMembrane: 1
PRS_BO_STR_ProtectionLayer_Waterstop: 1
PRS_BO_STR_Ramp_C30/37forSlabs: 1
PRS_BO_STR_Ramp_FormworkforSlabs: 1
PRS_BO_STR_Ramp_ReinforcementforSlabs: 1
PRS_BO_STR_RetainingWall_C30/37forRetainingWalls: 1
PRS_BO_STR_RetainingWall_FormworkforColumns,CW&RW: 1
PRS_BO_STR_RetainingWall_ReinforcementforRet.Walls: 1
PRS_BO_STR_Slab_C30/37forSlabs: 1
PRS_BO_STR_Slab_ReinforcementforSlabs: 1
PRS_BO_STR_Stair_C30/37forStairs: 1
PRS_BO_STR_Stair_FormworkforStairs: 1
PRS_BO_STR_Stair_ReinforcementforStairs: 1
PRS_L0_STR_Beam_C30/37forBeams: 1
PRS_L0_STR_Beam_FormworkforBeams: 1
PRS_L0_STR_Beam_ReinforcementforBeams: 1
PRS_L0_STR_Columns_C30/37forColumns&Con.Walls: 1
PRS_L0_STR_Columns_FormworkforColumns,CW&RW: 1
PRS_L0_STR_Columns_ReinforcementforColumns: 1
PRS_L0_STR_ConcreteWall_C30/37forColumns&Con.Walls: 1
PRS_L0_STR_ConcreteWall_ReinforcementforConcreteWalls: 1
PRS_L0_STR_Parapet_BituminousWPPaint: 1
PRS_L0_STR_Parapet_C30/37forParapets: 1
PRS_L0_STR_Parapet_FormworkforParapets: 1
PRS_L0_STR_Parapet_ReinforcementforParapets: 1
PRS_L0_STR_ProtectionLayer_BituminousWPPaint: 1
PRS_L0_STR_ProtectionLayer_HDPEDrainageMembrane: 1
PRS_L0_STR_ProtectionLayer_PolysterineInsulation(DOW)6cm: 1
PRS_L0_STR_ProtectionLayer_PolystyreneInsulation(DOW)3cm: 1
PRS_L0_STR_Slab_C30/37forSlabs: 1
PRS_L0_STR_Slab_FormworkforSlabs: 1
PRS_L0_STR_Slab_ReinforcementforSlabs: 1
PRS_L0_STR_Stair_C30/37forStairs: 1
PRS_L0_STR_Stair_FormworkforStairs: 1
PRS_L0_STR_Stair_ReinforcementforStairs: 1
PRS_L1_STR_Beam_C30/37forBeams: 1
PRS_L1_STR_Beam_FormworkforBeams: 1
PRS_L1_STR_Beam_ReinforcementforBeams: 1
http://localhost:8890/notebooks/Project_F.ipynb 20/40
16/04/2018 Project_F
PRS_L1_STR_Columns_C30/37forColumns&Con.Walls: 1
PRS_L1_STR_Columns_FormworkforColumns,CW&RW: 1
PRS_L1_STR_Columns_ReinforcementforColumns: 1
PRS_L1_STR_ConcreteWall_C30/37forColumns&Con.Walls: 1
PRS_L1_STR_ConcreteWall_ReinforcementforConcreteWalls: 1
PRS_L1_STR_ProtectionLayer_PolysterineInsulation(DOW)6cm: 1
PRS_L1_STR_ProtectionLayer_PolystyreneInsulation(DOW)3cm: 1
PRS_L1_STR_Slab_C30/37forSlabs: 1
PRS_L1_STR_Slab_FormworkforSlabs: 1
PRS_L1_STR_Slab_ReinforcementforSlabs: 1
PRS_L1_STR_Stair_C30/37forStairs: 1
PRS_L1_STR_Stair_FormworkforStairs: 1
PRS_L1_STR_Stair_ReinforcementforStairs: 1
PRS_L2_STR_Beam_C30/37forBeams: 1
PRS_L2_STR_Beam_FormworkforBeams: 1
PRS_L2_STR_Beam_ReinforcementforBeams: 1
PRS_L2_STR_ConcreteWall_C30/37forColumns&Con.Walls: 1
PRS_L2_STR_ConcreteWall_FormworkforColumns,CW&RW: 1
PRS_L2_STR_ConcreteWall_ReinforcementforColumns: 1
PRS_L2_STR_Parapet_C30/37forParapets: 1
PRS_L2_STR_Parapet_FormworkforParapets: 1
PRS_L2_STR_Parapet_ReinforcementforParapets: 1
PRS_L2_STR_ProtectionLayer_Bituminous(asphalt)WaterProof: 1
PRS_L2_STR_ProtectionLayer_BituminousWPPaint: 1
PRS_L2_STR_ProtectionLayer_C20forlightweightConcrete: 1
PRS_L2_STR_ProtectionLayer_NylonVaporBarrier: 1
PRS_L2_STR_ProtectionLayer_PolysterineInsulation(DOW)6cm: 1
PRS_L2_STR_ProtectionLayer_PolystyreneInsulation(DOW)3cm: 1
PRS_L2_STR_ProtectionLayer_Screed: 1
PRS_L2_STR_Slab_C30/37forSlabs: 1
PRS_L2_STR_Slab_FormworkforSlabs: 1
PRS_L2_STR_Slab_ReinforcementforSlabs: 1
PRS_L2_STR_Stair_C30/37forStairs: 1
PRS_L2_STR_Stair_FormworkforStairs: 1
PRS_L2_STR_Stair_ReinforcementforStairs: 1
PRS_RF_STR_Beam_C30/37forBeams: 1
PRS_RF_STR_Beam_FormworkforBeams: 1
PRS_RF_STR_Beam_ReinforcementforBeams: 1
PRS_RF_STR_ProtectionLayer_BituminousWPPaint: 1
PRS_RF_STR_Slab_C30/37forSlabs: 1
PRS_RF_STR_Slab_FormworkforSlabs: 1
PRS_RF_STR_Slab_ReinforcementforSlabs: 1
Number of unique values: 94
http://localhost:8890/notebooks/Project_F.ipynb 21/40
16/04/2018 Project_F
In [46]:
i=0
u, c = np.unique(s['Cost Overrrun'], return_counts=True)
for z in zip(list(u), list(c)):
i+=1
print('{}: {}'.format(z[0], z[1]))
print ("Number of unique values: {}".format(i))
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
<ipython-input-46-ce44a94148b9> in <module>()
1 i=0
----> 2 u, c = np.unique(s['Cost Overrrun'], return_counts=True)
3 for z in zip(list(u), list(c)):
4 i+=1
5 print('{}: {}'.format(z[0], z[1]))
~\Anaconda3\lib\site-packages\numpy\lib\arraysetops.py in unique(ar, r
eturn_index, return_inverse, return_counts, axis)
208 ar = np.asanyarray(ar)
209 if axis is None:
--> 210 return _unique1d(ar, return_index, return_inverse, ret
urn_counts)
211 if not (-ar.ndim <= axis < ar.ndim):
212 raise ValueError('Invalid axis kwarg specified for uni
que')
~\Anaconda3\lib\site-packages\numpy\lib\arraysetops.py in _unique1d(a
r, return_index, return_inverse, return_counts)
275 aux = ar[perm]
276 else:
--> 277 ar.sort()
278 aux = ar
279 flag = np.concatenate(([True], aux[1:] != aux[:-1]))
In [91]:
x=oldest_Date=s['START'].min()
y=newest_Date=s['START'].max()
x = str(x)
y = str(y)
date1 = x.split(" ")[0]
date2 = y.split(" ")[0]
print("Date range is: {} - {}".format(date1,date2))
http://localhost:8890/notebooks/Project_F.ipynb 22/40
16/04/2018 Project_F
In [92]:
x=oldest_Date=s['FINISH'].min()
y=newest_Date=s['FINISH'].max()
x = str(x)
y = str(y)
date1 = x.split(" ")[0]
date2 = y.split(" ")[0]
print("Date range is: {} - {}".format(date1,date2))
In [ ]:
In [53]:
for r in range(len(rcd.UnitPrice)):
if np.isnan(rcd.UnitPrice[r]):
rcd.UnitPrice[r]=(float(rcd.TotalCost[r])/float(rcd.Quantity[r]))
C:\Users\Paco\Anaconda3\lib\site-packages\ipykernel_launcher.py:3: Set
tingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
In [55]:
any(rcd.UnitPrice.isna())
Out[55]:
False
The approach/assumption we made is to fill the missing data from height,length and thickness with the
average of those values depending on their BOQCategory
Example: We filled the missing values of height with the BOQ Category "Concrete" with the average of the
excisting values which have the same category
http://localhost:8890/notebooks/Project_F.ipynb 23/40
16/04/2018 Project_F
In [57]:
list=[]
for r in range(len(rcd.Height)):
if np.isnan(rcd.Height[r]):
list.append(rcd.BOQCategory[r])
print(np.unique(list))
For the next step we calculate the averages of all the categories for height length and thickness
In [58]:
1.69
In [59]:
----------------------------------------------------------------------
-----
ZeroDivisionError Traceback (most recent call
last)
<ipython-input-59-0b2560786b2d> in <module>()
8 j=j+1
9 sum1=sum1+rcd.Height[r]
---> 10 y=round((sum1/j),2)
11 print(y)
That means that j is zero which means that there are no excisting values of height with the category earth
http://localhost:8890/notebooks/Project_F.ipynb 24/40
16/04/2018 Project_F
moving
In [60]:
sum1=0
j=0
for r in range(len(rcd.Height)):
if rcd.BOQCategory[r]=='Formwork':
if (np.isnan(rcd.Height[r])==False):
j=j+1
sum1=sum1+rcd.Height[r]
heb=round((sum1/j),2)
print(heb)
1.28
In [61]:
sum1=0
j=0
for r in range(len(rcd.Height)):
if rcd.BOQCategory[r]=='Protection Layers':
if (np.isnan(rcd.Height[r])==False):
j=j+1
sum1=sum1+rcd.Height[r]
hplb=round((sum1/j),2)
print(hplb)
1.87
In [62]:
1.68
Now we will fill the missing values with the averages except for those with boqcategory of earthmoving. in
those we decided to fill them with 0
http://localhost:8890/notebooks/Project_F.ipynb 25/40
16/04/2018 Project_F
In [63]:
for r in range(len(rcd)):
if np.isnan(rcd.Height[r]):
if rcd.BOQCategory[r]=='Concrete':
rcd.Height[r] = x
if rcd.BOQCategory[r]=='Formwork':
rcd.Height[r] = heb
if rcd.BOQCategory[r]=='Protection Layers':
rcd.Height[r] = hplb
if rcd.BOQCategory[r]=='Reinforcement':
rcd.Height[r] = hrb
if rcd.BOQCategory[r]=='Earth Moving':
rcd.Height[r] = 0
C:\Users\Paco\Anaconda3\lib\site-packages\ipykernel_launcher.py:6: Set
tingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
C:\Users\Paco\Anaconda3\lib\site-packages\ipykernel_launcher.py:4: Set
tingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
http://localhost:8890/notebooks/Project_F.ipynb 26/40
16/04/2018 Project_F
In [64]:
any(rcd.Height.isna())
Out[64]:
False
In [65]:
5.32
In [66]:
----------------------------------------------------------------------
-----
ZeroDivisionError Traceback (most recent call
last)
<ipython-input-66-62367c54e052> in <module>()
8 j=j+1
9 sum1=sum1+rcd.Length[r]
---> 10 leb=round((sum1/j),2)
11 print(leb)
http://localhost:8890/notebooks/Project_F.ipynb 27/40
16/04/2018 Project_F
In [67]:
5.23
In [68]:
6.43
In [69]:
5.15
http://localhost:8890/notebooks/Project_F.ipynb 28/40
16/04/2018 Project_F
In [70]:
for r in range(len(rcd)):
if np.isnan(rcd.Length[r]):
if rcd.BOQCategory[r]=='Concrete':
rcd.Length[r] = lcb
if rcd.BOQCategory[r]=='Formwork':
rcd.Length[r] = lfb
if rcd.BOQCategory[r]=='Protection Layers':
rcd.Length[r] = lplb
if rcd.BOQCategory[r]=='Reinforcement':
rcd.Length[r] = lrb
if rcd.BOQCategory[r]=='Earth Moving':
rcd.Length[r] = 0
C:\Users\Paco\Anaconda3\lib\site-packages\ipykernel_launcher.py:6: Set
tingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
C:\Users\Paco\Anaconda3\lib\site-packages\ipykernel_launcher.py:4: Set
tingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
http://localhost:8890/notebooks/Project_F.ipynb 29/40
16/04/2018 Project_F
In [71]:
any(rcd.Length.isna())
Out[71]:
False
In [72]:
0.4
In [73]:
----------------------------------------------------------------------
-----
ZeroDivisionError Traceback (most recent call
last)
<ipython-input-73-11c4aa3fe771> in <module>()
8 j=j+1
9 sum1=sum1+rcd.Thickness[r]
---> 10 emb=round((sum1/j),2)
11 print(emb)
http://localhost:8890/notebooks/Project_F.ipynb 30/40
16/04/2018 Project_F
In [74]:
sum1=0
j=0
for r in range(len(rcd.Thickness)):
if rcd.BOQCategory[r]=='Formwork':
if (np.isnan(rcd.Thickness[r])==False):
j=j+1
sum1=sum1+rcd.Thickness[r]
teb=round((sum1/j),2)
print(teb)
0.28
In [75]:
sum1=0
j=0
for r in range(len(rcd.Thickness)):
if rcd.BOQCategory[r]=='Protection Layers':
if (np.isnan(rcd.Thickness[r])==False):
j=j+1
sum1=sum1+rcd.Thickness[r]
tplb=round((sum1/j),2)
print(tplb)
0.12
In [76]:
sum1=0
j=0
for r in range(len(rcd.Thickness)):
if rcd.BOQCategory[r]=='Reinforcement':
if (np.isnan(rcd.Thickness[r])==False):
j=j+1
sum1=sum1+rcd.Thickness[r]
trb=round((sum1/j),2)
print(trb)
0.39
http://localhost:8890/notebooks/Project_F.ipynb 31/40
16/04/2018 Project_F
In [77]:
for r in range(len(rcd)):
if np.isnan(rcd.Thickness[r]):
if rcd.BOQCategory[r]=='Concrete':
rcd.Thickness[r] = lcb
if rcd.BOQCategory[r]=='Formwork':
rcd.Thickness[r] = lfb
if rcd.BOQCategory[r]=='Protection Layers':
rcd.Thickness[r] = lplb
if rcd.BOQCategory[r]=='Reinforcement':
rcd.Thickness[r] = lrb
if rcd.BOQCategory[r]=='Earth Moving':
rcd.Thickness[r] = 0
C:\Users\Paco\Anaconda3\lib\site-packages\ipykernel_launcher.py:6: Set
tingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
C:\Users\Paco\Anaconda3\lib\site-packages\ipykernel_launcher.py:4: Set
tingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
http://localhost:8890/notebooks/Project_F.ipynb 32/40
16/04/2018 Project_F
In [78]:
any(rcd.Length.isna())
Out[78]:
False
Now we will figure out how what materials are used for building walls in the schedule table
In [79]:
def contains_walls(x):
return 'walls' in x.lower()
In [80]:
list1=[]
for i in range(len(s.BOQ)):
if contains_walls(s.BOQ[i])==True:
list1.append(s.ConstructionElementType[i])
print(np.unique(list1))
We will split the variable Quantity into 2 groups low if the quantity value is lower than 10 or higher if quantity
higher than 10.Those values will be stored in a new column called BinQuantity
In [81]:
BinQuantity=[]
for r in range(len(rcd.Quantity)):
if rcd.Quantity[r]>=10.0:
BinQuantity.append('high')
else:
BinQuantity.append('low')
rcd['BinQuantity'] =BinQuantity
http://localhost:8890/notebooks/Project_F.ipynb 33/40
16/04/2018 Project_F
In [82]:
rcd.head()
Out[82]:
Construction Construction
Scope Element ID Element ConstructionElementPart BOQCategory B
Type Family
C30
0 PRS_RF_STR Beam 47933 ConcreteWork RC_Beam C-C/C-CW Concrete
Bea
C30
1 PRS_RF_STR Beam 47951 ConcreteWork RC_Beam C-C/C-CW Concrete
Bea
C30
2 PRS_RF_STR Beam 47942 ConcreteWork RC_Beam C-C/C-CW Concrete
Bea
C30
3 PRS_RF_STR Beam 47960 ConcreteWork RC_Beam C-C/C-CW Concrete
Bea
C30
4 PRS_L2_STR Beam 46084 ConcreteWork RC_Beam C-C/C-CW Concrete
Bea
In [94]:
In [95]:
http://localhost:8890/notebooks/Project_F.ipynb 34/40
16/04/2018 Project_F
In [96]:
labels = pd.value_counts(rcd.BOQCategory).keys()
sizes = pd.value_counts(rcd.BOQCategory)
plt.title('BOQ Category')
plt.legend(labels, bbox_to_anchor=(1, 1.05))
Out[96]:
<matplotlib.legend.Legend at 0x1449ec0fe10>
http://localhost:8890/notebooks/Project_F.ipynb 35/40
16/04/2018 Project_F
In [97]:
# Statistical information:
# print('Average age: {:.2f}%'.format(df.horsepower.mean()*100))
# print('Standard deviation: {:.2f}%'.format(df.horsepower.std()*100))
# print('Skewness: {:.2f}%'.format(df.horsepower.skew()*100))
# print('Kurtosis: {:.2f}%'.format(df.horsepower.kurtosis()*100))
# Distplot:
ax = sns.distplot(rcd.Length )
# Auxiliary information:
mn = rcd.Length.mean()
mx = ax.lines[0].get_ydata().max()
# Title:
ax.set_title('Length in the DataFrame')
# Annotation:
plt.annotate('mean', [mn, mx], xytext=[mn*1.1, mx*1.1], fontsize=10,
arrowprops=dict(arrowstyle="->", connectionstyle="arc3,rad=.2", color=
Out[97]:
Text(5.6112,0.199828,'mean')
http://localhost:8890/notebooks/Project_F.ipynb 36/40
16/04/2018 Project_F
In [110]:
#plt.scatter(x=Q,y=T)
rcd[rcd.Quantity < 3000]
#[Q<5000,T<50000]
sns.regplot(rcd[(rcd.Quantity < 3000) & (rcd.TotalCost < 30000)].Quantity,rcd[(rcd.Q
Out[110]:
<matplotlib.axes._subplots.AxesSubplot at 0x144a25eb0b8>
In [ ]:
http://localhost:8890/notebooks/Project_F.ipynb 37/40
16/04/2018 Project_F
In [118]:
Out[118]:
Text(0.5,1,'pie BOQ')
http://localhost:8890/notebooks/Project_F.ipynb 38/40
16/04/2018 Project_F
In [126]:
rcd[rcd.TotalCost<1000]
sns.boxplot(rcd[rcd.TotalCost < 1000].BOQCategory,rcd[rcd.TotalCost < 1000].TotalCos
plt.title("TotalCost-BOQCategory")
rcd[rcd.TotalCost<1000]
rcd[rcd.TotalCost<1000]
sns.barplot(rcd[rcd.TotalCost < 1000].BOQCategory,rcd[rcd.TotalCost < 1000].TotalCos
plt.title("TotalCost-BOQCategory")
Out[126]:
Text(0.5,1,'TotalCost-BOQCategory')
http://localhost:8890/notebooks/Project_F.ipynb 39/40
16/04/2018 Project_F
In [106]:
Out[106]:
Text(0.5,1,'TotalCost vs BOQCategory')
In [ ]:
In [ ]:
http://localhost:8890/notebooks/Project_F.ipynb 40/40