You are on page 1of 7

Print Go Back Next Page

Condition Survey Data Warehouse: Analysing


Data For Component Life Estimation
P Giblin
Building Performance Group, London, United Kingdom

Summary: During the 1990s, the British Government required all providers of social housing to
carry out surveys on their stock to ascertain its condition and maintenance requirements. As a
result, a large body of data has been collected nationally. This study investigates the feasibility of
analysing the data to estimate components’ lives.

A data warehouse was built using data from one of the surveying firms involved in collecting the
data (Building Performance Group). The dataset was processed to reduce ambiguity and
standardise repair descriptions. Window and roof replacement data were extracted and the years
to renew the components were analysed.

The data collected seems to be broadly in line with existing sources of durability information and indicates
that collating data from different sources can be used to get a good estimation of the service lives of
components and assemblies.

Keywords. Data warehouse, stock condition survey, durability, data mining.

1 INTRODUCTION
During the 1990s, the British Government ordered all providers of social housing, such as local authorities and housing
associations, to carry out surveys of their housing stock to assess funding requirements for repairs. These surveys were carried
out directly by the local authority or by consultant surveying firms.
Building Performance Group have been undertaken many stock condition surveys over this time. The collected data represents
the aggregated estimates of repair needs and times as perceived by a number of surveyors.
Compiling the data from these surveys provided a large data set that could be analysed to determine component lives. 6
surveys were chosen to test the hypothesis that condition survey data could be used to estimate component lives. These
represented a number of different dwelling types such as flats, houses and high rise buildings.
2 METHOD

2.1 Data warehouse concept


A data warehouse is an interface for end–users to access specially prepared data to be used with decision support systems or
executive information systems. It is typically a collection of data from a number of databases, categorised and denormalised
into a flat–file database (table) which was used in this project to extract relevant information.

2.2 Software used


The software used for this prototype was a combination of Microsoft Access as the database management system (DBMS) and
user interface. Minitab was used to output the statistical information. MS Access was chosen as it is in widespread use in the
organisation. Minitab was used to get a feel for the type of reports required without writing routines in a dedicated report
package such as Crystal Reports.

2.3 Databases
The survey data were stored in a number of formats, SQL Server (mdf), dBase (dbf) and MS Access (mdb) filetypes. Open
Database Connectivity (ODBC) was used to link tables in order to process and clean the data and make new tables.

9DBMC-2002 Paper 080 Page 1


Go Back Next Page

SQL server
databases

Access databases Cleanse data Data mart

dBase databases

Figure 1. Data warehouse architecture for


condition survey data access

2.4 Data warehouse


The warehouse database was used to query the survey databases to make new flat file tables containing the data
fields described above. This table contained the original (unchanged) data as a 'look up' table to speed up
processing and to ensure the integrity of the original tables was maintained.
The flat file table was copied for further data cleansing and processing. This proved to be a time consuming
exercise. In particular, repair descriptions needed to be standardised. There were numerous ways of describing a
particular repair and these were reduced to a generic component type repair field and a material field, for example,
‘repaint external door’ and ‘gloss paint to external door’ were altered to ‘decorate external door”.

2.5 Components analysed


Because of the amount of data cleansing in the repair description field, it was decided to use a subset of the
components to test the system. External windows and pitched roof coverings were chosen as typical cases. This
paper discusses the data and analysis on external windows.

2.6 Survey data


The data present in the database consists of a property identifier, post code, year-band of construction of the
property, Building Cost Information Service (BCIS) code(BCIS, ), repairs and replacement requirements as
determined by the surveyor, quantities and the year when the repair should be undertaken. The fields selected for
inclusion in the data mart were: construction year band, repair year, survey year, post code, project identifier,
property identifier, BCIS code, repair description, repair quantity and property type. The BCIS Standard Form of
Cost Analysis for Building Projects is used throughout the UK to provide data which allows comparisons to be
made between the cost of achieving various building functions in one project with that of achieving equivalent
functions in other projects.
The quantity information has not been used for this exercise as it was felt that they could skew the results therefore
each record refers to an actual individual observation by the surveyor but this data may be used for other analyses.

2.7 Data representation

Construction year band


The dates of construction were represented by the following year bands.
Pre 1920
1920-1935
1936-1945
1946-1960
1961-1980
Post 1981

9DBMC-2002 Paper 080 Page 2


Go Back Next Page

In order to determine the years from construction to component renewal, it was necessary to represent the year bands by a
specific year.
An adjusted construction year field was also added. This was to provide an estimate for prior renewal of components, for
example: If the construction year was earlier than 1975 and windows are PVCu then adjusted construction year is 1985 or else
the adjusted construction year is equal to construction year. The adjustments are described in the figure notes where
appropriate.

2.8 Data processing


The data was collected over a number of surveys, in various locations in the United Kingdom over a period of 5 years. As a
result, component and repair descriptions varied widely, particularly where the site surveyors recorded their results in free text
rather than choosing from a predefined list. This was more marked where the data was collected using paper forms rather than
electronically.
A repair record included a BCIS code. While this is an industry standard, it is a high level code and does not adequately
describe repairs. The data was filtered by BCIS code and then the repair descriptions were standardised by using the search and
replace capabilities of MS Access, inserting standard repair descriptions such as, 'Replace windows - PVCu' and finally by
inspecting the remaining individual records and replacing the repair data with the standardised text. Where ambiguity still
existed after this exercise, the record was discarded. From a total 250,000 records, 150,000 were considered usable for data
mining. The discarded records were either ambiguous or appeared to be incorrect entries.
The database was then filtered for window data, using the BCIS coding system and filtering for 'window', 'wndw' and so on in
the repair data field. This was resulted in 6590 records from 3014 properties where window replacement was unambiguously
described.
3 RESULTS FOR WINDOWS
Window Frame Material Count Percentage of Total (%)
Aluminium 354 5.37
hardwood 108 1.64
PVCu 242 3.67
Softwood 5559 84.36
Steel 327 4.96
N= 6590
Table 1: Sample size and percentages of window types
D e scrip tive S tatistic s
V a ria b le : Y e a rs To R e n e w

A n d er son - D a r lin g No r m ality Te st


A -S q ua r ed : 3 1 9.7 8 9
P -V a lue : 0.0 0 0

M ea n 4 1 .96 9 8
S tD ev 1 9 .89 1 3
V ar ia n c e 3 9 5.6 6 2
S kewn ess 1 .4 65 4 7
K ur to sis 1 .7 93 3 3
N 65 9 0
6 16 26 36 46 56 66 76 86 96 10 6 1 1 6 1 26
M in im u m 6.0 0 0
1st Qu a rtile 2 8.0 0 0
M ed ian 3 6.0 0 0
3r d Qu ar tile 5 0.0 0 0
9 5 % C o n fid e n c e In te rv a l fo r M u M axim u m 1 2 8.0 0 0
95 % C on fide n c e In te rval fo r M u
41 .4 89 4 2.4 5 0
36 37 38 39 40 41 42 43 9 5% C o n fid en c e In ter va l fo r S igm a
19 .5 57 2 0.2 3 7
9 5 % C on fid e nc e I nter va l fo r M ed ia n
9 5 % C o n fid e n ce In te rv a l fo r M e d ia n
36 .0 00 3 6.0 0 0

Figure 2. All windows.

9DBMC-2002 Paper 080 Page 3


Go Back Next Page

Descriptive Statistics
Variable: Years To Renew

Anderson-Darling Normality Test


A-Squared: 273.170
P-Value: 0.000

Mean 41.8120
StDev 20.1995
Variance 408.021
Skewness 1.39353
Kurtosis 1.50736
N 5559
6 16 26 36 46 56 66 76 86 96 106 116 126
Minimum 6.000
1st Quartile 28.000
Median 36.000
3rd Quartile 53.000
95% Confidence Interval for Mu Maximum 128.000
95% Confidence Interval for Mu
41.281 42.343
34 35 36 37 38 39 40 41 42 43 95% Confidence Interval for Sigma
19.831 20.582
95% Confidence Interval for Median
95% Confidence Interval for Median
34.000 36.000

Figure 3. All softwood windows


The data seems to be skewed towards the left with a long tail towards the right. This is possibly due to the lack of information
on component renewal after construction however there seem to be four distinct peaks which may indicate softwood windows
of differing quality.

Descriptive Statistics

Variable: Years To Renew


Anderson-Darling Normality Test
A-Squared: 76.881
P-Value: 0.000

Mean 30.6844
StDev 8.0351
Variance 64.5632
Skewness 1.18170
Kurtosis 2.18589
N 3631
8 13 18 23 28 33 38 43 48 53 58
Minimum 6.0000
1st Quartile 26.0000
Median 29.0000
3rd Quartile 34.0000
95% Confidence Interval for Mu Maximum 59.0000
95% Confidence Interval for Mu
30.4229 30.9458
29 30 31 95% Confidence Interval for Sigma
7.8545 8.2243
95% Confidence Interval for Median
95% Confidence Interval for Median
29.0000 30.0000

Figure 4. Softwood windows. Date of construction >=1970

9DBMC-2002 Paper 080 Page 4


Go Back Next Page

Descriptive Statistics

Variable: Years To Renew

Anderson-Darling Normality Test


A-Squared: 11.447
P-Value: 0.000

Mean 83.6204
StDev 24.0061
Variance 576.294
Skewness -1.15945
Kurtosis -1.5E-01
N 108
35 50 65 80 95 110
Minimum 31.000
1st Quartile 69.000
Median 97.000
3rd Quartile 97.000
95% Confidence Interval for Mu Maximum 112.000
95% Confidence Interval for Mu
79.041 88.200
78 88 98 95% Confidence Interval for Sigma
21.176 27.717
95% Confidence Interval for Median
95% Confidence Interval for Median
92.000 97.000

Figure 5. Hardwood windows


While the data sample is relatively small, the mean is very high and may indicate that the windows were replaced at some time.

Descriptive Statistics
Variable: Years To Renew

Anderson-Darling Normality Test


A-Squared: 1.446
P-Value: 0.001

Mean 36.3955
StDev 9.5574
Variance 91.3446
Skewness 0.487065
Kurtosis 0.622464
N 354
20 28 36 44 52 60 68
Minimum 19.0000
1st Quartile 29.0000
Median 37.0000
3rd Quartile 42.0000
95% Confidence Interval for Mu Maximum 70.0000
95% Confidence Interval for Mu
35.3964 37.3945
35 36 37 95% Confidence Interval for Sigma
8.9014 10.3186
95% Confidence Interval for Median
95% Confidence Interval for Median
35.0000 37.0000

Figure 6. Aluminium windows.


It was assumed that if the construction year <= 1960, then the windows were renewed in 1970.

9DBMC-2002 Paper 080 Page 5


Go Back Next Page

Descriptive Statistics
Variable: Years To Renew

Anderson-Darling Normality Test


A-Squared: 14.027
P-Value: 0.000

Mean 37.9235
StDev 6.9836
Variance 48.7702
Skewness 0.105785
Kurtosis 3.40473
N 327
20 30 40 50 60 70
Minimum 18.0000
1st Quartile 35.0000
Median 39.0000
3rd Quartile 40.0000
95% Confidence Interval for Mu Maximum 74.0000
95% Confidence Interval for Mu
37.1638 38.6833
37 38 39 95% Confidence Interval for Sigma
6.4862 7.5642
95% Confidence Interval for Median
95% Confidence Interval for Median
39.0000 39.0000

Figure 7. Steel windows.


It was assumed that if the construction year <= 1955, then the windows were renewed in 1965.

Descriptive Statistics

Variable: Years To Renew


Anderson-Darling Normality Test
A-Squared: 8.880
P-Value: 0.000

Mean 40.6281
StDev 10.2152
Variance 104.351
Skewness 0.649125
Kurtosis 0.548616
N 242
15.0 22.5 30.0 37.5 45.0 52.5 60.0
Minimum 13.0000
1st Quartile 37.0000
Median 38.0000
3rd Quartile 45.0000
95% Confidence Interval for Mu Maximum 63.0000
95% Confidence Interval for Mu
39.3346 41.9216
37 38 39 40 41 42 95% Confidence Interval for Sigma
9.3790 11.2165
95% Confidence Interval for Median
95% Confidence Interval for Median
37.2792 40.0000

Figure 8. PVCu windows.


It was assumed that if the construction year <= 1975, then the windows were renewed in 1985.

9DBMC-2002 Paper 080 Page 6


Go Back

4 DISCUSSION
The descriptive statistics graphs show the histogram and normal curve of the distribution of years to renew by quantity. The P-
Value is low and indicates that the data is not normal throughout all types.
The component lives seem to be higher than indicated by Ahluwalia and Shackford (Ahluwalia and Shackford, 1993), who
state the life expectancy of wood or aluminium casements to be 10-20 years and the HAPM Component Life Manual (HAPM
Publications Ltd, 2000)(0-35 years) but seem to be broadly in line with the estimated service life of components (ESLC) as
determined by the factoring method (BS ISO 15686-1:2000, 2000). While not investigated in this preliminary study, there is
scope to investigate the effect of environmental conditions on components from the demographic and environmental
information available for post–codes.
5 RECOMMENDATIONS

5.1 Classification system


The use of a common code to categorise component type and material is essential to ease data analysis. Cleaning the repair
description data is too time consuming for a data warehouse and as stated previously, the BCIS code is not sufficiently ‘low
level’ for this categorisation.
An alternative coding system such as the Uniclass classification system provides a suitable level for this application, for
instance, softwood windows in external walls can be represented succinctly by the code G251:G321 which comprises the
codes for External Wall (G251) and Windows (G321). It is recommended that this classification is added to the data capture
software or inserted as part of post-survey data processing. A publicly available classification system also enables the data to
be linked to other databases such as the HAPM component database for comparison against the insured life of similar
components (HAPM Publications Ltd, 2000).

5.2 Repair and renewal history


Unfortunately the date of renewal of a component was not available to the survey teams. It may be that this information is held
on the local authority or housing association databases and should be requested at the start of a survey. While this information
is generally not needed for a condition survey, it would enable much more accurate results on component durability to be
obtained.

5.3 Data sharing


The data for a large number of stock condition surveys exists nationally. All too often, once the data is processed and the
results delivered to the client, the database is unused.
A national condition survey data repository (similar to the BCIS scheme for construction and maintenance costs) would ensure
that the data is made available to interested parties. For confidentiality, identifying fields such as addresses, could be removed
and the system could be made available to participants and subscribers.
6 CONCLUSIONS
Data about quality was not collected so comparisons of components of varying standards cannot be made.
The component lives seem to be higher than indicated (Ahluwalia and Shackford, 1993) but broadly in line with components
lives estimated using the factoring method.
The time taken to clean the data vastly exceeded expectations. While it was thought that data cleaning would be a time–
consuming exercise, it turned out to be an extremely onerous task. A large number of records were unusable because of
missing or ambiguous data, 150,000 useful records were retrieved from the original 250,000 records for all repair and
replacement types.
The data collected seems to be broadly in line with existing sources of durability information and indicates that collating data
from different sources can be used to get a good estimation of the service lives of components and assemblies.
7 REFERENCES
1. Ahluwalia, G. and Shackford, A. (1993) Life Expectancy of Housing Components. Journal of Housing Economics
2. BCIS Standard Form of Cost Analysis, (2001), http://www.bcis.co.uk/sfca.html
3. BS ISO 15686-1:2000. Buildings and constructed assets - Service life planning. 2000. (GENERIC)
4. HAPM Publications Ltd (2000) HAPM Component Life Manual, London: E & F N Spon, ISBN 0419249109

9DBMC-2002 Paper 080 Page 7

You might also like