You are on page 1of 17

Making Statistical Data More Available

Author(s): Bo Sundgren
Source: International Statistical Review / Revue Internationale de Statistique, Vol. 64, No. 1 (Apr.
, 1996), pp. 23-38
Published by: International Statistical Institute (ISI)
Stable URL: http://www.jstor.org/stable/1403422
Accessed: 01-02-2016 18:58 UTC

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/
info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact support@jstor.org.

Wiley and International Statistical Institute (ISI) are collaborating with JSTOR to digitize, preserve and extend access to
International Statistical Review / Revue Internationale de Statistique.

http://www.jstor.org

This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTC
All use subject to JSTOR Terms and Conditions

in Mexico
Review(1996),64, 1, 23-38,Printed
International
Statistical
InternationalStatisticalInstitute

Statistical

Making

Data

More

Available

Bo Sundgren
Statistics Sweden, S-115 81 Stockholm,Sweden

Summary
Will statisticalofficesbe able to meet new challengesfrom the users to make statisticaldata more
availableby meansof moderntechnology?Cantheydo this withinexistingbudgetrestrictions,and with
due considerationto the interestsof data providers?Theseare questionsaddressedhere.Problemsand
opportunitiesare illustratedby examplesfromSweden.
Standard
Officialstatistics;Datadissemination;
Metadata;
interfaces;StanKey words:Statisticsproduction;
Statisticaldatabases;
Statisticalinformation
dardizedsoftware;Systemdevelopment;
systems.
Confidentiality;
1 New Challenges for Statistics Producers
Statistics producers in national statistical offices are facing new expectations, demands, and
requirementsfrom several directions:
* from statistics users, who want faster, easier, and less expensive access to statistical data throughmedia and routinesthatare betteradaptedto theirown processing needs;
* from data providers, who demand less burdensomereporting- throughmedia and routines
that are betteradaptedto their own informationsystems;
* from governmentsand tax-payers,who want "morevalue for less money";
* from internationalorganisations,requestingmembercountriesto providetimely,comparable,
good quality statistics,which comply with internationalstandards.
Technologicalprogressis takingplace as rapidlyas ever. All the above-mentionedstake-holders
in statistics productionexpect statisticsproducersto take full advantageof advancesin technology.
This paper will discuss how statistics producerscan respondto some of the challenges. The paper
focuses on how statisticaloffices can make statisticaldata more availableto statistics users, while
satisfying restrictionsgiven by scarce resourcesand the willingness of dataprovidersto co-operate.
2

User-Orientation and User-Friendliness

There is a need to review the concepts of user-orientationand user-friendliness.It has become a


widely accepted dogma that informationshould be user-orientedand user-friendly.All information
system designerspay lip services to this dogma.To be fair,most designerssincerelybelieve they are
developing systems characterisedby user-orientationand user-friendliness,althoughthey have since
long stoppedthinkingmore deeply aboutthe meaningof these concepts.
In the early ages of computerusage, that is in the 1960's, the direct user of a computerhad to
be a computerprogrammer.Since most computerapplicationsin those days were mathematically
oriented(as suggested by the word computeritself), it meant a step forwardfrom the user's point of
view, when the user/mathematiciancould communicatewith the computerby means of mathematical

This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTC
All use subject to JSTOR Terms and Conditions

24

B. SUNDGREN

formulae(like in FORTRAN)ratherthanhavingto programin machinecode or assemblerlanguages.


The programminglanguage COBOL meant a similar step forwardfor users/programmersoriented
towardsadministrativeapplications.
In a statisticaloffice thereare numerousinformationsystems applicationsof moreor less the same
kind: statistics production.As systematisedby figure 1, a statisticalproductionprocess includes a
numberof very typical functions like frame administration,sampling, data collection, data entry,
coding, editing, estimation,tabulation,analysis, and presentation.In the late 1960's there were few
other organisations,if any, which had a similar opportunityto exploit economies of scale in the
developmentof computerapplications.Thus, not surprisingly,statisticaloffices became pioneersin
the developmentof generalisedsoftware.These softwareproductsoften supportedhigh-level, nonproceduralcommandlanguages, which enabled non-programmersto develop applicationswithin a
certainapplicationareaby simply specifying
(i) the inputdatato the application,e.g. a so-called flatfile with a certainrecordlayout;and
(ii) the requestedoutputfrom the application,e.g. a statisticaltable with a certaincontents
and a certainlayout.
The variabilityof applicationsdeveloped with tools of this type has to be relativelylimited. This
conditionis satisfiedby the functionscorrespondingto productionsteps of a typicalstatisticalsurvey.
The high-level, non-proceduralcommandlanguagesrepresenteda certaindegree of end-userorientationin a computingenvironmentthat was based upon mainframecomputercentres operatedas
closed shops and in batch mode. In the early 1970's user-orientationand user-friendlinessbecame
more or less synonymous with person/computerinteractionthroughmenu-driveninformationsystems. Certainlythese systems helpedto bridgethe gapbetweenthe computerandits non-programmer
end-users.Nevertheless it was still very much the computerthat controlledthe user ratherthan the
other way around.The user could choose his route throughthe hierarchyimplied by the menus of
the menu-drivensystem, but he could not affect the hierarchyas such, and he had to go throughthe
hierarchylevel by level in a ratherrigid way.
The introductionof powerful,inexpensivemicro-computersin the beginningof the 1980's added
several new dimensions to the concepts of user-orientationand user-friendliness.First of all the
new technology meant that the closed mainframeshops could be closed for good as far as many of
the users were concerned.The users suddenly found themselves in control of computerresources
in much the same way as they alreadywere in control of other resourcesnecessary for their daily
work. The computerbecame demystified.Furthermore,the new technology finally enabledthe user
to take controlof the computerratherthanthe otherway around.This possibilitymaterialisedin the
windowing techniquespioneeredby Xerox, followed up by Apple, and successfully mass-marketed
by Microsoft.
Todaypracticallyevery user of statisticsis a user of computersas well. He has his own computer
in the office, at home, and when travelling.He demandsto choose whateversoftwarehe prefers to
retrieve, process, and analyse statisticaldata. Throughstandardisednetworkservices (in his own
office as well as world-wide)he is able to communicateandco-operatewith otherhumanbeings and
othercomputers,and he is able to do this very much on his own conditions.
Naturally,in this situationthere is not-and cannot be-a single concept of user-orientationand
user-friendliness.Differentusers have differentneeds, differentresources,anddifferentpreferences.
There are indeed a wide variety of user profiles, as suggested by figure 2. It would be futile for
a statistical office to try and satisfy all these differentrequirementswith one and the same notion
of user-orientationand user-friendliness.On the other hand, it would be equally futile to try and
tailor specific productsand services for each potentialuser of statistics.The challenge for a modern
statisticaloffice is to offer a multitudeof productsand services rangingfrom
* simple free-of-chargeproductsbased on self-service;over

This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTC
All use subject to JSTOR Terms and Conditions

MakingStatisticalData MoreAvailable

STATIST

INPUT

CAL

25

INFORMAT-ON SYSTEM

ACQUISITION

OUTPUT

AGGREGATION

Statistical
modelling

Survey
preparation

DELIVERY

Presentation

Observation

Frame

Tables

modelling

preparation

Population
Sampling

Data
Data

collection

Estimation

forms

Point

sources

estimations

___

Estimation
of
errors
sampling

Observation

Data
preparation

therpresentao

Estimation

Contact

Datapreparation
atsource

Graphs

modelling
Pmodelling

Trditiona
publications

Estimation
of
other
quality

Online
databases

Dther
estimations
andanalyses

Other
electronic
media

Data
entry
Coding
Data
editing

observa
Finalize
tionregister
Figure 1. Afunctionally orientedmodel of a statistical informationsystem.
This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTC
All use subject to JSTOR Terms and Conditions

26

B. SUNDGREN
* standard,off-the-shelfproduct/servicepackageschargedaccordingto price-lists;to
* sophisticated,tailor-madeservices providedto individualcustomerson the basis of tenders.

Standard Interfaces: Decreased Complexity and Increased Flexibility

It is a challenge for a modern statisticaloffice to be responsive to expectations,demands, and


requirementsfrom an ever more dynamic environment.Society itself, which is to be reflected by
statistical data, is changing at an ever faster rate. This leads to needs for more variability,more
flexibility,on the inputside as well as on the outputside of statisticalinformationsystems managed
by statisticaloffices.
In orderto managerequirementsfor greatervariabilityin the exchange of data with the external
world, and in orderto do this with the same or even less financialresources,a statisticaloffice must
considersystem level actions.It is not enoughjust to do "moreof the same thing"or to "runfaster".
It is necessaryto undertakemore drasticredesignactions.
Making more extensive and more systematic use of standardinterfaces are actions that may
lead to desirable system changes. Such actions may lead to a combinationof the following two
consequences:
* a drasticdecreasein the complexityof dataexchangebetween statisticalinformationsystems
and theirenvironmentsas well as betweenthe internalcomponentsof the individualstatistical
informationsystems themselves;
* a drastic increase in the (actual or potential) variabilityand flexibility in the (external and
internal)behaviourof the statisticalinformationsystems.
Both types of consequences are highly desirable.Figure 3 from Malmborg& Sundgren(1994)
illustratesthe differencesin termsof complexityand variabilitybetween
* a situationwhere two sets of systems interactdirectly in the absence of a standardinterface
(figure 3a); and
* a situationwhere the same two sets of systems interactvia a standardinterface(figure3b).
In the situation illustratedby figure 3a, the interactionformat will have to be negotiated for
each combinationof systems thatneed to interact.This will typically lead to many different,tailormade interactionformatsthat requirea lot of resources to develop and maintain.The situation is
inconvenientfromoperationpointof view as well, since everyindividualactorwill haveto remember
differentinteractionformatsfor differentinteractionpartners.If a new system is addedto any of the
two sets of systems, a new interactionformatwill have to be negotiatedfor each othersystem, with
which the new system needs to interact.
In the situationillustratedby figure3b, every system will need to develop, maintain,and operate
one single interactionprocess,the interactionwith the standardinterface.Throughthis process,every
system will be able to communicatewith all other systems, including systems that do not yet exist
but will be introducedlater. Thus, in comparisonwith the situationin figure 3a, this situation is
both less complex (to develop, maintain,and operate)and more flexible vis-&-visgrowthand other
changes in the system environment.
Figure 4 indicates a numberof places where a statisticalinformationsystem could and should
contain well designed, preferablystandardisedinterfaces.One may distinguishbetween
* external,inter-systeminterfaces;and
* internal,intra-systeminterfaces.
External interfaces ae interfacesbetween, on the one hand, the statistical informationsystem
underconsiderationand, on the otherhand

This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTC
All use subject to JSTOR Terms and Conditions

- "Analyst
- "Actor
on "International"
USERCATEGORY "Ministry"Researcher"Analyst
BY
of
/scientist" public private thefinance organisation"
N9
(O

CHARACTERISTIC
finance"

sector" sector" market"

Competence:

- subject matter
- statistical
- EDP

Knowledgeabout
relevantdata
sources:
- broad
- deep

Qualityrequirements:
- contents
- accuracy
-availability

Needs for
searchsystems,
and
documentation,
metainformation
Resources:
- hardware
- software
- expertise
- money
- "tradingobjects"

Figure 2. A schemefor analysing the profiles of differentcategories of statis

This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTC
All use subject to JSTOR Terms and Conditions

28

B. SUNDGREN
* statistics users: humanend-usersas well as other (statistical)informationsystems; these are
output-orientedinterfaces;
* dataproviders:humanrespondentsas well as other(administrative)informationsystems;these
are input-orientedinterfaces.

An example of an output-orientedstandardinterface for statistical informationsystems is the


GESMES format for representationof statisticalmacroinformationand accompanyingmeta-information. "GESMES"standsfor "GEnericStatisticalMESsage",andthe standardis developedby the
UN/EDIFACTMessage DevelopmentGroup6.1.
Similarly,on the input side, there are several UN/EDIFACTstandardformatscorrespondingto
typical documentsof differentbranchesof activityin society, e.g. trade.A generic standardfor input
messages to statisticalinformationsystems is the Raw Data ReportingMessage; see UN/EDIFACT
(1994).
By providing a statisticalinformationsystem with standardisedexternalinterfaces,the designer
makes the system open and easy to integratewith other systems, e.g. the local systems of users
and providersof statisticaldata. This is indeed a practicalapplicationof the theoreticalprinciples
illustratedin figure 3 above. By accepting data and metadatathrough standardisedinterfaces, a
statistics producerfacilitates for respondentsto provide statisticalraw data as a naturalside effect
of their own administrativeroutines. Analogously, by making (aggregatedor anonymised) data
and metadataavailablethroughstandardisedinterfaces,a statisticsproducerfacilitatesfor statistics
users to integrate statistical data from the statistics producerwith the user's own (statistical and
administrative)data for analyses and decision-making.
Statistical offices began to realise the importanceof standardisedinternal interfaces, at least
implicitly, when they startedto exploit the benefits of generalisedsoftware at a large scale in the
middle of the 1970's. As long as statistical informationsystems were completely tailor-madeby
professional programmers,who were using proceduralprogramminglanguages, there was not a
strong enough incentive to define and use standardisedinterfacesbetween software components.
It was up to the individual programmerto define suitable data structuresas well as formats and
proceduresfor data interchange.When generalisedsoftware productsgained in popularity,much
on the initiative of non-programmers,one problemwas the enormousvariabilityin data structures
and data interchangeformatsand proceduresthat were exhibitedby existing applicationsand data
files. It was first consideredto furtherdevelop the generalisedsoftwaretools in orderto make them
capable of handlingthis variability.It was soon realisedthat this would be a Sisyphus task. Instead
some statistical offices decided to standardisedata structureson the basis of the concept of a "flat
file", that is, a file containingonly one recordtype, adheringto a recordlayout with a fixed number
of fields containingthe (single) values of the attributes,or variables,of one particularinstance of a
certainobject type, e.g. a person, a household,or an enterprise.Multiplerecordtypes, hierarchical
records, and repeatinggroups were among the data structurephenomenathat were banned in this
standardisationprocess.
This standardisationof data structuresand data interchangecan be seen as a first step towards
database-orientedinformationsystems. Technicallyspeaking,therewas no physicaldatabasevisible
in those systems, where datawere storedandexchangedin sequentialfiles storedon magnetictapes.
Neverthelessthe "flatfile" standardstartedto play the samerole as the relationaldatamodel (with the
SQL interface)has in today's database-orientedsystems. Differentprocesses,controlledby different
generalised or tailor-madesoftware products, exchanged data as flat files-within and between
statisticalinformationsystems. The generalisedsoftwareproductswere often developed within the
statisticaloffices themselves, butthe same principlescould easily be appliedto commercialsoftware
as well. In fact commercial softwarecould very seldom handle more complex data structuresthan
flat files anyhow.
In a modernstatisticalinformationsystem the relationaldatamodel andthe SQL standardfor data

This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTC
All use subject to JSTOR Terms and Conditions

MakingStatisticalData MoreAvailable

Figure 3a One way of organising the interactionbetweentwo sets of systems.

Figure 3b Interactionbetween two sets of systems via a standardisedinterface.

This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTC
All use subject to JSTOR Terms and Conditions

29

B. SUNDGREN

30

USERS
PROVIDERS STATISTICS
OBSERVATION
ADMINISTRATIVE
SYSTEMS
nonymou statistics,
statistics, microdata,

observations,observation

metadata

E
E
.I
.. . . . . .....S..... . . . R.. . . R.

metadata

metadata etada

metadata

.......
R.

ES US
... ...N
. W.. . ER........

..

STATISTICS
PRODUCTION
STATISTICS
PRODUCTION
SECONDARY
REGISTER PRIMARY

data, observations,
register

statistics,

metadata metadata metadata

microdata,

metadata

macrodata,

metadata

MECHANISMS
AND
GLOBAL
METADATA
.RETRIEVAL
OBSERVATION
BASE
REGISTERS'I
REGISTERSCSTATISTICS

COLLECTIONS

ANDMETADATA
MICRODATA

TH E

DATABASE

OF

ANDMETADATA
MACRODATA

STATISTICAL

OFFICE

SYSTEMS0
SEOTHER
STATISTICAL
INFORMATION

Figure 4. A database-orientedstatistical informationsystem with clearly definedinternaland externalinterfaces.


This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTC
All use subject to JSTOR Terms and Conditions

MakingStatisticalData MoreAvailable

31

interchangebetween applicationsoftwareand the databasemanagementsystem are obvious choices


for internalinterfaces.All commercialsoftwareproductsthat want to survive on the markethave to
adhereto these standards.Anotherdefacto standard(thoughlimited to PC software)is Microsoft's
Object Linking and Embedding(OLE) for transferringdata and control between differentsoftware
components.
Figure5 indicateshow the differentfunctionsof a statisticalinformationsystem (cf figure 1) could
be designed to interfacethe databaseincludingmicrodata,macrodata,and metadata.
No standardsare for ever. Maybe in five or ten years time today's de facto standardswill have
become replacedby others,e.g. a widely acceptedstandardfor object-orienteddatabasemanagement.
This is not a great problem.It is relativelysimple to move from one standardto another.It is much
more difficult to live in a non-standardisedsituation,and to make the first-timemove to a standard.
Nor does it mattervery much if standardsare formallyagreedupon by standardisationbodies. What
is critical is that standardsshould neitherdiscriminatesoftware manufacturersfrom taking part in
competition,nor force softwareusers to be faithfulto any particularhardwareor softwarevendor.

USERS

STATISTICS

PROVIDERS

OBSERVATION
ADMINISTRATIVE
SYSTEMS

observations

anonymous

observations

o
metadadata.
register

OUTPUT DELIVERY

AGGREGATION

INPUT ACQUISITION

metodato
observations.

microdato.
ietodato

sto istics.
metodato

statistics.

microdto.
metodato

mocrodotoa
metadata

OF DATAAND METADATA
MANAGEMENT
BASE

REGISTERS

CODE REGISTERS

OBSERVATION

REGISTERS

MICRODATA
AND
METADATA

STATISTICS

COLLECTIONS

MACRODATA
AND
METADATA

STATISTICAL
DATABASE

Figure 5. Afunctionally orientedmodel of a database-orientedstatistical informationsystem.

Standard Components: Off-the-Shelf Software

Statisticaloffices were amongthe firstcompaniesandorganisationsto makesystematicuse of standardcomponents(e.g. generalisedsoftware)in the developmentof informationsystem applications.
Already during the sixties statistical offices startedto use commercially available and/or in-house
developed statistical packages for common statistical operationslike data editing, tabulation,and
statistical analysis. During the seventies some statistical offices could start reducing the number
This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTC
All use subject to JSTOR Terms and Conditions

32

B. SUNDGREN

of applicationprogrammers,encouragingsubjectmatterstatisticiansto develop (partof) their own


applicationsby means of high-level, non-procedural,generalisedsoftwaretools. This development
was intensifiedduringthe eighties.
With the advent of inexpensive PC technology and software, the boundarybetween "userprogramming"and "professionalprogramming"has become blurred-in statisticaloffices as well as in
the data processingcommunityat large.Majorcompaniesareclosing down theircentralapplication
developmentdepartments,advising business departmentsto use ready-madesoftwarepackages for
auxiliary functions, and to "puttogether"business-criticalapplicationsfrom softwarecomponents
that can be bought off-the-shelf from commercialsoftwarevendors.
Welke (1994) has predictedthat we shall see a paradigmshift in how informationsystems are
typically developed:
"Thereis a fundamentalparadigm shift underwayin how (information)systems and
the software which supports them, is developed. The shift is awayfrom a craft-based
structurein which user requirementsare specifiedand customsolutions developed,to
a market-productbased approach in which the users themselves select and arrange
meaningful-to-themcomponentsas a solution to their requirements."
The paradigmshift is likely to imply an even greaterfuturefor such things as
* inexpensive, generalisedsoftware,available"off-the-shelf"
* "tool-boxes"containinggeneralisedstandardcomponents
" rapidapplicationdevelopment(RAD) methodsand tools.
In connectionwith RAD, it shouldbe notedthattools for Computer-AssistedSystemsEngineering
(CASE) are likely to become more domain-specificthan today. Jackson(1994) has articulatedthe
importanceof domain-specificknowledgefor softwaredevelopment:
"The large aspiration to place the whole of software development... as one more
branchof engineeringis misconceived.Ouraspirationshould be to developspecialised
branchesof softwareengineering..."
" ... there are no casual buildersof cars or bridges. But in softwaredevelopmentit is
not easy to draw a clear line betweenthe casual developerand the serious, professional
developer As a result,... softwaredevelopmentis still largely an amateuractivity in a
very importantsense."
5

Metadata

There aremany potentialusers of statisticaldatain a modernsociety.Manyof themhave the competence as well as the hardwareand softwareresourcesneeded to take full responsibilityfor their
own usage of statisticaldata. They are eager, and sometimes impatient,to exploit the information
potential of statisticaloffices, and to do this on their own conditions-as far as permittedby confidentialityrestrictions.One majorobstacle, which often preventsthem from doing so, is the present
inadequacyof availablemetadata,that is, the absence or inadequacyof systematicdescriptionsof
statisticaldata and the processes behindthem.
A (potential)user of statisticaldatawill need metadatafor threemajorpurposes:
1. searchingfor potentiallyrelevantand useful statisticaldata;
2. evaluatingthe adequacyof availabledataand the cost/benefitof using them;
3. retrieving,interpreting,and analysingstatisticaldata.
First,statisticalmetadataare neededas a basis for searchoperations.The (potential)useris looking

This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTC
All use subject to JSTOR Terms and Conditions

MakingStatisticalDataMoreAvailable

33

for statistical data that could be relevantand useful for him in describing, analysing, or solving a
certainproblem.The traditionalapproachis for the user to turnto a statisticaloffice. Staff members
of statisticaloffices are often very helpful, but today this approachis not sufficient.There are far too
many potentialusers for any statisticaloffice to cope with face-to-face.In addition,many users need
to combine statisticaldata (and other data)from several sources, and no particularstaff member,or
even organisationalunit, of a statisticaloffice will have the necessary overview. Moreover,manual
help-functions are relatively expensive and slow, even if they are computer-assisted.Today a user
will expect the metadataneeded for search tasks to be organised and disseminatedin such ways
thathe himself can searchfor relevantdataon the basis of widely available,computerisedmetadata.
The process may start from a relatively vaguely expressed informationneed. The computerised,
metadata-supportedprocess should help the user to betterunderstandhis own needs, and it should
result in explicit referencesto availablestatisticaldata,which are likely to be relevantfor the user's
problem.
Second, once the user has identified some statisticaldata of potentialrelevance for his problem,
he will have to determine,if the data are really adequatefor the intendedpurpose.This means that
the user has to evaluatethe quality of the data, and to consider whetherit is really worth the effort
and cost to retrieve,interpret,and analyse the data.
Third,if and when the user has come to the conclusion thatcertainavailabledata are of sufficient
quality to justify the efforts and costs to use them, he will need metadatain order to actually
retrieve, interpret,and analyse the data. Retrievalmay be accomplishedby downloadingdata and
accompanyingmetadatato the user's own PC or by obtaininga disk or CD-ROMcopy. Interpretation
and analysis will require the same kind of metadataas were needed for making the preliminary
judgement of the qualityof the data. However,at this stage it may be necessaryto obtaindeeperand
more precise informationabout how the data were collected and processed, before they resulted in
the availablestatistics.
The documentationtempletin figure6 identifiesmetadataitemsthataredesirableor even necessary
as a basis for responsible usage of statisticaldata emanatingfrom a particularstatisticalsurvey. If
appropriatelycompiled with the correspondingmetadatafor other surveys they may also serve as a
basis for search operations.The survey documentationtemplet is partof the documentationsystem
SCBDOK, developed by StatisticsSweden. See also Sundgren(1991a, 1991b, 1992, 1993a, 1993b).
It is an equally importanttask for a statisticaloffice to producemetadataconcerningits surveys as
to producethe surveydatathemselves. In orderto be able to accomplishthis task in an efficient way,
the statistical office must carefully design its metadataflows. Metadatashould be capturedwhen
they naturallyarise for the first time, e.g. as the result of a design decision. At later stages it should
be possible to have them automaticallytransferredand transformedwhen surveydataaretransferred
or transformed.Furthermore,it should be possible to have the metadataconsistentlyupdated,when
the survey processes are changed, e.g. as the resultof new design decisions.
The metadatadescribinga statisticalsurvey and its data outputsare a combinationof formalised
metadata, e.g. code lists and record descriptions, and free-text metadatalike verbal descriptions
of variables and processes. Thus software systems for handling statistical metadatamay require
differenttypes of softwarecomponentsto be combined,e.g. relationaldatabasemanagementsystems
and software for managing and searching large amounts of text data. Hypertextsoftware (like in
advancedhelp functions and high-level Internet-tools)will also have a great potential for enabling
the users to navigate and associate in availablestatisticaldata and metadataand to process them in
efficient and intelligent ways.
This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTC
All use subject to JSTOR Terms and Conditions

B. SUNDGREN

34

TEMPLETFOR A STATISTICALSURVEY
DOCUMENTATION
0 Administrative information
0.0 Documentationtemplet
0.1 Survey name and identification,
organisationand persons responsible

1 Survey contents
1.1 Domainof interestand targetdomain,
verbaldescription
1.2 Targetdomain,formaldescription

0.2 Documentation modules and subsystems

andobjectgraph
1.2.1 Targetobjects,description

0.3 Archiveddata sets and publishedstatistics


0.4 References to other relevantdocumentation

populations
1.2.2 Target
variables
1.2.3Target

1.3 Surveyoutputs

1.3.1 Structuredoverviewof the tabulationplan


1.3.2 Publicationsin printedform
1.3.3 Electronicdistribution

1.3.4 Databasestorage

2 Survey plan
2.1 Frameprocedureand observationobjects
2.1.1 Overview
2.1.2 Frameanditslinkstoobjects

2.1.3 Frameproduction
2.1.4 Overcoverageand undercoverage

2.2 Sampling procedure (if applicable)


2.3 Data collection procedure
2.3.1 Observationobjects, descriptionand objectgraph
contactprocedures
2.3.2 Datasources,including
2.3.3 Observationvariablesand measurementinstruments

actionsatovercoverage)
2.3.4 Interruptions
(including
2.3.5 Non-response
actions

2.4 Planneddata preparation(coding,data entry,


editing and correction)
2.5 Planned observation register
2.5.1
2.5.2
2.5.3
2.5.4
2.5.5
2.5.6

Overview
Objecttypes, includingderivedobject types
Objectgraph
Object/variable-matrixes,
includingderivedvariables
Data set descriptions
Derivationprocedures(in complicatedcases)

4 Statistical processing and presentation


4.1 Observationmodels
4.1.1
4.1.2
4.1.3
4.1.4
4.1.5

Sampling
Non-response
Measurement/observation
Framecoverage
Totalmodel

4.2 Population models

4.3 Computationformulaeforestimations
4.3.1 Pointestimations
4.3.2 Estimationsof samplingerrors(varianceestimations)
4.3.3 Estimation/judgment
of otherqualitycharacteristics

3 Completed data collection


3.1 Frameproduction
3.2 Sampling
3.3 Data collection
3.3.1
3.3.2
3.3.3
3.3.4
3.3.5

Communicationwiththe data providers


Measurements,experiences of instruments
actionstaken
Interruptions/overcoverage,

causesandactionstaken
Non-response,

Editingand correctionat data collectiontime

3.4 Data preparation (coding, data entry,


editing and correction)

3.5 Production of final observation register


3.5P roduction of inalobserrup
v ationregisterobjec
objects
3.5. Treatment
of nointerresuption/overoverage
of partial
3.5.3 Treatment
non-response
countsofovercoverage, responses,
3.5.4 Frequency
non-responses etc
3.5.5 Completedderivationsof derivedobjects and
variables

5 Data processing system


5.0 System overview
5.0.1 Verbaldescription
5.0.2 System flow

5.1* Subsystem description


5.1.1 Overview
5.1.1.1 Verbaldescription
5.1.1.2 Systemflow
5.1.2 Componentdescriptions
5.1.2.1 Datasets
5.1.2.2 Processes
5.1.2.3 Othercomponents

4.4 Analyses
4.5 Presentationand disseminationprocedures
6 Log-book

Figure 6. Documentationtempletfor a statistical surveyand its productionsystem.


This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTC
All use subject to JSTOR Terms and Conditions

MakingStatisticalData MoreAvailable

35

6 Confidentiality
Statisticaldata can only be made availableto the users within the limitationsof certainconfidentiality restrictions. The most fundamentalpurpose of these restrictions is to preserve the data
provider's confidence in the statistics producer'swillingness and ability to ensure that data submitted to a statistics producerwill be used for statistical purposes only. Among other things the
statistics producermust be able to ensure that statisticaloutputswill not, thanks to the input submitted, directly or indirectly,enable a statisticsuser to associate sensitive informationwith the data
provideror anyone whom the dataproviderwould like to protect.
Statisticalconfidentialitycan only be ensuredby a combinationof technicalandlegislativeactions.
Advanced statistical and mathematicalmethods alone will never be sufficient, however sophisticated they may be. This has been clearly demonstratedby massive researcheffortsduringthe last 25
years. Basically, statisticalconfidentialityis aboutconfidence.A dataprovider,who does not trusta
particularstatisticsproducer,will not change his mindjust because the statisticsproducerpromises
to apply a "perfectlysafe" statisticalmethod,if therewere such a method (which there is not).
An adequatecombinationof technical and legislative rules for protectingthe confidentialityof
statisticaldata could be somethingalong the following lines:
* It should be forbiddenby law to use data submittedto a statistics producerfor other than
statisticalpurposes.
* Data submitted to a statistics producerfor statistical purposes should be protected against
sabotage, theft, and intrusionby physical and technical measures. Data that are associated
with identifiedsubjects(personsor organisations)mustbe handledonly by authorisedpersons,
"swornin" by the statisticaloffice.
* Statistical data must be anonymised(microdata)or aggregated(macrodata)before they can
be distributedto users outside the statistical office. Anonymised microdataand aggregated
macrodatamust be checked by the statisticsproducer,so that they do not contain "obvious"
disclosures of sensitive data for individual,easily identifiablesubjects (persons, enterprises
and otherorganisations).A disclosureis "obvious"if it does not requireany conscious effort.
* It should be forbiddenby law to make any conscious efforts to derive sensitive data about
identified,individualsubjectsfrom statisticaldata.
* It should always be less attractivefor a potentialintruder,who considersall costs and benefits,
to obtain informationabout identifiedsubjects from protectedstatisticaldata than to obtain
the same informationfrom some othersource.
* Statistical data that are not accompaniedby adequatedocumentation(metadata)should be
destroyed.
7

Experiences from Statistics Sweden

This paper has pointed to a numberof problems and opportunitiesthat need to be tackled by
a statistics producer,who wants to make statisticaldata more availableto a user, while satisfying
restrictionsgiven by scarce resourcesandthe willingness of dataprovidersto co-operate.The topics
covered were:
* the "fuzzy"concepts of user-orientationand user-friendliness
* standardinterfacesas instrumentsfor simplicity and flexibility
* standard,"off-the-shelf" software components as instrumentsfor speedy and inexpensive
applicationdevelopment
* good quality metadataenabling the user to retrieve and process data independentlyof the
producer
. technical and legislative measuresfor protectingthe confidentialityof statisticaldata.
This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTC
All use subject to JSTOR Terms and Conditions

36

B. SUNDGREN

Statistics Sweden is an example of a statistical agency, which has been working very actively
in all these areas over the last three decades. In the late 1960's and early 1970's Statistics Sweden
developed the TAB68 suite of high-level, non-proceduralsoftware products. These tools, which
covered many importantproductionsteps, e.g. editing and tabulation,became extensively used at
Statistics Sweden, first by non-programmersand then (after some initial hesitation) even by the
programmersthemselves. Many productionsystems are still heavily dependenton these software
products.
After gaining importantexperiencesfromusing the Canadiantime series databasesystem, CANSIM, Statistics Sweden developed its own AXIS system for makingcross-sectionaldata as well as
time series dataavailableon-line to internalandexternalusers.The systemwas put into regularoperation in 1976, and it is still runningsuccessfully,althoughmany users now demanddatato be made
availablein many otherways thanthroughrelativelyexpensiveandrigid mainframecommunication.
Duringthe next few years the system will be phasedout, and a new, client/serverbased system will
be phased in. The new system is entirely PC based; it makes extensive use of standardinterfaces,
e.g. SQL and GESMES, as well as a wide range of "off-the-shelf"softwareproducts,favouredby
internaland externalusers:
Figure 7 illustrateshow the new statisticaldatabasesystem at StatisticsSweden is intendedto cooperatewith the survey-basedproductionsystem within a client/serverframework.
The new databasesystem will make availablea lot of aggregatedmacrodata(time series as well
as cross-sectional), some anonymisedmicrodata,and the metadataneeded for efficient searching
and responsible interpretationand analysis by external users. Microdataand macrodatawill be
storedin SQL databases.At a laterstage object-orienteddatabasemanagementsystems (OODBMS)
and so-called on-line analyticalprocessing (OLAP) productsmay be consideredas alternativesor
complementsto SQL databasesfor certaintypes of usages.
The main sources of metadatawill be surveydocumentations,following the SCBDOKdocumentation templet shown in figure 6 above, complementedby productoverviews, quality declarations,
and some othertypes of documentation,which are availablefor statisticalproductsproducedwithin
the Swedish StatisticalSystem. The bulk of metadatawill be textual data with limited structuring.
These data are most likely to be handled as a text databaseby free text searchersand document
handling systems. A small but importantpart of the metadataare to be used for controlling the
operationof varioussoftwareproducts.These metadataneed to be storedin an SQL database,so that
they can be handledformally and automaticallycommunicatedand transformedbetween different
softwarecomponentsinside and outside the databasesystem.
The total size of the new statistical database,including metadata,macrodata,and anonymised
microdatamay turnout to be in the orderof 100 GB.
Many differentchannels will be utilised for disseminatingdata from the new statisticaldatabase
to the users, including self-service PCs in the premisesof StatisticsSweden, availablefor external
users, who want to down-load data and metadatafrom the statisticaldatabaseto their own storage
media, WorldWide Web (WWW) databases,CD-ROMproducts,diskettes,etc.
As for confidentialityproblemsconcerningstatisticaldata(anonymisedmicrodataandaggregated
datawith few contributors)the situationin Sweden has become dramaticallyimprovedfor bothusers
and producersas well as for dataprovidersthanksto new legislation,which criminalisesall attempts
to deriveidentifieddatafrom statisticaldata.The particularparagraphaboutthis in the Swedish Law
on Official Statisticsreads as follows:
"Officialstatistics must not be combined with other informationfor the purpose of
finding out the identityof individualsubjects."
In summary,on-going developmentswithin the Swedish StatisticalSystem providegood illustrations of the general principles that have been discussed in this paper.The practicalresults, which
This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTC
All use subject to JSTOR Terms and Conditions

MakingStatisticalData MoreAvailable

PROVIDERS

DATA

USERS

OF

37

AND

STATISTICS

branchof statistics,

register

observation
statistics

registers,
and

metadata

databasefunct
seo
information

USERS

OF

STATISTICS,

INTERNATIONAL
ORGAN

IZATIONS

Figure 7. Client-serverarchitectureof a system of statistical informationsystems.


This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTC
All use subject to JSTOR Terms and Conditions

38

B. SUNDGREN

havebeenachievedso far,indicatethatstatisticalofficeswill be ableto meetthe challengesfrom


the usersto makestatisticaldatamoreavailableby meansof moderntechnology,with due considerationto theinterestsof dataprovidersandthepublicatlarge.
References
Jackson,M. (1994). Problems,Methods,and Specialization,IEEESoftware.
Johannesson,P. (1993). Schema Integration,Schema Translation,and Interoperabilityin FederatedInformationSystems,
Universityof Stockholm.
Lebaube, P. (1991). EDI and Statistics-A Challenge for statisticians.In Proc 48th Session of the InternationalStatistical
Institute,Cairo.
Malmborg,E. (1986). On the Semantics of AggregatedData. In Proc. ThirdInt. Workshopon Statistical and Scientific
Database Management,Luxembourg.
Malmborg,E. (1992). Matrix-basedInterchangeof AggregatedStatisticalData. In Proc. SixthInternationalWorkingConference on Scientificand StatisticalDatabase Management,Ascona, Switzerland.
Malmborg,E. & Lisagor,L. (1993). Implementinga StatisticalMeta-Information
System. InEurostatConferenceon Statistical
Meta Information,Luxembourg,2-4 Feb. 93, also in StatisticalJournalof the UnitedNations UN/ECE2/1993.
Malmborg,E. & Sundgren,B. (1994). Integrationof StatisticalInformationSystems-Theory and Practice.In Proc. Seventh
InternationalWorkingConferenceon Scientificand StatisticalDatabase Management,Charlottesville,Virginia,USA.
Shoshani, A. (1982). StatisticalDatabases:Characteristics,Problemsand some Solutions. In Proc. 8th Int. Conf on Very
LargeData Bases.
Sundgren,B. (1973). An Infological Approachto Data Bases, StatisticsSweden, Urval Nr 7.
Sundgren,B. (1991a). StatisticalMetainformationand MetainformationSystems, Statistics SwedenR&D Report 1991:11;
also in StatisticalJournalof the UN/ECE2/1992.
Sundgren,B. (1991b). Whatmetainformationshouldaccompanystatisticalmacrodata?StatisticsSwedenR&DReport1991:9.
Sundgren,B. (1992). Organizingthe MetainformationSystems of a StatisticalOffice, StatisticsSwedenR&D Report1992:10;
also in the documentationfrom the UN/ECEWorksession on StatisticalMetadata1992 (METIS).
Sundgren,B. (1993a). StatisticalMetainformationSystems-pragmatics, semantics, syntactics. In EurostatConferenceon
Statistical Meta InformationSystems,Luxembourg;also in StatisticalJournalof the UN/ECE2/1993.
Sundgren, B. (1993b). Guidelines on the Design and Implementationof StatisticalMetainformationSystems, Statistics
Sweden R&D Report 1993:4. ECE Worksession on StatisticalMetadataNov. 1993, Revised versions 1994 and 1995.
UN/EDIFACT and Eurostat (1993). GESMES 93 Guidance to Users & Reference Guide (separate volumes), Eurostat,
Luxembourg.
UN/EDIFACT(1994). Raw Data ReportingMessage, Draftdocument.
Welke,R. J. (1994). The ShiftingSoftwareDevelopmentParadigm.In Proc. of the Baltic Workshopon NationalInfrastructure
Databases, Vilnius,Lithuania.

Resume
Les bureauxdes statistiques,peuvent-ils repondreaux demandesdes utilisateursde rendreles donnes statistiquesplus
accessible par les technologies modernes?Peuvent-ilsle fairesous les restrictionsimpos~espar le budgetet par l'inter8tdes
repondants?Ce sont des questionsadresseesici. Les problemeset les possibilitdssont illustr6spardes exemples de la Suede.

[ReceivedNovember,1995, acceptedNovember,1995]

This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTC
All use subject to JSTOR Terms and Conditions