Professional Documents
Culture Documents
DOI 10.1007/s10270-012-0252-1
REGULAR PAPER
Received: 5 July 2011 / Revised: 9 April 2012 / Accepted: 21 May 2012 / Published online: 23 June 2012
© Springer-Verlag 2012
Abstract Enterprise Architecture (EA) is an approach used ogy and information [4]. Architecture models constitute the
to provide decision support based on organization-wide mod- core of the approach and serve the purpose of making the
els. The creation of such models is, however, cumbersome as complexities of the real world understandable and manage-
multiple aspects of an organization need to be considered, able [5]. EA ideally aids the stakeholders of the enterprise to
making manual efforts time-consuming, and error prone. effectively plan, design, document, and communicate IT and
Thus, the EA approach would be significantly more prom- business related issues [6].
ising if the data used when creating the models could be As these models are intended to provide reliable manage-
collected automatically—a topic not yet properly addressed ment support it is imperative that they capture all the aspects
by either academia or industry. This paper proposes network of an organization which are of relevance. Thus, the mod-
scanning for automatic data collection and uses an exist- els often grow very large and contain several thousands of
ing software tool for generating EA models (ArchiMate is entities and an even larger number of relationships between
employed as an example) based on the IT infrastructure of these entities. The creation of such large models is often
enterprises. While some manual effort is required to make both costly and time consuming, as various stakeholders are
the models fully useful to many practical scenarios (e.g., to involved and many different pieces of information have to be
detail the actual services provided by IT components), empir- gathered. During the creation process, the EA models are also
ical results show that the methodology is accurate and (in its likely to become (partly) outdated [7]. Consequently, in order
default state) require little effort to carry out. to provide the best possible support, it needs to be ensured
that EA models both are holistic and reflect the organizations
Keywords Enterprise architecture · current state.
Automatic data collection · Network scanning Automatic data collection for model instantiation would
be preferable as this could mean a reduced modeling effort
and possibly an increased quality of the collected data. In
1 Introduction current EA initiatives, few approaches regarding data collec-
tion for the instantiation of models are proposed. In the most
In recent years, Enterprise Architecture (EA) has become an wide-spread EA frameworks, little discussion regarding data
established discipline for business and software system man- collection is available [2–4]. In the EA tools community,
agement [1]. EA describes the fundamental artifacts of busi- there are some approaches that are somewhat applied. That
ness and IT as well as their interrelationships [1–5], typically is, to either import models from third party software or allow
through dimensions such as business, application, technol- usage of SQL queries (or similar) in order to load information
from databases [8,9]. Both approaches however assume that
Communicated by Prof. Richard Paige. the modeled data is already available and updated (e.g., in
third party applications or databases). As the main problem
H. Holm (B) · M. Buschle · R. Lagerström · M. Ekstedt
generally is to collect this data in the first place and these
Industrial Information and Control Systems,
Royal Institute of Technology, 100 44 Stockholm, Sweden approaches have this as a pre-condition, significant manual
e-mail: hannesh@ics.kth.se effort is still required. In the research community, the focus
123
826 H. Holm et al.
has been on proposing methods and principles for model 2 Related works
creation and maintenance [10–13]. None of these academic
initiatives have, however, yet resulted in actual implementa- In the enterprise architecture community, there are few ini-
tion of automatic data collection for EA modeling, and none tiatives focusing on the data collection process for model
have validated their proposed approaches. instantiation and maintenance. Among the most well-known
This paper proposes the integration of network scanners frameworks, data collection is almost completely left to the
and EA tools in order to assist the data collection pro- modeler to handle. Some tool vendors provide support for
cess, especially for infrastructure assets, in EA modeling and data collection. However, in most cases, it is required that the
maintenance. The contribution presented in this paper is two- needed information is compiled somewhere else, implying
fold. First, to show that integrating an EA modeling tool with that the data must be collected by someone at some point in
a network scanner can provide automatic data collection for time. In the academic EA community, most researchers have
the modeling process. Second, to show that the information put their focus on deriving principles and designing methods
in these models are correct in respect to the reality it is sup- for model creation and maintenance. No researcher claims
posed to model. to have the focus on automatic data collection. Regarding
A pilot study of the proposed approach can be found in automated network scanners, there is no previous work that
[14]. This pilot study features automatic model generation has estimated their accuracy in terms of finding software and
using network scanning for a more specific EA metamodel, computer user accounts.
focusing on cyber security analyses. This pilot study some-
what illustrates the usefulness of the proposed approach for 2.1 EA frameworks
an EA metamodel closely related to the metamodel of the
scanner. However, it does not thoroughly explore the reli- There are many frameworks presenting and discussing EA
ability or validity of the approach, or how useful it is for a modeling [2–4]. However, none of these describe and discuss
more general EA metamodel. This work explores the use- the data collection process used when creating the architec-
fulness of the approach for a more general EA metamodel, ture models. No practical help is presented in these frame-
exemplified using ArchiMate, the arguably most wellknown works regarding the data collection for as-is models or for
EA metamodel. The analysis cover multiple key areas of updating already existing models (maintaining the architec-
EA modeling: (i) the comprehensiveness of the approach in ture).
terms of covered ArchiMate concepts, (ii) how the ambigu-
ity in ArchiMate is handled, (iii) how much effort that can 2.2 EA tools
be saved compared to manual modeling, (iv) how generated
models can be maintained, (v) how accurate gathered data In current EA tools, some approaches addressing automatic
are, and (vi) the validity of the approach for EA metamodels data collection can be found. The most common way is to
in general. import models that are made in third party software. For
Empirical data collected through an experiment estimates example, BizzDesign Architect [8] can import data from
the accuracy and effort required to generate EA models office applications and with this data instantiate models.
through the proposed approach. Aspects not statistically stud- Thereby, the automation aspect actually means that data is
ied are discussed accordingly. The Enterprise Architecture reused and does not need to be manually entered if it is already
Analysis Tool [15] was extended in order to enable model available. The interpretation of data documented in the third
transformation between automated network scanning and the party software can however be resource- and time consum-
ArchiMate metamodel. ing, thus contradicting parts of the purpose with automatic
The remainder of this paper is structured as follows: Sect. 2 data collection.
presents related work. In Sect. 3 enterprise architecture meta- Troux [9], for example, allows the usage of SQL queries
models are introduced and the metamodel of the Archi- in order to load information from databases. This approach
Mate language is described. The following Sect. 4 presents focuses on the extraction of the data model and process
the concept of network scanners. Next, in Sect. 5 a map- descriptions, thereby the automatic creation of the informa-
ping between the ArchiMate metamodel entities and network tion architecture as well as the business architecture.
scanner elements is proposed. The approach is studied by Both approaches assume that the data entered, in the
analyzing empirical data from an experiment, as presented third party applications or databases, is already available and
in Sect. 6. This section also contains the actual instantia- updated. However, this data still needs to be manually col-
tions made through the approach. Section 7 discusses the lected in the first place before it can be used. Thus, both
advantages and shortcomings of the approach, and guide- BizzDesign Architect and Troux can automatically instanti-
lines for practical usage of it. Finally, Sect. 8 concludes the ate models, but not automatically collect the information in
paper. the organization.
123
Automatic data collection for enterprise architecture models 827
MooD Business Architect [16] is another enterprise archi- tecture management. The idea is to connect IT management,
tecture modeling tool on the market. This tool has a com- System Operation, and Software Engineering. According to
ponent called Synchronization Activation Technology that the author, three major research challenges have to be met in
enables import and export of data between architecture mod- order to materialize this: (i) provide a coherent view of the
els and various data sources, for example, Microsoft Excel, quality status of the systems. (ii) Keep track of the quality
Microsoft SQL Server, and ODBC compliant databases. status as the systems evolve over time. (iii) Support the col-
However, all connections between architecture models and laboration of stakeholders for achieving the necessary quality
data sources need to be deployed and managed manually. level. Enterprise architecture models can be used to achieve
Sousa et al. [10] presents the Blueprint Management the first of the three challenges. Automatic data collection
System (BMS), a software tool and methodology, used for would be appropriate for the second challenge. In the paper,
collecting architecture information. The tool collects data Breu presents ten principles that are crucial for Living Mod-
from IT projects plans. This approach thus aims to pro- els. Principle no. 2—Close Coupling of Models and Code
vide automatic data collection for architecture models. The states: “Models are generated out of the code (e.g. archi-
approach is however still time consuming. Since the data doc- tecture models)”. This would mean automatic generation of
umentation process still needs to be formalized according to models at some architecture levels. Furthermore, Principle
their specific format. no. 3—Bidirectional Information Flow between Models and
ARIS Business Architect for SAP [17] supports the reuse Code, focus on the idea that information from code can be
and import of SAP process models out of the SAP Solution used to build models as well as information in models can
Manager. This is similar to the method proposed in the pres- be used to generate code. Throughout the ten principles pat-
ent paper as no manual interpretation of SAP data is needed terns and metamodel elements are discussed supporting these
to generate EA models. An issue for the approach is how- ideas. However, there is no tool today that can implement and
ever that not all organizations use SAP Solutions Manager. use these principles yet.
In addition, the SAP process model does only cover cer- In [20], the focus is on the design of the enterprise architec-
tain parts of the complete enterprise architecture; while other ture. The main result is a framework for engineering driven
aspects of EA, such as infrastructure, are not considered. The EA design and a software tool implementation of this frame-
method proposed in the present paper can be applied to next work. There is no however description of the data collection
to any organization and involves collection of, in particular, process for the instantiation of models. The design of EA in
infrastructure assets. Thus, the method used by ARIS Busi- [20] essentially concerns deriving the metamodel needed for
ness Architect can be seen as a potential complement to the an enterprise. The software tool implementation proposed
proposed approach. is a tool incorporating the framework and metamodel. In
There are other IT management tools, such as Configu- [21], the focus is on model maintenance and the main finding
ration Management Data Bases (CMDBs) [18,19], available reported in this publication is a discussion of the shortcom-
that like scanners can collect information in an organization. ings of existing model maintenance approaches. The authors
If these where to be integrated with an EA tool, the combi- present a federated approach to deal with these shortcomings.
nation could be used for instantiating architectural models. However, there is no discussion regarding the data collection
part of model maintenance.
2.3 EA methods and principles
2.4 Accuracy of automated network scanners
Buckl et al. [13] presents a Wiki-based approach to enter-
prise architecture documentation and analysis. According to There is, to the authors knowledge, no previous work that has
Buckl et al. companies who start an enterprise architecture studied the accuracy of automated network scanners’ regard-
initiative usually do not have a pre-defined information model ing the detection of software and user accounts. There are
for this. Many companies start with regular spreadsheets or however two studies on the topic of detecting vulnerabilities.
similar. Instead, Buckl et al. propose a Wiki-based approach That is, Holm et al. [22] analyzed how many existing true vul-
for collecting and sorting the information needed for enter- nerabilities that are properly detected by scanners, and how
prise architecture management. The main benefit with the many non-existing vulnerabilities that are falsely reported by
Wiki-approach is that the data collection is distributed but them. In another study, Holm [23] also analyzed how many
still managed formally. Although the Wiki-based approach vulnerabilities that would be removed if one would follow all
proposed seems interesting there is still a need for data col- guidelines provided by vulnerability scanners’ (oftentimes
lection to provide input to the Wiki. thousands of pages). None of these studies, however, evaluate
In [12], an approach for handling change is proposed. The the scanners’ accuracy in terms of finding software and user
approach is called Living Models and it is based on theories accounts—an objective of great importance towards measur-
of model based software development and enterprise archi- ing the value of the proposed approach.
123
828 H. Holm et al.
2.5 Analysis and conclusions concerns regarding their business and the supporting IT
systems. ArchiMate is extensively presented in [4] and is
The EA frameworks available today, such as TOGAF and partly based on the ANSI/IEEE 1471-2000, Recommended
the Zachman framework, provide very little data collection Practice for Architecture Description of Software-Intensive
support. The research initiatives have ideas and suggestions Systems, also known as the IEEE 1471 standard [30]. The
for principles and methods regarding data collection support, Open Group accepted the ArchiMate metamodel as a tech-
but none has implemented these ideas in any EA modeling nical standard [31] and as a part of The Open Group Archi-
tool. When it comes to available EA modeling tools some, tecture Framework (TOGAF) in 2009 [2].
like Troux and BizzDesign Architect, help the instantiation The ArchiMate metamodel consists of three layers; the
of models based on data already existing in other softwares Business layer, the Application layer and the Technology
or data bases. Thus, these do not help in collecting data in layer. Where the technology supports the applications, which
the organization, they rather make use of data already avail- in turn support the business. Each layer consists of a number
able in other tools. Comparable to network scanners, CMDBs of entities and defined entity relationships.
also collect information about applications and their rela- The entities in each layer are categorized into three aspects
tions. However, we are aware of no approach that has inte- of enterprise architecture: (i) The passive structure—mod-
grated a CMDB tool with an EA modeling tool. The only eling informational objects. (ii) The behavioral structure—
comparable work found is the Blueprint Management Sys- modeling the dynamic events of an enterprise. (iii) The active
tem (BMS) that collects data from IT project plans in an orga- structure—modeling the components in the architecture that
nization. It is however unclear how accurate that the retrieved perform the behavioral aspects.
information is and how the information in these IT project Figure 1 presents the ArchiMate metamodel. An overview
plans is gathered and entered in the first place. For organi- of the different types of relations in the language can be found
zations using SAP systems, Aris Business Architect can aid in Fig. 2; the entities utilized in the paper are described in
in the collection and modeling on a business process level. Sect. 5.2. For detailed descriptions, cf. [4].
Thus, covering some of the EA layers and not the same layers
as covered by the scanner and tool presented in this paper.
These two might thus be complements to each other in order
to provide a more complete EA model. 4 Automated network scanning
123
Automatic data collection for enterprise architecture models 829
Product Value
Business
Application
Data Object
Application Application
Function/ Component
Interaction
Application
Technology
Infrastructure Infrastructure
Service Interface
Fig. 1 The ArchiMate metamodel in the notation that was suggested in [4]
A network scan can be either authenticated or unauthen- deployed services as they do not need to probe as much
ticated. During an authenticated scan the scanner is given and thus are less intense. However, it is not always the
authentication parameters (i.e., credentials) of systems to case that credentials are readily available for the individ-
enable more detailed and presumably also more accurate ual(s) performing a scan and it can be seen as intrusive
scans. Authenticated scans are typically less disruptive on as the systems’ local files are probed. This study assess
123
830 H. Holm et al.
System information
MAC address 000C290326CC
IP address 173.18.3.1
User information
User accounts on system John Doe
Software
Operating system type and version Windows XP SP2
Application server (i.e. end-point) port, protocol, type and version 80, HTTP, Apache HTTP Server 2.2.1
Application client type and version Adobe Reader 9.0
123
Automatic data collection for enterprise architecture models 831
Fig. 4 Definition of the mapping within the tool; the ArchiMate metamodel can be found to the left, the XSD describing NeXpose reports to the
right, and the specified mapping in the center
Each scanner has a slightly different way of denominating Manager, and departments rather than computer user
concepts in the XSD, however, the main characteristics of a accounts. Manual effort is needed to fulfill this type of
XSD stay the same. Part of the XSD utilized by the scanner scenario. Naturally, more complex requirements need more
NeXpose can be seen in Fig. 5. The model transformation is manual work for specifying the ruleset of the model trans-
carried out through specifying what aspects of the XSD that formation.
should be mapped to what concepts in ArchiMate. Figure 4 The rest of this section describe the concepts of the Archi-
illustrates how the mapping specification is done within the Mate metamodel that can be mapped to the output of an auto-
tool. Once a mapping has been specified the scanner output mated network scanner. The translation is due to the purpose
is parsed using the Document Object Model and thereaf- of this study carried out through the viewpoint of ArchiMate.
ter queried using a self-developed algorithm. Based on the ArchiMate relations can be derived to connect all entities
result of the querying, the instantiated EA model is created. modeled except business actors; the scanner’s metamodel
An example translation, utilized in the present study, is given relates business actors to devices and system softwares, and
in Sect. 6.1. as ArchiMate prescribes relating business actors to services
this study does not consider it an adequate representation of
5.2 Translated concepts ArchiMate’s metamodel.
A Business Actor is an organizational entity capable of
The key characteristic in terms of translating contents of (actively) performing behavior [4]. A scanner collects all user
the XSD and XML to the ArchiMate metamodel is that of accounts of computer systems. It is possible to relate these
ambiguity. That is, the concepts of ArchiMate are, as other different actors to, for example, departments. However, such
enterprise architecture metamodels, possible to interpret in a mapping would naturally require additional effort from the
many different ways. As a consequence, one type of map- modeler performing the translation.
ping can be useful for one purpose and useless for another. An Application Component is a modular, deployable, and
For example, some enterprises might not be interested in replaceable part of a system that encapsulates its contents
modeling software such as Adobe Reader and Apache Web- and exposes is functionality through a set of interfaces [4].
server, or computer system accounts such as “John Doe”. A scanner collects data on various application components
Such an enterprise might only need data about larger appli- such as different ERP system modules and application clients
cations such as SAP Solution Manager and Oracle Enterprise such as Adobe Reader.
123
832 H. Holm et al.
An Application Interface declares how a component can A Network is a physical communication medium between
connect with its environment [4]. If a scanner finds an Appli- two or more devices [4]. As network scanners provide data
cation Component running on an end-point (i.e., a port) it regarding IP addresses of computer systems it is possible to
will provide information about how it communicates (e.g., detail networks.
type of protocol). All categories of the ArchiMate Structural aspects are thus
A System Software is a software environment for specific possible to map, and one of three parts of the metamodel’s
types of application components and data objects that are Behavioral aspects. All but one of these entities (System Soft-
deployed on it in the form of artifacts [4]. A scanner iden- ware) belong to the Structural aspects—which also is quite
tifies several types of system software such as web servers logical, considering that an automated scanner only gathers
and operating systems. information available through remote queries on a computer
An Infrastructure Interface is a point of access where the network.
functionality offered by a node can be accessed by other
nodes and application components [4]. A network scanner
details the protocol (e.g., SMTP) and port (e.g., 25) which a 6 Empirical study of the proposed approach
software end-point utilizes. It is also used to relate an oper-
ating system the software employed on it. This section describes how the proposed approach was tested
A Device is a physical computational resource upon which through an experiment on an actual network. Due to the ambi-
artifacts may be deployed for execution [4]. A network scan- guity of the ArchiMate metamodel (cf. Sect. 5.2) there is a
ner provides information about the hardware address and IP need to define the actual translation that is employed. The
address of a system. translation which is used for the empirical study is described
123
Automatic data collection for enterprise architecture models 833
in Sect. 6.1. Section 6.2 provide details regarding the exper- A Business Actor is translated as a user account on a com-
imental setup. Section 6.3 analyzes the reliability of the pro- puter system. An application server is translated to a System
posed approach. Sections 6.4 and 6.5 provide descriptive Software (end-point). An Application Component is mod-
results about the models created during authenticated and eled as an application client residing on a scanned system
unauthenticated scans of the network. The scanner NeXpose (for example, Adobe Reader 7.0). A System Software (OS) is
[33] was chosen as it has demonstrated good results in previ- seen as the actual operating system employed on the probed
ous tests [22]. However, as almost all scanners have very sim- system. An Infrastructure Interface is modeled as the appli-
ilar network scanning methodologies and signatures (built on cation protocol of a System Software (end-point). It is also
Nmap [34]) we believe that this study should be indicative of needed to relate different Application Components to System
the accuracy of other available automated network scanners Software (OS). That is, a scanner does not detail the spe-
as well. cific Infrastructure Interfaces between different Application
Components and System Softwares (OS)—but it is needed to
connect them. A Device is described through its IP and MAC
6.1 An example translation address. A Network is a range of IP addresses found during
the scan. For example, if the scanner finds IPs 172.18.3.2,
This paper provides an example mapping that illustrates the 172.18.3.5 and 173.18.4.5 then two Network entities would
usefulness of the proposed approach. The exemplified trans- be instantiated—172.18.3.* and 172.18.4.*.
lation covers all ArchiMate related information that a scanner A more detailed version of the mapping between the
can provide, given that the a minimal manual effort is utilized NeXpose XSD and the ArchiMate implementation in the
to perform the translation. That is, there has been no manual software tool used for the model transformation (cf. Sect. 5.1)
effort conducted to, for example, detail any services provided can be seen in Table 3. For example, a Device is instanti-
or to relate computer user accounts into groups. As a conse- ated by the XSD concept “node (nodeType)”, denominated
quence, it can be seen as a general type of mapping which through two of the attributes of this concept; “address” and
requires very little effort to perform. There was no focus “hardware-address”. An overview of all relations instanti-
placed on more enterprise-critical systems such as ERP sys- ated can be seen in Table 4. The directions of the relations
tems simply because this was not available in the experiment are described through their source entities and target entities,
network. The translation between scanner output (XML) and using both ArchiMate terminology and NeXpose XSD ter-
ArchiMate input was handled through the software tool pre- minology. The reader is referred to Sect. 3 for information
sented in Sect. 5.1, but any model transformation tool capable about these types of relations.
of translating XML to ArchiMate concepts would suffice.
A summary of the utilized mapping can be seen in Table 2.
This mapping was made from the perspective of the Arch- 6.2 The experimental setup
iMate metamodel as the purpose of the study is to evaluate
how enterprise architecture modeling can benefit from auto- The main experimental setup was designed by the Swedish
mated network scanning. Six different ArchiMate entities can Defence Research Agency with the support of the Swed-
be automatically generated based on this framework, namely: ish National Defence College. The environment was set to
Business Actor, Infrastructure Interface, System Software describe a simplified critical information infrastructure at a
(Operating System and end-point), Application Component, small electrical power utility and was composed of 20 phys-
Device, and Network. Relations can be instantiated to con- ical computer servers running a total of 28 virtual machines,
nect all entities except business actors, for reasons described divided into four network segments. Various operating sys-
in Sect. 5.2. tems and versions thereof were used in the network, e.g.,
123
834 H. Holm et al.
123
Automatic data collection for enterprise architecture models 835
Windows XP SP2, Debian 5.0 and Windows Server 2003 and a p-value of less than 0.05 is a commonly used refer-
SP1. Each host had several different network services oper- ence value for claiming that there is a significant difference
ating, e.g. web-, mail-, media-, remote connection- and file between the compared sources of data. A p value of less than
sharing services. More information about the environment 0.05 implies that there is less than 5 % probability that the
can be found in [35,36]. assessed differences between two or more sources of data are
due to random variation.
6.3 Data quality provided by the scanner As can be seen in Table 5, authenticated scanning is
equally accurate or more accurate than unauthenticated scan-
Accuracy is something of great importance towards the ning in all aspects. Application Components show extremely
reliability of the proposed approach. If the output from an significant differences due to that client-side software (the
automated network scanner is erroneous then the resulting tested type of Application Component) cannot be assessed
generated EA model(s) will not be reliable. Some aspects without giving the scanner credentials. The authenticated
can however be more important than others. For instance, it scan is also clearly better suited for assessing version num-
is typically of great importance that the correct applications bers of System Software, but fairly similar when it comes to
and systems are identified; but it might not be important that the type of System Software. The statistical tests support this
their actual version numbers are correct. assessment; there were significant differences when assess-
This study assesses the accuracy of Devices, System Soft- ing System Software (OS version) ( p = 0.0089) and System
wares, Application Components and Infrastructure Inter- Software (end-point version) ( p < 0.001) but no significant
faces. The accuracy of Business Actors was unfortunately differences for System Softwares (OS) (100 % for both types
not possible to study due to that the virtual images of the stud- of scans) and System Software (end-point) ( p = 0.27).
ied systems were mistakenly manipulated before many users
had been manually logged. However, the difference between
the number of generated Business Actors during authenti-
6.4 Model created using an authenticated scan
cated (cf. Sect. 6.4) and unauthenticated (cf. Sect. 6.5) scans
display the relative difference between the two scan types.
An authenticated automated scan was performed on the archi-
Also, the accuracy of the Network entity was not assessed as
tecture described in Sect. 6.2 using NeXpose. The resulting
all the Devices in the network were part of the same LAN—
model consists of 110 Application Components, 335 Infra-
thus only generating a single Network entity (with relations
structure Interfaces, 890 Business Actors (the large major-
to all devices). Several important concepts not available in
ity of these actors on the server systems, e.g. a dns server,
the current ArchiMate metamodel were also analyzed: as
a web server and a mail server), 28 Devices, 195 System
it can be valuable for enterprises to know the versions of
Softwares (of which 28 for operating systems), 1 Network,
their software (e.g., for patch management), the accuracy
and 679 relations between them. These entities and relations
of System Software versions and Application Component
were instantiated through the mapping described in Sect. 5.
versions were also assessed. All Devices, System Softwares
Figure 6 graphically illustrates the result for one of the sys-
(OS), System Software (end-point) and Infrastructure Inter-
tems (172.18.2.15). While the complete model is too large
faces in the experimental setup were analyzed. Studied client-
to display in the paper, the interested reader is welcome to
side Application Components were however selected from
receive it through contact with the authors.
the entire pool of available software clients through simple
random sampling as there were too many to study all such
components.
An accurate assessment is one that is correct, i.e., if a 6.5 Model created using an unauthenticated scan
system has the System software Windows XP with Service
Pack 2 and it is identified as Windows XP Service Pack 1 The generated model from an unauthenticated automated
then it will be accurate in terms of System Software (OS), but scan can be seen in Fig. 7 (the Device 172.18.2.15). The
inaccurate in terms of System Software (OS version). Thus, resulting model consists of 227 Infrastructure Interfaces, 106
the accuracy of the different variables are the mean values Business Actors, 28 Devices, 198 System Softwares (28 of
of independent sequences of correct/incorrect (0/1) answers. which were operating systems), 1 Network, and 462 relations
These characteristics make it reasonable to assume binomial between them. As for the model generated by the authenti-
distributions, fulfilling the requirements for performing two- cated scan, the interested reader is welcome to contact the
tailed hypotheses testings of the assessed results. When a authors. While the accuracy of Business actors could not be
p value is mentioned in this chapter it refers to results from evaluated, it is clear that the authenticated scan is much more
two-tailed hypothesis tests [37]. The p value is used to potent in this aspect (890 compared to 106 generated Busi-
explain the statistical difference between sources of data, ness Actors).
123
836 H. Holm et al.
Table 5 Accuracy of unauthenticated and authenticated scans for different types of automatically assessed ArchiMate related data
Variable Accuracy unauth. (%) Accuracy auth. (%) p value Samples
123
Automatic data collection for enterprise architecture models 837
require various effort to model and effort does not likely vious one; each new scan generates a completely new model.
scale linearly with entities/relations), it clearly highlights the This is an issue which we aim to address in the future.
applicability of automatic modeling—it is simply less time- Automated network scanning only generates models rep-
consuming than manual modeling. A notable limitation of the resenting single points in time This problem is however pres-
current implementation of the proposed approach is however ent also for manual approaches—but in a much more evident
that the result from a new scan cannot be appended to a pre- way as it is more resource demanding and thus unlikely to be
123
838 H. Holm et al.
123
Automatic data collection for enterprise architecture models 839
Validity and reliability of the field test The network archi- (cf. Sect. 5.1) capable of reducing large amounts of effort.
tecture tested was a virtual environment. Using a virtual As an automated network scanner is used by most organi-
environment can decrease performance, among other things zations and some scanners are available for free (e.g. [34])
packet loss. This problem is however mainly evident in very the required resources to implement the proposed approach
large virtual environments and not small subnets as evaluated should be small.
in this study, and should thus be a minor issue [39–41]. This
study can only evaluate the automated scanners ability to
assess the entities in the studied network architecture, which 8.1 Future work
only covers a very small amount of the operating systems,
external-, and local application services that are currently There are however some issues with the proposed approach
available on the market. The different products implanted that should be further researched. Different enterprises (and
in the network architecture were however of diverse nature. environments) will likely require various degrees of man-
Thus, we believe that this study gives a good hint towards ual effort in terms of specifying the mapping and manag-
the general accuracy of an automated network scan. A short- ing generated models. One method for decreasing the effort
coming of the experiment is that it did not evaluate the effort for the modeler could be to create a set of common default
required to specify large scale applications and application mappings that could be more easily manipulated to suit the
suits such as customer relationship management systems, needs of different contexts. Another option could be to have
geographical information systems, and asset management a detailed guide for usage of the model (some foundations
systems. This delimitation was chosen as there were no such for such a guide could be the contents of Sect. 7). We believe
systems in place in the experiment network. In terms of accu- that the optimal solution lies with a mix of these methods,
racy this delimitation should have a very minor influence; but that it depends on the general requirements of differ-
there is no reason to believe that there is a difference in accu- ent enterprises and the manual effort needed to fulfill these
racy for such components compared to what was addressed requirements. That is, if most enterprises have a few key
during the experiment. The required effort depends much requirements that need major effort to manage, it would be
on the requirements by the modeler. That is, it would not beneficial to have default mappings that fulfills these needs.
take more than a minute’s effort to only generate ArchiMate Similarly, if the requirements by each enterprise are unique,
concepts for e.g. the vendor SAP; whereas relating different most effort should be spent to specify a user-friendly guide
user accounts to departments might be very time consuming for mapping. In future work we aim to analyze these aspects
(depending on the amount of users and departments). through case studies at different enterprises.
In terms of future work for automatic generation of EA
models in general, there are various aspects that are of impor-
8 Conclusions tance. A number of significant topics have been discussed in
this paper: (i) how comprehensive the approach is (e.g., in
This paper proposes a method for automatic generation of terms of covered ArchiMate concepts), (ii) how the transla-
EA models with respect to the complex IT architectures tion is managed (e.g., in terms of ambiguity), (iii) how much
of enterprises. A previous study tested the approach on an effort that is required (and can be saved), (iv) how generated
EA metamodel for cyber security analysis [14]. The pres- models can be maintained, and (v) how accurate the gath-
ent study mapped the metamodel of ArchiMate to the output ered data are. Future work should focus on aspects that are
of automatic network scanners. The proposed method was cumbersome to model and maintain manually, yet available
empirically investigated through studying several different to collect with little effort in most enterprises. As a first step
variables: (i) how reliable results the method provides, (ii) to enable such work, a collection of commonly required data
how much of the metamodel context that is captured and (iii) that are cumbersome to model should be compiled. This col-
how resource efficient the proposed method is. The proposed lection could then be used to identify the key areas in terms
method offers reliable results, especially when the scanner of automatic data collection support. As discussed in the pre-
is given system credentials. The generated entities can rep- vious paragraph, it could also enable default mappings that
resent different ArchiMate interpretations, depending on the could significantly improve the usability of existing solutions
stated requirements (cf. Sect. 5). such as MooD Business Architect [16].
There are several implications of this study, of which the Future work would also benefit from mapping to a holis-
arguably most important are: This study clearly displays tic data collection methodology. A common standard would
the need for, and applicability of, automatic data collec- enable comparisons of different approaches in terms of, for
tion for EA models. Furthermore, it provides both academia example, EA metamodel comprehensiveness, data sources,
and industry with a readily available model transformation and actors required to involve when gathering and maintain-
tool for the purpose of automatic generation of EA models ing data. Unfortunately, there are no such accepted standard
123
840 H. Holm et al.
as of yet. While a methodology such as Living Models 20. Aier, S., Kurpjuweit, S., Saat, J., Winter, R.: Enterprise archi-
[12,42] (cf. Sect. 2.3) could become a standard for catego- tecture design as an engineering discipline. AIS Trans. Enterp.
Syst. 1(1), 36–43 (2009)
rization of automatic data collection methods, it (and others 21. Fischer, R., Aier, S., Winter, R.: A federated approach to enter-
like it) is currently not tailored for categorizations. Thus, prise architecture model maintenance. Enterp. Modell. Inf. Syst.
future research should also address categorization of data Archit. 2(2), 14–22 (2007)
collection methods for EA models. 22. Holm, H., Sommestad, T., Almroth, J., Persson, M.: A quantita-
tive evaluation of vulnerability scanning. Inf. Manage. Comput.
Secur. 19(4), 231–247 (2011)
23. Holm, H.: Performance of automated network vulnerability scan-
ning at remediating security issues. Comput. Secur. 31(2), 164–
175 (2012)
References 24. Johnson, P., Ekstedt, M.: Enterprise Architecture—Models and
Analyses for Information Systems Decision Making, Studentlit-
1. Ross, J.W., Weill, P., Robertson, D.: Enterprise Architecture As teratur (2007)
Strategy: Creating a Foundation for Business Execution. Harvard 25. Lagerström, R., Johnson, P., Ekstedt, M.: Architecture analysis
Business School Press, Boston (2006) of enterprise systems modifiability—a metamodel for software
2. The Open Group: The Open Group Architecture Framework change cost estimation. Softw. Qual. J. 18, 437–468 (2010)
(TOGAF), version 9, The Open Group (2009) 26. Närman, P., Johnson, P., Nordström, L.: Enterprise architecture: a
3. Zachman, J.A.: A framework for information systems architecture. framework supporting system quality analysis. In: Proceedings of
IBM Syst. J. 26, 276–292 (1987) the International Annual Enterprise Distributed Object Computing
4. Lankhorst, M.M.: Enterprise Architecture at Work: Modelling, Conference, pp. 130–142 (2007)
Communication and Analysis, 2nd edn. Springer, Berlin (2009) 27. Ullberg, J., Lagerström, R., Johnson, P.: A framework for service
5. Winter, R., Fischer, R.: Essential layers, artifacts, and dependencies interoperability analysis using enterprise architecture models. In:
of enterprise architecture. J. Enterp. Archit. 3, 7–18 (2007) IEEE International Conference on Services Computing, pp. 99–107
6. Kurpjuweit, S., Winter, R.: Viewpoint-based meta model engineer- (2008)
ing. In: Enterprise Modelling and Information Systems Architec- 28. Franke, U., Flores, W.R., Johnson, P.: Enterprise architecture
tures (EMISA 2007) dependency analysis using fault trees and bayesian networks. In:
7. Aier, S., Buckl, S., Franke, U., Gleichauf, B., Johnson, P., Proceedings of 42nd Annual Simulation Symposium (ANSS), pp.
Närman, P., Schweda, C., Ullberg, J.: A survival analysis of appli- 209–216 (2009). http://www.scs.org
cation life spans based on enterprise architecture models. In: 3rd 29. Gustafsson, P., Höök, D., Ericsson, E., Lilliesköld, J.: Analyzing
International Workshop on Enterprise Modelling and Information IT impacts on organizational structure—a case study, In: Port-
Systems Architectures, Ulm, Germany, pp. 141–154 (2009) land International Center for Management of Engineering and
8. BiZZdesign: BiZZdesign Architect. http://www.bizzdesign.com Technology (PICMET) Conference Proceedings, pp. 3197–3210
(2011). Accessed on March 2011 (2009)
9. Troux Technologies: Metis. http://www.troux.com/products/ 30. IEEE: 1471–2000—IEEE Recommended Practice for Architec-
(2011). Accessed on March 2011 tural Description for Software-Intensive Systems (2000). http://
10. Sousa, P., Lima, J., Sampaio, A., Pereira, C.: An approach for cre- standards.ieee.org
ating and managing enterprise blueprints: a case for it blueprints, 31. The Open Group: ArchiMate 1.0 Specification (2009). http://www.
Advances in Enterprise Engineering III. Lecture Notes Bus. Inf. opengroup.org/archimate
Process. 34, 70–84 (2009) 32. Manzuik, S., Pfeil, K., Gold, A., Gatford, C.: Network security
11. Hafner, M., Winter, R.: Processes for enterprise application archi- assessment: from vulnerability to patch, Syngress (2006)
tecture management. In: Proceedings of the 41st Hawaii Interna- 33. Rapid7: Nexpose. http://www.rapid7.com (2011)
tional Conference on System Sciences, pp. 396–406 (2008) 34. Network mapper: Nmap. http://nmap.org (2011)
12. Breu, R.: Ten principles for living models—a manifesto of change- 35. Hammervik, M., Andersson, D., Hallberg, J.: Capturing a cyber
driven software engineering. In: International Conference on Com- defence exercise. In: Proceedings of the Symposium on Technol-
plex, Intelligent and Software Intensive Systems, pp. 1–8 (2010) ogy and Methodology for Security and Crisis Management, p. 36
13. Buckl, S., Matthes, F., Neubert, C., Schweda, C.M.: A wiki-based (2010)
approach to enterprise architecture documentation and analysis. 36. Geers, K.: Live fire exercise: preparing for cyber war. J. Homeland
In: 17th European Conference on Information Systems, pp. 1–13 Secur. Emerg. Manage. 7(1), 74 (2010)
(2009) 37. Warner, R.: Applied statistics: from bivariate through multivariate
14. Buschle, M., Holm, H., Sommestad, T., Ekstedt, M., Shahzad, K.: techniques. Sage Publications, Inc, Thousand Oaks (2008)
A tool for automatic enterprise architecture modeling. In: Proceed- 38. Närman, P., Holm, H., Johnson, P., König, J., Chenine, M.,
ings of the CAiSE Forum 2011, pp. 25–32 (2011) Ekstedt, M.: Data accuracy assessment using enterprise architec-
15. Buschle, M., Ullberg, J., Franke, U., Lagerström, R., Sommes- ture. Enterp. Inf. Syst. 5(1), 37–58 (2011)
tad, T.: A tool for enterprise architecture analysis using the PRM 39. Ye, K., Jiang, X., Chen, S., Huang, D., Wang, B.: Analyzing and
formalism. In: CAiSE2010 Forum PostProceedings, pp. 108–121 modeling the performance in xen-based virtual cluster environ-
(2010) ment. In: 2010 12th IEEE International Conference on High Per-
16. MooD International: MooD Business Architect. http://www. formance Computing and Communications, IEEE, pp. 273–280
moodinternational.com/ (2012). Acessed on March 2012 (2010)
17. Software AG: ARIS for SAP. http://www.softwareag.com/corpo 40. McDougall, R., Anderson, J.: Virtualization performance: per-
rate/products/aris_platform/aris_implementation/aris_sap (2011) spectives and challenges ahead. ACM SIGOPS Oper. Syst.
18. Lokomo Systems AB: OneCMDB. http://www.onecmdb.org Rev. 44(4), 40–56 (2010)
(2011) 41. Wang, G., Ng, T.: The impact of virtualization on network perfor-
19. FrontRange Solutions: FrontRange CMDB. http://www.front mance of amazon ec2 data center. In: INFOCOM, 2010 Proceed-
range.com/cmdb.aspx (2011) ings IEEE, IEEE, pp. 1–9 (2010)
123
Automatic data collection for enterprise architecture models 841
42. Farwick, M., Agreiter, B., Breu, R., Ryll, S., Voges, K., Hanschke, tainability, IT-management, also he is a co-author of the book Enterprise
T.: Automation processes for enterprise architecture management. Architecture: Models and Analyses for Information Systems Decision
In: 2011 15th IEEE International Enterprise Distributed Object Making. Robert is a partner and consultant at Management Doctors, an
Computing Conference Workshops, IEEE, pp. 340–349 (2011) IT-management consultancy firm.
123