You are on page 1of 4

OPC UA Hardware Offloading Engine as dedicated

peripheral IP Core
Chris Paul Iatrou Leon Urbas
Technical University of Dresden Technical University of Dresden
Chair for Process Control Systems Engineering Chair for Process Control Systems Engineering
Email: Chris_Paul.Iatrou@tu-dresden.de Leon.Urbas@tu-dresden.de

Abstract—OPC UA is a promising candidate for achieving II. A RCHITECTURAL OVERVIEW


a vertical semantic integration of field devices in the next
generation of industrial automation topologies. Microprocessing The OPC UA hardware server can be divided into three
platforms embedded in sensors and actor do however not provide key components (fig. 1a): A hierarchic set of coders, a
the memory and computing resources required to integrate coder interconnecting bus fabric and a data access component
OPC UA communication stacks. To enable the usage of OPC UA handling information stored in the namespace. The hierarchic
on limited platforms, this article introduces a dedicated, highly structure is derived from the OPC UA-Binary protocol, divided
scalable hardware server stack, synthesizable on 75.800 μm2
using 28 nm CMOS technology as well as FPGAs, that can into the transport layer, a security- & sequencing layer and a
process OPC UA as a peripheral component. services layer. Transport layer coders exchange data streams
with OSI1-4 component. The design includes connection-
dedicated TCP/IP offloading engines, making package pro-
I. I NTRODUCTION cessing deterministic without implying real-time capability on
the side of the underlying Ethernet fabric. Each coder is able
OPC Unified Architecture (OPC UA ) is an industrial com-
to process one specific OPC UA data stream format, e.g. that
munications protocol with object oriented information meta-
of a read service request. The number and type of coders
modeling capabilities. Aside from accessing data values, the
present in each layer is variable at design time. Coders between
relationships between instances and types convey semantic
hierarchies are interconnected using a bidirectional, packet
information pertaining to server and data. [1], [2] These
based, connection oriented bus fabric. Additionally, coders of
attributes make OPC UA a prime candidate as an information
the service layer are able to communicate with the namespace
carrier for IoT and next generation industrial communications.
engine, requesting information about attributes or relations of
[3] Due to the protocols object oriented design, it does
nodes stored in the binary namespace image. The namespace
however not scale well to computing platforms with limited
engine also provides for method call interrupts, value updating
memory and computing resources. This limitation particularly
and subscriptions.
impacts field devices such as sensors and actors, which are
vital information carriers in any automation process.[4] At- A. Service Coder Units
tempts to implement software stacks on common and industrial The OPC UA-Binary encoding is a serialized representation
microcomputing platforms result either in OPC UA servers individual data fields. The encoding rules for 37 base types
with severely limited capabilities [5] or non-specification con- are explicitly specified by OPC UA and more complex types
formant behavior[6]. Even in these cases the implementation are derived by creating structures of said types. [8] As a rule,
of OPC UA servers still demands powerful System-on-a-Chip when decoding a serialized, binary data stream the type of the
based products with external memory components.[7] structure contained must be known before hand, as the data
To enable actors and sensors to become part of a dynamic, stream does not provide stream information or synchroniza-
heterogeneous networked ecosystem both in industrial and tion markers. Standard processing cores operating with fixed
commercial applications, the authors designed an OPC UA word widths additionally require multiple instruction to align
server implemented as a hardware IP capable of offloading operands for calculations or manipulation for reading these 8
OPC UA communications in an autonomous, on-chip periph- bit data streams.[9]
eral component. The design enables highly parallel processing This issue was addressed by creating highly specialized
of OPC UA communications independently of the chips ap- coder units based on a newly developed OPC UA stream pro-
plication bound main processor. This peripheral component cessor core. The core natively parses all OPC UA base types as
allows energy efficient, widely employed microcomputing part of its instruction set. It also provides hardware memory
platforms to make use of OPC UA as an information carrier management for a configurable amount of internal memory.
without requiring significant redesign or changes to proven, All registers of the core treat data as streams composed of
optimized or time critical software. 8bit words. The OPC UA stream processor core allows to
978-1-5090-2339-4/16/$31.00 
c 2016 European Union programatically define the processing of messages without the
(a) Architectural design overview (b) Internal structure of a service coder

Figure 1: Design overviews of the peripheral IP with exemplary coder distribution and a service processor

requirement to align, index or otherwise manage the transfer B. Coder interconnect fabric
and storage of base data types. Stream processors operate in
a request-response scheme, in which the core enters a low
power idle state until a message arrives. The data stream is
Communication between coders is facilitated by a non
then parsed and responded to by to stream processor according
blocking, connection oriented network-on-chip bus fabric
to its program, after which the processors returns into the idle
(NoC). A terminal handles the access to this bus on the
state. Each cycle clears the internal memory, registers, flags
coder side, making communications appear like FIFO con-
and program counters.
tents. Physical access to the bus is controlled by a round-
Service coders (Fig. 1b) are composed of a stream processor robin-scheduler, providing equally distributed bus access. This
core, a bus terminal, a namespace engine linkup, a limited set allows for deterministic worst case behavior, which is essential
of peripherals persistent registers and a timer. Every OPC UA for guaranteeing hard real-time capabilities. The bus scheduler
service coder uses this same hardware arrangement and only can also function as router towards other schedulers. The
differs in the program used to process data streams. Each coder addressing scheme allows for 256 bus participants, which
also has a unique bus address and broadcast key set during can be located on a single bus or distributed over multiple
synthesis. bridged buses. The MTU is 256 Bytes, transferred as 32 bit
words. The packet based transactions permit different coders to
Large memory components are only required in coders that process data in parallel, allowing for continuous operations and
assemble fragmented messages, which is only the case for the minimizing hot-waiting of stream processors that are halted
TCP/IP Offloading Engine and the OPC UA-Sequencing layer due to the lack of data in their terminal.
that assembles message chunks. The memories are constructed
as FIFO buffers and are made accessible to subsequent codes, A single coder in the hierarchy can only decode his re-
to which the contents are forwarded on demand as a data spective part of the datastream; i.e, the security coder is
stream. New data can easily be written back into the buffers able to sequence the chunks of a message and remove the
by subsequent coders, so that local memory is not required in relevant headers, but he needs to forward the contained service
coders of the service layer. request to the appropriate service coder. The discovery of an
appropriate subsequent coder is handled by the bus system
Every service coder can process exactly one stream at using a key broadcast mechanism. The key corresponds to
any time. Coupled with the forwardable memory architecture, the decoding requirements of the message, which for service
deterministic program execution by the stream processor cores decoders is the service NodeId, while the transport coder uses
and the non-blocking packet switching bus fabric, this design the SecureChannelId. Coders that can process the message au-
principle results in the worst case processing time for any tomatically reply to the broadcast query and the first free coder
message being solely dependent on the length of the message is chosen to process the message. Adding new services during
itself. This property enables fulfilling hard real-time require- design time only entails connecting new service coders to the
ments, as both the maximum message length and the clock bus system, automatically allowing its dynamic discovery at
rate of the system are known at design time. runtime.
C. Binary namespace encoding purpose shared memory, accessible to all service coders, pro-
One of the key strengths of OPC UA is its information vides an inter-process communication mechanism. The shared
modeling capabilities. Type definitions, instances and methods memory is currently used to store and verify session IDs across
are abstracted as nodes in a graph and interconnected using all service coders.
typed references.[1] Runtime representation of this data re- The application processor has access to the namespace en-
quires large amount of memory. Data models in small devices gine using memory mapped interface registers. These registers
such as actors and sensors are largely static and composed of allow interacting with the OPC UA peripheral IP and enable
preinitialized objects and variables.[10] Only a small subset of the application to write to or read data from the namespace.
this information are changed at runtime. It was hence opted to For the application processor, updating OPC UA values is
store the information model on an static memory component reduced a periodic write operation to a special function register
(such as EEPROM or Flash) and mask any runtime changes in its address space.
to the data using a small associative cache. The default XML
OPC UA namespace representation proves too complex to
parse on-demand, while the OPC UA binary encoding imposes
severe limitation: The serial decoding requirement forces the
reading process to either parse the entire image up to the
information it wants to access, or index certain memory
locations.
As a solution, a specialized binary encoding for OPC UA
was developed that reduces persistent memory requirements
to 116 kB for namespace 0 1 . Next to reducing memory
requirements, the image format was designed to ease parsing
the contents of the linear 8 bit memory. Nodes are arranged
in ascending order of their numeric ID and CRC synchroniza-
tion markers in the stream permit locating nodes positions,
allowing for O(log(n)) search complexity[11] for arbitrary
nodes. Relative data offsets of node attributes (references, type
payload, compressed string data) are inserted to enable quick
navigation to the desired attributes. Search operations for refer- Figure 2: Internal architecture of the namespace engine.
ences are obsoleted by using a 24 bit direct addressing scheme,
effectively reducing search complexity for referenced nodes
to O(1). Frequently referenced nodes, such as referenceTypes, A bus system similar to the NoC is used to communicate
are addressed indirectly using 6 bit fields. The binary encoding with service coders. Only the round robin bus scheduler but not
is automatically generated by a python namespace compiler the overlying packet switching protocol is used. Service coders
using XML descriptions of the namespace. request access to the namespace engine using their instruction
set, thereby accessing the bus. The namespace engines bus
D. Namespace engine scheduler then allocates a fixed time slice for each coder
Service coders access the contents of the namespace by interaction. If the time slice elapses before the service coder
using an extension of their instruction set to interact with concludes its transactions, the state of the transaction is stored
the namespace engine, whose core component is an interface in a stack and later restored when the coder is re-granted access
for parsing the binary image. Its contents are assumed to be to the namespace engine, effectively preventing any coder from
static, variable contents being preallocated by the namespace blocking the namespace access.
compiler. On write operations, the information is stored in III. VALIDATION AND H ARDWARE C HARACTERISTICS
a separate memory, which masks the data stored at that
The design composed of a TCP/IP offloading engine, a
address of the static image. The data retrieval logic builds
transport coder, a security coder and service coders covering
on an address associative cache, thereby providing for a fixed
GetEndpoints, Read and Open-/CloseSecureSession services
amount of maskable memory. A feistel network based hash
underwent successful synthesis studies for FPGAs and the
table provides fast identification of masked memory areas.
GlobalFoundries 28nm SLP CMOS technology. The design,
A prefetch cache is used to accelerate read access to the
excluding the namespace engine, was extensively simulated at
comparatively slow serial memory (predictive read).
125 Mhz.
The namespace engine also provides subscription tracking
Simulated network traffic included a full TCP/IP con-
by monitoring bus operations and automatically storing the
nection handling, OPC UA transport layer connection han-
information necessary to generate publish responses. A general
dling (HEL/ACK/CLO), managing the secure session (Open-
1 1.6 MB of equivalent XML description or 196 kB for OPC UA binary /CloseSecureSession), reading server endpoints (GetEnd-
encoding points) and reading the server-status node (Read). The com-
plete testbench was concluded in 245 μs, a single read trans- FIFO buffers and a non blocking, round robin controlled bus
action taking 25.52 μs2 . The server responses where verified fabric provides deterministic and predictable processing times,
to be equivalent to software implementations. Several error making the protocol usable in real-time critical applications
scenarios where run for all implemented services as well as without mandating additional protocols. The design provides
the TCP/IP engine, which were all successfully recognized and a high degree of configurability in regard to its functionality,
handled by their respective coders. energy consumption and footprint. Access to large information
Synthesis studies for FPGAs and the GF28nm SLP CMOS models is made possible by recoding the information to fit on
technology confirm the high efficiency and low footprint of the a serial, non-volatile memory component, with written data
OPC UA hardware peripheral IP. Synthesis targeting GF28nm being masked in a small memory area during access. The sum
SLP CMOS resulted in a footprint of 75.800 μm2 at 125 Mhz3 of these efforts makes OPC UA accessible to even the simplest
On a Xilinx FPGAs the maximum clock rate was 70 Mhz45 , microcomputing platforms in actors and sensors, enabling their
while the Altera Cyclone V reaches 60 Mhz 6 . vertical integration into information networks. If scaled, the
Energy consumption and footprint scale linear to the num- design also provides mean for implementing high throughput,
ber and types of services coders of the server (profiles). autonomous aggregating servers and other use cases.
Dynamic coder identification over the bus system enables
R EFERENCES
the addition or removal of services to the network on chip
fabric at design time. Since no other modifications to the [1] M. Damm, W. Mahnke, and S.-H. Leitner, OPC Unified Architecture.
Springer, 4 2009.
hardware are necessary, the process of configuring profiles and [2] L. Zheng and H. Nakagawa, “Opc (ole for process control) specification
service parallelism could be automatically generated using IP and its developments,” in SICE 2002. Proceedings of the 41st SICE
Core generators that determine the appropriate service coder Annual Conference, vol. 2, p. 917–920 vol.2, 8 2002.
[3] S. H. Jeanne Schweder, “Platform industrie 4.0 proposes reference
configuration depending on the use case. Further means to architecture model for industrie 4.0: Opc-ua confirmed as one and only
tune the stacks Characteristics includes adjustments of cache standard in category "communication layer",” 04 2015.
sizes, stack- and in-processor buffer depths and the maximum [4] P. Reboredo and M. Keinert, “Integration of discrete manufacturing
field devices data and services based on OPC UA,” in Industrial
number of chunks/shared FIFO sizes. Electronics Society, IECON 2013-39th Annual Conference of the IEEE,
p. 4476–4481, IEEE, 2013.
IV. C URRENT WORKS [5] J. Imtiaz and J. Jasperneite, “Scalability of opc-ua down to the chip
level enablesinternet of things,” in Industrial Informatics (INDIN), 2013
Current works focus on implementing and improving the 11th IEEE International Conference on, 7 2013.
namespace engine, in particular in regard to supporting model [6] G. Shrestha, J. Imtiaz, and J. Jasperneite, “An optimized opc ua transport
changes that insert or delete segments of memory from the profile to bringing bluetooth low energy device into ip networks,” in
Emerging Technologies Factory Automation (ETFA), 2013 IEEE 18th
static binary image. The NoC protocol is being revised to allow Conference on, p. 1–5, 9 2013.
message prioritization, effectively enabling quality-of-service [7] MatrikonOPC, OPC-UA Device Server Hardware Development Kit
support. The large number of configurable components such (HDK). MatrikonOPC, 6 2013.
[8] V. VDK, OPC-UA Specification Part 6, vol. 6 of OPC-UA Specification.
as memory sizes or the number of service coders is examined OPC Foundation, 11 2008.
to determine the appropriate service multiplicity and memory [9] J. Gummaraju and M. Rosenblum, “Stream Programming on General-
configuration in dependance of several use cases. Since all Purpose Processors,” in MICRO 38: Proceedings of the 38th annual
ACM/IEEE international symposium on Microarchitecture, (Barcelona,
service coders are identical from a hardware standpoint, one Spain), 11 2005.
concept being closer examined is implementing floating ser- [10] C. Legat, C. Seitz, and B. Vogel-Heuser, “Unified sensor data provision-
vice coders by assigning the appropriate program just in time ing with semantic technologies,” in Emerging Technologies & Factory
Automation (ETFA), 2011 IEEE 16th Conference on, p. 1–8, IEEE, 2011.
as a message requires processing, which would allow for a [11] K. Weicker and N. Weicker, Algorithmen und Datenstrukturen. Springer-
load dependent, adaptive coder availability. The bus protocol Verlag, 2013.
is being redesigned to include message priorization so that
messages of certain client gain priority when searching or
assigning message coders.
V. C ONCLUSION
The works presented in the paper demonstrate the feasi-
bility and benefits of using an OPC UA Hardware Server.
By employing a hierarchic coder structure derived from the
OPC UA binary protocol structure allows the fully paral-
lel and autonomous processing of individual messages. The
independence of the processing coders, fixed size, shared
2 Firstbit reception to last bit transmission on ethernet PHY
3 Areas for slow-corner models at 85o C.
4 Virtex 6 XC6VLX75T: 36% logic use, 57% memory use
5 Artix7 XCA100T-CSG324: 20% logic use, 19% memory use
6 CycloneV 5CGFXC7D7F31C8: 15% logic use, 7% memory use

You might also like