You are on page 1of 6

http://www.embedded.

com/print/4429865

Christian Legare, Micrium - April 12, 2014

Developers often believe that since a communication protocol stack is called a TCP/IP stack, porting it to
an embedded target provides the target with all TCP/IP functionalities and performance. This is far from
true.

A TCP/IP stack requires such resources as sockets and buffers to achieve its goal. These resources,
however, consume RAM--a scarce resource on an embedded target. Deprived of sufficient resources, a
TCP/IP stack will not work better than a RS-232 connection.

When performance is not an issue and the primary requirements are connectivity and functionality,
implementing TCP/IP on a target with scarce resources (RAM and CPU) is an option. Today, however,
when an Ethernet port is available on a device, expectations are that performance will be in the order of
Megabits per second. This is achievable on small embedded devices, although certain design rules need to
be observed.

By using a Transport Control Protocol (TCP) example, this article demonstrates design rules to be
considered when porting a TCP/IP stack to an embedded device.

Network buffers
A TCP/IP stack places received packets in network buffers to be processed by the upper protocol layers
and also places data to send in network buffers for transmission. Network buffers are data structure
defined in RAM.

A buffer contains a header portion used by the protocol stack. This header provides information regarding
the contents of the buffer. The data portion contains data that has either been received by the Network
Interface Card (NIC) and thus will be processed by the stack, or data that is destined for transmission by
the NIC.

Figure 1 – Network buffer

The data portion of the network buffer contains the protocol data and protocol headers. For example:

1 de 6 06/05/2014 17:18
http://www.embedded.com/print/4429865

Figure 2 – Encapsulation process

The maximum network buffer size is determined by the maximum size of the data that can be transported
by the networking technology used. Today, Ethernet is the ubiquitous networking technology used for
Local Area Networks (LANs).

Originally, Ethernet standards defined the maximum frame size as 1518 bytes. Removing the Ethernet, IP
and TCP encapsulation data, this leaves a maximum of 1460 bytes for the TCP segment. A segment is the
data structure used to encapsulate TCP data. Carrying an Ethernet frame in one of the TCP/IP stack
network buffers requires network buffers of approximately 1600 bytes each. The difference between the
Ethernet maximum frame size and the network buffer size is the space required for the network buffer
metadata.

It is possible to use smaller Network buffers. For example, if the application is not streaming multimedia
data but rather transferring small sensor data periodically, it is possible to use smaller network buffers than
the maximum allowed.

TCP segment size is negotiated between the two devices that are establishing a logical connection. It is
known as the Maximum Segment Size (MSS). An embedded system could take advantage of this protocol
capability. On an embedded target with 32K RAM, when you account for the all the middleware RAM
usage, there is not much left for network buffers!

Network operations
Many networking operations affect system performance. For example, network buffers are not released as
soon as their task is completed. Within the TCP acknowledgment process, a TCP segment is kept until its
reception is acknowledged by the receiving device. If it is not acknowledged within a certain timeframe,
the segment is retransmitted and kept again.

If a system has a limited number of network buffers, network congestion (packets being dropped) will
affect the usage of these buffers and the total system performance. When all the network buffers are
assigned to packets (being transmitted, retransmitted or acknowledging received packets), the TCP/IP
stack will slow down while it waits for available resources before resuming a specific function.

The advantage of defining smaller network buffers is that more buffers exist that allow TCP (and UDP) to

2 de 6 06/05/2014 17:18
http://www.embedded.com/print/4429865

have more protocol exchanges between the two devices. This is ideal for applications where the
information exchanged be in smaller packets such as a data logging device sending periodic sensor data.

A disadvantage is that each packet carries less data. For streaming applications, this is less than desirable.
HTTP, FTP and other such protocols will not perform well with this configuration model.

Ultimately, if there is insufficient RAM to define a few network buffers, the TCP/IP stack will crawl.

TCP Performance
Windowing. TCP has a flow control mechanism called Windowing that is used for Transmit and Receive.
A field in the TCP header is used for the Windowing mechanism so that:

1. The Window field indicates the quantity of information (in terms of bytes) that the recipient is able
to accept. This enables TCP to control the flow of data.
2. Data receiving capacity is related to memory and to the hardware’s processing capacity
(network buffers).
3. The maximum size of the window is 65,535 bytes (a 16-bit field).
4. A value of 0 (zero) halts the transmission.
5. The source host sends a series of bytes to the destination host.
6.

Figure 3 – TCP Windowing

Within Figure 3, the following occurs:

1. Bytes 1 through 512 have been transmitted (and pushed to the application using the TCP PSH flag)
and have been acknowledged by the destination host.
2. The window is 2,048 bytes long.
3. Bytes 513 through 1,536 have been transmitted but have not been acknowledged.
4. Bytes 1,537 through 2,560 can be transmitted immediately.
5. Once an acknowledgement is received for bytes 513 through 1,536, the window will move 1,024
bytes to the right, and bytes 2,561 through 3,584 may then be sent.

On an embedded device, the window size should be configured in terms of the network buffers available.
For example, with an embedded device that has eight network buffers with an MSS of 1460, let’s
reserve 4 buffers for transmission and 4 buffers for reception. Transmit and receive window sizes will be 4
times 1460 (4 * 1460 = 5840 bytes).

3 de 6 06/05/2014 17:18
http://www.embedded.com/print/4429865

On every packet receive, TCP decreases the Receive Window size by 1460 and advertise the newly
calculated Receive Window Size to the transmitting device. Once the stack has processed the packet, the
Receive Window Size will be increased by 1460, the network buffer will be released and the Receive
Window Size will be advertised with the next packet transmitted.

Typically, the network can transport packets faster than the embedded target can process them. If the
Receiving device has received four packets without being able to process them, the Receive Window Size
will be decreased to zero. A zero Receive Window Size advertised to the Transmitting device tells that
device to stop transmitting until the Receiving device is able to process and free at least one network
buffer. On the transmit side, the stack will stop if network buffers are not available. Depending how the
stack is designed/configured, the transmitting function will retry, time-out or exit (Blocking/Non-blocking
sockets).

UDP does not have such a mechanism. If there are insufficient network buffers to receive the transmitted
data, packets are dropped. The Application needs to handle these situations. Title-1
TCP connection bandwidth product
The number of TCP segments being received/transmitted by a host has an approximate upper bound equal
to the TCP window sizes (in packets) multiplied by the number of TCP connections:

Tot # TCP Pkts ~= Tot # TCP Conns * TCP Conn Win Sizes

This is the TCP connection bandwidth product.

The number of internal NIC packet buffers/channels limits the target host's overall packet
bandwidth. Coupled with the fact that most targets are slower consumers, data being received by the
target by a faster producer will consume most or all NIC packet buffers/channels & thereby drop some
packets. However, even if/when performance/throughput is exceptionally low; TCP connections should
still be able to transfer data via re-transmission.

Windowing with multiple sockets


The given Windowing example assumes that the embedded device has one socket (one logical connection)
with a foreign host. Imagine a system where multiple parallel connections are required.

The discussion above can be applied to each socket. With proper application code, the connection
throughput is a divisor of the total connection bandwidth. This means that the TCP/IP stack configured
Window size needs to take into consideration the maximum number of sockets running at any point in
time.

Using the same example with 5 sockets and providing a Receive Window size of 5840 bytes to every
socket, 20 network buffers (4 buffers per Window * 5 sockets) will have to be configured. Assuming that
the largest network buffers possible (about 1600 bytes) are used, this means about 32K RAM of network
buffers (20 * 1600) is required; otherwise, the system will slow down due excessive retransmission
patterns.

A reverse calculation is probably what happens most of the time. How does one find the Tx and Rx
window sizes for a system?

When 20 network buffers are reserved for reception and that the system needs a maximum of 5 sockets at
any point in time, then:

Rx Window Size = (Number of buffers * MSS) / Number of sockets

If the result is less than one MSS, more RAM for additional buffers is required.

4 de 6 06/05/2014 17:18
http://www.embedded.com/print/4429865

Delayed Acknowledgement
Another important factor needs to be taken in to consideration with TCP--the network congestion state.
TCP keeps each network buffer transmitted until it iis acknowledged by the receiving host. When packets
are dropped or never delivered because of a number of network problems, TCP retransmits the packets.
This means that unacknowledged buffers are set aside and used for this purpose.

TCP does not necessarily acknowledge every packet received, a situation called Delayed
Acknowledgement. Without delayed acknowledgement, we half of the buffers used for transmission are
used for acknowledging every received packet. With delay acknowledgement, this number is reduced to
33%.

Knowing the number of buffers that can be used for transmission, based on the quantity of RAM that can
used for network buffers and the maximum number of sockets in use at any point un time, the Transmit
Window Size can be calculated:

Without Delayed Acknowledgement:


Tx Window Size = (Number of buffers * MSS) / (Number of sockets * 2)

With Delayed Acknowledgement:


Tx Window Size = (Number of buffers * MSS) / (Number of sockets * 1.5)

Note that a similar analysis can be done with a UDP application. Flow control and congestion control
instead of being implemented in the Transport Layer Protocol are moved to the Application Layer
Protocol. For example: TFTP (Trivial File Transfer Protocol). Acknowledgement and retransmission are
part of any data communications protocols. If it is not performed by the communication protocols, the
application must take care of it.

It is the developer’s decision to use UDP or TCP. If TCP is not required, it can be removed from the stack
(reducing the application ROM usage), however the application will need to take care of the network
problems responsible for the non-delivery of packets.

DMA and CPU speed


As stated previously, most targets are slow consumers. Packets generated by a faster producer and
received by the target will consume most or all NIC network buffers and some packets will be dropped.
Hardware features such as DMA and CPU speed that can improve this situation. The latter is trivial, the
faster the target can receive and process the packets, the faster the network buffers can be freed.

DMA support for the NIC is another means to improve packet processing. It is easy to understand that
when packets are transferred quickly to and from the stack, that network performance improves. DMA
also relieves the CPU from the transfer task, allowing the CPU to perform more of the protocol
processing.

Conclusion
When implementing a TCP/IP stack, the design intentions need to be clear. If the goal is to use the Local
Area Network without any consideration for performance, a TCP/IP stack or a subset of it can be
implemented with very few RAM (approximately 32K).

However, if the application requires the capabilities of the TCP protocol at a few megabits per second, a
more complete TCP/IP stack is required. In this case, when embedded system requirements are in the
range of 96K of RAM, resources need to be allocated to the protocol stack so that it can perform its
duties.

Christian Legare is Executive Vice-President and Chief Technology Officer at Micrium. He has a
Master's degree in Electrical Engineering from the University of Sherbrooke, Quebec, Canada. In his 22

5 de 6 06/05/2014 17:18
http://www.embedded.com/print/4429865

years in the telecom industry, he deployed networks and thought classes. Christian was involved as an
executive in large scale organizations as well as start-ups, mainly in Engineering and R&D. Christian
was in charge of an IP (Internet Protocol) certification program at the International Institute of
Telecom (IIT) in Montreal, Canada as their IP systems expert. Mr. Legare joined Micrium in 2002 as
Executive Vice-President and Chief Technology Officer with Micrium, home of uC/OS-II and uC/OS-III,
the real-time kernels, and was instrumental in the development of the majority of the kernel services.

This paper was presented at the Embedded Systems Conference as part of a class taught by Christian
Legare on "Achieving TCP-IP performance in embedded systems (ESC-106)."

6 de 6 06/05/2014 17:18

You might also like