You are on page 1of 11

(IJCNS) International Journal of Computer and Network Security, 41

Vol. 2, No. 5, May 2010

Security Issues for Voice over IP Systems


Ali Tahir1, Asim Shahzad2
1
Faculty of Telecommunication and Information Engineering, UET TAXILA,
Pakistan
ali.tahir.nsn@hotmail.com

2
Faculty of Telecommunication and Information Engineering, UET TAXILA,
Pakistan
asim.shahzad@uettaxila.edu.pk

Abstract: Firewalls, Network Address Translation (NAT), and


encryption produce Quality of Service (QoS) issues which are
the basic building block to the operations of a VOIP system.
There are two major non proprietary standards used for VoIP
communications. They are H.323 and Session Initiation
Protocol (SIP). Firewalls, NAT, and Intrusion Detection
Systems (IDS) are used to filter unwanted packets in VoIP
environment. To provide defense against an internal hacker,
another layer of defense is necessary at the protocol level to
protect the data itself. In VOIP, as in data networks, this can be
accomplished by encrypting the packets at the IP level using
IPSec. As VoIP is an IP based technology that utilizes the
Internet it also inherits all associated IP vulnerabilities. This
research paper focus on different security issues associated with Figure 1. How VOIP works [1]
voice over IP system.
The VoIP networks replace the traditional public-switched
Keywords: Voice-Over Internet Protocol (VoIP), Firewalls. telephone networks (PSTNs), as these can perform the same
functions as the PSTN networks. The functions performed
include signaling, data basing, call connect and disconnect,
1. Overview of VOIP and coding-decoding.

The Voice-Over Internet Protocol (VoIP) technology allows


the voice information to pass over IP data networks. This 2. Quality of Service Issues
technology results in huge savings on the amount of
Quality of Service (QoS) is fundamental to the operation of
physical resources required to communicate by voice over
long distance. It does so by exchanging the information in a VOIP network. Despite all the money VOIP can save users
packets over a data network. The basic functions performed and the network elegance it provides, if it cannot deliver at
by a VoIP include - signaling, data basing, call connect and least the same quality of call setup and voice relay
disconnect, and coding/decoding. The steps involved in functionality and voice quality as a traditional telephone
originating and internet telephone call are the conversion of network, then it will provide little added value. There are
the analogue voice signal to digital format and compression various security measures that can degrade QoS. For
/ translation of the signal into internet protocol (IP) packets example blocking of call setups by firewalls, encryption-
for transmission over the internet; the process is reversed at produced latency and delay variation (jitter). QoS issues are
the receiving end. VoIP software’s like Vocal TEC or Net 2 central to VOIP security. If QoS was assured, then most of
Phone are available for the user [1]. With the exception of the same security measures currently implemented in
phone to phone, the user must posses an array of equipment today’s data networks could be used in VOIP networks. But
which should at minimum include VoIP software, an because of the time-critical nature of VOIP, and its low
internet connection, and a multimedia computer with a tolerance for disruption and packet loss, many security
sound card, speakers, a microphone and a modem [1].The measures implemented in traditional data networks just
VoIP network acts as a gateway to the existing PSTN
aren’t applicable to VOIP in their current form. The main
network. This gateway forms the interface for transportation
QoS issues associated with VOIP that security affects are
of the voice content over the IP networks. Gateways are
presented here [4].
responsible for all call origination, call detection, analogue
to digital conversion of voice, and creation of voice packets.
2.1 Latency
Latency in VOIP refers to the time it takes for a voice
transmission to go from its source to its destination. We
would like to keep latency as low as possible for ideal
42 (IJCNS) International Journal of Computer and Network Security,
Vol. 2, No. 5, May 2010

systems but there are practical lower bounds on the delay of reassembling these packets is non-trivial, especially when
VOIP. The ITU-T Recommendation G.114 establishes a dealing with the tight time constraints of VOIP.
number of time constraints on one-way latency. The upper When jitter is high, packets arrive at their destination in
bound is 150 ms in [2] for one-way traffic experienced in spurts. The general prescription to control jitter at VOIP
domestic calls across PSTN lines in the continental United endpoints is the use of a buffer, but such a buffer has to
States. For international calls, a delay of up to 400 ms was release its voice packets at least every 150 ms (usually a lot
deemed tolerable, but since most of the added time is spent sooner given the transport delay) so the variations in delay
routing and moving the data over long distances, we must be bounded. The buffer implementation issue is
consider here only the domestic case and assume our compounded by the uncertainty of whether a missing packet
solutions are upwards compatible in the international realm. is simply delayed an anomalously long amount of time, or is
VOIP calls must achieve the 150 ms bound to successfully actually lost.
emulate the QoS that today’s phones provide. It places a Jitter can also be controlled throughout the VOIP network
genuine constraint on the amount of security that can be by using routers, firewalls, and other network elements that
added to a VOIP network. The encoding of voice data can support QoS. These elements process and pass along time
take between 1 and 30 ms and voice data traveling across urgent traffic like VOIP packets sooner than less urgent data
the North American continent can take upwards of 100 ms packets. Another method for reducing delay variation is to
although actual travel time is often much faster . Assuming pattern network traffic to diminish jitter by making as
the worst case (100 ms transfer time), 20 –50 ms remain for efficient use of the bandwidth as possible. This constraint is
queuing and security implementations [4]. at odds with some security measures in VOIP. Chief among
these is IPSec, whose processing requirements may increase
latency, thus limiting effective bandwidth and contributing
to jitter.
The window of delivery for a VOIP packet is very small, so
it follows that the acceptable variation in packet delay is
even smaller. Thus, although we are concerned with
security, the utmost care must be given to assuring that
delays in packet deliveries caused by security devices are
kept uniform throughout the traffic stream [4].

2.3 Packet Loss


VOIP is exceptionally intolerant of packet loss. Packet loss
can result from excess latency, where a group of packets
arrives late and must be discarded in favor of newer ones. It
can also be the result of jitter, that is, when a packet arrives
after its surrounding packets have been flushed from the
buffer, making the received packet useless. Compounding
Figure 2. Sample Latency Budget [4] the packet loss problem is VOIP’s reliance on RTP, which
uses the unreliable UDP for transport, and thus does not
Delay is not confined to the endpoints of the system. Each guarantee packet delivery. However, the time constraints do
hop along the network introduces a new queuing delay and not allow for a reliable protocol such as TCP to be used to
possibly a processing delay if it is deliver media. The good news is that VOIP packets are very
a security checkpoint (i.e. firewall or encryption/decryption small, containing a payload of only 10-50 bytes, which is
point). Also, larger packets tend to cause bandwidth approximately 12.5-62.5 ms in [4], with most
congestion and increased latency. In light of these issues, implementations tending toward the shorter range. The loss
VOIP tends to work best with small packets on a logically of such a minuscule amount of speech is not discernable or
abstracted network to keep latency at a minimum. at least not worthy of complaint for a human VOIP user.
The bad news is these packets are usually not lost in
2.2 Jitter isolation. Bandwidth congestion and other such causes of
Jitter refers to non-uniform packet delays. It is often caused packet loss tend to affect all the packets being delivered
by low bandwidth situations in VOIP and can be around the same time. So although the loss of one packet is
exceptionally detrimental to the overall QoS. Variations in fairly inconsequential, probabilistically the loss of one
delays can be more detrimental to QoS than the actual packet means the loss of several packets, which severely
delays themselves. Jitter can cause packets to arrive and be degrades the quality of service in a VOIP network.
processed out of sequence. RTP, the protocol used to Despite the infeasibility of using a guaranteed delivery
transport voice media, is based on UDP so packets out of protocol such as TCP, there are some remedies for the
order are not reassembled at the protocol level. However, packet loss problem. One cannot guarantee all packets are
RTP allows applications to do the reordering using the delivered, but if bandwidth is available, sending redundant
sequence number and timestamp fields. The overhead in information can probabilistically annul the chance of loss.
(IJCNS) International Journal of Computer and Network Security, 43
Vol. 2, No. 5, May 2010

Such bandwidth is not always accessible and the redundant into a VOIP network is not feasible, particularly when VOIP
information will have to be processed, introducing even is integrated into existing data networks. Instead, these data-
more latency to the system and ironically, possibly network solutions must be adapted to support security in the
producing even greater packet loss. new fast paced world of VOIP.

2.4 Bandwidth & Effective Bandwidth 2.6 Power Failure and Backup Systems
As in data networks, bandwidth congestion can cause packet Conventional telephones operate on 48 volts supplied by the
loss and a host of other QoS problems. Thus, proper telephone line itself. This is why home telephones continue
bandwidth reservation and allocation is essential to VOIP to work even during a power failure. Most offices use PBX
quality. One of the great attractions of VOIP, data and voice systems with their conventional telephones, and PBXs
sharing the same wires, is also a potential headache for require backup power systems so that they continue to
implementers who must allocate the necessary bandwidth operate during a power failure [4]. These backup systems
for both networks in a system normally designed for one. will continue to be required with VOIP, and in many cases
Congestion of the network causes packets to be queued, will need to be expanded. An organization that provides
which in turn contributes to the latency of the VOIP system. uninterruptible power systems for its data network and
Low bandwidth can also contribute to non-uniform delays desktop computers may have much of the power
(jitter) [1], since packets will be delivered in spurts when a infrastructure needed to continue communication functions
window of opportunity opens up in the traffic. during power outages, but a careful assessment must be
Because of these issues, VOIP network infrastructures must conducted to ensure that sufficient backup power is available
provide the highest amount of bandwidth possible. On a for the office VOIP switch, as well as each desktop
LAN, this means having modern switches running at 100M instrument. Costs may include electrical power to maintain
bit/sec and other architectural upgrades that will alleviate UPS battery charge, periodic maintenance costs for backup
bottlenecks within the LAN. Percy and Hommer suggest power generation systems, and cost of UPS battery
that if network latencies are kept below 100 milliseconds, replacement. If emergency/backup power is required for
maximum jitter never more than 40 milliseconds, then more than a few hours, electrical generators will be
packet loss should not occur. With these properties assured, required. Costs for these include fuel, fuel storage facilities,
one can calculate the necessary bandwidth for a VOIP and cost of fuel disposal at end of storage life.
system on the LAN in a worst case scenario using statistics
associated with the worst-case bandwidth congesting codec. 2.7 QoS Implications for Security
This is fine when dealing simply with calls across the LAN, The strict performance requirements of VOIP have
but the use of a WAN complicates matters. Bandwidth usage significant implications for security, particularly denial of
varies significantly across a WAN, so a much more complex service (DoS) issues. VOIP-specific attacks (i.e., floods of
methodology is needed to estimate required bandwidth specially crafted SIP messages) may result in DoS for many
usage. VOIP-aware devices. For example, SIP phone endpoints
Methods for reducing the bandwidth usage of VOIP include may freeze and crash when attempting to process a high rate
RTP header compression and Voice Activity Detection of packet traffic SIP proxy servers also may experience
(VAD). RTP compression condenses the media stream failure and intermittent log discrepancies with a VOIP-
traffic so less bandwidth is used. However, an inefficient specific signaling attack of under 1Mb/sec. In general, the
compression scheme can cause latency or voice degradation, packet rate of the attack may have more impact than the
causing an overall downturn in QoS. VAD prevents the bandwidth; i.e., a high packet rate may result in a denial of
transmission of empty voice packets (i.e. when a user is not service even if the bandwidth consumed is low [4].
speaking, their device does not simply send out white noise). 3. Standards and Protocols
However, by definition VAD will contribute to jitter in the
system by causing irregular packet generation [4]. Over the next few years, the industry will address the
bandwidth limitations by upgrading the Internet backbone to
2.5 The Need for Speed asynchronous transfer mode (ATM), the switching fabric
The key to conquering QoS issues like latency and designed to handle voice, data, and video traffic. Such
bandwidth congestion is speed. By definition, faster network optimization will go a long way toward eliminating
throughput means reduced latency and probabilistically network congestion and the associated packet loss. The
reduces the chances of severe bandwidth congestion. Thus Internet industry also is tackling the problems of network
every facet of network traversal must be completed quickly reliability and sound quality on the Internet through the
in VOIP. The latency often associated with tasks in data gradual adoption of standards. Standards-setting efforts are
networks will not be tolerated. Chief among these latency focusing on the three central elements of Internet telephony:
producers that must improve performance are firewall/NAT the audio codec format; transport protocols; and directory
traversal and traffic encryption/decryption [4]. Traditionally, services.
these are two of the most effective ways for administrators to
secure their networks. However, they are also two of the In May 1996, the International Telecommunications Union
greatest contributors to network congestion and throughput (ITU) ratified the H.323 specification, which defines how
delay. Inserting traditional firewall and encryption products
44 (IJCNS) International Journal of Computer and Network Security,
Vol. 2, No. 5, May 2010

voice, data, and video traffic will be transported over IP– without passing through the firewall. The introduction of
based local area networks; it also incorporates the T.120 firewalls to the VOIP network complicates several aspects of
data conferencing standard. The recommendation is based VOIP, most notably dynamic port trafficking and call setup
on the real-time protocol/real-time control protocol procedures.
(RTP/RTCP) for managing audio and video signals.
As such, H.323 addresses the core Internet-telephony 4.2 Stateful Firewalls
applications by defining how delay-sensitive traffic, (i.e., Most VOIP traffic travels across UDP ports. Firewalls
voice and video), gets priority transport to ensure real-time typically process such traffic using a technique called packet
communications service over the Internet [1]. filtering. Packet filtering investigates the headers of each
packet attempting to cross the firewall and uses the IP
addresses, port numbers, and protocol type contained therein
to determine the packets’ legitimacy [4]. In VOIP and other
media streaming protocols, this information can also be used
to distinguish between the start of a connection and
established connection.There are two types of packet
filtering firewalls, stateless and stateful. Stateless firewalls
retain no memory of traffic that has occurred earlier in the
session. Stateful firewalls do remember previous traffic and
can also investigate the application data in a packet. Thus,
stateful firewalls can handle application traffic that may not
be destined for a static port.

4.1.1 VOIP specific Firewall Needs


Figure 3: H.323 Architecture [4] In addition to the standard firewall practices, firewalls are
often deployed in VOIP networks with the added
responsibility of brokering the data flow between the voice
4. Firewalls, NAT AND IDS
and data segments of the network. This is a crucial
Firewalls and NAT present a formidable challenge to VOIP functionality for a network containing PC-Based IP phones
implementers. However, there are solutions to these that are on the data network, but need to send voice
problems, if one is willing to pay the price. It is important to messages. All voice traffic emanating from or traveling to
note that all three major VOIP protocols, SIP, H.323, and such devices would have to be explicitly allowed in if no
H.248 all have similar problems with firewalls and NATs. firewall was present because RTP (Real Time Protocol)
Although the use of NATs may be reduced as IPv6 is makes use of dynamic UDP ports (of which there are
adopted in [4], they will remain a common component in thousands). Leaving this many UDP ports open is an
networks for years to come, and IPv6 will not alleviate the egregious breach of security. Thus, it is recommended that
need for firewalls, so VOIP systems must deal with the all PC-based phones be placed behind a stateful firewall to
complexities of firewalls and NATs. Some VOIP issues with broker VOIP media traffic. Without such a mechanism, a
firewalls and NATs are unrelated to the call setup protocol UDP DoS attack could compromise the network by
used. Both network devices make it difficult for incoming exploiting the excessive amount of open ports [10].
calls to be received by a terminal behind the firewall / NAT. Firewalls could also be used to broker traffic between
Also, both devices affect QoS and can impose strong inflict physically segmented traffic (one network for VOIP, one
with the RTP stream. network for data) but such an implementation is fiscally and
physically unacceptable for most organizations, since one of
4.1 Firewalls the benefits of VOIP is voice and data sharing the same
Firewalls are a staple of security in today’s IP networks. physical network.
Whether protecting a LAN, WAN, encapsulating a DMZ, or
just protecting a single computer, a firewall is usually would 4.2 Network Address Translation (NAT)
be the first line of defense against attackers. Traffic not NAT is commonly performed by firewalls to conserve IP
meeting the requirements of the firewall is dropped. addresses and hide internal IP addresses/ports from direct,
Processing of traffic is determined by a set of rules external access. This causes issues for VoIP. When VoIP
programmed into the firewall by the network administrator. endpoints negotiate ports for media exchange, they
These may include such commands as “Block all FTP traffic communicate these ports to one another in packet payloads.
(port 21)” or “Allow all HTTP traffic (port 80)”. Much more So a conventional NAT remains unaware of ports selected,
complex rule sets are available in almost all firewalls. A cannot translate them meaningfully and blocks them at the
useful property of a firewall, in this context, is that it firewall. A Real Time Mixed Media (RTMM) firewall uses
provides a central location for deploying security policies. It the Back-to-Back User Agent (B2BUA) to work with the
is the ultimate bottleneck for network traffic because when NAT to rewrite the addresses in the signaling stream,
properly designed, no traffic can enter or exit the LAN providing NAT pathways for the media streams. To do this,
(IJCNS) International Journal of Computer and Network Security, 45
Vol. 2, No. 5, May 2010

the RTMM must be able to perform NAT at media speeds, Intrusion detection is a second line of defense behind other
to prevent latency/jitter or loss of packets [11]. security mechanisms (Firewalls, Encryption). Supplying the
All of the benefits of NAT come at a price. NATs “violate detector engine with application specific knowledge makes
the fundamental semantic of the IP address, that it is a it more effective and powerful.
globally reachable point for communications”. This design An Intrusion Detection System (IDS) helps administrators to
has significant implications for VOIP. For one thing, an monitor and defend against security breaches. Intrusion
attempt to make a call into the network becomes very detection techniques are generally divided into two
paradigms, anomaly detection and misuse detection. In
complex when a NAT is introduced. The situation is
anomaly detection techniques, the deviation from normal
analogous to a phone network where several phones have
system behaviors is detected, whereas misuse detection is
the same phone number, such as in a house with multiple
based on the matching of attack signatures. Unlike
phones on one line (see Figure 4). There are also several signature-based intrusion detection, anomaly detection has
issues associated with the transmission of the media itself the advantage of detecting previously-unknown attacks but
across the NAT, including an incompatibility with IPSec at the cost of relatively high false alarm rate.
[4]. Sekar etal introduced a third category of specification-based
intrusion detection. Specification-based approach takes the
manual development of a specification that captures
legitimate system behavior and detects any deviation
thereof. This approach can detect unseen attacks with low
false alarm rate. However, these previous approaches fall
short of defending VoIP applications, because of the cross-
protocol interaction and distributed nature of VoIP.

Figure 4. IP Telephones behind NAT and Firewall [4]

Problem:
• Simple NAT devices, which are not VoIP aware, perform
NAT on the IP headers only.
• VoIP packets contain private IP address in the payload.
• Therefore, VoIP sessions cannot be established.

Solution: Figure 6. IDS Engine sits on or close to the end-point [12]


• Translates both the IP header and the packet’s payload VoIP systems use multiple protocols for call control and
with a routable IP address (Far End NAT) data delivery. For example, in SIP-based IP telephony,
• Media relay until full session establishment. Session Initiation Protocol (SIP) is used to control call setup
and teardown, while Real-time Transport Protocol (RTP) is
Session Border Controller (SBC) for media delivery. A VoIP system is distributed in nature,
consisting of IP phones, SIP proxies, and many other
• SBC’s started as a solution to connectivity problems servers. Defending against malicious attacks on such a
caused by NAT done by non-VoIP aware devices. heterogeneous and distributed environment is far from
• SBC’s are usually used by carriers and located at the trivial [2].
border of their core networks [7].
4.3.1 Cross-protocol Methodology for Detection
Is the IDS which use cross-protocol detection accesses
packets from multiple protocols in a system to perform its
detection. This methodology is suitable to systems that use
multiple protocols and where attacks spanning these
multiple protocols are possible. There is the important
design consideration that such access to information across
protocols must be made efficiently. A VoIP system
incorporates multiple protocols. A typical example is the use
of SIP to establish a connection, followed by use of RTP to
transfer voice data. Also, RTCP and ICMP are used to
monitor the health of the connection. VoIP systems typically
have application level software for billing purposes and
Figure 5. Far End NAT [7] therefore may have accounting software and a database.
Using the cross-protocol methodology for detection, one can
4.3 Intrusion Detection System (IDS) create a cross-protocol rule to look at the SIP messages, the
46 (IJCNS) International Journal of Computer and Network Security,
Vol. 2, No. 5, May 2010

transaction messages between the accounting software and in successfully incorporating IPsec encryption into VOIP
the database, and the RTP flows later on. Specifically, each services.
of the following three conditions must hold [12]. 5.1 IPSec
1. The SIP message should follow the correct format. IPsec is the preferred form of VPN tunneling across the
2. When the accounting software sends out a transaction Internet. There are two basic protocols defined in IPsec:
to denote a call from user A to user B, check if user A has Encapsulating Security Payload (ESP) and Authentication
sent a SIP Call Initialization message to user B. If user A Header (AH) (see Figure 7). Both schemes provide
has not set up the call with a legitimate SIP Call connectionless integrity, source authentication, and an anti-
Initialization message, then this condition will be violated.
replay service. The tradeoff between ESP and AH is the
3. Check the source/destination IP addresses of the
increased latency in the encryption and decryption of data in
subsequent RTP flows. Together with information from
ESP and a “narrower” authentication in ESP, which
DNS and SIP Location Servers, we can reconfirm that each
RTP flow has a corresponding legitimate call setup. normally does not protect the IP header “outside” the ESP
header, although Internet Key Exchange (IKE: Responsible
4.3.2 Stateful Methodology for Detection for key agreement using public key cryptography) can be
A second abstraction useful for VoIP systems in particular is used to negotiate the security association (SA), which
stateful detection. Stateful detection implies building up includes the secret symmetric keys. In this case, the
relevant state within a session and across sessions and using addresses in the header (transport mode) or new/outer
the state in matching for possible attacks. It is important header (tunnel mode) are indirectly protected, since only the
that the state aggregation be done efficiently so that the entity that negotiated the SA can encrypt/decrypt or
technique is applicable in high throughput systems, such as authenticate the packets. Both schemes insert an IPsec
VoIP systems. header (and optionally other data) into the packet for
A VoIP system maintains considerable amount of system purposes, such as authentication [4].
state. The client side maintains state about all the active IPsec also supports two modes of delivery: Transport and
connections – when the connection was initiated, when it Tunnel. Transport mode encrypts the payload (data) and
can be torn down, and what the properties of the connection upper layer headers in the IP packet. The IP header and the
are. The server side also maintains state relevant to billing,
new IPsec header are left in plain sight. So if an attacker
such as the duration of the call.
were to intercept an IPsec packet in transport mode, they
could not determine what it contained; but they could tell
5. Encryption and IPsec where it was headed, allowing rudimentary traffic analysis.
The only focus on security of the network, protecting On a network entirely devoted to VOIP, this would equate to
endpoints, other components, from malicious attacks is not logging which parties were calling each other, when, and
enough for VoIP systems. Firewalls, IDS, NAT, and other for how long. Tunnel mode encrypts the entire IP datagram
such devices can help keep intruders from compromising a and places it in a new IP Packet [9]. Both the payload and
network, but firewalls are no defense against an internal the IP header are encrypted. The IPsec header and the new
hacker in [4]. Another layer of defense is necessary at the IP Header for this encapsulating packet are the only
protocol level to protect the data itself. In VOIP, as in data information left in the clear.
networks, this can be accomplished by encrypting the
packets at the IP level using IPsec. This way if anyone on
the network, authorized or not, intercepts VOIP traffic not
intended for them, these packets will be impossible to
understand. The IPsec suite of security protocols and
encryption algorithms is the standard method for securing
packets against unauthorized viewers over data networks
and will be supported by the protocol stack in IPv6. Hence,
it is both logical and practical to extend IPsec to VOIP,
encrypting the signal and voice packets on one end and
decrypting them only when needed by their intended
recipient.
Also, several factors, including the expansion of packet size,
ciphering latency, and a lack of QoS urgency in the
cryptographic engine itself can cause an excessive amount of
latency in the VOIP packet delivery. This leads to degraded
voice quality, so once again there is a tradeoff between Figure 7. IPsec Tunnel and Transport Modes [9]
security and voice quality, and a need for speed. Fortunately,
the difficulties are not insurmountable. IPsec can be IPSec supports the Triple DES encryption algorithm (168-
incorporated into a SIP network with roughly a three-second bit) in addition to 56-bit encryption. Triple DES (3DES) is a
additional delay in call setup times, an acceptable delay for strong form of encryption that allows sensitive information
many applications. This section explains the issues involved to be transmitted over untrusted networks. It enables
(IJCNS) International Journal of Computer and Network Security, 47
Vol. 2, No. 5, May 2010

customers, particularly in the finance industry, to utilize to require fragmentation and the corresponding reassembly
network layer encryption. of IPSec datagrams.
2. Encrypted packets are probably authenticated, which
means that there are two cryptographic operations that are
Table 1: Encryption Algorithms for VoIP [9] performed for every packet.
3. The authentication algorithms are slow, although work
Encryption Algorithms for IPSec
Type Key Strengt has been done to speed up things as the Diffie−Hellman
h computations [4].
DES Symmetric 56-bit Weak
3DES Symmetric 168-bit Mediu 5.5 Difficulties Arising from VOIPsec
m
AES Symmetric 128, 192, or 256- Strong IPSec has been included in IPv6. It is a reliable, robust, and
bit widely implemented method of protecting data and
RSA Asymmetric 1024-bit minimum Strong authenticating the sender. However, there are several issues
associated with VOIP that are not applicable to normal data
5.2 The Role of IPSec in VOIP traffic. Of particular interest are the Quality of Service
The prevalence and ease of packet sniffing and other (QoS) issues and some others like latency, jitter, and packet
techniques for capturing packets on an IP based network loss. These issues are introduced into the VOIP environment
makes encryption a necessity for VOIP. Security in VOIP is because it is a real time media transfer, with only 150 ms to
concerned both with protecting what a person says as well as deliver each packet. In standard data transfer over TCP, if a
to whom the person is speaking. IPSec can be used to packet is lost, it can be resent by request. In VOIP, there is
achieve both of these goals as long as it is applied with ESP no time to do this. Packets must arrive at their destination
using the tunnel method. This secures the identities of both and they must arrive fast. Of course the packets must also be
the endpoints and protects the voice data from prohibited secure during their travels. However, the price of this
users once packets leave the corporate intranet. The security is a decisive drop in QoS caused by a number of
incorporation of IPSec into Ipv6 will increase the factors.
availability of encryption, although there are other ways to A study by researchers focused on the effect of VOIPsec on
secure this data at the application level. VOIPsec (VOIP various QoS issues and on the use of header compression as
using IPSec) helps reduce the threat of man in the middle a solution to these problems. They studied several codecs,
attacks, packet sniffers, and many types of voice traffic encryption algorithms, and traffic patterns to garner a broad
analysis. Combined with the firewall implementations, Ipsec description of these effects. Some empirical results
makes VOIP more secure than a standard phone line. It is developed by Cisco are available as well in [4].
important to note, however, that Ipsec is not always a good Delay
fit for some applications, so some protocols will continue to • Processing—PCM to G.729 to packet
rely on their own security features. • Encryption — ESP encapsulation + 3DES
• Serialization — time it takes to get a packet out of the
5.3 Local VPN Tunnels router, each “hop” generally has fixed delay.
Virtual Private Networks (VPNs) are “tunnels” between two • IPsec overhead: about 40 bytes (depending configuration)
endpoints that allow for data to be securely transmitted • IP header: 20 bytes
between the nodes. The IPSec ESP tunnel is a specific kind • UDP + RTP headers: 20 bytes
of VPN used to traverse a public domain (the Internet) in a
• RTP header compression: 3 bytes for IP+UDP+RTP
private manner. The use and benefits of VPNs in IPSec have
been great enough for some to claim “VOIP is the killer app
Effects on 8 kbps CODEC (voice data: 20 bytes)
for VPNs”. VPN tunnels within a corporate LAN or WAN
• clear text voice has an overhead of 3 bytes, which
are much more secure and generally faster than the IPSec
suggests required bandwidth of approximately 9 kbps
VPNs across the Internet because data never traverses the
• IPsec encrypted voice: overhead 80 bytes and required
public domain, but they are not scaleable [4]. Also, no
bandwidth 40 kbps
matter how the VPN is set up, the same types of attacks and
issues associated with IPSec VPNs are applicable, so we
consider here only the case of IPSec tunneling and assume 5.6 Encryption / Decryption Latency
The cryptographic engine is bottleneck for voice traffic
the security solutions can be scaled down to an internal
transmitted over IPsec. The driving factor in the degraded
network if needed.
performance produced by the cryptography was the
scheduling algorithms in the crypto-engine itself However,
5.4 Memory and CPU Considerations
there still was significant latency due to the actual
Packets that are processed by IPSec are slower than packets
encryption and decryption.
that are processed through classic crypto. There are several
Encryption/decryption latency is a problem for any
reasons for this and they might cause significant
cryptographic protocol, because much of it results from the
performance problems:
computation time required by the underlying encryption.
1. IPSec introduces packet expansion, which is more likely
48 (IJCNS) International Journal of Computer and Network Security,
Vol. 2, No. 5, May 2010

With VOIP’s use of small packets at a fast rate and Fortunately, the increased processing power of newer
intolerance for packet loss, maximizing throughput is phones is making endpoint encryption less of an issue.
critical. However, this comes with a price, because although
6.2 Secure Real Time Protocol (SRTP)
DES is the fastest of these encryption algorithms, it is also
the easiest to crack [4]. Thus, designers are once again The Secure Real-time Protocol is a profile of the Real-time
forced to toe the line between security and voice quality. Transport Protocol (RTP) offering not only confidentiality,
but also message authentication, and replay protection for
5.7 Expanded Packet Size the RTP traffic as well as RTCP (Real-time Transport
IPsec also increases the size of packets in VOIP, which leads Control Protocol).
to more QoS issues. The increase is actually just an increase SRTP provides a framework for encryption and message
in the header size due to the encryption and encapsulation of authentication of RTP and RTCP streams. SRTP can
the old IP header and the introduction of the new IP header achieve high throughput and low packet expansion.
and encryption information. This leads to several SRTP is independent of a specific RTP stack
complications when IPsec is applied to VOIP. First, the implementation and of a specific key management standard,
effective bandwidth is decreased as much as 63% [4]. Thus but Multimedia Internet Keying (MIKEY) has been
connections to single users in low bandwidth areas (i.e. via designed to work with SRTP.
modem) may become infeasible. The size discrepancy can In comparison to the security options for RTP there are
also cause latency and jitter issues as packets are delayed by some advantages to using SRTP. The advantages over the
decreased network throughput or bottlenecked at hub nodes RTP standard security and also over the H.235 security for
on the network (such as routers or firewalls). media stream data are listed below [4, 12].

5.8 IPsec and NAT Incompatibility SRTP provides increased security, achieved by
• Confidentiality for RTP as well as for RTCP by
IPsec and NAT compatibility is far from ideal. NAT
encryption of the respective payloads;
traversal completely invalidates the purpose of AH because
• Integrity for the entire RTP and RTCP packets, together
the source address of the machine behind the NAT is
with replay protection;
masked from the outside world. Thus, there is no way to
• The possibility to refresh the session keys periodically,
authenticate the true sender of the data. The same reasoning
which limits the amount of cipher text produced by a fixed
demonstrates the inoperability of source authentication in
key, available for an adversary to cryptanalyze;
ESP. We have defined this as an essential feature of
• An extensible framework that permits upgrading with
VOIPsec, so this is a serious problem. There are several
new cryptographic algorithms;
other issues that arise when ESP traffic attempts to cross a
NAT. If only one of the endpoints is behind a NAT, the • A secure session key derivation with a pseudo-random
situation is easier. If both are behind NATs, IKE negotiation function at both ends;
can be used for NAT traversal, with UDP encapsulation of • The usage of salting keys to protect against pre-
the IPsec packets. computation attacks;
• Security for unicast and multicast RTP applications.

6. Solutions to the VOIPsec Issues SRTP has improved performance attained by


We have raised a number of significant concerns with • Low computational cost asserted by pre-defined
IPsec’s role in VOIP. However, many of these technical algorithms;
problems are solvable. Despite the difficulty associated with • Low bandwidth cost and a high throughput by limited
these solutions it is very important for the establishment of a packet expansion and by a framework preserving RTP
secure implementation of VOIPsec. header compression efficiency;
• Small footprint that is a small code size and data memory
6.1 Encryption at the End Points for keying information and replay lists.
One proposed solution to the bottlenecking at the routers
due to the encryption issues is to handle 6.3 Key Management for SRTP – MIKEY
encryption/decryption solely at the endpoints in the VOIP SRTP uses a set of negotiated parameters from which
network. One consideration with this method is that the session keys for encryption, authentication and integrity
endpoints must be computationally powerful enough to protection are derived. MIKEY describes a key management
handle the encryption mechanism. Though ideally scheme that addresses real-time multimedia scenarios (e.g.
encryption should be maintained at every hop in a VOIP SIP calls and RTSP sessions, streaming, unicast, groups,
packet’s lifetime, this may not be feasible with simple IP multicast). The focus lies on the setup of a security
phones with little in the way of software or computational association for secure multimedia sessions including key
power. In such cases, it may be preferable for the data be management and update, security policy data, etc., such that
encrypted between the endpoint and the router (or vice requirements in a heterogeneous environment are fulfilled.
versa) but unencrypted traffic on the LAN is slightly less MIKEY also supports the negotiation of single and multiple
damaging than unencrypted traffic across the Internet [4]. crypto sessions.
(IJCNS) International Journal of Computer and Network Security, 49
Vol. 2, No. 5, May 2010

thus increasing the effective bandwidth used by the


MIKEY has some important properties in [4]: transmission. In particular, when cIPsec is adopted, the
• MIKEY can be implemented as an independent software average packet size is only 2% bigger, rather than 50%
library to be easily integrated in a multimedia longer plain VoIPsec packets, which makes VoIPsec and
communication protocol. It offers independency of a specific VoIP equivalent from the bandwidth usage point of view.
communication protocol (SIP, H.323, etc.) However, packet loss does have an exacerbated detrimental
• Establishment of key material within a 2-way handshake, effect on packets compressed under the cIPsec scheme.
therefore best suited for real-time multimedia scenarios When packets are lost, they cannot be re-sent and the
There are four options for Key Distribution: endpoints need to resynchronize. However, the time saved in
• Pre-shared key the crypto-engine and the security provided may be well
• Public-key encryption worth this price of this approach [5].
• Diffie-Hellman key exchange protected by public-key
encryption 6.6 Resolving NAT/IPSec Incompatibilities
• Diffie-Hellman key exchange protected with pre-shared- The most likely widespread solution to the problem of NAT
key and keyed hash functions (using an MIKEY extension traversal is UDP encapsulation of IPSec. This
(DHHMAC)) implementation is supported by the IETF and effectively
allows all ESP traffic to traverse the NAT. In tunnel mode,
6.4 Better Scheduling Schemes this model wraps the encrypted IPSec packet in a UDP
Without a way for the crypto-engine to prioritize packets, packet with a new IP header and a new UDP header, usually
the engine will still be susceptible to DoS attacks and using port 500. This port was chosen because it is currently
starvation from data traffic impeding the time-urgent VOIP used by IKE peers to communicate so overloading the port
traffic. A few large packets can clog the queue long enough does not require any new holes to be punched in the
to make the VOIP packets over 150 ms late (sometimes firewall. The SPI field within the UDP-encapsulated packet
called head-of-line blocking), effectively destroying the call. is set to zero to differentiate it from an actual IKE
Ideally, the crypto-engine would implement QoS scheduling communication. This solution allows IPsec packets to
to favor the voice packets, but this is not a realistic scenario traverse standard NATs in both directions. The adoption of
due to speed and compactness constraints on the crypto- this standard method should allow VOIPsec traffic to
engine. One solution implemented in the latest routers is to traverse NATs cleanly, although some extra overhead is
schedule the packets with QoS in mind prior to the added in the encapsulation/decapsulation process. IKE
encryption phase. negotiation will also be required to allow for NAT traversal.
It is not surprising that for voice traffic the crypto-engine The problem still remains that IP-based authentication of
can be a serious bottleneck. Rather than the expected the packets cannot be assured across the NAT, (although
constraints on the crypto-engine throughput, the critical fully qualified domain names could be used) but the use of a
factor turned out to be the impossibility to control and shared secret (symmetric key) negotiated through IKE could
schedule access to the crypto-engine so as to favor real-time provide authentication. It is important to note that IP-based
traffic over regular one. This applies regardless of whether authentication is weak compared with methods using
the scheduler is implemented as a software module or a cryptographic protocols.
hardware component. Therefore, if voice traffic is There are several other solutions to the IPsec/NAT
interleaved with other types of traffic, e.g., ftp or http traffic, incompatibility problem, including Realm-Specific IP
during a secure session, it may happen that the latter RSIP), IPv6 Tunnel Broker, IP Next Layer (IPNL), and UDP
(usually characterized by big packets) is scheduled in the encapsulation. RSIP is designed as a replacement for NAT
crypto-engine before voice traffic. In this case voice traffic and provides a clear tunnel between hosts and the RSIP
might be delayed to the point that packets are discarded Gateway. RSIP supports both AH and ESP, but
most of the times. implementing RSIP would require a significant overhaul of
the current LAN architecture so while it is quite an elegant
6.5 Compression of Packet Size solution, it is currently infeasible. Perhaps as a result of
these problems, RSIP is not widely used. The IPv6 tunnel
Compression of Packet Size turn results in considerably less
broker method uses an IPv6 tunnel as an IPSec tunnel, and
jitter, latency, and better crypto-engine performance. There
encapsulates an IPv6 packet in an IPv4 packet. But this
is, of course, a price for these speedups. The compression
solution also requires LAN upgrades and doesn’t work in
scheme puts more strain on the CPU and memory
situations where multiple NATs are used. IPNL introduces a
capabilities of the endpoints in order to achieve the
new layer into the network protocols between IP and
compression, and, of course, both ends of a connection must
TCP/UDP to solve the problem, but IPNL is in competition
use the same compression algorithm.
with IPv6 and IPv6 is a much more widely used standard
Efficient solution for packet header compression is known
[4].
as cIPsec, for VoIPsec traffic. Simulation results from
different sources show that the proposed compression
scheme significantly reduces the overhead of packet headers,
50 (IJCNS) International Journal of Computer and Network Security,
Vol. 2, No. 5, May 2010

7. Security Threats in VoIP degrading the service. This causes calls to drop prematurely
and halts call processing.
In the early days of VoIP, there was no big concern about
Why would someone launch a DoS attack? Once the target
security issues related to its use. People were mostly
is denied of the service and ceases operating, the attacker
concerned with its cost, functionality and reliability. Now
can get remote control of the administrative facilities of the
that VoIP is gaining wide acceptance and becoming one of
system [10, 12].
the mainstream communication technologies, security has
become a major issue. 7.5 Spamming over Internet Telephony (SPIT)
The security threats cause even more concern when we think If you use email regularly, then you must know what
that VoIP is in fact replacing the oldest and most secure spamming is. Put simply, spamming is actually sending
communication system the world ever known – POTS (Plain emails to people against their will. These emails consist
Old Telephone System). Let us have a look at the threats mainly of online sales calls. Spamming in VoIP is not very
VoIP users face. common yet, but is starting to be, especially with the
emergence of VoIP as an industrial tool.
7.1 Identity and Service Theft Every VoIP account has an associated IP address. It is easy
Service theft can be exemplified by phreaking, which is a for spammers to send their messages (voicemails) to
type of hacking that steals service from a service provider, thousands of IP addresses. Voice mailing as a result will
or use service while passing the cost to another person. suffer. With spamming, voicemails will be clogged and
Encryption is not very common in SIP, which controls more space as well as better voicemail management tools
authentication over VoIP calls, so user credentials are will be required. Moreover, spam messages can carry viruses
vulnerable to theft. and spyware along with them. [10]

Eavesdropping is how most hackers steal credentials and 7.6 Call Tampering
other information. Through eavesdropping, a third party can Call tampering is an attack which involves tampering a
obtain names, password and phone numbers, allowing them phone call in progress. For example, the attacker can simply
to gain control over voicemail, calling plan, call forwarding spoil the quality of the call by injecting noise packets in the
and billing information. This subsequently leads to service communication stream. He can also withhold the delivery of
theft. packets so that the communication becomes spotty and the
Stealing credentials to make calls without paying is not the participants encounter long periods of silence during the
only reason behind identity theft. Many people do it to get call.
important information like business data.
A phreaker can change calling plans and packages and add
7.7 Man-in-the-Middle Attacks
more credit or make calls using the victim’s account. He can
of course as well access confidential elements like voice VoIP is particularly vulnerable to man-in-the-middle
mail, do personal things like change a call forwarding attacks, in which the attacker intercepts call-signaling SIP
number [10]. message traffic and masquerades as the calling party to the
called party, or vice versa. Once the attacker has gained this
position, he can hijack calls via a redirection server [10].
7.2 Vishing
Vishing is another word for VoIP Phishing, which involves
8. Conclusion
a party calling you faking a trustworthy organization (e.g.
This paper provides an overview of the VoIP and of the
your bank) and requesting confidential and often critical issues related to VoIP security. There are numerous
information. challenges to the secure implementation, deployment and
use of the VoIP on a same network which is used to carry
7.3 Viruses and Malware other network data. These challenges are related to the
VoIP utilization involving soft-phones and software are service availability, Quality of Service, Firewalls, Intrusion
vulnerable to worms, viruses and malware, just like any Detection, IPSec, VoIP threats, and privacy. The next step
Internet application. Since these soft-phone applications run for the ongoing research project is to identify specific areas
that will be addressed in the near term research efforts.
on user systems like PCs and PDAs, they are exposed and
vulnerable to malicious code attacks in voice applications.

7.4 DoS (Denial of Service)


References
[1] Rinzam A. VoIP: Voice Over Internet Protocol.
A DoS attack is an attack on a network or device denying it Department of Computer Science and Engineering, M
of a service or connectivity. It can be done by consuming its G College of Engineering.
bandwidth or overloading the network or the device’s [2] H. Sengar, D. Wijesekera. Center for Secure Information
internal resources. Systems, George Mason University, Fairfax, USA. H.
In VoIP, DoS attacks can be carried out by flooding a target Wang, S. Jajodia. Department of CS, College of
with unnecessary SIP call-signaling messages, thereby William and Mary, Williamsburg, USA. “IDS: VoIP
(IJCNS) International Journal of Computer and Network Security, 51
Vol. 2, No. 5, May 2010

Intrusion Detection through Interacting Protocol State Authors Profile


Machines”.
[3] M. Marjalaakso. Security Requirements and Constraints Ali Tahir is an MS Scholar at University of
of VoIP. Helsinki University of Technology, Engineering and Technology, Taxila, Pakistan.
Department of Electrical Engineering and He did his B.Sc Software Engineering from
Telecommunications. UET TAXILA in 2005. He has worked in
[4] D. Richard Kuhn, Thomas J. Walsh, and Steffen Fries, Nokia Siemens Network for two years as an
implementation engineer. Currently he is
“Security Considerations for Voice Over IP Systems:
working in cryptography and network security
Recommendations of the National Institute of Standards center at a Government organization. His areas
and Technology” Special Publication 800-58, Sections 8 of interest are digital image processing, computer vision, network
and 9, January 2005. security and wireless communication.
[5] R. Barbieri, D. Bruschi, E. Rosti. Voice over IPsec:
Analysis and Solutions. Department of Science,
University of Milano, Itlay. Asim Shahzad is pursuing his PhD from
[6] G. Egeland, “Introduction to IPsec in IPv6”. University of Engineering and Technology
http://www.eurescom.de/~publicwebdeliverables/P1100 Taxila, Pakistan. He has also completed his
series/P1113/D1/pdfs/pir1/41_IPsec_intro.pdf MS in Computer Engineering from UET
[7] Check Point Software Technologies Ltd. “Check Point TAXILA. He has done MS in
Solution for Secure VoIP”. 2003 - 2007. Telecommunication Engineering from
[8] NAT: Network Address Translation and Voice Over institute of communication technologies,
Islamabad, Pakistan. He is working as
Internet Protocol. http://www.voip-
Assistant Professor at UET TAXILA in Telecommunication
info.org/wiki/view/NAT+and+VOIP Engineering Department. His areas of interest are optical
[9] Jeremy Stretch. IPSEC: Internet Protocol Security. communication, Data and network Security.
http://www. packetlife.net
[10] M. Hurley. “VoIP VULNERABILITIES”. Centre for
Critical Infrastructure Protection (CCIP). Information
Note, Issue 06, Wellington, New Zealand, January
2007.
[11] Mark D. Collier. Firewall Requirements for Securing
VoIP, SecureLogix Corporation.www.securelogix.com
[12] Yu-Sung Wu, S. Bagchi. School of Electrical &
Computer Eng, Purdue University, S. Garg, N. Singh,
Tim Tsai, Avaya Labs. “A Stateful and Cross Protocol
Intrusion Detection Architecture for VoIP
Environments”.

You might also like