You are on page 1of 12

VOIP: How it works

VoIP is basically voice communication carried out using the Internet Protocol (IP) for transport. Traditional
phone networks, ie Public Switched Telephone Networks (PSTN) used circuit-switching. In CircuitSwitching,
resources are reserved along the entire communication channel for the duration of the call, Internet Protocol
however uses packet-switching. In PacketSwitching, information is digitally transmitted into one or more packets.
Packets know their destination, and may arrive there via different paths. Internet Protocol is agnostic to the physical
medium as such it provides a way to run VoIP as an application on wired or wireless networks. Where a wired
network could be a public switched telephone network (PSTN), cable, digital subscriber line (DSL) or Ethernet and
the wireless network could be the wireless carrier’s network, such as code division multiple access (CDMA), time
division multiple access (TDMA) or GSM network, or private networks such as WiFi, BlueTooth or WiMAX.
The simplest VoIP implementation uses IP capable end-user equipment such as IP phones or a computer and
does not rely on a standard telephone switch; it is then connected to a LAN and voice calls can be made locally over
the LAN. The IP phones include codecs that digitize and encode and decode speech and also packetize and
depacketize the encoded speech into IP packets. Calls between different sites can be made over the wide area IP
network. Proxy servers perform IP phone registration and coordinate call signaling, especially between sites.
Connections to the PSTN can be made through VoIP gateways. Below is a diagram representation of an end to end
VoIP call.

VoIP Calls Components


VoIP Call components can be simplified into four major components that include the Signaling Gateway
Controller also known as the agent or the Media Gateway Controller, the Media Gateway, the Media Server and the
Application Server.
i. The Signaling Gateway Controller(SGC)
The SGC is at the heart of the VoIP platform, Its role is to connect the PSTN (public switched
telephone network) world with the IP network.It provides:
- Support of Signaling System 7 (SS7) protocol stack which is the PSTN world's main
Signaling protocol suite
- Support of voice call control protocols such as H.323 or SIP
- Full support of media control protocols such as MGCP or Megaco (H.248) which are used
for controlling Media Gateway session connections and parameters.
- Generate Call Detailed Records (CDRs) for billing purposes.
- Bandwidth management control through admission control mechanisms, in other words, new
sessions are admitted only if the system is able in terms of bandwidth to provide acceptable
service to them.
- Support of bandwidth policing mechanisms -- with the use of media flow profiles, the
Signaling Gateway Controller instructs the Media Gateway to monitor the RTP media flow
and apply rate limit policies to aggressive flows. This mechanism also preserves appropriate
Quality of Service levels
- Provisioning of media connections meaning allocating media connection characteristics such
as coding and packetization to Media Gateways as well as specific DS0 allocation for the
reservation of Media resources
ii. The Media Gateway
Its main role is transmission of voice packets using the RTP transmission protocol. Its functions
include:
- Support of MGCP or MEGACO for call control under the administration of the Media
Gateway Controller.
- Transmission of Voice data using RTP i.e. packetization of data is also applied when TDM
trunks are interfacing the Media Gateway.
- Support of T1/E1 trunks for transferring voice in SS7 networks.
- Support of different Compression algorithms for fulfilling the requirements of the call as
instructed by the SGC.
- Manage Digital Signal Processing (DSP) resources for ideal service offering.
iii. The Media Server
It is used where added features are needed such as voicemail or video conferencing. It’s functions
include:
- Transmission of call progress tones and special service announcements.
- Voicemail functionality.
- Voice activated dialing.
- Voicemail to email transmission - voicemail can be transmitted as attachment to an email
address.
- Support for Interactive Voice Response (IVR) that is call routing or even service activation
can be performed based on dialed Dual tone multi-frequency (DTMF) digits where a caller
according to voice menus selects the appropriate DTMF digit that triggers the required
service.
iv. The Application Server
The major role of an Application Server is to provide value-added services to the IP network, this is
where global and customer specific services are provisioned.It communicates with the Signaling
Gateway Controller through protocols such as H.323 or SIP. It’s functions include:
- Support of customized private dialing plans.
- Basic service offering - basic services such as call forward always, call forward on busy, call
waiting, call transfer, call park and voicemail are offered through the Application Server.
- Advance service offering - advanced features such as call authorization using PIN, remote
office can be offered by this component.
- Generation of Call Detailed Records (CDRs).
- FreePhone Service -

How VoIP protocols work to provide service


VoIP operates over IP using different protocols to perform specific functions. These functions can be
categorized into VoIP Signaling, Media Gateway Control and Media. Below is a diagram showing the protocols
used for each function.

1. VoIP Signaling
VoIP signaling is the process of setting up, tearing down and management of VoIP Calls. Session Control
Protocols are responsible for the establishment, preservation and tearing down of call sessions. They include
H.323 and SIP protocol suites.
a. H.323 Protocol Suite
H.323 is a communication protocol from the ITU-T. It is a call control protocol that allows for the
establishment, maintenance, and teardown of multimedia sessions across H.323 endpoints. H.323 is
a suite of specifications that controls the transmission of voice, video, and data over IP networks. It
employs the use of other protocols in its operation including:
i. H.225: H.225 handles call setup and teardown between H.323 endpoints and is also
responsible for peering with H.323 gatekeepers via the Registration Admission Status (RAS)
protocol.
ii. H.245: H.245 that acts as a peer protocol to H.225 and is used to negotiate the characteristics
of the media session, such as media format, the method of DTMF relay, the media type
(audio, video, fax, and so on), and the IP address/port pair for media.
iii. H.450: H.450 controls supplementary services between H.323 entities. These supplementary
services include call hold, call transfer, call park, and call pickup.
H.323 has the following devices:
- H.323 gateways: H.323 gateways are endpoints that are capable of interworking between a
packet network and a traditional Plain Old Telephone Service (POTS) network (analog or
digital). Since these H.323 endpoints can implement their own call routing logic, they are
considered to be “intelligent” and, as such, operate in a peer-to-peer mode. H.323 gateways
are capable of registering to a gatekeeper and interworking calls with a gatekeeper by using
the RAS protocol.
- H.323 gatekeepers: H.323 gatekeepers function as devices that provide lookup services.
They indicate via signaling to which endpoint or endpoints a particular called number
belongs. Gatekeepers also provide functionality such as Call Admission Control and security.
Endpoints register to the gatekeeper by using the RAS protocol.
- H.323 terminals: Any H.323 device that is capable of setting up a two-way, real-time media
session is an H.323 terminal. H.323 terminals include voice gateways, H.323 trunks, video
conferencing stations, and IP phones. H.323 terminals use H.225 for session setup, progress,
and teardown. They also use H.245 to define characteristics of the media session such as the
media format, the method of DTMF, and the media type.
- Multipoint control units: These H.323 devices handle multiparty conferences, and each
device is composed of a multipoint controller (MC) and multipoint processor (MP). The MC
is responsible for H.245 exchanges, and the MP is responsible for the switching and
manipulation of media.
H.323 Call
The illustration above shows a basic H.323 slow start call between two H.323 terminals. The
calling terminal first initiates a TCP connection to the called terminal, using destination port 1720.
Once this connection is established, H.225 messages are exchanged between the two terminals to set
up the call. In order to negotiate parameters that define call characteristics such as the media types
(for example, audio, video, fax), media formats, and DTMF types, an H.245 exchange has to ensue
between the terminals.
A separate TCP connection is established between the endpoints to negotiate an H.245
exchange, however, in some cases, as an optimization, H.245 messages are tunneled using the same
TCP socket as H.225, using a procedure known as H.245 tunneling. When utilizing a separate TCP
connection for H.245, the called terminal advertises the TCP port number over which it intends to
establish an H.245 exchange. The ports used for the establishment of H.245 are ephemeral and are
not dictated by the H.323 specification.
The H.245 exchange results in the establishment of the media channels required to transmit
and receive real-time information.
Another version of H.323 calls known as Fast Connect which is a quicker and more efficient
mechanism to establish an H.323 call and the call setup can be done in as few as two messages.The
fast connect setup is illustrated below.

When transmitting a Call Setup message, the endpoint populates a field, known as the
fastStart element, with H.245 messages. The called endpoint can accept FastConnect by selecting
any fastStart element in the Call Setup message, populate the necessary data fields (as specified in
H.323), and return a fastStart element in any H.225 message that is Call Proceeding, Alerting,
Connect to the caller.
The called endpoint can also reject FastConnect and fall back to the traditional slow start call
process by either explicitly indicating so using a flag, initiating any H.245 communications, or
providing an H.245 address for the purposes of initiating H.245 communications.
b. Session Initiation Protocol(SIP)
SIP is a signaling protocol used to create, manage and terminate sessions in an IP based
network. A session could be a simple two-way telephone call or it could be a collaborative multi-
media conference session.SIP is an application layer protocol that incorporates many elements of
Hypertext Transfer Protocol (HTTP) and Simple Mail Transfer Protocol (SMTP). It is a standard
RFC 3261 put forward by the Internet Engineering Task Force (IETF) and was originally designed
by Mark Handley, Henning Schulzrinne, Eve Schooler, and Jonathan Rosenberg in 1996. SIP
provides additional features to call setup that include:
- SIP allows for the establishment of user location (i.e. translating from a user's name to their
current network address).
- SIP provides for feature negotiation so that all of the participants in a session can agree on
the features to be supported among them.
- SIP is a mechanism for call management - for example adding, dropping, or transferring
participants.
- SIP allows for changing features of a session while it is in progress.
SIP Components
1. User Agent Client (UAC): Caller application that initiates and sends SIP requests.
2. User Agent Server (UAS): Receives and responds to SIP requests: accepts, redirects, or refuses
calls.
3. Proxy Server: When a request is generated, the exact address of the recipient is not known in
advance. So the client sends the request to a proxy server. The server on behalf of the client as if
giving a proxy for it forwards the request to another proxy server or the recipient itself. It contains
UAC and UAS.
4. Redirect Server: A redirect server redirects the request back to the client indicating that the client
needs to try a different route to get to the recipient. It generally happens when a recipient has moved
from its original position either temporarily or permanently.
5. Registrar: Servers need to detect the location of an user in a network. Users have to register their
locations to a Registrar server. Users from time to time refreshes their locations by registering and
sending a special type of message to a Registrar server.
6. Location Server: The addresses registered to a Registrar are stored in a Location Server.
SIP Commands
1. INVITE: Invites a user to a call
2. ACK: Acknowledgement is used to facilitate reliable message exchange for INVITEs.
3. BYE:Terminates a connection between users
4. CANCEL :Terminates a request, or search, for a user. It is used if a client sends an INVITE
and then changes its decision to call the recipient.
5. OPTIONS: Solicits information about a server's capabilities.
6. REGISTER :Registers a user's current location
7. INFO: Used for mid-session signaling
SIP Operation
SIP works on the request/response framework and
mirrors a model similar to HTTP, where there is a client/server exchange. A node that generates the
request is called a user agent client (UAC), and a node that processes the request and sends out at least
one response is called a user agent server (UAS). An SIP transaction consists of a single request and
all responses to that request, which may include zero or more provisional responses (1XX) and one or
more final responses (2XX, 3XX, 4XX, 5XX, 6XX). A SIP dialog is a peer-to-peer relationship
between user agents that exists for some time.
How SIP Calls Work
Step 1. Phone A initiates a communication session with Phone B by sending a SIP INVITE message
to the communication manager that functions as both the registrar server for the phones and their
UAS for all outbound requests.

Step 2. SIP INVITE includes several header field values in the SIP message. The SIP INVITE can
be sent in one of two ways:
- With an Session Description Protocol(SDP) body here the call is classified as an “early
offer”
- Without an Session Description Protocol(SDP) body here the call is classified as a “delayed
offer” call.
SDP is used to encode the characteristics of a media session, such as the type of media stream
supported for example, audio, video , the IP addresses and port numbers for the media streams, and
the set of supported codecs for different media stream types.

Sending an early offer INVITE allows the UAC to enforce characteristics of the session up front by
including its supported media stream types, the relevant media formats per media stream, and any
SDP-based extensions. With delayed offer INVITE messages, the UAC has to tailor its session
characteristics in accordance with the SDP body advertised by the UAS.

Step 3. On receiving the SIP INVITE message, Communications Manager(Unified CM) sends a 100
Trying response to Phone A. The 100 Trying response serves to inform Phone A that the INVITE
has been received, and processing is underway.
After sending the 100 Trying response, Unified CM (Communications Manager) examines the
Request URI in the received INVITE message and does a database lookup. The database lookup is
done to determine the location information (IP address and port) of Phone B. The location
information for Phone B is stored in the Communications Manager because it also functions as a
registrar server.

Step 4. On obtaining the location information of Phone B, Unified CM operates as a SIP back-to-
back user agent (B2BUA), and a SIP INVITE is sent from Unified CM(Communications Manager)
to Phone B. Phone B sends a 100 Trying response to Unified CM.

Step 5. After the request is completely processed at Phone B and it begins ringing, a 180 Ringing
message is sent to Unified CM(Communications Manager).
The 180 Ringing message is then relayed from Unified CM (Communications Manager)to Phone A.
At this stage, an audible ringback tone must be generated at Phone A. The ringback tone might be
generated locally on the phone or might be generated by Unified CM(Communications Manager).
Alternatively, if Phone B wants to stream a custom ringback tone or pre-connect announcement, it
sends a 183 Session Progress message with an SDP body.
This scenario, defined as “early media,” allows Phone A to listen to media packets encapsulating
custom ringback tones or pre-connect announcements even before Phone B goes off-hook.

Step 6. Once Phone B is taken off-hook, a 200 OK response is sent to Unified CM(Communications
Manager), indicating that the call has been answered. Included in the 200 OK is an SDP body
indicating the chosen media stream(s) and media codecs. The 200 OK response is then sent to Phone
A. At this stage, the phones can begin to exchange media packets with one another. The 200 OK
response must be followed by a SIP ACK sent end-to-end to indicate that the 200 OK response was
reliably received.

At this stage, the SIP dialog is considered complete. You should note that the Communications
Manager is only responsible for setting up the communication session but for most scenarios does
not place itself in the path of the media packets.

Step 7. The SIP call terminates when one of the phones transmits a SIP BYE message.

2. Media Gateway Control


Media Gateways are network elements that provide conversion between the audio signals carried on
telephone circuits and data packets carried over the Internet or over other packet networks. They act as a
gateway to the PSTN network over the IP Multimedia Subsystem Networks. They perform translation
between different technologies, protocols, and networks. Networks such as Public System Telephone
Network (PSTN), Next Generation Networks (NGN), and Wireless Networks, can all be interconnected
together using a VoIP Media Gateway. VoIP Media Gateways function is analog-to-digital conversion of
voice and creation of voice IP packets.They integrate data, audio, video, and fax transmission through a
single platform.They are usually placed at the edge of two or more different Transport Mediums like ATM,
IP, TDM, even two wire PSTN and its main duty is to convert media (voice/video) between these domains.
There are two protocols that are used here are Media Gateway Control Protocols(MGCP) or MEGACO also
known as H.248.
a. MGCP
MGCP is designed as an internal protocol within a distributed system that appears to the
outside as a single VoIP gateway. This system is composed of a Call Agent (or Media
Gateway Controller), at least one Media Gateway (MG) that performs the conversion of
media signals between circuits and packets, and at least one Signaling gateway (SG) when
connected to the PSTN.
MGCP assumes a connection model where the basic constructs are endpoints and
connections, that can either be point to point or multipoint. Endpoints are sources or sinks of
data and could be physical or virtual. Creation of physical endpoints requires hardware
installation while creation of virtual endpoints can be done by software. Examples of
physical endpoints include:
- An interface on a gateway that terminates a trunk connected to a PSTN switch aka a
trunk gateway.
- An interface on a gateway that terminates an analog POTS connection to a phone, key
system, PBX, etc. A gateway that terminates residential POTS lines (to phones) is
called a residential gateway
An example of a virtual endpoint is an audio source in an audio-content server.Connections
can be established over several types of bearer network e.g :
- Transmission of audio packets using RTP and UDP over a TCP/IP network as is the
case for VoIP.
- Transmission of audio packets using an ATM network.
- Transmission of packets over an internal connection, for example the TDM backplane
or the interconnection bus of a gateway. This is used, in particular, for "hairpin"
connections, connections that terminate in a gateway but are immediately rerouted
over the telephone network.
The MGCP system is composed of a Call Agent that implements the "signaling"layers of the
H.323 standard or SIP. MGCP implements the media gateway control interface as a set of
transactions. The transactions are composed of a command and a mandatory response. There
are eight types of command:
1. CreateConnection(CRCX)
2. ModifyConnection(MDCX)
3. DeleteConnection(DLCX)
4. NotificationRequest(RQNT)
5. Notify(NTFY)
6. AuditEndpoint(AUEP)
7. AuditConnection(AUCX)
8. RestartInProgress(RSIP)

b. MEGACO/H.248
Megaco is a protocol defined by both IETF - which calls the protocol Megaco and ITU- which
calls the protocol H.248 for control of elements in a physically decomposed multimedia
gateway,enabling separation of call control from media conversion. A Media Gateway Controller
(MGC) controls one or more Media Gateways (MG). Similar to MGCP, Megaco is a master/slave
protocol that controls gateway functions at the edge of the packet network.
Its function is to allow gateway decomposition into a call agent (call control) part (known as
Media Gateway Controller, MGC) - master, and a gateway interface part (known as Media
Gateway, MG) - slave.
Megaco and H.323 and SIP are complementary protocols to each other with most soft switch
vendors using MGCP or Megaco to control gateways, but use SIP at the application layer.
The Megaco protocol gives the call agent more flexibility of transport type and control over the
media gateway, as well as some hooks for applications such as video conferencing.
In security and quality of service, Megaco is more flexible than MGCP. While MGCP supports
only IPSEC, Megaco also supports an authentication header. Both protocols support
authentication of the source address. While MGCP only supports UDP for signaling messages,
Megaco supports UDP, TCP, ATM, and SCTP and also has better stream management and
resource allocation mechanisms.

Either Megaco and MGCP may be used for a master-slave VoIP architecture. Both protocols provide the
following functions:
i. Create, modify and delete connections using any combination of transit networks, including frame
relay, ATM( Asynchronous Transfer Mode), TDM(Time Division Multiplexing), Ethernet or analog.
Connections can be established for transmission of audio packets over several types of bearer
networks such as :
- IP networks using RTP and/or UDP
- an internal connection, such as the TDM backplane or the interconnection bus of a gateway.
This is used for connections that terminate in a gateway but are immediately rerouted over
the telephone network (“hairpin”connections).
ii. Detect or generate events on end points or connections. For example, a gateway may detect dialed
digits or generate a ringback tone on a connection.
iii. Collect digits according to a digit map received from the call agent, and send a complete set of
dialed digits to the call agent.
iv. Allow mid-call changes, such as call hold, playing announcements, and conferencing.
v. Report call statistics
3. Media(Transport)
VoIP, uses a combination of RTP(Real Time Protocol) and UDP over IP. UDP provides an
unreliable connectionless delivery service using IP to transport messages between end points on the Internet.
RTP, used in conjunction with UDP, provides end-to-end network transport functions for applications
transmitting real-time data, such as audio and video,over unicast and multicast network services. RTP does
not reserve resources and does not guarantee quality of service.
A companion protocol RTCP does allow monitoring of a link, but most VoIP applications offer a
continuous stream of RTP/UDP/IP packet without regard to packet loss or delay in reaching the receiver.

REFERENCES

MGCP | FreeSWITCH Documentation, https://developer.signalwire.com/freeswitch/FreeSWITCH-

Explained/Signalling/MGCP_13173349/. Accessed 6 March 2023.

Antoniou, Stelios. “VoIP Signaling Protocols.” Pluralsight, 2 June 2010,

https://www.pluralsight.com/blog/it-ops/voip-signaling-protocols. Accessed 6 March 2023.

Davis, Kyzer, et al. “Overview of H.323 > VoIP Protocols: SIP and H.323.” Cisco Press, 20 November

2019, https://www.ciscopress.com/articles/article.asp?p=3100060&seqNum=4. Accessed 6 March 2023.

“how MGCP basically works | Bsoft Bangalore.” Bsoft Bangalore, 31 July 2011,

https://bsoftbangalore.wordpress.com/2011/07/31/how-mgcp-basically-works/. Accessed 6 March 2023.

Vaishnav, Chintan. “Voice over Internet Protocol (VoIP): The Dynamics of Technology and Regulation.”

Massachusetts Institute of Technology,

https://web.mit.edu/chintanv/www/Publications/MIT_TPP_Thesis_Chintan_Vaishnav_Final.pdf. Accessed

6 March 2023.

You might also like