You are on page 1of 4

REAL-TIME END-TO-END SECURE VOICE COMMUNICATIONS OVER GSM

VOICE CHANNEL
N.N. Katugampala, K.T. Al-Naimi, S. Villette, and A.M. Kondoz
Centre for Communication Systems Research, University of Surrey
Guildford, GU2 7XH, United Kingdom
phone: + 44 1483 689843, fax: +44 1483 686011, email: a.kondoz@surrey.ac.uk
web: www.ee.surrey.ac.uk/CCSR/

ABSTRACT ity problems especially across the international networks
GSM is the most wide spread mobile communications sys- [2]. The GSM data channel typically requires 28-31 sec-
tem in the world. However the security of the GSM voice onds to establish a connection, of which approximately 18
traffic is not guaranteed especially over the core network. It seconds are taken up by the GSM modem handshaking
is highly desirable to have end-to-end secure communica- time. In addition, the GSM data channel uses Automatic
tions over the GSM voice channel. In order to achieve end- Repeat Request (ARQ) for error correction and has zero
to-end security, speech must be encrypted before it enters errors at the expense of increased delay. The average
the GSM network. A modulation scheme that enables the round-trip time of the GSM data channel is 0.5 seconds [3].
transmission of encrypted voice and data over the GSM This value depends upon the size of the packets transmitted.
voice channel was designed1. A real-time prototype is im- In practice this translates into a delay, which exceeds the
plemented demonstrating the end-to-end secure voice com- ITU-T specifications for one-way transmission times of
munications over the GSM voice channel. 150ms for telephony services [4]. Although the proposed
3GPP standards specify the provision of low-latency data
The modem technology presented facilitates the transmis- bearer channels, which could be used for end-to-end secure
sion of encrypted data and an encryption algorithm is not communications or telemetry operations, the deployment
specified. The users may choose an algorithm and a hard- dates of such systems are as yet uncertain, and it will be
ware platform as necessary. quite some time before 3G mobile systems will be ubiqui-
tously available.
1. INTRODUCTION
On the other hand, the use of encryption on the speech
The GSM system ensures subscriber identity confidential- channel is not straightforward. The GSM terminal has a
ity, subscriber authentication as well as confidentiality of speech compression/decompression process for efficient
user traffic and signalling. The ciphering algorithms used in use of the bandwidth and this is heavily based on the as-
GSM [1] have proved to be effective in ensuring traffic sumption that the input signal will be speech. It uses the
confidentiality. However the traffic confidentiality is only usual speech production model parameters such as pitch,
ensured across the radio access channel. Voice traffic is vocal tract model parameters etc. to efficiently compress
transmitted across the core circuit switched networks ‘in the input speech. If the speech signal is encrypted before it
clear’ in the form of PCM or ADPCM speech which opens comes to the encoding block, as it will be randomised by
up the possibility of unauthorised access to GSM-to-GSM the encryption process, it will not satisfy the expected
or GSM-to-PSTN conversations. Moreover, the encryption speech characteristics. Hence it will fail to go through the
on the GSM speech channel is optional and controlled by GSM speech transcoding process with sufficient accuracy.
the network operator, not the end user. Control by the end A method was presented where after the encryption process
user may be preferable in some applications. For guaran- the resultant bits are modulated back onto speech-like
teed end-to-end security the speech signal must be en- waveforms [5], which possess the required speech charac-
crypted before entering the communications system. teristics. This paper presents the progress made since the
publication [6], in terms of developing a real-time prototype
Although the GSM data channel can be used for encrypted with reduced complexity and the results from testing on
speech transmission, this approach suffers from a number public GSM network voice calls.
of disadvantages. The GSM data channel has interoperabil-
2. GSM MODEM
1 The standard modems used in PSTN are not suitable for the
This work was partially supported by Engineering and Physical Sciences
Research Council (EPSRC) UK Portfolio Partnership in Integrated Electron- compressed low bit rate speech channels. The main objective
ics grant (reference number: GR/S72320/01). of speech compression is to reduce the number of bits re-

standard GSM handset tions network Figure 2: Overview of the complete system quired to represent speech. An integrated PDA modulator. which converts it into a speech-like waveform to implementation.g. The with a much lower data rate (400 bps). munication systems. in order to accommodate in the available tion. This objective difference prevents most data 3. GSM voice buffer. Then the low bit rate speech channels [5]. a speech decoder at the demodulation of data begins. Bluetooth audio link. mobile com- the modem. will not add any distortion due to the interface. may have a fairly different waveform on a sample- by-sample basis. A side-effect of this approach is that using the same techniques. Synchronisation of the frame base station.g. a speech encoder boundaries is achieved by using a different modulated signal at the second base station and a downlink radio channel. This problem is compounded by the point of the communications network. e. 1. whilst perceptually being similar to the input speech. For simplicity only simplex communication is illustrated. Analogue connections add extra distortion and per- be encrypted. Speech-like Input Speech Speech-like waveform waveform 1010101 1100110 1010101 Data Speech Input speech Speech Data Data modulator compression Compressed Input encoder encryption modulator ‘speech’ data Add-on module to be connected to Transmitter standard GSM handset Speech PSTN to GSM to Speech Communications Network Data may be subjected to bit encoder GSM 64 kbps PCM PSTN decoder errors and packet/block loss waveform Base Station Subsystem Base Station Subsystem Speech-like Speech-like waveform Output Speech waveform 1010101 Compressed Data Speech ‘speech’ 1010101 1100110 demodulator decompression Output Data Data Speech Output speech data demodulator decryption decoder Receiver Add-on module to be connected to Figure 1: Modulation over the speech channel of a communica. and the speech codec may be implemented on a personal digital Therefore it was necessary to design a different modem for assistant (PDA) with a GSM connection. The output bit stream of the speech encoder can nection. The input speech signal is first compressed using a very low bit rate speech encoder It should be noted that Bluetooth provides a digital connec- [8]. which employ speech There are several methods to interface the service access compression systems. 1. handset compresses the modulated waveform. encryption/decryption. and the transmission path in a add on module and the interface provided with a low bit rate voice communication system. The secure voice system is implemented as a separate Figure 2 depicts a more detailed example for the GSM sys. virtually no errors. The GSM speech encoder in the buffers. [6]. i. duced by the modulator at the transmit end. GSM handset to fact that in many networks. using the hands free sockets of the GSM handsets. This modem can be modulated waveform could be directly copied onto the used to transmit any form of general digital data.e. cause the ginning. a core transmission network. bile terminal communications path. which directly accesses the GSM voice feed into the GSM handset. add on module and the interface provided with cables tem. how- . The resulting digital bit stream is transmitted over the communications The demodulator needs to be frame synchronised before any channel. and in particular. The secure voice system is implemented as a separate modulator. This synchronisation bit stream is received by the decoder of the receive terminal signal is derived from a known set of data stored at both the which converts it back to a speech-like waveform. The encrypted speech data is fed into the form worse than the digital connections. REAL-TIME PROTOTYPE modems from operating over channels. This signal passes through a GSM voice call with waveform generated by the decoder to differ from that pro. crypted speech. e.2 kbps. 3. The modulator and the demodulator and transmitted at the be- transcoding that takes place within the network. The modem. while the hands free cables provide an analogue con- bandwidth. the resulting synthesised speech. the demodulator. Figure 1 depicts the relationship between the 2. it sounds very similar to the original. the speech signal may undergo more than one set of compression/decompression stages. en. which includes a radio link. e.g. whilst still retaining an acceptable ever full duplex secure voice communication is possible speech quality level [7]. The demodula- tor is still able to extract the original transmitted data. The example shown is a typical mobile terminal to mo. a phe- nomenon known as tandeming.

It is quite clear from the complete full duplex secure voice system. PSTN-to-GSM. as depicted in Figure 3. and the Bit Error Rate (BER). which may also be used by the user applica. extra delay of 135 ms in addition to the normal GSM correlation values less than the set threshold indicate frames speech channel delay. Figure 4 depicts the real-time prototype involved. Therefore the the speech quality when tested with the worst-case modem drift is continuously measured and corrected at the demodu- lator. and 1. Tables 1 and 2 clearly demonstrate the trade-off implementation. which results in stretching or is used to provide communication quality speech on GSM- shrinking of the received signal. ployed in the speech decoder ensure that the frame erasures correlation value corresponds to a non-zero lag. e. In order to avoid the potential problems associated with ana- tions1.Received Signal Output data Buffer Demodulator Viterbi decoder 10100011 Lag Cross. This is the most challenging scenario for the proposed system due to the double tandem speech transcod- hands-free cables. Micro- better on GSM-to-PSTN. including Table 2 that digital interfacing or an integrated PDA imple- encryption and decryption is expected to run on a modern mentation would significantly improve the demodulation PDA e. 200 to 400 MHz and 64 MB of RAM. . delay of the GSM data channel. and 3 of the modem are available with different com.8 GHz Intel Celeron processor. the frame are concealed on typical GSM-to-GSM voice calls. time. As a result the proposed tion. Two laptop Personal Computers (PC) are used. The PC runs the demodulator Complexity is given in fixed point MIPS.18 GB of network voice calls on UK public networks.2 kbps speech codec [8]. The speech codec does not use at the transmitter. a PC via a hands-free cable. Once the complexity reduction GSM-to-GSM voice call. The error resilience techniques em- tor is estimated. The cross. 2. There is no noticeable degradation in the frame synchronisation due to this effect. however the modulated signal was techniques currently being investigated are implemented transferred to the handset as a data file. The clock frequencies of the Analogue to Digital Converters Figure 4: Real-time prototype implementation (ADC) and the Digital to Analogue Converters (DAC) used A state of the art proprietary 1. If the maximum normalised cross. This process transmits the modulated signal on a tion. 95ms for the algorithmic delay of the 1. Local Convolutional Hands free cable correlation modulator encoder Figure 3: Drift compensation An additional problem associated with the analogue connec. The modem in. namely Voda- RAM. propagation of errors. in addition to the improved security. and synchronisation. drift compensa. The output data bits are channel configuration that has a 1% Frame Error Rate (FER) at 1. and the speech decoder and plays the output speech in real- cludes channel coding. The system works cards and various standard Nokia handsets are used. see Tables 1 and 2. Table 1 shows the complexity and the memory require- ments of the components of the system. or PSTN-to-PSTN soft Visual C++ library functions are used to read and write connections. However the output bit stream may contain bit errors and in order to avoid The extra end-to-end delay introduced by the system stays unnecessary jitter due to those errors.g.2 encoded and modulated using exactly the same processes as kbps. The second handset was connected to requirements at the expense of increased bit error rates. Three configura. Then the cross-correlation between the Long Term Prediction (LTP) [10]. accuracy. Portable PC with multiple audio channels tions is drifting of the digital samples of the received signal. Viterbi decoding.g. [9] may be slightly different. each with a 2. Creative Sound Blaster Audigy 2 sound- ing involved in GSM-to-GSM voice calls. A real-time prototype of the system is implemented. [11] and there is no input to the demodulator and the output of the local modula. The demodulator will loose to-GSM voice calls.2 kbps plied only when the maximum normalised cross-correlation speech coder plus 40 ms for the modem give an overall value is greater than a suitable threshold. and playing the file while on a voice call to a demodulator and have reduced complexity and memory second GSM handset. as a bad frame indicator. The second and the third was emulated by copying a modulated waveform file onto a configurations use sub optimum search techniques in the GSM handset. due to one or no speech transcoding stages to the sound cards. The interface to the GSM handsets is provided using fone and O2. Table 2 shows the results obtained on GSM-to-GSM cross Microsoft Windows XP operating system. reasonable. between the complexity and memory requirements. boundaries are adjusted using that lag value. speech decoder. the correction is ap. logue interfacing at the transmitter side a digital interface plexity and memory requirements. which demonstrates full duplex secure voice communication on 4. RESULTS GSM-to-GSM voice calls. This is significantly less than the with bit errors. system provides a better quality of service than the existing systems.

GSM connections.9 1. London. 37. Villette. Gover. No.0 1. K. October tion quality speech using the secure channel on GSM-to- 2002.T.2 0. 4. codes a throughput of 1.6 5.” Revision 3.2 Digital/Analogue2 3.0 3. Sturt. M. onto speech-like waveforms. S. Kabal. 45.0 3. 467-478. Kondoz.2 kbps [10] S.4 1.9 Digital/Analogue3 3.2% [9] S. 37. 4-8. April communication over the GSM voice channel is achievable. Palaz.40 1. and A. and A. is used in the proto.0 Secure voice and data communications over the GSM voice channel has been enabled by modulating the encrypted data [8] M.03 0. New York. which does not noticeably degrade the speech prediction in multipulse coders.7 0. NATO STANAG candidate.8 Modem3 40 50 15 35 35 Digital/Analogue1 3.2 0. Kondoz. Wiley. Speech. “Real-time data transmission over GSM voice channel for secure voice & data applications. Stefanovic. Tsukuba. D. Al-Naimi. making it a more [11] R. “Secure communication mechanisms for GSM networks. Speech. and J.2 0.M. [7] A. pp. Villette. pp. and Signal Processing. Katugampala. proceedings EUSIPCO 2000. [2] M.12 0. pp. Kondoz. 1989.” IEEE Transactions on Consumer Electronics.9 1. February 2003.03% BER and 0. 1074-1079. “End to end data bearer performance characterisation for communications over wide area mobile networks. GSM connections.” IEEE Transactions on Acoustics. February 2003. Ramachandran and P. Frame Error Rate (FER) has been achieved. M.50 2. Japan. tions. London. Villette.” IEE Secure GSM and Beyond: End to End Security for Mobile Communications. [4] ITU-T Recommendation G. [6] N. [5] N. and H. Vol. London.2 SB-LPC based speech coder: the Turkish totype has been implemented which produces communica.P. 4. type. . With the addition of error correcting September 2000. REFERENCES [1] C. The latency of the secure channel is significantly lower 317-327. Kondoz.0 2.” The 2nd IEE Secure Mobile Communications Forum. No.78 2. May 2000.N. speech coding. Thorlby. S.114.” IEE Secure GSM and Beyond: End to End Security for Mobile Communications.” J. Katugampala.4 1. “Digital speech: coding for low bit rate communication systems.9 1. “One-way transmission time.4 Digital/Analogue3 3. “Secure voice over GSM and other low bit rate systems. “Pitch prediction filters in attractive alternative for real-time secure voice communica. Atal. A real-time pro. CONCLUSION Analogue/Analogue3 3. R. and A. It has been shown in this paper that end-to-end secure and Signal Processing. Finland. TABLE 1: Complexity and memory requirements TABLE 2: Results on GSM-to-GSM Voice Calls Component Complexity MIPS RAM ROM Executable Before channel After channel decoding Average Worst Kbytes Kbytes Kbytes Interface decoding Speech codec 42 50 8 12 50 Rate Rate BER % BER % FER % Modem1 250 260 15 700 35 kbps kbps Modem2 200 210 15 700 35 Digital/Analogue1 3.” been achieved with 2. “Amplitude optimisation and pitch speech codec. Singhal and B.” IEEE Transactions on quality on GSM-to-GSM connections. pp.9 1.” Proceedings of the IEEE Speech Coding Workshop 2002. Y.2 kbps with 0.7 0.35 1.0 6.4/1. Challans. Vol. Street. Tampere.17 0.0 2.2 0. S. than the latency of the GSM data channel. Acoustics. September 2004. Lo and Y. K.7 0.2 kb/s speech coder with noise pre-processor. November 1999. C.0 3. Cho. [3] P.0 3. 1994. March 1989. Vol.9% Bit Error Rate (BER) on GSM-to.4/1. Villette.” IEE Secure GSM and Beyond: End to End Security for Mobile Communications. “Interoperability and international operation: An introduction to end to end mobile security. London. Kondoz. Chen. Al-Naimi. February 2003. A. “A 2.0 Digital/Analogue2 3. A state of the art proprietary 1. A throughput of 3 kbps has “A 2.