Professional Documents
Culture Documents
Shijia Mei, Zhihong Liu, Yong Zeng, Lin Yang, Jian Feng Ma
School of Cyber Engineering
Xidian University
Xi’an, China
e-mail: sjmei@stu.xidian.edu.cn
Abstract—Context-based zero-interaction has become the of smart IoT devices. Updating and revocation of keys are
trend for smart IoT device pairing. In this paper, we propose a difficult and the interconnection between devices from
secure and usable mechanism to authenticate devices co- different vendors is inconvenient.
located in smart home scenario, and build a secure Currently many context-based zero-interaction secure
communication channel between legitimate devices by utilizing device pairing schemes have been proposed [7], [8], [9].
on-board microphones to capture a common audio context. These solutions usually rely on the sensors on the smart IoT
After receiving randomly generated sound signals, smart IoT devices to perceive surrounding physical environment. Based
device uses the time intervals between salient sound signals to on the principle that the coexisting devices could sense the
derive audio fingerprint which can be matched among co-
similar context, the devices to be paired utilize sensors to
present devices and then be used to bootstrap trust of the
devices. The protocol is based on the idea that devices co-
collect information in the physical environment, and
located within a physical security boundary (e.g., single family transform the information into a common key.
house) can hear similar sounds, and the devices outside would Among many entropy sources, acceleration and sound
miss parts of sound signals due to the attenuation when sounds signals are of great concern because their entropy is
pass through the wall. To accelerate the generation rate of relatively large [7] [10]. Acceleration is usually used as the
audio fingerprint, an extra sound source is introduced. We entropy source for pairing of wearable devices or cars [8].
implement our protocol on Android devices, and the However, the acceleration entropy source is only suitable for
experiment results show that the protocol can distinguish the devices which can move.
malicious devices outside from the legitimate devices located The sounds allow any device to participate in the pairing
inside a security boundary and can quickly establish a strong process as long as it is equipped with microphone sensors.
secret-key between legitimate devices. Schurmann et al. [11] proposed a pairing scheme which uses
ambient noise as the entropy source. The use of ambient
Keywords-IoT security; context-based pairing; zero- noise is based on the fact that the sound signals produced by
interaction; smart home random events in the real world is also random, and an
algorithm in the music recognition system is leveraged to
I. INTRODUCTION extract ambient noise fingerprint. Before processing the
With the remarkable growth of the Internet of Things sound signals, the algorithm transforms the sound signals
(IoT) industry, security problems in IoT are becoming from time domain to the frequency domain, which requires
increasingly prominent. Most smart IoT devices utilize Wi-Fi, high computational cost. In practice, the sound signals can be
Bluetooth, ZigBee and other wireless communication processed in the time domain using the time interval between
technologies to share data and coordinate tasks [1]. The data random events as the source of entropy. However, the rate of
transmitted between smart IoT devices may involve user’s key generation using time intervals between random events
health information, property privacy and location privacy. in real-world is relatively low [2].
However, the open nature of wireless channels makes device Aiming at this problem, we design a new audio-based
communication vulnerable to various security attacks such as secure device pairing solution, called Listen!, that is
Man-in-the-Middle (MITM) attacks, eavesdropping and applicable to the pairing scenarios, e.g., smart home scenario,
message tampering [2] [3]. It’s critical to establish secure where smart IoT devices co-located within a security
communication channels between smart IoT devices for boundary. As demonstrated in Fig. 1, devices inside a
security concerns. security boundary could perceive the same audio context
Some early secure device pairing schemes need human over their coexistence time. However, an adversary outside
intervention, such as entering a password [4]. However, most the security boundary can’t accurately observe or predict the
emerging smart IoT devices do not provide user interfaces sound signals within the boundary.
like keyboards and displays, such as implantable human The drawback of the pairing based solely on ambient
chips in smart medical care [5], pacemakers [6], etc. Besides, sound is the long pairing time if the ambient sound is sparse.
a large number of tiny smart IoT devices make manual To shorten the pairing time, an extra sound source is
settings infeasible. What’s more, another kind of solutions introduced into Listen!. The sound source can be a home
based on key predistribution are also unsuitable for a myriad laptop or a mobile phone with speakers to generate
392
The program that produces the extra sound signals can be information with other devices, she initiates the pairing
run on a smart IoT device equipped with speakers, such as process by broadcasting a request message ActivePair. After
personal computer, smart phone or smart intelligent speaker. receiving the pairing request, Bob replies with a PairIsOk
User can place the extra sound source in the vicinity of message. If Alice also wish to pair with Bob, she responds
legitimate devices to be paired. Since the extra sound source with a PairStart message, then they start to sample the
is controllable, it’s feasible to generate the intermittent sound ambient sound signals. After sampling, both devices step into
signals with random time intervals in a short time to shorten Key Generation phase. First, they generate the audio
the pairing procedure. fingerprint independently, then leverage reconciliation to
The signals broadcasted by the extra sound source can be correct the fingerprint mismatch and obtain the same audio
divided into two categories: audible sound and ultrasound. fingerprint. Finally, two devices step into Key Confirmation
The frequency of audible sound is within the range of 20Hz- phase to derive the secret key from audio fingerprint and to
20KHz. Ultrasound refers to acoustic waves that lie within a verify the derived secret key is actually matched, Bob uses
frequency range above audible sound(20KHz-20MHz). the secret key to challenge Alice. If Alice can solve the
At present, general devices with microphones can only challenge correctly, they consider the pairing successful.
broadcast audible sound. To produce ultrasound, more
specialized equipment is required. Given that extra sound
source can only emit the audible sound signals, to simplify
the generation of specific sound, we choose a sine wave
which lasts for tS milliseconds with frequency f within 20Hz-
20KHz and amplitude fixed at Am dB. The last parameter is
random time interval tR, which can be calculated as follows:
393
To diminish the background noise influence and Interval shifting occurs when certain salient sound
improve signal fidelity, a low-pass filter is applied on the signal is detected at one device but is lost at another device.
collected sound data to reduce noise. The result after noise Interval shifting will cause all subsequent time intervals
reduction is shown in Fig. 4(b). mismatch. We use edit distance to quantify the differences
Based on the salient sound signals have higher energy under interval shifting situation. Edit distance between two
than the ambient noise, the short time energy is leveraged to strings is the minimum number of edit operations such as
detect the salient sound signals. First, we divide the sound insertion, deletion or replacement of a single character. Fig.
signals into N frames while each frame contains M samples. 5 shows when interval shifting happens, audio fingerprints
To reduce spectrum leakage, each frame is “multiplied” by are similar in edit distance, but very different in hamming
Hanning window function. Then the short time energy of distance. Fuzzy commitment doesn’t work now, we utilize
each frame is computed as follows: string reconciliation to reconcile two audio fingerprints
M 1
under this situation.
En ¦x
m 0
2
n (m) , where 1 d n d N (2) C. Key Confirmation
After reconciliation, Alice and Bob have got a common
En is short time energy of the n-th frame of collected audio fingerprint. Let the audio fingerprints after
data. xn(m) is the value of m-th sample in n-th frame. reconciliation are F'A and F'B respectively. Both of them use
Next, we normalize the short time energy of each frame a key derivation function KDF(·) to get the secret keys
to the range of [0,1]. Finally, thresholding is performed as KAlice=KDF(F'A) and KBob = KDF(F'B).
shown in Fig. 4(c) to detect the starting points of salient Bob generates a random nonce nB, and computes the
sound signals. We concatenate the binary representation of Message Authentication Code MACKBob (nB ) with key KBob,
time intervals between two adjacent starting points as the
audio fingerprint FAudio = {t1 || t2 || ……|| tn}. then sends the message hash(F'B) || nB || MACKBob (nB ) to
B. Information Reconciliation Alice.
Alice computes the MACK Alice (nB ) , and compares the
Because Alice and Bob get the audio fingerprint
independently, there may be some time interval mismatches MACK Alice (nB ) with MACKBob (nB ) .
between their fingerprints FA and FB due to ambient noise.
If MACK Alice (nB ) = MACKBob (nB ) , Alice generates a
There are two types of mismatches, quantitative difference
and interval shifting. random nonce nA, and sends to Bob the message nA ||
Quantitative difference occurs when Alice and Bob MACK Alice (nB ||nA ) .
quantify the time duration between two adjacent salient
Bob then computes MACKBob (nB ||nA ) and compares it
sound signals, they may get two different time durations.
We can use hamming distance hamming(·,·) to represent with MACK Alice (nB ||nA ) , if they are equal, then pairing is
quantitative differences. If hamming(FA, FB) is less than a successful. The secret key can be used by symmetric key
predefined threshold H, fuzzy commitment [15] can be used algorithms such as AES to ensure secure communication
to reconcile audio fingerprints. However, if the number of between Alice and Bob.
mismatches exceeds the correction capability of the
algorithm, interval shifting situation may be taken into VI. EVALUATION AND DISCUSSION
considerations. To analyze the feasibility of our approach, we performed
)$XGLRRI$OLFH D__D__D__D__DĂĂ several experiments in different contexts investigating in
D D D D D real contextual settings. We implemented Listen! as an
'ROIK Android OS application. The system is implemented in Java,
and the Android version is 8.0.0. Message Authentication
D DLLVWLPHLQWHUYDOFRPSXWHGE\$OLFH
Code algorithm is implemented by HMAC-MD5.
)$XGLRRI%RE E__E__E__EĂĂ
A. Experiment Set-up
E E DD E E
We install the prototype on XiaoMi smartphones with
(UH
MEMS microphones whose frequency response ranges from
100 Hz to 20 kHz. The sampling rate of devices used in
E ELLVWLPHLQWHUYDOFRPSXWHGE\%RE
experiments is 8000 Hz. We conducted experiments in an
D = E D E office whose outer walls are acted as the security boundary
D z E D E
for the legitimate devices inside. The collection software on
D z E D E
D z E D E
the legitimate devices Alice and Bob continuously measures
D z E
D E
the sound signal levels in the device’s context and attempts
to establish a secret key between Alice and Bob. In these
F $XGLRILQJHUSULQWVDUHVLPLODULQHGLWGLVWDQFH experiments an extra sound source is placed inside the
ZKLOHYHU\GLIIHUHQWLQKDPPLQJGLVWDQFH boundary. We also assumed that there is an adversary Eve
Figure 5. Interval shifting example located outside the security boundary, and he can launch
394
eavesdropping and sound signal injection attacks, as parameter of interest is the threshold Thr to detect salient
depicted in Fig. 6. What’s more, the Man-in-the-Middle sound signals which have comparatively large energy. In
(MITM) attack against our protocol rarely occurs, as Alice this experiment, we tested the method in different set-ups
and Bob only exchange a tiny part information of audio with different threshold Thr.
fingerprint during the reconciliation stage. Therefore, the The experiments are conducted in an office where the
audio fingerprint will not be compromised by MITM. two legitimate devices and an extra sound source are placed
Set-up 1. Experiment with ambient noise. Alice and on a table located in the middle of the office. The adversary
Bob just collect the ambient sound signals produced by Eve is deployed outside the office. We let someone increase
random events in the surroundings. ambient noise by performing the following actions:
Set-up 2. Experiment with extra sound source. In this knocking on the door, walking across the room, clapping
setting, as shown in Fig. 6, an extra sound source is placed hands, talking and colliding two objects.
inside the security boundary, broadcasting sound signals at Fig. 7 depicts the sound signals detected by the
random time intervals. The time unit of interval for extra legitimate devices and the adversary. We can find that the
sound source is tU, which can be set as a value in the range co-located devices Alice and Bob clearly show the similarity
[10,100] milliseconds. in their respective fingerprints, but the similarity with Eve is
relatively low.
Wall The experimental results show that the Thr = 0.3, Thr =
0.4 and Thr = 0.5 would detect the salient sound signals
Device 1 Device 2 more accurately. We can improve our protocol in
Adversary
Extra Sound information reconciliation stage. Pairing devices can try
Source
three times to reconcile the audio fingerprint using different
threshold 0.3, 0.4, 0.5 respectively. Finally, the pairing
success rate between legitimate devices is 85 percent which
is extremely high than pairing success rate with adversary
which is only 10 percent.
395
Figure 8. Effect of distances to pairing success rate.
396
[13] L. Li, X. Zhao, and G. Xue, “A proximity authentication system for [15] A. Alsaggaf and H. Acharya, “A fuzzy commitment scheme,”
smartphones,” IEEE Transactions on Dependable & Secure Computer Science, no. 3, pp. 367–377, 2008.
Computing, vol. 13, no. 6, pp. 605–616, 2016. [16] S. Agarwal, V. Chauhan, and A. Trachtenberg, “Bandwidth efficient
[14] S. Jarecki and X. Liu, “Fast secure computation of set intersection,” string reconciliation using puzzles,” IEEE Transactions on Parallel
in International Conference on Security & Cryptography for and Distributed Systems, vol. 17, no. 11, pp. 1217–1225, Nov 2006
Networks, 2010.
397