You are on page 1of 154

INVESTIGATING MOBILE AND PERIPHERAL SIDE CHANNELS

FOR ATTACK AND DEFENSE

by

Patrick Timothy Cronin

A dissertation submitted to the Faculty of the University of Delaware in partial


fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical and
Computer Engineering

Summer 2021

© 2021 Patrick Timothy Cronin


All Rights Reserved
INVESTIGATING MOBILE AND PERIPHERAL SIDE CHANNELS

FOR ATTACK AND DEFENSE

by

Patrick Timothy Cronin

Approved:
Jamie D. Phillips, Ph.D.
Chair of the Department of Electrical and Computer Engineering

Approved:
Levi T. Thompson, Ph.D.
Dean of the College of Engineering

Approved:
Louis F. Rossi, Ph.D.
Vice Provost for Graduate and Professional Education and
Dean of the Graduate College
I certify that I have read this dissertation and that in my opinion it meets the
academic and professional standard required by the University as a dissertation
for the degree of Doctor of Philosophy.

Signed:
Chase Cotton, Ph.D.
Professor in charge of dissertation

I certify that I have read this dissertation and that in my opinion it meets the
academic and professional standard required by the University as a dissertation
for the degree of Doctor of Philosophy.

Signed:
Haining Wang, Ph.D.
Member of dissertation committee

I certify that I have read this dissertation and that in my opinion it meets the
academic and professional standard required by the University as a dissertation
for the degree of Doctor of Philosophy.

Signed:
Xing Gao, Ph.D.
Member of dissertation committee

I certify that I have read this dissertation and that in my opinion it meets the
academic and professional standard required by the University as a dissertation
for the degree of Doctor of Philosophy.

Signed:
Fouad Kiamilev, Ph.D.
Member of dissertation committee
ACKNOWLEDGEMENTS

I would like to thank my advisor Dr. Chase Cotton for his invaluable insight
into the graduate school process, his continual encouragement, and always believing in
me. Without his help, this dissertation, and the opportunities that I will enjoy because
of it, would never be possible.
Next I would like to thank Dr. Haining Wang and Dr. Xing Gao for their
immense help in teaching me to become a better researcher. Their long hours of
guidance and mentorship have taught me invaluable lessons about problem formulation
and presentation.
I would also like to thank Dr. Fouad Kiamilev, my very first computer engineering
professor, and the first UD ECE professor that I met on a discovery day nine years ago
for helping to put me on the path to becoming a computer engineer.
Next, I would like to thank my parents, my brother, and my wife for their
continuous support throughout my academic career. Without their encouragement I
never would have made it.
Finally, I would like thank God for granting the countless blessings and opportu-
nities that have enabled me to achieve so much.

iv
TABLE OF CONTENTS

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

Chapter

1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Charger Surfing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2


1.2 Time-Print . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 ARM SoC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 CHARGER-SURFING: EXPLOITING A POWER LINE


SIDE-CHANNEL FOR SMARTPHONE INFORMATION
LEAKAGE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1 Threat Model and Background . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 8


2.1.2 USB Charging . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.3 LCD/OLED Touchscreen Technology . . . . . . . . . . . . . . 10
2.1.4 Animations on the Touchscreen . . . . . . . . . . . . . . . . . 11

2.2 Power Line Leakage Exploration . . . . . . . . . . . . . . . . . . . . . 12

2.2.1 Button Press Detection . . . . . . . . . . . . . . . . . . . . . . 13


2.2.2 Button Press Location Identification . . . . . . . . . . . . . . 13
2.2.3 Impact of Battery Charging . . . . . . . . . . . . . . . . . . . 15

2.3 Sensitive Information Inference . . . . . . . . . . . . . . . . . . . . . 16

2.3.1 Raw Signal Acquisition . . . . . . . . . . . . . . . . . . . . . . 16

v
2.3.2 Button Sequence Detection . . . . . . . . . . . . . . . . . . . 17
2.3.3 Individual Button Isolation . . . . . . . . . . . . . . . . . . . 18
2.3.4 Phone Detection . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.5 Signal Preprocessing . . . . . . . . . . . . . . . . . . . . . . . 20
2.3.6 Animation Inference . . . . . . . . . . . . . . . . . . . . . . . 20

2.4 Case Study: Passcode Inference . . . . . . . . . . . . . . . . . . . . . 21

2.4.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . 22


2.4.2 Classifier Configuration and Training . . . . . . . . . . . . . . 23
2.4.3 Phone Identification . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.4 Single Button Inference . . . . . . . . . . . . . . . . . . . . . . 25
2.4.5 Misclassification Analysis . . . . . . . . . . . . . . . . . . . . . 27
2.4.6 Passcode Inference . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4.7 Impact of Sampling Frequency . . . . . . . . . . . . . . . . . . 31
2.4.8 Detection Granularity Analysis . . . . . . . . . . . . . . . . . 33

2.5 Attack Practicality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.5.1 A Portable Data Collection System . . . . . . . . . . . . . . . 34


2.5.2 Testing of Varied Device Settings . . . . . . . . . . . . . . . . 35
2.5.3 Cross Device Testing . . . . . . . . . . . . . . . . . . . . . . . 36

2.6 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3 TIME-PRINT: AUTHENTICATING USB FLASH DRIVES


WITH NOVEL TIMING FINGERPRINTS . . . . . . . . . . . . . . 44

3.1 Threat Model and Background . . . . . . . . . . . . . . . . . . . . . . 47

3.1.1 Threat Model and Attacker Capabilities . . . . . . . . . . . . 47


3.1.2 USB 2.0 Versus 3.0 . . . . . . . . . . . . . . . . . . . . . . . . 49
3.1.3 USB Mass Storage Devices and Flash Storage Controllers . . . 50
3.1.4 USB Security . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.2 Timing Side-Channel Exploration . . . . . . . . . . . . . . . . . . . . 52

3.2.1 Motivation of Time-Print . . . . . . . . . . . . . . . . . . . . . 53


3.2.2 Creation of a Reliable Fingerprint . . . . . . . . . . . . . . . . 54

vi
3.2.3 Preliminary Classification . . . . . . . . . . . . . . . . . . . . 55

3.3 Time-Print Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.3.1 Performing Precise Timing Measurements . . . . . . . . . . . 56


3.3.2 Exercising the USB Flash Drive . . . . . . . . . . . . . . . . . 59
3.3.3 Preprocessing Timing Values . . . . . . . . . . . . . . . . . . . 59
3.3.4 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.4 Evaluation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.4.1 Experimental Devices . . . . . . . . . . . . . . . . . . . . . . . 61


3.4.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.4.3 Fingerprint Script . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.4.4 Training and Testing Datasets . . . . . . . . . . . . . . . . . . 64

3.5 Time-Print Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.5.1 Scenario ¶: Brand Identification . . . . . . . . . . . . . . . . 64


3.5.2 Scenario ·: Same Brand Device Identification . . . . . . . . . 66
3.5.3 Scenario ¸: Auditing / Classification . . . . . . . . . . . . . . 68

3.6 Practicality of Time-Print . . . . . . . . . . . . . . . . . . . . . . . . 69

3.6.1 System Latency . . . . . . . . . . . . . . . . . . . . . . . . . . 70


3.6.2 Fingerprints with Hardware Variation . . . . . . . . . . . . . . 72
3.6.3 Fingerprint Robustness with Device Usage . . . . . . . . . . . 73
3.6.4 Spoofing A Fingerprint . . . . . . . . . . . . . . . . . . . . . . 75
3.6.5 Other Considerations . . . . . . . . . . . . . . . . . . . . . . . 76
3.6.6 Fingerprint the Flash Controller . . . . . . . . . . . . . . . . . 76
3.6.7 Real-World Deployment of Time-Print . . . . . . . . . . . . . 77

3.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

3.7.1 Device Fingerprinting . . . . . . . . . . . . . . . . . . . . . . . 77


3.7.2 Flash Based Fingerprints . . . . . . . . . . . . . . . . . . . . . 79
3.7.3 USB Attacks and Defenses . . . . . . . . . . . . . . . . . . . . 79

3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4 AN EXPLORATION OF ARM SYSTEM LEVEL CACHE AND

vii
GPU SIDE CHANNELS . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.2.1 Caching and Side-Channel Attacks . . . . . . . . . . . . . . . 84


4.2.2 Consumer ARM System Design . . . . . . . . . . . . . . . . . 85
4.2.3 Website Fingerprinting and Timer Restrictions . . . . . . . . . 88

4.3 Threat Model and Challenges in ARM . . . . . . . . . . . . . . . . . 89

4.3.1 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 89


4.3.2 Cache Occupancy Challenges in ARM . . . . . . . . . . . . . 90

4.4 Optimizing ARM Cache Occupancy . . . . . . . . . . . . . . . . . . . 93

4.4.1 Test Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93


4.4.2 Cache Access Pattern . . . . . . . . . . . . . . . . . . . . . . . 94
4.4.3 Foreground vs. Background Activity . . . . . . . . . . . . . . 94
4.4.4 Browser Memory Management . . . . . . . . . . . . . . . . . . 96

4.5 Attacks on ARM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

4.5.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.5.2 Optimizing Cache Occupancy Attack . . . . . . . . . . . . . . 100
4.5.3 Novel GPU Channel . . . . . . . . . . . . . . . . . . . . . . . 102

4.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

4.6.1 Web-Based Attacker Results . . . . . . . . . . . . . . . . . . . 105


4.6.2 App-Based Attacker Results . . . . . . . . . . . . . . . . . . . 107
4.6.3 Comparison to Prior Work . . . . . . . . . . . . . . . . . . . . 108
4.6.4 GPU Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.6.5 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . 110

4.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110


4.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

5 CONCLUSION AND FUTURE WORK . . . . . . . . . . . . . . . . 114

5.1 Summary and Contributions . . . . . . . . . . . . . . . . . . . . . . . 114


5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

viii
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

Appendix

A ADDITIONAL FIGURES AND TABLES FOR


CHARGER-SURFING: EXPLOITING A POWER LINE
SIDE-CHANNEL FOR SMARTPHONE INFORMATION
LEAKAGE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
B ADDITIONAL FIGURES AND TABLES FOR TIME-PRINT:
AUTHENTICATING USB FLASH DRIVES WITH NOVEL
TIMING FINGERPRINTS . . . . . . . . . . . . . . . . . . . . . . . . 137

B.1 Additional Figures and Tables . . . . . . . . . . . . . . . . . . . . . . 137

C ADDITIONAL FIGURES AND TABLES FOR AN


EXPLORATION OF ARM SYSTEM LEVEL CACHE AND GPU
SIDE CHANNELS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

ix
LIST OF TABLES

2.1 Single Button Accuracy . . . . . . . . . . . . . . . . . . . . . . . . 25

2.2 Cumulative Accuracy of 3 Classification Attempts for Single User


Trained Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.3 Impact of sampling frequency on row, column, and overall classification


accuracy, based on 3-user data of Motorola G4. . . . . . . . . . . . 32

2.4 Single Button and Passcode Inference Accuracy (5 training users / 15


testing users). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.5 Single Button Inference Accuracy (5 training users / 1 testing user)


with Varied Configurations. . . . . . . . . . . . . . . . . . . . . . . 36

2.6 Cross-device training and testing configurations. . . . . . . . . . . . 37

2.7 iPhone 6+ cross device testing classification results. 2 training users on


an iPhone 6+ and 10 testing users on a different iPhone 6+. . . . . 37

2.8 iPhone 8+ cross device testing classification results. 2 training users on


an iPhone 8+ and 10 testing users on a different iPhone 8+. High
initial accuracy meant that subsequent attempts realized minimal
improvement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.1 USB mass storage devices utilized in the evaluation of Time-Print. 58

3.2 Percentage of samples accepted when trained for each device model. 66

3.3 Average True Accept Rate (TAR) and True Reject Rate (TRR) for
same model device identification. . . . . . . . . . . . . . . . . . . . 68

3.4 Classification accuracy for each drive type in Scenario ¸. . . . . . . 69

3.5 System configurations for cross host investigation. . . . . . . . . . . 73

x
4.1 Devices and High Power (HP) and Low Power (LP) core configurations
utilized in this work. . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4.2 Accuracy for web-based cache occupancy website fingerprint on


multiple ARM devices . . . . . . . . . . . . . . . . . . . . . . . . . 105

4.3 Accuracy for native application cache occupancy website fingerprint on


multiple ARM devices . . . . . . . . . . . . . . . . . . . . . . . . . 105

4.4 Accuracy for GPU based website fingerprinting on ARM devices . . 105

A.1 Smartphones Used For Evaluation . . . . . . . . . . . . . . . . . . . 135

A.2 Classification Network Used for iPhone . . . . . . . . . . . . . . . . 136

A.3 Classification Network Used for Android . . . . . . . . . . . . . . . 136

B.1 Neural network architecture used for classification. . . . . . . . . . 137

C.1 1D Convolutional Neural Network Configuration . . . . . . . . . . . 138

xi
LIST OF FIGURES

2.1 USB charging in public or shareable environments. . . . . . . . . . 11

2.2 Power leakage on the USB power line when charging a Motorola G4.
Sampling rate is 125 KHz. The signal is filtered with a moving mean
filter to increase clarity. . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3 Averaged voltage readings for (a) Motorola G4 with LCD screen and
(b) Samsung Galaxy Nexus with AMOLED screen, when displaying
flickering white bars on the top, middle, and bottom rows, as well as
left, middle, and right columns, of a black screen. . . . . . . . . . . 12

2.4 Comparison of voltage readings when pressing buttons on the lock


screen of a Motorola G4 in two cases: fully charged vs charging.
Sampling rate is 125 KHz. (a) depicts the raw unfiltered signal. (b)
utilizes a high pass filter with a cutoff frequency of 60 Hz to remove
the offset. (c) presents the Fourier transform of the filtered signal,
demonstrating that the charging status of the phone does not affect
the signal integrity. . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5 Overview of Charger-Surfing’s working flow. . . . . . . . . . . . . . 16

2.6 The top displays the raw signal of multiple overlapping button presses.
The bottom demonstrates how peak detection can be utilized to
determine non-overlapping portions of individual button presses. The
signal is collected from Motorola G4 and filtered for clarity. . . . . 19

2.7 Passcode lock screen layout and animation. . . . . . . . . . . . . . 23

2.8 Android’s animation on touching different parts of a button. . . . 25

2.9 Breakdown of actual and predicted button classifications for the


Galaxy Nexus when trained with one user’s data. An entry on row i
and column j corresponds to button i being classified as j. . . . . . 27

2.10 Accuracy of 4-digit passcode inference. . . . . . . . . . . . . . . . . 28

xii
2.11 Accuracy of 6-digit passcode inference. . . . . . . . . . . . . . . . . 28

2.12 Impact of different sampling rates on single button accuracy, based on


3-user data of Motorola G4. . . . . . . . . . . . . . . . . . . . . . . 31

2.13 Android and iOS keyboards. Each keyboard has a similar layout, with
4 rows of buttons. Each keyboard contains a maximum of 10 buttons
per row (top row). . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.14 The portable, low-cost data collection setup. A WiFi enabled


microcontroller can send acquired data to a custom webserver in
real-time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.1 Three security scenarios of USB fingerprinting for device


authentication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.2 Histograms of read timings for 16 different USB mass storage drives.
Each plot contains 20 different samples. . . . . . . . . . . . . . . . 52

3.3 The design of Time-Print. . . . . . . . . . . . . . . . . . . . . . . . 55

3.4 A USB SCSI command sequence. . . . . . . . . . . . . . . . . . . . 57

3.5 Flow of generating 1D features from the raw fingerprint samples of a


drive as used for different model identification (top) and 2D features as
used for individual device classification (bottom). . . . . . . . . . . 62

3.6 Classification accuracy degradation as the number of samples is


reduced (10 SanDisk Ultra USB 3.0 drives). . . . . . . . . . . . . . 71

3.7 Classification accuracy degradation as the number of samples is


reduced (10 SanDisk Blade USB 2.0 drives). . . . . . . . . . . . . . 71

4.1 Overview of ARM’s DynamIQ architecture featuring heterogeneous


processor cores organized into high (big) and low (LITTLE)
performance clusters. The CPU clusters and accelerators (GPU, ISP,
and DSP) are all connected to a shared system level cache. . . . . . 86

4.2 Google Pixel 3 Cache Average Memory Access Time . . . . . . . . 95

4.3 iPhone SE 2 Cache Average Memory Access Time . . . . . . . . . . 96

4.4 M1 MacBook Air Cache Average Memory Access Time in Chrome . 97

xiii
4.5 M1 MacBook Air Cache Average Memory Access Time in Safari . . 97

4.6 M1 MacBook Air Cache Average Memory Access Time in Firefox . 98

xiv
ABSTRACT

Advances in computing technology and convenience have dramatically altered the


way people conduct business. The reliance on face to face meetings, phone calls, postal
service mail, large filing rooms, and hand computing have largely fallen out of favor,
being replaced by constantly connected mobile smartphones, increasing productivity,
convenience, and revenue. When all of this data becomes too ungainly or secret to move
across the internet modern computers have eschewed an ungainly number of competing
standards (parallel, serial, etc.) and settled on USB mass storage devices.
While these mobile devices have enabled incredible convenience, they have done
little to thwart insidious data stealing attacks. This dissertation focuses on a specific
type of cyber attack, side channels. Side channel attacks identify components of a system
that are shared by multiple processes and then monitor an observable characteristic
(timing, temperature, power, acoustics) of that component to identify information about
the system. In this dissertation we examine the utilization of side channels for both
attack and defense in mobile and peripheral platforms.
First, we examine the security of the content that appears on mobile phone
screens, identifying that many user input actions result in large onscreen animations
which cause noticable disturbances in the power trace of the device. We discover that
these disturbances can be accurately classified with a machine learning system, enabling
an attacker to learn exactly what buttons a user presses and perform an in-depth study
on stealing user lock screen passcodes.
Next, we investigate how users move data, examining the security issues brought
about by USB flash drives. We examine the low-level timing characteristics of USB
flash drives and determine that each individual device contains a uniquely identifiable
timing pattern that can be exploited to construct a defense that identifies whether a

xv
drive is authorized for use on the target device. We examine the robustness of this
defensive method in multiple scenarios.
Finally, we notice a trend in computing towards more mobile devices and more
accessible architectures, specifically noting a recent move of some laptop designers from
x86 to ARM. We identify that the driving force behind this shift is the rapid increase in
ARM performance, power usage, and heat dissipation, partly brought about by major
modifications to the core and cache architectures. We examine whether these major
changes now enable attacks that were mainly feasible on x86 devices. We specifically
examine an attack which fingerprints the websites that users visit and use it’s success
to construct a novel GPU based channel within the ARM architecture for website
fingerprinting.

xvi
Chapter 1

INTRODUCTION

The landscape of computing has dramatically shifted over the last 20 years.
Users are no longer restricted to desk bound terminals and bulky file transfer media,
now computing moves with them. Smartphones and the ubiquity of wireless internet
access have greatly increased the flow and access to information, allowing new internet
based businesses to flourish and enabling numerous quality of life enhancements such
as online bill payment, portfolio management, and turn by turn navigation. However,
these benefits do not come without a cost. The centralization of personal and financial
data into a single mobile device has made smartphones an attractive target for cyber
criminals.
With the desire of stealing credentials, spying on a user for potential blackmail
purposes, or simply forcing the user to view advertisements, cyber criminals have
attacked mobile phones in many of the same ways that they attacked desktop and laptop
platforms. The malware targets some form of operating system vulnerability, exploits it,
and then gains privileged access to the device. Unlike their desktop counterparts, mobile
operating systems usually exhibit much more control over what APIs programmers
have access to, vastly limiting the attack surface. To this end, attackers have begun
to utilize side channel attacks, that is, attacks that observe components or shared
hardware resources that differ based on user behavior.
This dissertation investigates exploiting side channels in both offensive and
defensive settings with regards to mobile devices and peripherals. Our first study
examines how mobile smartphones leak information about their screen content over
the charging cable, and I demonstrate how this power side channel can be utilized
to infer user passcodes. I then examine a defensive setting, where I propose a novel

1
timing side channel to defend computer systems from rogue USB flash drives. Finally,
I examine the newer cache architectures that ARM has developed for mobile phones
and laptop/desktops and investigating how older cache based channels from x86 can be
utilized on these newer architectures. Finally, I utilize these findings to create a new
side channel via shared access to the GPU. In Sections 1.1 to 1.3 I briefly introduce
each topic, and the organization for this dissertation is provided in Section 1.4

1.1 Charger Surfing


Mobile phones have enabled incredible convenience, allowing users to easily
access the internet from almost anywhere. Mobile operating system developers have
also worked to add increased convenience by providing an application store and APIs
for developers to design applications that allow direct interface with user accounts on
various websites in a far more convenient way than mobile websites. This has enabled
users to install applications on their phones that allow access to all aspects of their
lives such as, health tracking, managing and paying utility bills, portfolio management,
banking, and many more with further functionality planned including turning the
mobile device into the only device necessary to unlock and turn on new cars! Part of
the added convenience of many applications is that they remember user logins, not
requiring users to re-enter their passwords after they are initially setup.
To protect all of this sensitive information, mobile operating system providers
have created numerous security mechanisms to prevent unauthorized access to a device.
These take multiple forms (fingerprint, faceID, etc.) but usually ask for a simple 4-6
digit pin code if the user is unable to utilize other options. I investigate how users
interact with their devices, identifying that when users are entering input to the screen,
multiple fixed animations occur to alert the user that the input has been accepted. As
the screen on a mobile device is one of the largest consumers of power, I investigate
how these animations can create classifiable differences in the power trace of the device.
I expose that a moderately equipped attacker can steal user pin codes with very high
accuracy through this power side channel.

2
1.2 Time-Print
The increase in reliance on computing technology has lead to an increased
necessity for security of large databases and confidential information. Where old
confidential records or critical equipment like power generators may have been kept in
physically guarded facilities where the only security concerns were fire and unauthorized
entry, new secure facilities must contend with many more security threats. To keep as
small of an attack surface as possible, many of these high security areas are physically
isolated from the internet, preventing the ingress of malware from the internet that
could wreak havoc on the facility. To facilitate the transfer of information from system
to system in these areas, many facilities utilize USB flash drives.
While initially an excellent lightweight and easily portable storage medium, USB
has developed to become a direct security threat. Attackers can store malicious code
that runs as soon as the device is plugged in or simply utilize unauthorized drives to
copy files and take them from the facility. This chapter investigates USB flash drives,
uncovering new methods to reliably identifying authorized devices while maintaining
usability.

1.3 ARM SoC


Over the years ARM has developed a number of optimizations and design
methods for their systems on a chip (SoCs) which have enabled them to be competitive
with x86 laptop and desktop CPUs. As x86 has been the dominant computing platform
a large amount of effort has gone into developing advanced side channel attacks
to steal information and invade user privacy. With the rising popularity of ARM
processors this chapter examines whether chip designers are learning from the mistakes
of previous designs or whether the ARM architecture will make the same mistakes as
x86. Specifically, I examine the new shared cache on ARM SoCs and re-implement
and upgrade a cache contention channel for website fingerprinting. This chapter also
examines the accelerators added to SoCs and their tight integration with the CPU
cache, uncovering and evaluating a new GPU contention side channel.

3
1.4 Organization
The remainder of this dissertation is organized as follows. In Chapter 2, I
investigate power leakage of mobile phone screens, demonstrating a economical attack
that can accurately steal user passcodes. Chapter 3 examines the security of USB flash
drives, creating a timing side channel to accurately identify individual USB flash drives
and protect systems from unauthorized devices. Chapter 4 analyzes new design patterns
that are enabling ARM processors to perform competitively against x86 processors in
the laptop/desktop market and observes how previously proposed cache timing channels
from x86 can be modified to perform well on ARM. I also design and investigate a GPU
channel within the SoC. Finally, I summarize the dissertation and discuss future work
in Chapter 5.

4
Chapter 2

CHARGER-SURFING: EXPLOITING A POWER LINE


SIDE-CHANNEL FOR SMARTPHONE INFORMATION LEAKAGE

Touchscreen devices such as smartphones and tablets have become a daily tool
for a variety of business and entertainment activities, including mailing, banking,
browsing, gaming, and photography. While these devices have ushered in an era of
great convenience, their rich functionality has lead to ever-increasing usage, draining
batteries faster, and necessitating that users seek out areas to charge their smartphones.
One study suggests that city dwellers charge their phones multiple times per day [6].
To allow users to conveniently charge their devices, facilities such as USB power lines
and charging stations have been widely deployed in public areas, including airports [10],
parks [2, 11], hotels [3], and hospitals [1]. The market for shareable power banks is also
thriving [7], allowing users to simply scan a QR code to rent a public power bank and
charge their devices.
Despite their convenience, USB charging interfaces and stations also introduce
a number of security threats, as the USB interface in a public area is not under the
user’s control [8]. A typical USB interface is composed of one or more (depending on
the protocol) differential data lines for data transmission and a 5V and ground line for
delivering power. Previously it has been demonstrated that the data transmitted over
the data line can be sniffed [121] or monitored through the crosstalk leakage on the
power line [157]. Adversaries can also extract power consumption information from the
power line to infer coarse-grained information, such as internet browsing history [42]
or password length [182]. These disclosed security threats, however, do not stop users
from heavily utilizing USB charging facilities in public areas, since charging usually
involves no data transfer over the USB data line.

5
In this work, I reveal that USB charging in public areas can pose far more serious
threats than previously believed. I show, for the first time, that the signals on the power
line form a side channel and leak far more fine-grained information than previously
believed. Specifically, the power consumption information is highly correlated with
the activities on the touchscreen. Leveraging this side channel, built on the dynamic
power signals, adversaries can precisely identify the location of virtual button presses
on the touchscreen, with which they can steal extremely sensitive data such as a user’s
passcode. I call this security threat Charger-Surfing. I conduct a series of experiments
to demonstrate the existence of fine-grained information leakage tied to smartphone
touchscreen activity. For the construction of the Charger-Surfing channel, I develop
a wireless, low cost, and portable power trace capture system using commercial-off-
the-shelf (COTS) hardware. To further demonstrate that Charger-Surfing is a real
threat, I perform a case study on a numeric passcode unlock screen and show that
Charger-Surfing is able to extract a passcode on both Android and iOS devices by
leveraging signal processing and neural network techniques. I thoroughly assess this
security threat on different types of smartphones, multiple phones of the same model,
and across different users. Our results show that Charger-Surfing can achieve an average
accuracy of 98.7% for single button inference on all the tested smartphones. For an
unknown user1 , Charger-Surfing has, on average, a 95.1% or 92.8% chance to accurately
crack a 4-digit or 6-digit passcode on its first attempt, respectively, and a 99.3% (4-digit)
or 96.9% (6-digit) success rate within five trials.
In a nutshell, this is the first work that demonstrates fine-grained information
leakage over the power line of the USB charging cable regarding the content of the
touchscreen. More importantly, our studies show that the effectiveness of Charger-
Surfing is victim-independent, meaning that adversaries can train the neural network
using touchscreen data on their own smartphones with different configurations without

1
To show the effectiveness of Charger-Surfing, the model of a target device is trained
with the data created by an adversary and tested with victim users whose data were
not used to train the model.

6
any prior knowledge of a victim. The major contributions of this work include:

• A comprehensive study on the dynamic power usage of the touchscreen to demon-


strate the location, causes, and granularity of information leakage over the USB
power line. To the best of our knowledge, this is the first work to explore the
classification of dynamic screen animations and induced information leakage.

• A new security threat, Charger-Surfing, which exploits a side channel through


the USB power line to infer user interactions with the content on the touchscreen.
The techniques used by Charger-Surfing for signal processing and model learning
are given.

• A portable microcontroller-based power trace capture system using COTS hard-


ware, which demonstrates the feasibility of exploiting the disclosed leakage channel
at a low cost.

• A thorough evaluation on multiple smartphones, showing high accuracy in inferring


a victim’s private information, such as their passcode, without any prior knowledge
of the victim, and that this leakage vulnerability is not tied to a specific smartphone
or mobile OS.

The rest of this chapter is organized as follows. Section 2.1 presents our threat
model and a brief primer on USB charging, touchscreen technology, and touchscreen
animations. The existence of fine-grained information leakage over the USB power line is
demonstrated in Section 2.2. The security threat posed by Charger-Surfing is detailed in
Section 2.3, followed by an in-depth case study in Section 2.4. Section 2.5 discusses the
attack practicality of Charger-Surfing. Section 2.6 describes countermeasures against
Charger-Surfing. Section 2.7 surveys related work, and finally, Section 2.8 concludes
the chapter.

2.1 Threat Model and Background


This section first presents the threat model, and then discusses the various
components of a smart device involved in the new side channel, including (1) USB

7
charging, (2) touchscreen technology, focusing on the dynamic power consumed when
displaying different colors, and (3) the dynamic content of the touchscreen that could
be potentially leaked.

2.1.1 Threat Model


The objective of this work is to highlight the vulnerabilities of the power line
side-channel in smartphones which, if exploited, can lead to serious information leakage.
Consider a realistic scenario in public places, where users charge their smartphones
with a USB charger that is not owned/controlled by themselves. The USB charger
could be a charging station in a public area, such as airports (Figure 2.1a), or simply
an interface where users bring their own USB cables (Figure 2.1b). It could also be
a shareable power bank rented from a third-party (Figure 2.1c), or the USB outlets
provided in a hotel (Figure 2.1d). The USB charger provides the standard functionality
(i.e., charging) and looks ordinary.
However, since these chargers are controlled by third-parties, the power con-
sumption of the connected device could be monitored by a device hidden inside the
packaging or behind the charging interface. The voltage monitor would not cause any
adverse impact to the charging speed, and would thus be quite stealthy. With a low
power microcontroller concealed inside the packaging, power traces can be recorded, or
streamed wirelessly, for analysis.
Finally, assume that adversaries have no prior knowledge of a specific victim,
and have no need or have never had the chance to collect the power trace of the victim’s
smartphone. However, I assume that adversaries can easily profile the power dynamics
of most popular smartphone models beforehand, enabling them to attack a wide range
of smartphone users.
Security Threats Posed by Leakage. I observe that the dynamic power
trace of a smartphone is highly correlated to the animation played on the touchscreen.
Unfortunately, leaking the animations played on the touchscreen could cause severe
security threats. The owner of such a specialized “surfing” charger can steal a victim’s

8
private data entered through the touchscreen, such as passcode, credit card number,
and banking information. To expose such threats, I demonstrate Charger-Surfing’s
capability in inferring a numeric passcode.
While there are a myriad of potential biometric lock mechanisms available
(fingerprints, faceID, etc.), many of these can be deceived [9, 5] and require a backup
PIN (personal identification number) code if they are unavailable (gloves, sweat, etc.).
Other authentication mechanisms such as Android’s pattern-based lock are not available
on all phones and have been shown to be less secure than a PIN code [16]. Thus, I
focus on the passcode-based lock as it is the most widely used primary or secondary
authentication mechanism to unlock touchscreen devices, and it acts as one of the only
barriers to gain complete control of a smartphone.
A passcode is extremely valuable to a dedicated adversary. When a victim
can be easily identified (e.g., using a USB port at a hotel room), knowledge of the
passcode would be sufficient for an adversary with physical access (e.g., evil maid
attack [4]) to the victim’s smartphone to steal private information or even reset other
online passwords (e.g., Apple ID and iCloud passwords). Even for an adversary without
physical access (e.g., a shareable power bank), a compromised passcode could still lead
to severe consequences, as users tend to reuse their passcodes (recent studies show
that each passcode is reused around 5 times [60]) and a smartphone’s passcode may be
reused as the PIN code of a credit/debit card or online payment system (e.g., Apple
Pay or Alipay). Overall, there are many possible real scenarios, where this type of
information would be very useful to law enforcement or an adversary for espionage,
fraud, identity theft, etc.

2.1.2 USB Charging


USB has become a standard interface for charging portable devices such as
smartphones, while enabling serial communications at the same time. Standard USB
plugs contain four pins and a shield: one pin delivers +5VDC [149], one pin connects
to the shield forming the ground, while the other two pins are used for differential

9
data transmission and carry negligible current when charging the battery. Newer USB
protocols include more differential data pairs, but leave the +5VDC and ground pins the
same. When charging a device, its battery enters the charging state, and the device’s
power is supplied not from the battery but from the power source connected by the
USB power line.

2.1.3 LCD/OLED Touchscreen Technology


The two major touchscreen technologies are Liquid Crystal Display (LCD) and
Organic Light Emitting Diodes (OLED). Both technologies have many improvements
or extensions, such as Active-Matrix Organic Light-Emitting Diode (AMOLED), Super
AMOLED, and In-Plane Switching (IPS) LCD. The power consumption profile of these
touchscreen technologies is reviewed below [37].
LCD has three major components, a backlight that is always on, vertically
polarized filters, and liquid crystals. The liquid crystals are charged to different voltages
to display different colors. Specifically, to display a black pixel, the crystals are charged
with the highest voltage. This voltage aligns the crystals horizontally, allowing only
horizontally polarized light through. As the filter layer is vertically polarized, no light
can shine through and a black pixel is produced. To display a white pixel, the crystal
layer voltage is relaxed, aligning it vertically, allowing light to pass through the filter.
OLED displays utilize organic molecules to produce holes and electrons to create light
in an emissive layer. Individual OLEDs are used to produce each pixel. To display a
black pixel, the OLED must enter a low power state, while displaying a white pixel
requires the OLED to enter a high power state.
As LCDs and OLEDs use dissimilar mechanisms to produce an image on a screen,
they generate vastly different power traces to produce the same image. Specifically, to
create an animation of a white dot, most pixels will be black. The black LCD pixels
will be in a high power state, and the pixels that make up the white dot in a low power
state. OLEDs, on the other hand, will have their black pixels in a low power state, and
their white pixels in a high power state. Thus, if it were possible to observe the voltages

10
(a) USB charging station (b) USB charging interface

(c) Shareable Power Bank (d) USB charging in hotel


Figure 2.1: USB charging in public or shareable environments.

applied to the individual pixels, the two screen technologies should have inverse values
when they are utilized to display an identical image.

2.1.4 Animations on the Touchscreen


Smartphones with touchscreen technology always provide graphical interfaces
(e.g., the lock screen, the telephone dial pad, and the text entry keyboard in applications)
for users to input data, and also use real-time animations to inform the users that
their inputs have been registered. Most of these animations occur on a static screen
(i.e., no other animation is playing) and always at the same location on the screen
(i.e., the digit/letter does not move around). As reviewed before, displaying lighter or
darker pixels consumes different amounts of power in LCD and OLED technologies.
Furthermore, LCD and OLED screens refresh from left to right, row by row, leading to
the potential that the dynamic power consumption, which can be measured through the

11
0.12 0.21
0.2
Screen Off
Screen On
Enter Lock Screen
0.15 Tap Pin Code 0.19

Voltage (V)

Voltage (V)
Voltage (V) 0.11
0.1
0.17
Left Top Left Top
0.05 Mid-Col Mid-Row Mid-Col Mid-Row
Right Bottom Right Bottom
1st 2nd 0.1 0.15
Button Button 0 1 2 0 1 2
0 Time (ms) Time (ms)
1 2 3 4
Time (S)
(a) (b)
Figure 2.2: Power leakage Figure 2.3: Averaged voltage readings for (a) Mo-
on the USB power line when torola G4 with LCD screen and (b) Samsung Galaxy
charging a Motorola G4. Sam- Nexus with AMOLED screen, when displaying flicker-
pling rate is 125 KHz. The sig- ing white bars on the top, middle, and bottom rows,
nal is filtered with a moving as well as left, middle, and right columns, of a black
mean filter to increase clarity. screen.

USB charging cable, may leak the location on the touchscreen where a virtual button is
pressed.

2.2 Power Line Leakage Exploration


Smartphones are sophisticated computing platforms with a complex multi-core
System-on-a-Chip (SoC) handling various device drivers for touchscreens, cameras,
sensors, etc. Previous research has shown that the display (i.e., touchscreen) and
CPU/GPU are among the top contributors to the overall power consumption in a
smartphone [36]. While previous work has shown that the power consumption of a
smartphone leaks information regarding the activities on the touchscreen [182, 180],
such information leakage is of coarse granularity (e.g., internet browsing history [182]
or password length [180]). In comparison, the goal of this work is to demonstrate
fine-grained information leakage, specifically, the ability to identify the exact locations
of button presses and extract a user’s input (e.g., a passcode) with dynamic power
traces.
To examine the power leakage, I conduct a series of experiments utilizing a
Motorola G4 connected to a USB charging cable in which the ground cable has been cut
and spliced with a small resistor. An oscilloscope is used to monitor the voltage across
this resistor and thereby the current utilization of the device. This section presents our

12
experimental findings, highlights the leakage patterns, and further shows that the state
of the smartphone’s battery will not cause any attenuation effects on the side channel.

2.2.1 Button Press Detection


To explore the potential for identifying button presses, our first study observes
the signal on the USB cable while charging a smartphone, utilizing the aforementioned
oscilloscope and charging cable setup. The dynamic power signal is highly correlated
with device activity, as illustrated in Figure 2.2. When the smartphone is asleep, there
is a steady current utilization with minimal noise. Once the phone is perturbed from
the sleep state, there is an immediate increase in its current utilization. When the
phone enters the lock screen, the signal shows large spikes at different intervals. Finally,
when the user starts to tap the screen and enter a passcode, the signal exhibits a clear
rise and fall upon each button press.
This experiment not only demonstrates the information leakage on the power
line, but furthermore illustrates two important properties underpinning our following
studies: (1) from the signal measured on the USB power line, one can clearly detect the
powering-on of the screen and the exact starting point of the button-press sequence; (2)
in the lock screen mode, each button press made by a user is clearly observable and
separable.

2.2.2 Button Press Location Identification


The power usage in Figure 2.2 shows a significant elevation when a button is
pressed. This elevated usage is caused by the activities of the mobile OS. Specifically,
once the mobile OS has captured an input action from the user, it provides visual
feedback by rendering and drawing an animation on the screen, causing pixels to
rapidly change colors and inducing two significantly different voltage states. On the
lock screen, the animation for each button press is similar, albeit in a different location.
These similar animations cause the power leakage to exhibit similar signals for different
buttons, as the blue and grey areas in Figure 2.2 depict.

13
10 -3
4
High Pass Filter Fully Charged

Voltage (V)
0
60 200 400 600

10 -3
4
High Pass Filter Charging

0
60 200 400 600
Frequency (Hz)

(a) (b) (c)


Figure 2.4: Comparison of voltage readings when pressing buttons on the lock screen of
a Motorola G4 in two cases: fully charged vs charging. Sampling rate is 125 KHz. (a)
depicts the raw unfiltered signal. (b) utilizes a high pass filter with a cutoff frequency
of 60 Hz to remove the offset. (c) presents the Fourier transform of the filtered signal,
demonstrating that the charging status of the phone does not affect the signal integrity.

The unique contribution of this work is to discriminate the “similar-looking”


signals and extract the location of the animation via power leakage. To examine this
potential, I have designed a custom Android application running on two smartphones
with different screen technologies: the Motorola G4 with an LCD screen, and the
Samsung Galaxy Nexus with an AMOLED screen. The application divides the screen
into six portions (i.e., top, middle, and bottom rows, as well as left, middle, and right
columns) and displays, on a black background, flickering white bars that fill each portion
of the screen in their respective tests. To mimic the way the Android OS renders user
interface elements and the lock screen, I set the hardwareAccelerated developer flag
to ensure that the GPU is involved in image rendering.
The gathered signal exhibits a steady 60Hz signal that denotes the beginning and
end of a refresh cycle2 . I isolate the 60Hz signal within the sample stream and average
all of the frames to reduce noise for better visual effects. The results are presented in
Figure 2.3, which zooms in on a 2ms portion of the signals to better display the subtle
differences. As can be seen, the voltage readings show that in both LCD and OLED
technologies, there is an appreciable difference in the power usage of displaying the

2
The screen constantly refreshes all pixels with a specific rate (typically 60Hz), in a
manner from left to right, and from top to bottom. This phenomenon can be observed
with a slow motion camera, such as the one on an iPhone, which films at 240 frames
per second.

14
same image on different portions of the screen. These experiments demonstrate the
great potential for inferring the location of the animation played on the screen when a
user presses a virtual button, by exploiting the power leakage on the USB power line.

2.2.3 Impact of Battery Charging


One important question is whether the state of the smartphone’s battery will
cause any attenuation effects on the power side channel. This is critical as the smart-
phones charged at public USB charging facilities will likely have arbitrary battery levels.
Once plugged into a charger, the smartphone draws its power from the charger and
uses any excess power to charge its battery. Not only will a charging battery lead to
a higher power draw than a fully charged battery, but the battery charging circuitry
might attenuate the power leakage information, since high frequency signals contained
in current spikes might be filtered by the reactive components of the battery charging
controller.
To study the difference between a fully charged phone and a charging phone,
I collect the power traces under the same workload, i.e., when entering a single digit
on the virtual keypad repeatedly. The power traces are presented in Figure 2.4a. The
figure shows a positive offset for the “charging” case, demonstrating that a larger base
amount of current is being drawn by the phone to perform its tasks and additionally
charge the battery. However, upon applying a high-pass filter to remove all frequencies
under 60Hz that correspond to the DC offset in the signal, the filtered signals of the
two phones match each other quite well, as shown in Figure 2.4b. I also conduct a
Fourier transform on both signals, and display the resulting frequency spectrum in
Figure 2.4c. In the figure, the high-band frequency signals still exist in both cases,
preserving the high speed dynamic fluctuations attributed to the user touching the
screen. Although the charging battery illustrates a slightly smoothed frequency signal,
there is no obvious visual difference in the frequency spectrum between a charging
phone and a fully charged phone.

15
Figure 2.5: Overview of Charger-Surfing’s working flow.

2.3 Sensitive Information Inference


This section presents the method with which Charger-Surfing exploits the fine-
grained power line leakage described above to infer button presses made by a smartphone
user.
Figure 2.5 presents the working mechanism of Charger-Surfing. An adversary first
acquires raw signals from a “surfing” charger with a hidden voltage monitor (provided
in step ¶). The raw signal is searched to detect a button sequence (step ·), which is
further isolated to individual buttons (step ¸). Next, a neural network processes the
signal to determines the target device model (step ¹). This information is used to select
the exact model for button identification from a set of pre-trained neural networks. The
button press signal is preprocessed (step º) for the phone model specific neural network,
which finally infers the virtual buttons pressed by the user on the touchscreen (step »).
The rest of this section details the techniques used in each step of Charger-Surfing.

2.3.1 Raw Signal Acquisition


The prerequisite for sensitive information inference is to covertly and compre-
hensively capture the power trace of the user’s smartphone without losing any useful
information. In Charger-Surfing, this is performed at step ¶, as shown in Figure 2.5,
via a hidden voltage monitor that is attached to the charger without a user’s knowledge.

16
The voltage monitor should be able to collect the raw signal of the charging
device at a sampling frequency that is carefully determined. Utilizing a very high
frequency will result in unnecessarily large and cumbersome data, while sampling
too slowly will miss key information. There are two factors that affect the sampling
frequency: the refresh cycle of the screen and the resolution of the screen. As mentioned
in Section 2.2.2, screens typically refresh pixel by pixel, from left to right and from top
to bottom. To observe both the row and column portion of an animation, it is preferable
to sample at a rate that is slightly greater (or less) than the per row update speed, so
that (1) the power utilization can be monitored on a per row basis, and (2) samples can
be taken in different columns as the refresh moves down the screen. Most of today’s
flagship smartphones use a screen resolution between 1920×1080 and 2960×1440 and
have a refresh rate of 60 Hz. A single sample per row would require a sample rate in
the range of 115–178 KHz. Our design uses a sample rate at 125 KHz, which takes one
sample per every 0.9 – 1.4 rows on many flagship smartphones. This rate ensures that
consecutive samples are not taken on the same vertical line, thus providing more useful
location information.

2.3.2 Button Sequence Detection


Step · of Charger-Surfing processes the captured power trace and isolates the
portions of the signal corresponding to the sequence of button presses.
When the user presses a virtual button on the touchscreen, the mobile OS
determines the location of the input and acknowledges the user by lighting up the
button (or playing an animation around it). With a text or numeric entry, it also
displays the corresponding letter or number on the screen.
Each of these activities increases the power consumption, collectively generating
a visible spike in the captured raw power utilization signal, as shown in Figure 2.2. To
detect these signals, Charger-Surfing utilizes a moving mean filter and a level detector.
The filter removes noise from signals, allowing the level detector to isolate portions of

17
the signal belonging to a button press sequence once the level is above an empirically
determined threshold.

2.3.3 Individual Button Isolation


Upon detecting a sequence of button presses, Charger-Surfing moves to step ¸
which detects and isolates each individual button press. Since users press buttons at
different rates, inferring individual button signals is much easier and more practical
than blindly classifying the entire sequence with button presses possibly occurring at
any arbitrary speed.
The process of detecting individual presses also utilizes a combination of a moving
average filter and a level detector. When passed through a moving average filter, the
button sequence displays spikes, each of which corresponds to the beginning of a button
press, as shown in Figure 2.6.
Depending on the button press rate, the raw power signal (e.g., the top picture in
Figure 2.6) may show either a single and isolated press, or multiple overlapping button
presses. In the latter case, it is important to select the signal portion containing the
most distinctive information. The lower picture of Figure 2.6 shows the pattern of a
single button press, wherein the biggest changes occur at the beginning of the signal.
This trend is consistent with the typical behavior of the screen, which is usually static
but comes alive as soon as a button is pressed. Accordingly, for overlapping button
presses, Charger-Surfing discards the end of the signal and keeps the beginning, which
is the most important, distinctive, and potentially identifiable portion.

2.3.4 Phone Detection


In the envisioned threat model, adversaries can profile the power charging
dynamics of most popular smartphone models beforehand, and pre-train a neural
network model for each of these popular phones. A victim’s signal collected over
the USB power line can be fed into the pre-trained model, once the phone type is
determined.

18
Figure 2.6: The top displays the raw signal of multiple overlapping button presses. The
bottom demonstrates how peak detection can be utilized to determine non-overlapping
portions of individual button presses. The signal is collected from Motorola G4 and
filtered for clarity.

While the steps ¶–¸ performed up to this point are generally applicable to all
smartphones, step ¹ of Charger-Surfing focuses on detecting the phone type. This
task is much easier than classifying individual button presses as the screen technology,
the screen resolution, and different components within the phone (CPU, GPU, screen
driver, etc.) lead to vastly different power trace patterns, as demonstrated in Figure 2.3.
To accomplish this identification task, I utilize a neural network that is trained with
the isolated button press signals. The raw signal is passed through a high-pass filter
to preserve the high-frequency components, which are highly correlated to the phone
model, while removing the less informative DC offsets that can be a result of brightness
changes, charging/charged, or different charging rates.

19
As the victim’s phone model may not belong to the set that the attacker utilized
to train Charger-Surfing, the system further examines the confidence values of each
output class when inferring the phone model. If the confidence values are all low, it
will not pass the samples to the phone-specific neural networks for classification.

2.3.5 Signal Preprocessing


After determining the phone model, Charger-Surfing then scales and standardizes
the power signal in step º following the characteristics of the specific phone model.
The signals gathered from the USB power line are commonly between 0 and 100 mV.
After passing through the high-pass filter, the signal is mostly distributed between
-50 mV and 50 mV. I preprocess the data with a scaler designed for the target phone
model, which is created by pre-training with a few samples from the adversary’s own
device. The resultant signal’s range is between -1 and 1, which typically leads to the
best inference results for most neural networks.

2.3.6 Animation Inference


In the final step (i.e., step » in Figure 2.6), the preprocessed power signal is
sent to a neural network trained for that specific type of device, to reconstruct the
multi-press sequence that the victim types into the device.
As the collected signal is a one-dimensional time series of voltage measurements,
Charger-Surfing utilizes a one-dimensional convolutional neural network (CNN). The
network includes a repeated series of convolutional and max-pooling layers, followed
by a softmax regression layer, which classifies the input signal into one of the possible
buttons and provides a confidence value associated with each class.
Why Utilize a CNN? CNNs are known for their high accuracy when processing
data with spatial correlation and classifying time series data [84].
Furthermore, as discussed in Section 2.3.1, Charger-Surfing uses a single sampling
rate for all the phones and the sampling rate (125KHz) is chosen to modulate around
the screen rather than continually sampling the same pixels. This implies that for

20
phones with different screen resolutions, features of button presses appear at different
locations of the power signal. CNNs are well suited to recognize features that can be
found in any area of a signal.
Model Classifier Configuration. An important consideration of any CNN
is the size of the convolutional kernels. Small kernels may not be able to recognize
features that manifest themselves over a large portion of the input signal, while large
kernels may be too coarse, missing the fine details and features of an input signal.
The ideal size of the convolutional kernels depends on the size of the features in
the power trace, which in turn depends on the sampling rate, screen layout, and size
of the animation to be detected. If one desires to classify individual keys on the device
text entry keyboard, for example, it would be necessary to calculate the size of the key
press animation with respect to the screen size and modify the kernel size accordingly.
This allows the first layer of the network to capture features that are large enough to
identify a button press, while not being so large as to oversimplify or miss a feature,
and not being so small as to only capture noise. Furthermore, our CNN design adopts a
typical architecture consisting of sets of a convolutional layer followed by a max-pooling
layer, which potentially increases the receptive field3 of the network. This allows the
subsequent layers of the network to leverage the highlighted features and correlate their
location across multiple frames of the signal when inferring the key press.

2.4 Case Study: Passcode Inference


To demonstrate that Charger-Surfing poses a genuine security threat, I conduct
a case study of passcode inference, dividing my evaluation into two major sections. This
section details the experimental evaluation for a broader range of devices, including
data collection, single button inference, 4- and 6-digit passcode inference, and impact
of sampling frequency upon inference accuracy, demonstrating the wide applicability
of Charger-Surfing. Section 2.5 tightens the scope of our evaluations, focusing on a

3
The receptive field is the portion of the input signal affecting the current convolutional
layer.

21
low-cost hardware implementation of the Charger-Surfing attack, its insensitivity to
different smartphone configuration variables (wallpapers, brightness, vibration, charging
status), and the transferability of the attack between different smartphones of the same
model. In total, I gather data from 33 volunteers4 and on 6 different devices. Our
participants are about 30% female, including members of varied races, heights, and
weights. The age of our participants ranges from 20 to 60 years old. This section utilizes
the data of 15 volunteers and four devices, while Section 2.5 uses an additional set of
18 volunteers and two devices.

2.4.1 Data Collection


To ensure that Charger-Surfing is not tied to a specific phone model, screen
technology, or mobile OS, I collect data from a spectrum of smartphones running both
iOS and Android OS, listed in Table A.1 in Appendix A. For Android devices, the
Galaxy Nexus represents smartphones with aging hardware, while the Motorola G4
provides an example of a more recent and advanced smartphone. A similar strategy is
applied in selecting the iOS devices. The iPhone 6+ represents an aging but still widely
used device, while the iPhone 8+ provides an example of a more recent smartphone
that shares a large amount of hardware with the current iPhone SE 2nd generation
released in 2020.
To assess the impact that individual users might have on the accuracy of Charger-
Surfing, I collected input data from 15 volunteers who regularly use passcode based
authentication in smartphones. Our participants have diverse backgrounds and are
varied in height, weight, gender, race, and age. The goal is to demonstrate that
Charger-Surfing is victim-independent, as the different users likely interact with the
same smartphone differently (e.g., placing their finger on different areas of the button
or holding their finger on the screen for different amounts of time), which could lead to
variations in the duration of the animations played on the smartphones tested. Each

4
The human-user-involved experiments have been filed and approved by the Institu-
tional Review Board (IRB) to ensure participants are treated ethically.

22
(a) iPhone (b) Android

Figure 2.7: Passcode lock screen layout and animation.

user was tasked to input a pre-determined sequence of 200+ buttons on the numerical
lock screen. The sequence was designed to gather a uniform distribution of button
presses such that no button had a disproportionate amount of samples.
Our data collection utilizes a modified charging cable and a Tektronix MDO4024C
oscilloscope. The charging cable is modified by cutting the ground wire and inserting a
0.3Ω resistor. The oscilloscope is used to measure the voltage drop across the resistor,
providing a fine-grained and repeatable method of observation. It is configured to
sample at a rate of 125,000 samples per second.

2.4.2 Classifier Configuration and Training


As discussed in Section 2.3.6, for the best performance, it is necessary to tune
the kernel sizes of the CNN based on the screen layouts and animations that are being
classified.
Figure 2.7 presents the typical lock screen layouts implemented by Android and
iOS systems as well as the animations on the lock screen.
As shown, the animations caused by a button press range from about 1/10 of
the vertical screen height on iPhones (button 5 in Figure 2.7a) to about 1/5 of the
vertical screen height on Android phones (button 5 in Figure 2.7b). With a sampling

23
rate of 2,083 samples per frame5 , the most pertinent features for button identification
are within 208 (iPhone) - 416 (Android) samples. Thus, when considering the receptive
field of the network, I choose an initial kernel size of 50 for the iPhone network and 100
for the Android network. This sizing configuration ensures that I capture the smaller
features of the signal in the initial layers of the network while still considering both
the larger features of the signal in intermediate layers and the location on the screen
across multiple frames of animation in the final layer. Detailed network configurations
are listed in Tables A.2 and A.3 in Appendix A.
Our threat model assumes that adversaries are unable to obtain the victim’s
data before training the system, and thus can only train the classifier using their own
collected data. To emulate this scenario, I divide the users into two separate sets: one
set for training (i.e., adversary) and the other set for testing (i.e., victim). To examine
the robustness of the network to the composition of the training data, I randomly
select five users to create the training set. The remaining 10 users form the testing set,
ensuring that there is no overlap between the training and testing users. I train five
neural networks for each device such that the ith (1 ≤ i ≤ 5) network is trained with
the data from i different users.
In testing, each network’s performance is evaluated on the 10 testing users, and
the average accuracy is reported.

2.4.3 Phone Identification


Our experimental steps closely follow the process in Figure 2.5. After the signal
is acquired, it is passed through button isolation (as described in Section 2.3.3). The
next step is to correctly identify the target phone model so that the signal can be
processed by the appropriate preprocessing system (Section 2.3.5) and classifier.

5
The power trace signal is sampled at 125KHz, and the lock screen refreshes at a rate
of 60Hz. Under this configuration, 2,083 samples are gathered within each refresh cycle.
Each sample contains information about the content of the screen progressing vertically,
as the screen refreshes from top to bottom.

24
Table 2.1: Single Button Accuracy
# of Phone
Training Motorola Galaxy
iPhone 6+ iPhone 8+
Users G4 Nexus
1 82.0% 50.0% 23.8% 44.6%
2 90.0% 95.0% 93.3% 67.1%
3 99.6% 99.1% 96.9% 88.7%
4 99.7% 99.4% 98.5% 94.5%
5 99.9% 99.6% 99.5% 95.8%

(a) Press button on the left side. (b) Press button on the right side.

Figure 2.8: Android’s animation on touching different parts of a button.

I train a primary neural network using high-pass filtered data from a subset
of the collected users and test on the data from the remaining users. Our results
show that the network can determine the correct phone model 100% of the time. This
identification step is also applicable to phones that might run multiple OS versions.
Different OS versions would be detected and classified at this step before being passed
to the more specific secondary neural networks.

2.4.4 Single Button Inference


I first evaluate the accuracy for inferring a single button press, which is the most
fundamental aspect of the system, as, without the ability to robustly classify a single
button, it is impossible to accurately infer the entire passcode.
Table 2.1 lists the accuracy of a single button inference for each smartphone.
When the training data was collected from only one user, I observe divergent accuracy

25
results for different phones, ranging from 23.8% for iPhone 6+ to 82.0% for Motorola
G4. Once I increase the training data size to two users, however, there is a significant
accuracy improvement for single button inference: 67% for iPhone 8+ and more than
90% for all the other phones. The increasing accuracy trend is mainly attributed to the
differences in user behavior when interacting with touchscreens, which can have direct
effects on the power usage of the screen. More specifically, Android devices demonstrate
spatial and temporal variations while iOS devices demonstrate temporal and processing
variations. On the Android lock screen, the screen plays an animation that depends
on where users place their finger. An example of this scenario is shown in Figure 2.8,
where a user placing the finger on the left or right side of the button can create different
animations. Furthermore, the longer the user holds their finger in this position, the
larger the darker white circle grows. On iOS devices, when users press a button on the
lock screen, no matter where exactly they press it, the entire button lights up completely
and immediately. This animation does not end until the user removes their finger,
imparting temporal variations to the recorded power trace. Furthermore, devices newer
than the iPhone 6S (such as the tested iPhone 8+) make use of so-called “3D-Touch”
to measure the force of the screen press. This extra processing and information further
introduces subtle noise or processing variations into the measured signals.
The aforementioned user-oriented uncertainties and randomness can be dramati-
cally mitigated by integrating more users into the training process. Once the neural
network is presented with a robust dataset demonstrating diverse user behaviors, these
abnormalities can be recognized and classified correctly. Table 2.1 confirms that by
training on four users’ data, Charger-Surfing can achieve more than 94% accuracy when
classifying the single button presses of new users (i.e., the victims) for all devices. The
average accuracy across all four test phones for single button inference further reaches
98.7% when there are five training users. By this point, the improvements demonstrate
diminishing returns as more users are included. This indicates that our system only
requires a few users’ training data to achieve near optimal accuracy.

26
Figure 2.9: Breakdown of actual and predicted button classifications for the Galaxy
Nexus when trained with one user’s data. An entry on row i and column j corresponds
to button i being classified as j.

2.4.5 Misclassification Analysis


To further evaluate the effectiveness of Charger-Surfing, I examine how the
neural networks perform when they guess incorrectly. Figure 2.9 presents the confusion
matrix of the inference results of the Galaxy Nexus, when trained on only one user’s
data. The figure shows the actual pressed buttons as rows and predicted buttons as
columns. An entry on row i and column j corresponds to button i being classified as j.

27
(a) 1st Trial (b) 5th Trial (c) 10th Trial

Figure 2.10: Accuracy of 4-digit passcode inference.

28
(a) 1st Trial (b) 5th Trial (c) 10th Trial

Figure 2.11: Accuracy of 6-digit passcode inference.


Figure 2.9 shows the highest prediction rate in the diagonal for all buttons except
for button 7, which can be classified as 7 or 8 with equal probability of 0.45. Five
buttons (0, 1, 6, 7, 9) demonstrate performance lower than 50%, however, usually the
incorrect inference is only off by a single row or column, indicating that the screen
region it guessed is correct. Excellent examples of this phenomenon are the pairs (0,9)
and (7,8) that are frequently mis-predicted as one another.
In many buttons, the mis-predictions are not uniformly distributed but tend to
cluster into one or two buttons, implying that a second or third guess would result in the
correct prediction for these buttons. The results of the first three guesses of the system
trained by only one user’s data are shown in Table 2.2. The second guess achieves an
average accuracy increase of 11.7%, and the third guess further increases accuracy by
an average of 9.9%. This rapid accumulation trend will assist in the reducing the search
space when classifying a user’s passcode.

Table 2.2: Cumulative Accuracy of 3 Classification Attempts for Single User Trained
Model
Phone
Attempts Motorola Galaxy
iPhone 6+ iPhone 8+
G4 Nexus
1 82.0% 50.0% 23.9% 44.6%
2 86.6% 63.0% 40.6% 57.3%
3 89.0% 72.0% 51.9% 65.5%

2.4.6 Passcode Inference


With ability to classify single button presses, it is possible to infer passcodes.
Many Android and iOS smartphones allow up to ten passcode attempts before erasing
the content of a device, thus I report the accuracy of Charger-Surfing in inferring 4-digit
and 6-digit passcodes within 10 trials.
4-digit passcode: I select 1,000 random 4-digit combinations to test the
classifier. To construct the candidates for a passcode guess, I examine the confidence
vectors of each single button inference in the passcode. I rank these confidence vectors

29
to produce the top candidates for each press and then construct combinations of the
top candidates to produce guesses for the passcode.
Figure 2.10 illustrates the accuracy for 4-digit passcode inference. I utilize the
networks trained in Section 2.4.4, where each phone is trained on its own network with
i (1 ≤ i ≤ 5) users. Figures 2.10 (a), (b), and (c) show the accuracy results after the
first, fifth, and tenth trials, respectively.
In a brute force attack scenario, the success rate on the first trial is only 0.01%.
By contrast, with only one user in the training set, Charger-Surfing achieves an average
success rate of 13.9% on the first trial and a 20.8% success rate after the 10th trial.
Clearly, there is a strong trend towards improved accuracy as the number of training users
increases, showing that with more users, Charger-Surfing can develop a more general
and accurate model that is robust against irregularities caused by user interactions
with the smartphone. When two users are involved in training, the average success rate
increases substantially, scoring 59.5% on the first trial and 75.8% by the tenth trial.
This improvement trend continues but slows down as more users are included. Finally, it
achieves an average success rate of 95.1% on the first trial and 99.5% on the tenth trials
when trained with five users. The diminishing return indicates a strong convergence of
Charger-Surfing’s inference accuracy with only a few users in the training set.
6-digit passcode: I further evaluate the effectiveness of Charger-Surfing when
cracking a longer, 6-digit passcode. Similarly to the 4-digit case, I select 1,000 random
6-digit combinations and test them against our inference system. Figures 2.11 (a),
(b), and (c) illustrate the accuracy after the first, fifth, and tenth trials, respectively.
Although the search space for a 6-digit passcode is much larger (a 6-digit passcode
has 1,000,000 combinations), Charger-Surfing demonstrates high success rates similar
to those achieved when cracking a 4-digit passcode. When trained on five users, the
success rate of the first trial is greater than 90% for all phones except the iPhone 8+,
which has an accuracy of 77.0%. Even for iPhone 8+, the success rate then increases to
90.3% after the fifth trial; and the accuracy for all phones is more than 96% by the
tenth trial. In comparison to a brute force approach that has a success rate of 0.001%

30
125%

100%

Single Button
Accuracy
75%

50%

25%

0%
125.0 62.5 31.3 15.6 10.4 7.8 6.3 3.9
Frequency (KHz)
Figure 2.12: Impact of different sampling rates on single button accuracy, based on
3-user data of Motorola G4.

within ten trials, Charger-Surfing is more than 96,000 times more effective.

2.4.7 Impact of Sampling Frequency


As mentioned in Section 2.3.1, Charger-Surfing utilizes a sampling rate of 125
KHz, which takes about 1 sample every 0.9–1.4 rows on many flagship smartphone
screens. As sampling at a higher frequency requires more expensive and powerful
equipment, I examine the impact of sampling at lower frequencies on single button
inference accuracy. I downsample the raw signal to different frequencies, and preprocess
the signal in the manner described in Section 2.3.5. The neural networks are resized
and retrained to work with the data collected at reduced sampling rates.
Figure 2.12 illustrates the accuracy of single button inference on a Motorola G4
using networks trained with three users. A decreasing trend in accuracy can be seen
when lowering the sampling frequency. The drop is slow at first: when the sampling
rate decreases to 31.3 KHz, the accuracy degrades from 99.6% to 99.5%, a drop of
only 0.1%. When the sampling rate is reduced to 15.6 KHz, there is a larger drop in
accuracy but it still remains above 90%. However, further decreases in the sampling
rate leads to dramatic losses in accuracy.
To better understand the reason for the accuracy drop, I further examine the

31
Table 2.3: Impact of sampling frequency on row, column, and overall classification
accuracy, based on 3-user data of Motorola G4.

Accuracy
Frequency
Row Column Overall
(KHz)
62.5 99.4% 99.4% 99.3%
31.3 99.8% 99.6% 99.5%
15.6 98.5% 92.4% 92.3%
10.4 94.1% 62.3% 61.3%
7.8 85.3% 46.9% 43.0%
6.3 59.5% 38.5% 26.0%
3.9 30.8% 33.4% 9.9%

(a) iOS Keyboard (b) Android Keyboard

Figure 2.13: Android and iOS keyboards. Each keyboard has a similar layout, with 4
rows of buttons. Each keyboard contains a maximum of 10 buttons per row (top row).

row and column accuracy degradation6 as the sampling rate decreases. The results are
listed in Table 2.3. It turns out that the column accuracy is the limiting factor. While
the row accuracy remains above 94% even at 10.4KHz, the column accuracy degrades
from 99.5% at 31.3KHz to 62.3% at 10.4KHz. Such a result is consistent with the screen
refresh behavior: as the screen refreshes row by row and from left to right on each
row, the row signal changes much slower than the column signal. Thus, a decreased
sampling rate can still capture the row signal, but becomes incapable of fully capturing
the column signal.

6
Row (column) accuracy is defined as the percentage of classifications that fall within
the correct row (column) (e.g., a ‘1’ that is misclassified as a ‘2’ is still in the correct
row).

32
2.4.8 Detection Granularity Analysis
So far I have demonstrated that by monitoring the power usage of a charging
smartphone, an adversary can extract the location of animations on the touch screen,
compromising a user’s passcode. Another particularly enticing target is the onscreen
virtual keyboard. Each press of the keyboard provides feedback to the user by either
displaying an enlarged version of the pressed character or by darkening the pressed key.
Thus, an adversary with a voltage monitoring setup might attempt to infer a user’s
input by locating and classifying the animations of the onscreen keyboard. However,
one important question remains: is Charger-Surfing able to achieve sufficient precision
for classifying smaller animations on the screen?
To gain a better understanding of the achievable precision of Charger-Surfing, I
examine the relationship among animation positioning, animation size, and inference
accuracy at different sampling rates. Specifically, the results in Table 2.3 show that the
column accuracy is the limiting factor in classification accuracy. Using the examples of
the onscreen keyboard in Figure 2.13, I can see that both iOS and Android keyboards
have a maximum of 10 columns (top row) that must be classified accurately. Table 2.3
shows that a sampling rate of 31.3 KHz is required to accurately classify 3 columns.
Thus, to classify 10 columns, the sampling rate should be increased by at least 10/3
times to around 105 KHz.
While this sampling rate ensures that the signal contains enough information,
it is equally important to tune the filter size in the neural network for identifying the
patterns present in the data. As previously discussed in Section 2.4.2, the convolutional
kernels must be sized such that they are smaller than the number of samples that
encompass the animation. For example, in the iOS keyboard presented in Figure 2.13a,
each key takes up about 1/17th of the vertical space on the screen. Using the sampling
rate determined above, of 105 KHz, 1,750 samples are taken during each screen refresh.
Thus, each keypress animation can be recorded in about 103 samples. Leveraging
our experience in training the CNN for passcode inference (a kernel size of 50 for 208

33
samples, as described in Section 2.4.2), a kernel size close to 25 should provide an
adequate starting point for tuning the network to detect keyboard press animations.

2.5 Attack Practicality


The analysis on sampling rate shows the potential of developing a low-cost data
acquisition system with cheap and compact commercial off-the-self (COTS) hardware,
which can be easily integrated and hidden inside shared power banks or public USB
charging facilities, making the Charger-Surfing attack more practical. In this section, I
demonstrate the practicality of Charger-Surfing by (1) detailing a portable, low-cost
power trace collection system, and (2) testing the system under different smartphone
settings and across different devices of the same model.

2.5.1 A Portable Data Collection System


I design and develop a portable, low-cost microcontroller-based system for data
acquisition, as shown in Figure 2.14. It consists of an Espressif ESP32 chip with a
dual-core Tensilica Extensa LX6 processor, built-in WiFi, and Bluetooth radio. In the
system, the microcontroller is connected to a 10-bit analog-to-digital converter (ADC)
manufactured by Analog Devices (AD7813). One of the ESP32 cores is dedicated to
gathering samples from the ADC, while the other core handles all WiFi communication
and data storage needs. The sampling rate is configurable (up to 62.5KHz) and, as
each sample is only 10 bits, the maximum data rate is quite low, at only 78.125KBps.
The cost of the whole data collection system is less than $20.
A Motorola G4 is used to test the accuracy and effectiveness of this portable,
low-cost data collection system. I set the sampling rate to 62.5KHz, and collect button
press data from 20 different users. Based on the studies in Section 2.4, I randomly
select five users to train the network and validate with the remaining 15 users. The
results are shown in Table 2.4. I can see that even with a low-end (less than $20) data
acquisition setup, an adversary can correctly identify single button presses with 98.6%
accuracy on the first attempt: a drop of only 1.3% compared to a much more expensive,

34
Figure 2.14: The portable, low-cost data collection setup. A WiFi enabled microcon-
troller can send acquired data to a custom webserver in real-time.
Table 2.4: Single Button and Passcode Inference Accuracy (5 training users / 15 testing
users).
Single Button Passcode
Attempt Press Trial 4-Digit 6-Digit
1 98.6% 1 94.9% 92.4%
2 99.4% 5 97.4% 94.9%
3 99.6% 10 97.5% 96.3%

faster sampling and bulky setup (e.g., an oscilloscope). For cracking a 4-digit passcode,
the system achieves an average accuracy of 94.9% in the first attempt and 97.4% by the
fifth attempt. The results of cracking a 6-digit passcode are also promising: an average
accuracy of 92.4% in the first attempt and 96.3% by the tenth attempt.

2.5.2 Testing of Varied Device Settings


In an attack scenario, it is unlikely that a victim’s device is configured exactly like
the attacker’s training device. For example, it is likely that a victim has a different screen
background, brightness setting, etc. To examine how these configuration variations may
affect the accuracy of the attacker network, I test the network on a victim with different

35
configurations. I gather data from a Motorola G4 in which we, one at a time, change
the wallpaper (two different wallpapers), modify the brightness (0%, 50%, 100%), use
an uncharged phone, and enable haptic feedback. I then test the data against the
network trained with 5 users in Section 2.5.1. The results listed in Table 2.5 indicate
that the configuration difference has very little impact upon the inference accuracy,
which remains above 97% for single button inference in all cases. This demonstrates
that Charger-Surfing is quite robust against device configuration changes.

2.5.3 Cross Device Testing


To further demonstrate that Charger-Surfing poses a real threat, I launch attacks
under a more strict cross-device scenario wherein attackers can only train the classifiers
on their own phone and then test them against a different phone (i.e., a victim’s phone).
Also, while attackers can collect data from multiple different wallpapers during training,
they might not know the exact wallpaper used by the victim. This set of ‘cross-device’
experiments are conducted given two phone models, iPhone 6+ (iOS 12.4) and iPhone
8+ (iOS 13.4). Under each model, there are two phones (e.g., two iPhone 6+ phones)
used separately for training and testing. For each training phone at the attacker side, I
have two users who gather 100 presses for each button. I then train the model using
three different wallpapers: black, white, and multi-colored. For each testing phone at
the victim side, I gather 200 test presses from 10 users (different from the two users at
the attacker side), with wallpapers that are not used in training. The exact training
and testing configurations are listed in Table 2.6.
The obtained accuracy results of the two phone models, iPhone 6+ and iPhone

Table 2.5: Single Button Inference Accuracy (5 training users / 1 testing user) with
Varied Configurations.
Static Wallpaper Brightness
Configuration Charge Haptics
1 2 0% 50% 100%
Accuracy
99.3% 98.0% 98.0% 97.3% 100% 99.2% 100%
(1st Attempt)

36
Table 2.6: Cross-device training and testing configurations.
Training Testing
Phone A Phone B
Users: 1,2 Users: 3-12
Wallpaper: 1,2,3 ⇒ Wallpaper: 4
100 Presses of Balanced 200
each button button sequence
Total: 6,000 Presses Total: 2,000 Presses

Table 2.7: iPhone 6+ cross device testing classification results. 2 training users on an
iPhone 6+ and 10 testing users on a different iPhone 6+.
Single Button Passcode
Attempt Press Trial 4-Digit 6-Digit
1 99.1% 1 96.5% 94.6%
2 99.4% 5 97.4% 95.6%
3 99.4% 10 97.4% 96.2%

Table 2.8: iPhone 8+ cross device testing classification results. 2 training users on an
iPhone 8+ and 10 testing users on a different iPhone 8+. High initial accuracy meant
that subsequent attempts realized minimal improvement.
Single Button Passcode
Attempt Press Trial 4-Digit 6-Digit
1 99.7% 1 99.0% 98.6%
2 99.8% 5 99.1% 98.6%
3 99.8% 10 99.1% 98.7%

37
8+, are presented in Tables 2.7 and 2.8, respectively, demonstrating that both cross-
device tests achieve greater than 99% accuracy on the first attempt when classifying
single buttons and greater than 94% accuracy when classifying 6-digit passcodes. Note
that the accuracy results here are slightly higher than those in the oscilloscope-based
experiments shown in Section 2.4. This slight difference could be caused by the different
iOS versions (the oscilloscope experiments are performed on older iOS versions), or
oscilloscope vs ADC quantization at low voltages.
Overall, this set of experiments clearly indicate that Charger-Surfing works well
not only across different users but across different devices of the same model, posing a
real security threat.

2.6 Countermeasures
Our experiments show that on different smartphones, Charger-Surfing is highly
effective at locating the button presses on a touchscreen and inferring sensitive informa-
tion such as a user’s passcode. While it would be difficult to completely fix the leakage
channel, which is related to USB charging and hardware, there exist some possible
countermeasures.
The side channel exploited by Charger-Surfing leaks information about dynamic
motion on the touchscreen. This attack is so effective as the layout of the lock screen
is fixed: the buttons for a passcode are in the same positions every time the screen
is activated. On the contrary, randomizing a number’s position on the keypad for
code entry would likely hamper Charger-Surfing’s ability to detect a user’s sensitive
information. However, this position randomization may inconvenience users as it will
take more time for them to locate each button. Furthermore, this approach scales poorly;
randomizing a keyboard layout, for example, would be highly undesirable to users.
Likewise, it is possible for smartphone vendors to remove button input animations,
a change that would significantly reduce the information leakage in the power line,
but provide minimal feedback to users as to whether they have correctly pressed the

38
intended button. While both features are available in some customized versions of
Android, they are not widely deployed in currently available devices.
At first glance, one likely solution is not to eliminate the leakage, but to drown
it out via noise. One such option would be to utilize a moving background such as
the readily available live/dynamic wallpapers on Android/iOS, which act similarly to
videos and constantly animate the screen. While this idea seems initially attractive, it
has a few major drawbacks: 1) the live wallpaper only works on the lock or home screen
and would not prevent similar attacks against onscreen keyboards in applications, and
2) the noise generated by this system is random and can be filtered out with sufficient
samples. In a preliminary study of this defense technique, I built a neural network
trained with 100 samples per button taken with two live wallpapers and tested on
another live wallpaper. The network was able to realize greater than 98% single button
accuracy, demonstrating that with sufficient samples of live wallpapers, Charger-Surfing
can discern the true user input signal from the noise signal of the moving background.
To fully address the leakage channel exploited by Charger-Surfing, one solution
is to eliminate the leakage channel by inserting a low pass filter in the charging circuitry
of the device. This modification will remove the informative high frequency component
from the signal. In a preliminary testing, I applied a low-pass filter with a cutoff of
60Hz to the collected iPhone 6+ cross-device data and the accuracy dropped to 10%
(expected accuracy of random guessing). This result demonstrates that this approach
can effectively mitigate the information leakage that Charger-Surfing relies upon.
Until an effective countermeasure is widely adopted, it is important for users
to be increasingly aware of the security threats associated with USB charging. Users
should avoid inputting a passcode or other sensitive information while charging their
smartphones in public or shared environments.

2.7 Related Work


In this section, I briefly survey the research efforts that inspire our work and
highlight the differences between our work and previous research. I mainly discuss

39
research work in the following four areas:

Smartphone authentication. Smartphones are commonly equipped with two popular


authentication methods: numeric-based passcodes or pattern-based passcodes. Both
methods, however, are vulnerable to various types of attacks, including shoulder
surfing [141], smudges [17], and keyloggers [34]. Previous work has demonstrated that
sensory data (e.g, accelerometer, gyroscope, and orientation) can be used to extract
a user’s input on the touchscreen [179, 118, 128]. In addition to in-device sensors,
attackers can also utilize acoustic signals to infer keystroke information on physical
keyboards [25, 195]. Recently, Zhou et al. [193] proposed PatternListener to crack
Android’s pattern lock password through the acoustic signals gathered by a malicious
application accessing the in-device microphone. Unlike these works, our work does not
require malicious apps to be installed on the target smartphone.
Another type of keystroke inference on smartphone devices leverages video
recording [184], where attackers use a camera to record finger behaviors [137, 178, 185]
or the users’ movements [158]. The reflections off of an eyeball, captured by special
equipment, can also be exploited to leak device passwords [19, 18, 43]. Our work differs
from these in that our approach does not require attackers to be in close physical
proximity to the victim.
Other authentication methods utilize physiological biometrics (e.g., face [161])
and behavioral biometrics for authentication, including touch patterns [192], gait [113,
177], hand movements, and grasp features [100, 152]. However, these approaches
can suffer from replay attacks and insufficient accuracy and do not satisfy industry
requirements.

Power analysis. Extensive efforts have been devoted to analyzing the power consump-
tion of smartphones [132, 131, 31, 112]. Carroll et al. [36] presented a detailed analysis
showing that the touchscreen is one of the major consumers of power in a smartphone.
Furthermore, many works [41, 55, 188] attempt to understand the energy consumed by
the touchscreen.

40
The power consumption of a smartphone could be exploited as a side channel
to extract information such as mobile application usage [42] or password length [180].
Yang et al. demonstrated that public USB charging stations allow attackers to identify
the webpages being loaded when a smartphone is being charged [182]. Michalevsky
et al. [116] demonstrated that power consumption could be used to infer the location
of mobile devices. Spolaor et al. [154] showed that the USB charging cable can be
used to build a covert channel on smartphones by controlling a CPU-intensive app
over 20 minutes. To the best of our knowledge, I are the first to show that the power
consumption of a smartphone can be used to infer animations on a touchscreen and
steal sensitive data, such as a user’s passcode.

Other side channel attacks. Chen [40] demonstrated that the shared procfs in the
Linux system could be exploited to infer an Android device’s activities and launch UI
inference attacks. Without procfs (e.g., iOS devices), attackers can still infer sensitive
information and private data by exploiting exposed APIs [189]. Genkin et al. [65]
acquired secret-key information from electromagnetic signals by attaching a magnetic
probe to a smartphone. Radiated RF signals can also be used to eavesdrop screen
contents remotely [114]. Recent research [67] has also shown the possibility to infer
broad information on large computer monitors via acoustic emanations from the voltage
regulator. Similar to traditional computers, smartphones are also vulnerable to classical
cache-based side-channel attacks [191]. Our work differs from these prior works by
showing much finer grained information leakage of screen animation locations through
the power line.

USB and other power vulnerabilities. As modern smartphones rely on USB to


charge their batteries, multiple vulnerabilities have been found in the USB interface [170],
including traffic monitoring [121], crosstalk leakage [157], keylogging side channels [119],
malicious command execution [166], and trust exploitation [20]. While prior research
has tried to filter malicious USB actions [167, 164], our work demonstrates that, even
without any data transmission over the USB cable, the power consumed can be exploited

41
to extract fine-grained information such as user passcodes.
While ethernet over power line techniques have been utilized in both homes and
data centers [38], Guri et al. demonstrated the possibilities of building covert channels
over a power line [75]. Prior research has also shown that power consumption information
can lead to various privacy issues, including key extraction on cryptographic systems [94]
and laptops [68], state inference of home appliances [59], webpage identification of
computers [45] and laptop user recognition [50]. Unlike these attacks, our work classifies
ten on-screen animations in real time, directly exposing precise user input over the
charging port.

2.8 Summary
This paper reveals a serious security threat, called Charger-Surfing, which
exploits the power leakage of smartphones to infer the location of animations played on
the touchscreen and steal sensitive information such as a user’s passcode. The basic
mechanism of Charger-Surfing monitors the power trace of a charging smartphone and
extract button presses by leveraging signal processing and neural network techniques
on the acquired signals. To assess the security risk of Charger-Surfing, I conduct a
comprehensive evaluation of different types of smartphones and different users. My
evaluation results indicate that Charger-Surfing is victim-independent and achieves
high accuracy when inferring a smartphone passcode (an average of 99.3% and 96.9%
success rates when cracking a 4-digit and 6-digit passcode in five attempts, respectively).
Furthermore, I build and test a portable, low-cost power trace collection system to
launch a Charger-Surfing attack in practice. I then utilize this system to demonstrate
that Charger-Surfing works well in real settings across different user configurations and
devices. Finally, I present different countermeasures to thwart Charger-Surfing and
discuss their feasibility.
Having demonstrated the offensive capabilities of side channels, I next turn our
attention to developing a side channel based defense. Specifically, I realize that many
computer systems are vulnerable to a curious user problem, where well intended users

42
may plug unknown USB devices into a computer, potentially infecting it with malware.
I design a system to thwart this type of attack in the following chapter.

43
Chapter 3

TIME-PRINT: AUTHENTICATING USB FLASH DRIVES WITH


NOVEL TIMING FINGERPRINTS

The Universal Serial Bus (USB) has been a ubiquitous and advanced peripheral
connection standard for the past two decades. USB has standardized the expansion of
computer functions by providing a means for connecting phones, cameras, projectors,
and many more devices. Recent advancements in USB have increased data transfer
speeds above 10 Gbps, making the USB mass storage device (flash drive) a popular
method for moving data between systems. Especially, USB is commonly used in air-
gapped systems where security policies prohibit data transfer via the Internet, such as
military, government, and financial computing systems [33, 125, 134].
While USB has made the usage and development of various peripheral devices
far simpler, it has recently been scrutinized for security issues [21, 58, 76, 86]. USB
is an inherently trusting protocol, immediately beginning to set up and communicate
with a peripheral device as soon as it is connected. This has many advantages, as users
are not required to undertake a difficult setup process, but has recently been exploited
by attackers to compromise host systems. The discovery of Stuxnet [58], Flame and
Gauss [97] has demonstrated that malware can be designed to spread via USB stick.
Unwitting and curious employees might pick up dropped (infected) flash drives and plug
them into their computers, which allowed the malicious code on the drives to infect the
hosts and then propagate across the network, wreaking havoc on the targeted industrial
control systems. More recently, attackers have investigated the ability to modify the
firmware of a USB device [76, 86] such that an outwardly appearing generic USB flash
drive can act as an attacker-controlled, automated, mouse and keyboard. The behavior
of the USB driver can also be utilized as a side-channel to fingerprint a host device and

44
launch tailored drive-by attacks [21, 52]. While many defense mechanisms have been
proposed, these techniques generally require user input [165], new advanced hardware
capabilities [24, 160], or utilize features (device product ID, vendor ID, or serial number)
that could be forged by an advanced attacker with modified firmware [12, 86].
In this paper, I propose a new device authentication method for accurately
identifying USB mass storage devices. I reveal that read operations on a USB mass
storage device contain enough timing variability to produce a unique fingerprint. To
generate a USB mass storage device’s fingerprint, I issue a series of read operations to
the device, precisely record the device’s response latency, and then convert this raw
timing information to a statistical fingerprint. Based on this design rationale, I develop
Time-Print, a software-based device authentication system. In Time-Print, I devise a
process for transforming the raw timing data to a statistical fingerprint for each device.
Given device fingerprints, Time-Print then leverages one-class classification via K-Means
clustering and multi-class classification via neural networks for device identification. To
the best of our knowledge, this is the first work to expose a timing variation within USB
mass storage devices, which can be observed completely in software and be utilized to
generate a unique fingerprint1 .
To validate the efficacy of Time-Print, I first provide evidence that statistical
timing variations exist on a broad range of USB flash drives. Specifically, I gather
fingerprints from more than 40 USB flash drives. Then I examine three common security
scenarios assuming that attackers have different knowledge levels about the targeted
victim from least to most: (1) identifying known/unknown device with different models,
(2) identifying seen/unseen devices within the same model, and (3) classifying individual
devices within the same model. I demonstrate compelling accuracy for each case, greater
than 99.5% identification accuracy between known/unknown devices with different
brands and models, 95% identification accuracy between seen and unseen drives of the

1
USB Type-C has provisions to identify device models [172] via a specialized key
system; Time-Print does not make use of any specialized hardware and works on both
legacy and new devices.

45
same model, and 98.7% accuracy in classifying individual devices of the same model.
I finally examine the robustness of Time-Print in multiple hardware configurations.
I observe that Time-Print experiences a small accuracy degradation when measured on
different USB ports, hubs, and host systems. I also examine the stability of Time-Print
and present a strategy to make the fingerprints robust to write operations. Additionally,
I investigate the authentication latency of Time-Print, demonstrating that while precise
authentication can be achieved in 6-11 seconds, an accuracy greater than 94% can be
achieved in about one second.
The major contributions of this work include:

• The first work to demonstrate the existence of a timing channel within USB mass
storage devices, which can be utilized for device fingerprinting.

• The design and development of a completely software-based fingerprinting sys-


tem, Time-Print, for authenticating USB mass storage devices without requiring
additional hardware or burdensome user interaction.

• A thorough evaluation of more than 40 USB mass storage devices, showing that
the ability to fingerprint with high accuracy is not dependent upon the device
brand, protocol, or flash controller.

The remainder of this chapter is organized as follows. Section 3.1 describes the
threat model, including an attacker’s capabilities, and provides a primer on the USB
protocol, USB mass storage devices, and USB security threats/defenses. Section 3.2
demonstrates the existence of a fingerprintable timing channel within USB mass storage
devices. Section 3.3 details the method for generating and gathering a USB mass storage
fingerprint. Section 3.4 presents the experimental setup for evaluation. Section 3.5
evaluates the Time-Print system. Section 3.6 examines the practicality of Time-Print
under different use configurations. Section 3.7 surveys related work in USB security,
device fingerprints, and device authentication. Finally, Section 3.8 summarizes the
chapter.

46
3.1 Threat Model and Background
This section presents the threat model and introduces various components of the
new timing-based side-channel, including the USB protocol stack, USB mass storage
devices, and current USB security.

3.1.1 Threat Model and Attacker Capabilities


The objective of this work is to highlight the applicability of a security primitive
that can physically and reliably identify USB mass storage devices through a new
timing-based side-channel. I consider a series of realistic scenarios, in which an entity
attempts to either prevent its computing assets from engaging with unauthorized USB
flash drives or better track the usage of flash drives inside an organization. The desired
security level of a computing system inside the organization varies from the least (e.g.,
an open environment) to the highest (e.g., an ‘air-gap’ protection).
Under the lowest security level, I assume that attackers also have the least
knowledge/privilege to launch an attack. For example, a computer at a reception desk
or an open laboratory may have access to some assets on the organization’s network and
is in a high traffic area, where an attacker may be able to physically plug a malicious
device into the temporarily unattended machine. However, compared to computing
systems at the higher security level, it is less challenging (and with less motivation) to
protect such computers at the lower security level. Moreover, the defense methodologies
developed for a high-security system can be applied for the protection of a low-security
system.
Therefore, the main focus of our work is on air-gapped systems that have the
highest security level, such as computer systems in military, government, or financial
organizations, which are frequently air-gapped and isolated from the Internet. Industrial
control systems or life-critical systems (e.g., medical equipment) might also be air-
gapped [125, 134]. While the air-gap is effective at thwarting the vast majority of
outside attacks, it is very difficult to transfer data to and from an air-gapped system.
To this end, USB mass storage devices offer an excellent, low-cost solution, but are

47
not without their drawbacks. Attacks such as Stuxnet [58] were injected into target
systems via USB, and recent research has demonstrated the creation of malicious USB
devices which can negatively affect system security [21, 76, 86].
I then assume that attackers attempt to compromise the target air-gapped
computer via USB drives. Attackers have the ability to design malicious USB devices
so that once the USB handshake is completed, malicious scripts or activities can
be executed on the host. According to the organization’s security policies, system
administrators only issue access to a few approved USB devices (i.e., insider devices)
belonging to particular brands and models (e.g., SanDisk Cruzer Blade). Thus, a USB
fingerprinting mechanism must be integrated into the host to accept/classify approved
USB devices and reject other devices. For a specific air-gapped computer system, system
administrators can train fingerprints for all approved devices. Also, they can pre-collect
multiple devices from popular brands or models to augment the device authentication
system with examples of unapproved drives.
With these settings in mind, I envision three typical scenarios as shown in Fig-
ure 3.1, in which Time-Print offers enhanced security benefits for device authentication.
Note that Time-Print is designed to augment current USB security, and it can greatly
assist existing USB security mechanisms such as GoodUSB [165] and USBFilter [168].
Scenario ¶: Attackers have no knowledge of the approved USB devices, and
thus a random USB device could be connected to the target host. Such a random USB
device likely does not belong to one of the approved device models. Time-Print should
thus reject any device whose model is not approved. In this minimal knowledge scenario,
administrators can also prevent system infection from irresponsible employees that
plug in non-approved devices (dropped devices) or computers in an open environment
(reception computers).
Scenario ·: Attackers (e.g., former employees who are aware of the security
measure) know the brand and model of the approved USB devices and purchase one with
the same brand and model. Time-Print should be able to reject unseen devices of the
same brand/model.

48
Scenario ¸: Auditing user authentication. A system administrator should have
the ability to identify specific devices that were issued to employees. For approved
devices, different authorization levels might be assigned. In this case, the system
administrator needs to audit which specific devices are connected to the target system
to trace employee activities and detect data exfiltration attacks. Therefore, Time-Print
should be able to classify all approved devices with high confidence.
Attacker Capabilities. I examine Time-Print against attackers at multiple
levels. A weak attacker may simply attempt to plug a device into the victim system
with little knowledge (e.g., Scenario 1). A stronger attacker may know the device model
allowed at the victim side and attempt to connect a device of the same model (e.g.,
Scenario 2/3). The strongest attacker may be able to steal a legitimate device and
attempt to replicate the physical fingerprint with an FPGA based system. While the
FPGA based system may present different firmware, the firmware for current USB flash
drives is a closely guarded and proprietary secret. I do not consider a case in which
an attacker is able to significantly modify the firmware of a (stolen) legitimate device.
In addition, I also must exclude authorized users who attempt to maliciously harm
their own computing systems. This is a reasonable assumption as authorized users who
have privileges to access any system resources likely has little need for mounting such a
complicated USB attack.
Defender Preparations. To use Time-Print, defenders (e.g., system admin-
istrators) should first have a security policy for limiting the employee usage of USB
devices to specific models. Then, they need to gather fingerprint samples for their
legitimate devices to enroll them into Time-Print beforehand.

3.1.2 USB 2.0 Versus 3.0


The USB standard consists of software and driver specifications that control the
communication between two devices and has undergone several revisions. One major
revision of the protocol, USB 2.0 [49], enables high data speeds (e.g., the High-Speed
specification of 480 Mbit/s), and adds support for diverse peripheral devices including

49
Insider Devices Difficulty
(Brand X)

APPED COMPU ❸
R-G TE
AI R

Unauthorized Devices
(Brand X)

Unauthorized Devices
(Other Brands)

Figure 3.1: Three security scenarios of USB fingerprinting for device authentication.

cameras, network adapters, Bluetooth, etc. The later introduced USB 3.0 [79] standard
offers an increased 5 Gbit/s data rate and additional support for new types of devices.
Also, USB 3.0 devices are backwards compatible with USB 2.0 ports, but at 2.0’s speed.
USB 3.1 [80] further increases the data transfer rate to 10 Gbit/s with a modified power
specification that increases the maximum power delivery to 100W [13]. In this paper, I
focus on USB devices with standards 2.0, 3.0, and 3.1.

3.1.3 USB Mass Storage Devices and Flash Storage Controllers


USB mass storage devices are a form of removable storage media that allow a user
to transfer files between a host and the device. As a recognized device class [61], mass
storage devices follow a well-defined process when connected to a host. The host queries
the device to discover its class code. Upon determining that it is a mass storage device,
the host launches an instance (on Linux host systems) of the usb-storage driver. The
driver scans the device, determining its file system, and launches the appropriate file
system drivers.
To enable the communication between a device and the host, each USB mass
storage device contains a microprocessor(s) that handles communications and manages

50
the flash storage of the device. Flash storage is generally made up of many blocks. As
flash has a limited write endurance and is usually designed in such a way that individual
bits cannot be selectively cleared, the flash controller typically conducts a series of
operations to modify the stored data in the flash medium. It first locates a new unused
block and copies the data from the old block to the new block while incorporating any
data changes. The flash controller then marks the old block as dirty, and eventually
reclaims these dirty blocks as part of the garbage collection process. The controller
(as the ‘flash translation layer’) maintains the mapping information between logical
addresses (addresses used by the host system to access files) and the physical addresses
of the actual pages, and the frequent remapping of blocks is an invisible process to the
host system. Thus, the time required for the USB mass storage device to access large
chunks of data is potentially unique and suitable to fingerprint the device.

3.1.4 USB Security


With its rapid adoption, USB has also become a popular target for attackers.
Previous studies have shown that users are likely to plug in devices that they find on
the ground [85, 156, 171], especially those modified to look ‘official’ (e.g., contain a
government logo) [144]. Meanwhile, researchers have also proposed numerous defenses,
ranging from firewall and permissions systems [14, 165, 168] to device fingerprinting [82].
Many of these systems rely upon the reported device descriptors (e.g. product/vendor
ID, serial number) [12, 26] which can be modified by a skilled attacker [76, 86]. Mag-
neto [82] attempts to identify USB devices via electromagnetic fingerprinting of the
microcontroller within the USB mass storage device, which is hard for attackers to ma-
nipulate. However, the system required expensive, bulky, and highly sensitive spectrum
analyzers and EM probes to identify devices. Instead, in this work, I uncover a new
timing channel within USB mass storage devices and require no extra equipment to
uniquely identify devices.

51
SanDisk Blade #0 SanDisk Blade #1 SanDisk Blade #2 SanDisk Blade #3
0.4
0.3

Frequency
0.2
0.1
0.0
20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100

0.20
Generic #0 Generic #1 Generic #2 Generic #3
0.15
Frequency

0.10
0.05
0.00
20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100
SanDisk Ultra #0 SanDisk Ultra #1 SanDisk Ultra #2 SanDisk Ultra #3
0.4
Frequency

0.2

0.0
20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100
Samsung Bar Plus #0 Samsung Bar Plus #1 Samsung Bar Plus #2 Samsung Bar Plus #3
0.3
Frequency

0.2
0.1
0.0
20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100
Histogram Bin Histogram Bin Histogram Bin Histogram Bin

Figure 3.2: Histograms of read timings for 16 different USB mass storage drives. Each
plot contains 20 different samples.

3.2 Timing Side-Channel Exploration


USB mass storage devices are sophisticated systems that contain at least one
microprocessor (i.e., flash storage controller), some form of embedded firmware, and
one or more flash memory devices. The microprocessor(s) is utilized to maintain the
flash translation layer and the flash endurance (via wear leveling) and to communicate
with a host computer. Whenever the USB mass storage device connects to the host,
a series of transactions provide the host with information about its size, capabilities,
name, partition table, etc. For those transactions, individual physical devices may
demonstrate small variations (e.g., timing variations) within a tolerance boundary that
does not affect normal operations. One common method for observing these variations
is through unintentional electromagnetic emissions [44, 46, 47, 52, 82].
While prior works have demonstrated that the USB device enumeration process
can be used to identify individual host computers / OSes [21, 52, 99], I attempt to
explore a timing channel to accurately identify USB devices. In particular, this work
searches for observable timing differences between the interactions of a mass storage

52
device and its host. If the flash controller of one device can respond faster or slower
than that of a different device, it is possible that this variation can be used to identify
a device. Furthermore, if a large chunk of data is requested from the device, the flash
translation layer may access multiple locations to return all of the data at once. The
time taken for this action (e.g., consult translation table, access one or multiple flash
blocks within the device, coalesce data, respond to host) may also create observable
timing differences.

3.2.1 Motivation of Time-Print


Previous works [21, 52, 99] have demonstrated that the USB handshake and
enumeration process can leak information about the host, including the host’s operating
system (different command sequence) or the host itself (timing differences between
packets). I first attempt to check whether such a handshake and enumeration process
can also generate a stable fingerprint for USB devices.
Within the Linux operating system, this handshake entails the loading of a series
of drivers, each providing more specialized functionality to the USB device. Once the
device is initially connected to the host system, the USB core driver accesses the device
and requests its descriptors. The device responds with its descriptors and identifies its
class (e.g., human interface device, mass storage, etc.) A device object is created, and
the specific class driver is then instantiated. In the case of a USB mass storage device,
the USB storage driver is initiated, and the USB storage driver probes the device via
its communication interface, the Small Computer System Interface (SCSI). The host
utilizes SCSI commands to probe the filesystem, the appropriate filesystem driver is
then loaded depending on format (e.g. FAT, exFAT, NTFS, ext4, etc.), and the drive
is finally mounted and enumerated. With the drive handshake completed, the drive
remains idle until the user opens the drive to access it.
I utilize the usbmon [186] driver within Linux and the Wireshark [162] program
to capture and analyze the raw packet transmissions during the device enumeration
and mount process. I find that the behaviors of packet transmissions between similar

53
devices do not vary significantly enough to create a unique profile. In addition, the file
contents of the same device greatly influence the behaviors of the device enumeration
process, such as addresses, sizes, and the number of packets. Therefore, the device
enumeration process cannot be leveraged to generate a reliable fingerprint.

3.2.2 Creation of a Reliable Fingerprint


To remedy this issue, I seek a new approach for creating a reliable fingerprint.
While the timing of USB setup packets does not seem to provide fingerprintable
information, the interfacing with the flash controller can. Each time the host system
requests data from a USB device, the flash controller must access the flash translation
layer, determining and accessing the location of the block (or blocks if the files are
fragmented across multiple physical locations). It then coalesces those areas into USB
packets and sends them to the host. Our intuition is that this access time varies based
upon the locations of the blocks on a device as well as the size of a read. To examine
whether this assumption is valid, I issue a known series of read requests of different sizes
and locations via SCSI commands on the device. By recording the timestamp for each
read action, I attempt to construct a statistical fingerprint for the timing characteristics
of each device. I utilize the sg read utility [56] to achieve low-level control of the read
commands. Each read sets the Direct IO (DIO), Disable Page Out (DPO), and Force
Unit Access (FUA) flags of the sg read utility to ‘1’. This combination of flags forces
the system to access the USB drive with each read and disallows the operating system
from utilizing cached read data. Especially, the DPO flag forces the USB device to fetch
the read from the physical media and keeps the flash controller from responding with
cached reads. This flag combination is necessary to ensure that each read physically
probes the specified flash blocks (allowing for true timing values to be gathered), instead
of simply reading cached data.

54
Host USB Device

Timing
Extra Read
Acquisition
SCSI
Commands
Preprocessing

Fingerprints

Identification

Time-Print Physical Blocks

Figure 3.3: The design of Time-Print.

3.2.3 Preliminary Classification


To investigate whether a timing fingerprint might be possible, I conduct prelimi-
nary experiments by gathering timing readings from 16 different devices: 4 devices for
each of 4 different models (i.e., a generic device found on Amazon, SanDisk Cruzer Blade,
SanDisk Ultra, and Samsung Bar Plus). A histogram of these readings is presented in
Figure 3.2.
Each graph contains the histograms of 20 separate readings. The high overlap
between readings implies that the timing measurement is stable from reading to reading,
and thus may be a good candidate for fingerprinting. Visual inspection demonstrates
that different brand/model devices exhibit different timing characteristics, indicating
that read timings will enable us to differentiate devices with different models. Further
inspection of the variations among devices of the same model shows that some clear
differences still exist. For example, SanDisk Blade 1 and SanDisk Blade 3 in the first row
demonstrate differently shaped distributions. Thus, the preliminary results motivate us
to develop a timing-based device authentication mechanism.

55
3.3 Time-Print Design
In this section, I detail the design and implementation of Time-Print and describe
how Time-Print generates device fingerprints. In general, Time-Print extends the USB
driver to generate a number of extra reads on randomly chosen blocks on USB devices via
the SCSI commands (as shown in Figure 3.3) and then measures the timing information
of these read operations. The process of Time-Print consists of four steps, namely, (1)
performing precise timing measurements, (2) exercising the USB flash drive to generate
a timing profile, (3) preprocessing the timing profile, and (4) conducting classification
based on the timing profile for device acceptance/rejection.

3.3.1 Performing Precise Timing Measurements


As shown in Figure 3.3, Time-Print enables the fingerprinting technique within
the driver using SCSI commands. Such a design allows the fingerprint data to be
acquired before the device is fully connected to the host system (thus allowing for
rejection if the device is deemed unrecognized). Also, the driver has visibility into every
packet exchanged between the device and the host with minimal delay, which reduces
the overhead and latency for the authentication process while simultaneously increasing
the precision of the timing measurements.
The USB mass storage driver and the USB SCSI command sequence maintain a
complex series of objects within the Linux operating system to control the command
and data transactions communicated between the host and peripherals. Every data
read consists of three parts, as visually presented in Figure 3.4: (1) the host issues a
read command to the device, which specifies the size and location of the data to be
read; (2) the peripheral responds with the requested data; (3) the peripheral responds
with a status packet to indicate that the transfer is either successful or unsuccessful.
Within the USB driver, two different methods control these three transactions.
The usb store msg common method transfers the command packet and receives the
status packet, while the usb stor bulk transfer sglist function receives the actual
data from the device. To perform precise timing measurements of these transactions,

56
Host Peripheral

Com
m and

a
Timing Dat
Information

s
Statu
r an sfer
T

Time

Figure 3.4: A USB SCSI command sequence.

Time-Print leverages a low overhead and high granularity timing source, the CPU
timestamp counter (TSC), which is a monotonic 64-bit register present in all recent
x86 processors. While initially designed to count at the clock speed of the CPU, most
recent systems implement a ‘constant TSC’, which ticks at a set frequency regardless
of the actual CPU speed. This feature enables Time-Print to precisely time the data
transmission phase, regardless of the underlying CPU frequency. I utilize the built-in
kernel function rdtsc() both before and after each transaction to record the precise
amount of time it takes for the execution of each interaction.
With the collected timing information, Time-Print further integrates a low-
overhead storage and reporting component for this timing information. This component
modifies the USB driver to maintain a continuous stream of timing information for the
drive. Specifically, I augment the us data structure present in the USB storage header
to contain arrays to keep track of command opcode, size, address, and TSC value for
each transaction.

57
Device Manufacturer Device Name Size Flash Controller Number of Devices USB Protocol
SanDisk Cruzer Blade 8GB SanDisk 10 USB 2.0
Generic General UDisk 4GB ChipsBank CBM2199S 10 USB 3.0
SanDisk Ultra 16GB SanDisk 10 USB 3.0
Samsung BAR Plus 32GB Unknown 4 USB 3.1
PNY USB 3.0 FD 32GB Innostor IS902E A1 1 USB 3.0
Kingston DataTraveler G4 32GB SSS 6131 1 USB 3.0

58
Kingston DataTraveler SE9 64GB Phison PS2309 1 USB 3.0
PNY Elite-X Fit 64GB Phison PS2309 1 USB 3.1
SMI USB Disk 64GB Silicon Motion SM3269 AB 1 USB 3.0
SMI USB Disk 64GB Silicon Motion SM3267 AE 1 USB 3.0
SanDisk Cruzer Switch 8GB SanDisk 1 USB 2.0
SanDisk Cruzer Glide 16GB SanDisk 2 USB 2.0

Table 3.1: USB mass storage devices utilized in the evaluation of Time-Print.
To transfer the timing values and record them (for prototype purposes), I
implement a character device within the USB storage driver to transfer the timing
information to the userspace for further processing. Since accessing the TSC is designed
to be a low overhead function, the induced overhead is negligible (more discussion on
the overhead is presented in Section 3.6). To ensure minimal performance impact, once
a device has been approved, the timing and storage functionality can be disabled.

3.3.2 Exercising the USB Flash Drive


As discussed in Section 3.2.1, it is difficult to build a reliable timing-based
fingerprint based on the information leaked from the USB handshake and enumeration
process, due to its variable nature. Instead, I develop a common test pattern that can
be applied to any USB device. In particular, I generate a script with a random pattern
of reads in different sizes from different offsets within the drive. The script is executed
whenever a new USB device is detected by the host system. This procedure ensures a
consistent number of reads from different locations on the drive allowing for the creation
of a statistical, timing-based fingerprint. Meanwhile, reading from multiple locations
with different sizes is necessary as it provides a better chance of generating a unique
fingerprint for the flash drive. According to Micron [117], the NAND flash blocks built
into a USB flash drive are at least 128KB, while each logical block address that can be
accessed by the host system corresponds to a 512-byte chunk. As the logical to physical
mapping is opaque to the user, it is challenging to know whether a large read from
a specific location involves any accesses to multiple contiguous flash blocks, multiple
blocks in different locations, or only a single block. By attempting to generate as many
different types of accesses as possible, Time-Print can better extract the subtle timing
differences caused by those accesses.

3.3.3 Preprocessing Timing Values


As shown in Figure 3.4, there are three packets exchanged between the host
and peripheral: the original command, the responding data packet, and the transfer

59
status. I need to capture and record the timing values for each packet from the host’s
perspective. Specifically, a timestamp is recorded upon the entry and exit of each of
the two functions listed above. Each timestamp also includes the following meta-data:
command opcode, the size of the packet, and the offset the data is coming from. The
preprocessing step of Time-Print filters any commands that are not read commands
from the recording, and searches for the beginning of the commands from the read
script to discount any packets that are issued as part of the drive enumeration. As the
goal of the fingerprint system is to focus specifically on the time it takes for the drive
to access blocks of the USB device, not the timing between packets, I calculate the
time latency between when the host finishes sending the command packet and when
the host finishes receiving the data response packet from the drive.
The next step is to organize this raw timing information, which contains timing
data from a multitude of locations and sizes. I group them into separate bins where
each contains one size and address offset. Grouping the timing results by read size and
offset ensures that each timing sample within a group corresponds to a single action or
group of actions within the drive, allowing for meaningful statistical analysis.

3.3.4 Classification
With the timing information grouped by size and offset, I can leverage features
and machine learning techniques to create a fingerprint for each device. Based on the
trained fingerprints, Time-Print can reject or accept devices. For the different security
scenarios mentioned in the threat model, Time-Print uses different algorithms for better
performance. Section 4.6 further presents the details for different scenarios.

3.4 Evaluation Setup


To demonstrate the effectiveness and potential applications of Time-Print, I
build a testbed to extract fingerprints from 43 USB mass storage devices. In this section,
I describe the equipment utilized, the detailed data collection methodology, the read
sequence utilized, and how I denote the training and testing datasets.

60
3.4.1 Experimental Devices
I utilize the following devices and system configurations to gather fingerprints.
Host System, OS, and Driver Modifications. Our host system is a DELL
T3500 Precision tower. The system contains an Intel Xeon E5507 4 core processor with
a clock speed of 2.27GHz and 4GB of RAM. The USB 2.0 controllers are Intel 82801JI
devices. I utilize a Renesas uPD720201 USB 3.0 controller (connected via PCI) for
USB 3.0 experiments.
The host runs Ubuntu 18.04 LTS and I modify the USB storage drivers as
detailed in Section 3.3.12 . Namely, I modify the USB driver to record the timing
information for the start and completion of each USB packet transmission that is a
part of the USB storage stack. Each time a device is connected, a data structure is
created to store the timestamp and packet metadata information. This data structure
is deleted upon device disconnect. A character device is inserted into the USB driver
code to facilitate the transfer of this timing information to log files after the completion
of drive fingerprinting operations.
USB Devices. I test the performance and applicability of Time-Print with 12
unique USB models and 43 different USB devices. Table 3.1 lists the device manufacturer,
name, size, controller, number of devices, and protocol for every device used in our
experiments. I select these brands to create a broad dataset that contains a number of
the most popular devices on the market (purchased by users on Amazon as of September
2020). Each device is analyzed with no modifications to the firmware of the device.
To ensure fairness, all devices are zeroed and formatted as FAT32 with an
allocation size of 4KB, and are identically named as ‘USB 0’. I extract the device
controller name by using Flash Drive Information Extractor [153]. Of note, SanDisk
does not publicly identify the versions of their flash controllers and simply reports the
name ‘SanDisk’.

2
Since Time-Print is entirely software-based, it could reasonably be extended to macOS
and Windows with cooperation from developers.

61
Raw Grouped
Samples Samples Features Classifiers

Group 1
Mean 1 1D
Size: 16KB
Loc: 0 Mean 2
Scenario ❶
Group 2
Size: 32KB Mean N
Loc: 0 K-means

Group 3
Size: 64KB
Loc: 0
Histogram 1 2D
Histogram 2 Scenarios ❷❸
Group N
Size: 64KB Histogram N Neural
Loc:M Networks

Figure 3.5: Flow of generating 1D features from the raw fingerprint samples of a drive
as used for different model identification (top) and 2D features as used for individual
device classification (bottom).

USB Hub and Ports. To facilitate testing of the USB drives, I utilize an
Amazon Basics USB-A 3.1 10-Port Hub that I connect to the inbuilt USB 2.0 Intel
82801JI hub on the host for USB 2.0 experiments and to the Renesas uPD720201 USB
3.0 hub for USB 3.0 testing.

3.4.2 Data Collection


Given our setup, I implement a script to gather data from multiple USB devices
at once. The Amazon Basics USB hub utilized in our experiment can selectively
enable/disable the power connected to each specific port. I implement this functionality
through the uhubctl [173] library and simulate the physical unplugging and replugging
of each USB device between every sample.
To reduce any impact on the precision of the timing within the driver, which is
of the utmost importance for fingerprint, I utilize the Linux cpuset utility to isolate
the USB storage driver process to its own CPU core. This largely prevents interference
from context switches. Furthermore, since some CPUs do not guarantee that the TSC is
synchronized between cores, it is necessary to ensure that all measurements are gathered
from the same core.

62
To better explain the overall testing methodology, I further present the sample
acquisition process with an example of 10 different USB drives. Before testing, each
port on the USB hub is disabled such that no power is provided to a plugged-in device.
I then plug each drive into a port on the USB hub and record the mapping of the hub
port to drive ID (to match each sample to a specific drive). The fingerprint gathering
script enables the first port on the USB hub and waits for the USB driver process to be
launched. Upon launch, the driver process is isolated to a single core of the CPU to
ensure maximum timing precision. Next, I launch the fingerprinting script that initiates
a series of reads of different sizes and in different locations on the drive. The returned
data is not recorded because only the timing information of these reads is important.
Once the collection script completes, I mount the character device and write all of the
recorded timing information to a log file. The system then unmounts the character
device and USB device and disables the USB port to simulate unplugging the device. I
also simulate non-idle system states: the Linux stress utility is run to fully utilize one
CPU core on every other sample. The above process is repeated for the next port on
the USB hub. All drives are tested in a round-robin fashion.
Once 20 fingerprints have been gathered from each drive, I physically unplug
each drive and plug it into a different port on the USB hub; this ensures that any
difference observed in the readings is caused by the individual USB drives, not the USB
port.

3.4.3 Fingerprint Script


To gather a fingerprint, I utilize a script of 2,900 reads. Each read is randomly
chosen to be of size 16KB, 32KB, or 64KB, and to access six logical blocks at 0x0,
0x140000, 0x280000, 0x3c0000, 0x500000, 0x640000. The block addresses are spread
out in an attempt to access diverse physical locations on the drive. To ensure that
any uniqueness observed in the fingerprint is caused by physical variations in the drive
accesses and not script variations, the script is randomly generated once and then used
for each device.

63
3.4.4 Training and Testing Datasets
As mentioned above, in our experiments, fingerprints are gathered in a round-
robin fashion from devices in a set of 20. After collecting 20 fingerprints for all drives, all
devices are physically unplugged and then plugged into different ports. I thus refer to a
group of 20 fingerprints as a ‘session’ of data. For all devices listed in Table 3.1, I gather
4 sessions of fingerprints (i.e., 80 fingerprints). I then conduct 4-fold cross-validation by
selecting 3 sessions for training, and 1 session for testing.

3.5 Time-Print Results


To evaluate the effectiveness of Time-Print, I conduct a series of experiments
in the three scenarios listed in Section 3.1, namely, identifying devices with different
brands, identifying unseen devices of the same brand, and auditing (i.e., classification
on all insider devices).

3.5.1 Scenario ¶: Brand Identification


I first examine the accuracy for identifying a random (unknown) USB device of
a brand different from approved devices. For instance, a system administrator would
like to prevent a dropped device attack where a careless employee plugs in a malicious
unauthorized device. While Figure 3.2 (in Section 3.2) shows that this timing-based
fingerprint has the potential to be very effective, here I quantitatively evaluate all
devices listed in Table 3.1.
Approach. To accomplish this task, I expect that Time-Print trained on a
specific device should always accept that device while rejecting all other devices with
different models and brands. Thus, I design a single-class classification system using
the K-Means algorithm. The one class classification system creates clusters of samples
from the approved device and draws a decision boundary to reject any readings from
devices of other brands or models.
Particularly, K-Means requires that each data sample is presented as a 1D feature
list. K-Means utilizes this feature list and a distance metric to calculate a sample to

64
sample distance by examining the features of each sample, and groups the samples
into clusters. Once the algorithm converges, I calculate the distance of each training
sample to its closest cluster. The maximum distance value is then used to set a decision
boundary. In this case, for a fingerprint to be accepted by the clustering algorithm,
it must be within the decision boundary of one of the pre-trained clusters. I first
preprocess each sample into different chunks by separating each reading based on the
size and location offset of the measurement. With the size and locations grouped, I
calculate the mean of each group, generating a 1D feature list for each sample, as
illustrated in the upper part of Figure 3.5.
Training and testing. I train the one-class classifier on four types of devices:
(1) the Generic Drives (10 devices), Samsung Bar Plus (4 devices), SanDisk Ultra (10
devices), and SanDisk Cruzer Blade devices (10 devices). I then test the classifier
against all other devices listed in Table 3.1. For clarity of presenting the results, I group
all extra devices with the USB 3.X protocol into a set called ‘other USB3’, and all extra
devices with the USB 2.0 protocol into a set called ‘other USB2’.
For example, to test the accuracy for the Generic Drives, I have four sessions (80
fingerprints in total) of data for all ten devices in this model. For Generic Drive #1, I
train the classifier using three sessions of data and test the classifier using the remaining
one session of data, and the data from all other devices from different brands/models. I
repeat the experiment for each Generic device and report the average accuracy.
Accuracy. The results are presented in Table 3.2, showing very high accuracy:
an average true accept rate of 99.5% while rejecting all drives of different models and
brands (i.e., zero false accept rate). As mentioned in the threat model, Time-Print is
mainly designed for use in a high-security system. Such a system should always reject
unknown models to minimize security risks. While the true accept rate of 99.5% may
still reject a legitimate device with a very small chance for the first trial, the user can
simply re-plugin the USB drive and re-authenticate with the system. The probability
of being rejected twice in a row is only 0.0025%. In other words, the probability of a
legitimate device being accepted after two trials is 99.9975%, which is very close to one.

65
Training Devices
SanDisk Samsung SanDisk
Generic
Cruzer Blade Bar Plus Ultra
Generic 99.9% 0.0 0.0 0.0
SanDisk
0.0 98.8% 0.0 0.0
Cruzer Blade
Devices
Testing
Samsung
0.0 0.0 99.7% 0.0
Bar Plus
SanDisk
0.0 0.0 0.0 99.9%
Ultra
Other USB2 0.0 0.0 0.0 0.0
Other USB3 0.0 0.0 0.0 0.0

Table 3.2: Percentage of samples accepted when trained for each device model.

Overall, these results show that Time-Print can accurately distinguish unknown
devices with different brands and models from legitimate devices.

3.5.2 Scenario ·: Same Brand Device Identification


The second scenario requires Time-Print to identify unseen devices of the same
brand and model, which is a much more difficult task as all devices share the same
design.
Approach. To this end, I utilize a 2-D convolutional neural network for the
classification task. As our task is not to locate the best possible network for classification
but to demonstrate that fingerprinting a USB mass storage device is possible, I adopt
a standard classification network design. For reference, the network architecture is
provided in Table B.1 in the Appendix.
For preprocessing, similarly to Scenario ¶, I separate the raw timing information
by size and location. As the script contains six possible locations and three possible
sizes, the separation procedure produces 18 distinct collections of timing data for each
fingerprint gathered.
To utilize these values within a neural network, I transform their raw format
(a collection of numbers ranging from one to ten million) to a value range that works

66
for neural networks (e.g. 0 to 1). Especially, I convert the data from each group to
a histogram, with all data being scaled by the group global minimum and maximum
values, from the entire training set. Such a method creates a fine-grained representation
of the signal. This also makes sense as large reads take much longer to complete than
short reads, and a full ranged histogram would contain a large amount of unimportant
zero values. To ensure experimental integrity, the individual minimum and maximum
ranges are recorded and used to process the testing set.
Each histogram can be represented as a 1D vector of measurement frequency,
and the histograms for all groups can be concatenated together to create a 2D input
vector to the classification network. This process is illustrated in the lower part of
Figure 3.5. Another advantage of the histogram and neural network combination is
that the network can rapidly be tuned to work for different drives, since the number of
histogram bins, readings per size and location, or input trace can easily be adjusted
while maintaining a consistent preprocessing pipeline.
Training and testing. To achieve accurate identification, system administra-
tors can purchase multiple devices from the same brand and model to serve as ‘malicious’
devices to train the classifier. I emulate this scenario by examining the SanDisk Cruzer
Blade, SanDisk Ultra, and the Generic drives. I have 10 devices for each model. Among
the 10 devices, for training, one device is selected as the ‘legitimate’ drive, and 8 of
the remaining 9 devices are chosen as ‘malicious’ drives; then the last is used as the
‘unseen’3 device for testing purposes. During training, I use 60 samples of each drive
involved. During testing, I utilize the remaining 20 samples of each ‘legitimate’ drive
and 20 samples of each ‘unseen’ drive. To ensure fairness and remove any influence
of randomness, I test all 90 possible combinations (10 possible ‘legitimate’ drives × 9
possible ‘unseen’ drives) and cross-validate each by rotating the samples utilized for
training and testing.

3
The ‘unseen’ device is equivalent to an attacker’s ‘malicious’ device, and I use a
different term to differentiate the malicious device in testing from those used in training.

67
SanDisk SanDisk
Generic
Cruzer Blade Ultra
TAR TRR TAR TRR TAR TRR
Raw 92.2% 93.8% 96.5% 89.2% 97.6% 90.6%
Augment 97.3% 91.7% 98.0% 93.5% 98.7% 91.4%

Table 3.3: Average True Accept Rate (TAR) and True Reject Rate (TRR) for same
model device identification.

Accuracy. Table 3.3 presents the results, showing a compelling average true
accept rate (TAR) of 95.4% and an average true reject rate (TRR) of 91.2%.
After investigating the false acceptances, I find that most false acceptances occur
in pairs. I realize that the problem of classifying an unknown drive is likely to benefit
from synthetic data. Augmenting the training set with random variations (in an attempt
to simulate more unknown devices), or with samples from more ‘malicious’ devices
may better solidify the decision boundary of the network, leading to higher overall
accuracy. I also augment the samples of the ‘legitimate’ drives, albeit with much smaller
perturbations, to increase the true accept rate. I randomly select samples from the
training set and perturb them with noise. This augmentation procedure improves the
results, increasing the overall average accuracy to 95%. More specifically, the average
true accept rate increases to 98.0%, and the average true reject rate increases to 92.2%.
Overall, these results indicate that this approach has enough information to
uniquely fingerprint USB drives and that Time-Print can even detect unseen devices of
the exact same brand and model.

3.5.3 Scenario ¸: Auditing / Classification


I finally evaluate the effectiveness of Time-Print on the auditing scenario, in
which a system administrator needs to determine exactly which device had files copied
to/from it (to track/identify an insider threat). I evaluate the accuracy for Time-Print
to uniquely identify a single device from a pool of devices that are authorized for use.
I employ a network with a similar architecture to the one employed in Scenario

68
Device Name (# of Devices) Classification Accuracy
SanDisk Cruzer Blade (10) 98.6%
Generic Drive (10) 99.1%
SanDisk Ultra (10) 98.7%
Samsung Bar Plus (4) 98.4%

Table 3.4: Classification accuracy for each drive type in Scenario ¸.

· and shown in Table B.1 of the Appendix. Since the goal is to identify each individual
drive, I modify the final output layer of the network to contain the same number of
neurons as devices that I attempt to classify. I utilize the same histogram transformation
from Scenario ·, where each sample is separated by size and location and then converted
to a histogram for utilization in the neural network.
Similarly to Scenario ·, I train and test (with cross-validation) a classifier for
each model (i.e., only drives in one model are trained and tested), as I expect that
an organization that adopts a device authentication system like Time-Print will limit
the usage of USB drives to a particular model. The classification results are listed in
Table 3.4 for the SanDisk Cruzer Blade, Generic, SanDisk Ultra, and Samsung Bar Plus
devices. I can see that Time-Print achieves accuracy above 98.4% for varied devices,
including those from some of the best selling manufacturers (SanDisk and Samsung).
Furthermore, the data for SanDisk and the Generic devices demonstrates that the
variability between drives is rich enough to create distinct classification boundaries
among different drives. Finally, this data shows that USB fingerprinting is not limited
to a single manufacturer or USB protocol. In short, Time-Print is able to fingerprint a
USB drive within the same brand and model for accurate classification.

3.6 Practicality of Time-Print


With the viability of fingerprinting USB mass storage devices demonstrated, I
further examine the practicality of Time-Print in multiple aspects, including the latency
of fingerprint acquisition, the impact of host system hardware variations on fingerprint
accuracy, device usage, location accesses, whether just the flash controller itself can be

69
utilized for fingerprinting, and how Time-Print might be deployed in the real world.

3.6.1 System Latency


The time to acquire the USB fingerprint varies depending upon the number of
reads and the protocol used by the device (e.g., USB 2.0 or 3.0). I measure the time
required to capture the fingerprint from a SanDisk Cruzer Blade USB 2.0 device and a
SanDisk Ultra USB 3.0 device. The time cost of achieving the results in Section 4.6
is an average of 11 seconds on the USB 2.0 drive and 6 seconds on a USB 3.0 drive,
respectively. The time difference is expected as the components of the USB 3.0 drives
are faster to support the enhanced speed of the protocol.
On the other hand, intuitively, fewer extra reads in the driver should save time,
but degrade the identification accuracy. I further evaluate how the number of observed
reads affects the accuracy of Time-Print by truncating the gathered samples and
examining the accuracy in Scenario ¸ with the SanDisk Blade and Ultra devices.The
results are presented in Figures 3.6 and 3.7. Both figures show that the accuracy
decreases by at least a full percentage point when the number of samples is halved. The
degradation continues gradually on the USB 2.0 device (down to 95% accuracy when
30x fewer samples are taken) and more steeply on the USB 3.0 device (reducing to 90%
accuracy when 30x fewer samples are taken). Overall, even with 10x fewer samples
being used, Time-Print can still achieve more than 94.5% accuracy while reducing the
latency to only about 1 second, since the time required to acquire a fingerprint scales
linearly with the number of extra reads.
Such a result indicate that there exists a trade-off between the time required
to generate a fingerprint and the ability to use the fingerprint for unique device
authentication. The system administrator can utilize this knowledge to choose between
the time required to obtain a fingerprint for their system and the desired security level.

70
Figure 3.6: Classification accuracy degradation as the number of samples is reduced
(10 SanDisk Ultra USB 3.0 drives).

Figure 3.7: Classification accuracy degradation as the number of samples is reduced


(10 SanDisk Blade USB 2.0 drives).

71
3.6.2 Fingerprints with Hardware Variation
When the fingerprint data is acquired, it must pass through a myriad of system
components. For example, the data transmission, beginning with the USB drive, must
go through any ports and hubs along its path, through the USB controller on the
motherboard, and finally through the bridge between the motherboard USB controller
and the processor. Each of these system components may contain varying levels of
routing logic and create timing variations in the fingerprint. As such, I conduct several
experiments to understand the impact of hardware variations on fingerprint accuracy.
Different Ports and Hubs.
To understand the impact of using different ports and hubs, I utilize the training
data from Section 4.6, but gather new testing sets with both Generic and SanDisk Blade
devices. I conduct two tests: (1) the USB hub is plugged into a different host port
and (2) another Amazon Basics USB-A 3.1 10-Port Hub is used to test the accuracy of
these configurations with the classifier and training data of Scenario ¸. I observe that
utilizing a different host port or a different hub slightly reduces the accuracy from 99%
to about 95% for the Generic devices but has no effect on the SanDisk Blade devices.
Different Host. I further investigate the impact of different host machines:
can the same fingerprint be transferred between different host machines? I expect to
see a degradation in accuracy as many factors (e.g., variations in the clock speed of the
processor, motherboard, etc.) are likely to alter the fingerprint. To assess the impact,
I gather a dataset on a second host system with a different configuration (system
comparison is listed in Table 3.5) using both the Generic and SanDisk Blade devices.
Again, I utilize Scenario ¸ as an example to measure the accuracy degradation.
The main difference between the two host systems lies in the different CPUs.
The TSC tick rate (i.e., the rate at which the TSC increments) is directly dependent on
the base clock speed of CPU. Thus, I prescale the data gathered on the testing machine
by multiplying the timing values by a factor of 0.7386, which is the ratio of 2.26 GHz
on our training machine to 3.06 GHz on the testing machine.

72
Training System Testing System
Intel Xeon E5507 Intel Xeon W3550
Processor
4C/4T @ 2.27 GHz 4C/8T @ 3.06 GHz
Motherboard Dell 09KPNV Dell 0XPDFK
RAM 2x2GB 1x8GB
USB Controller Intel 82801JI Intel 82801JI

Table 3.5: System configurations for cross host investigation.

With this preprocessing step, the SanDisk Blade devices experience no accuracy
degradation, and the Generic drives experience an 11% accuracy decrease to 88%, which
is still a promising finding. To understand the reason for these different behaviors, I
uncover that the Generic devices appear to produce noisier distributions with more
similar peak locations than the SanDisk Blade devices, as shown in Figure 3.2. I infer
that such increased noise coupled with different electrical paths (e.g., different hubs,
ports, machines) makes the Generic devices harder to classify in a cross host scenario.
However, it should be noted that in an enterprise environment, people usually purchase
a number of identical host machines with the same model of processor, motherboard,
USB controllers, etc. As a result, I might experience even better fingerprint transfer
between hosts. Meanwhile, this host transfer is not required in our threat model, as
system administrators can train an authentication system for each protected computer.

3.6.3 Fingerprint Robustness with Device Usage


Flash devices utilize a logical to physical mapping within the flash translation
layer to ensure that the flash blocks are evenly used within a device (a process called
wear-leveling). When the usb-storage driver attempts to write data to an address,
it specifies a logical address which the flash translation layer converts to a physical
address. Because flash blocks are modified at the block level, instead of the bit level, a
write operation requires the data to be written to a new empty block and the logical
to physical address mapping is updated. Since Time-Print utilizes the physical timing
characteristics of specific physical blocks (accessed via logical addresses), this remapping

73
might degrade the accuracy of the fingerprint as the device is written to.
To investigate the impact of this remapping, I conduct an experiment by writing
hundreds of random files to five SanDisk Cruzer Blade devices and track the accuracy
of the classification system by gathering a sample between each write. In total, I write
6,520MB data to each 8GB drive.
The results demonstrate that Time-Print is somewhat resilient to drive writes,
experiencing no accuracy degradation until about 2.3GB at which point the accuracy
rapidly decreases. To better understand the cause of this sudden accuracy degradation,
I examine the behavior of the actual flash drive. I utilize the tool hdparm to observe
the actual logical block address (LBA) of each file, and notice that the drive attempts
to write files to the lowest available LBA. The classification neural network essentially
performs a matching task, attempting to classify the trace as the class that is the closest
to the training samples. After more than half of the LBAs utilized for the fingerprint
are written, the neural network is no longer able to perform this task reliably, since the
majority of the LBAs are no longer the same. To address this problem, there are two
solutions: LBA reservation and manufacturer support.
LBA reservation. If Time-Print can prevent the drive from updating the
virtual to physical mapping of the blocks utilized for fingerprinting, it can prevent drive
writes from affecting the fingerprint, as the drive will not reassign pages that are in
use. This can be accomplished by placing small placeholder files at their locations for
LBA reservation. I implement this mechanism by copying large files (to occupy large
swaths of LBAs) and small files into the chosen fingerprint locations, and then deleting
the large files. I use the hdparm tool to check the LBAs used by the small fingerprint
files. All of the small files combined together are only 768KB in total, thus inducing
low overhead. I then write 7.3GB (the capacity of the drive) data to the drive in 16MB
chunks, and observe no changes in the histograms and no accuracy degradation. This
solution can adequately accommodate the normal drive usage as long as the small
fingerprint files are not deleted (by users).
Manufacturer Support. This is the most resilient solution but requires

74
collaboration with drive manufacturers. Manufacturers already provide extra flash
blocks that are hidden from users to facilitate better wear leveling and drive performance.
They can similarly reserve extra blocks for fingerprinting on new devices. This solution
can ensure that Time-Print fingerprints are unaffected by write operations and further
ensure that accidental deletion of the contents of the drive will not interfere with the
fingerprint.

3.6.4 Spoofing A Fingerprint


An advanced attacker might design a malicious device to deceive Time-Print
by mimicking a legitimate drive (e.g., replicate the physical fingerprint with an FPGA
based system). While all of the experiments in this study utilize a static read sequence
of 2,900 reads, in a real deployment, the read-sequence, including the specific locations
and number of reads, can be either a secret (stored on the protected system) or randomly
generated based upon a device identifier (e.g., use the serial number as a random seed).
Since attackers are unable to know the exact locations utilized by Time-Print, they can
only fingerprint random locations and hope that Time-Print would accept the spoofed
values.
To assess the security of Time-Print against this type of advanced attacks, I run
an experiment where I generate random choices of locations to test whether Time-Print
accepts a legitimate drive fingerprinted in the wrong locations. To emulate an attacker
who is unaware of the correct sample locations, I gather a new dataset for the drives
that are sampled in the wrong locations. More specifically, I generate a script that
randomly chooses 6 locations on a drive and generates reads every time the drive is
plugged in. I test Time-Print similarly to Scenario · wherein I train Time-Print to
accept samples in the correct locations of the legitimate drive and to reject samples
from other devices. To further augment the training set, I add random noise to some of
the training samples from the legitimate drive (similarly to Scenario ·). Our testing
set consists of the samples from the legitimate device taken in the correct locations,

75
which should be accepted, and the samples from the legitimate device taken in the
wrong locations (to emulate a spoofing attack) that should be rejected.
I test this setup with the SanDisk Blade, Ultra, and Generic devices and observe
an average of 96.4% true accept rate and 99.6% true reject rate. This result indicates
that Time-Print is very robust against such ‘spoofing’ attacks.

3.6.5 Other Considerations


I further investigate whether better accuracy could be obtained by increasing the
number of addresses accessed by Time-Print. Theoretically, accessing more locations
on the drive should provide more information to better identify drives. To this end, I
conduct experiments on accessing 18 locations (as opposed to 6), while maintaining
the same number of total extra reads. I gather data on the SanDisk Cruzer Blade,
Generic, and SanDisk Ultra drives, and evaluate the performance in Scenarios · and
¸. I observe that while the individual accuracy of each drive type fluctuates slightly,
the average performance (across all three models) in each scenario remains similar.
Another consideration is the modification of the access order. I run an experiment
with five SanDisk Cruzer Blade devices by randomizing the access order for each sample.
There is no accuracy degradation. I also examine whether the device format affects
Time-Print. I reformat all of the devices to EXT2 and retrain Time-Print. Similarly, no
accuracy degradation is observed. This is expected as the file system format is another
virtual layer above the physical pages of the USB device and therefore should not affect
the fingerprint.

3.6.6 Fingerprint the Flash Controller


I also examine whether the timing information from only the flash controller could
be utilized to identify the drives. I investigate this by utilizing the timing information
of the ‘transfer status’ packet (a packet that comes only from the USB controller on
the mass storage device), instead of the timing information for returning the data. I
test this on both the SanDisk Cruzer Blade and the SanDisk Ultra devices. I find that

76
utilizing only this information reduces accuracy from greater than 98% to 65% and 45%
for the two types of devices, respectively. This shows that while the timing information
of the flash controller can be utilized to identify some devices, it alone is insufficient to
create a robust fingerprint.

3.6.7 Real-World Deployment of Time-Print


I have demonstrated that Time-Print can be utilized in various scenarios for
USB drive authentication. Each of the scenarios can serve as a module in a more
complete security system that might be deployed in the real world. For example, a
system administrator concerned mainly about protecting systems from stray external
devices can employ our system as demonstrated in Scenario ¶, while an administrator
with concerns about targeted attacks might choose to utilize Scenarios ¶ and ·
together, first rejecting unknown models and then ensuring that the device is legitimate.
Scenario ¸ can be further employed to track user activities for auditing purposes.
Time-Print can also be integrated into other USB security systems, which offer firewall
like protections [14, 165, 168] but rely on the drive to correctly report its identification.
The identification capability of Time-Print will provide a stronger defense against skilled
attackers who can alter device identifiers [76, 86].

3.7 Related Work


In this section, I survey the research efforts that inspired this work as well as
highlight the key differences between this work and previous research.

3.7.1 Device Fingerprinting


Uniquely identifying individual physical devices has long been of interest to
the security community [28, 30, 95, 136, 187]. The ability to track and authenticate
a physical device accurately can help increase security and serve as another factor in
multi-factor authentication. As such, many different methods for device fingerprinting
have been presented.

77
One of the most common methods for fingerprinting is the utilization of (un)intentional
electromagnetic frequency radiation. Cobb et al. [46, 47] showed that the process
variations in the manufacturing process cause subtle variations in the unintentional
electromagnetic emissions, which can be utilized to generate a valid fingerprint for
similar embedded devices. Cheng et al. [44] further found that unique fingerprints can
be created for more sophisticated systems like smartphones and laptops. Other prior
works [27, 57, 129, 138] study the fingerprint generation in radiating electromagnetic
signals for communication (e.g. Zigbee, WiFi, etc.). The most similar work to Time-
Print is Magneto [82], which uses the unintentional electromagnetic emissions during
device enumeration on a host to fingerprint USB mass storage devices. While their
work demonstrates the ability to classify different brands and models accurately, the
system requires expensive measurement equipment. By contrast, this work requires
no special equipment and uncovers a novel timing channel that can be used to further
identify devices within the same brand and model.
Device serial numbers, descriptors, and passwords are also used to thwart the
connection of unauthorized USB devices [12, 83]. These defenses inherently trust that
the USB device is accurately reporting software values. TMSUI [181], DeviceVeil [160],
and WooKey [24] use specialized hardware to uniquely identify individual USB mass
storage devices, and as a result, most of these systems are not compatible with legacy
devices. Instead, Time-Print is completely software-based and does not require any
extra or specialized hardware. The USB 3.0 Promoter Group has proposed a USB 3.0
Type-C PKI-based authentication scheme [172] to identify genuine products, but these
mechanisms are not designed to uniquely identify individual devices. Other prior works
utilize a USB protocol analyzer [99] or smart devices [21] to identify a host system and
its specific operating system by inspecting the order of enumeration requests and timing
between packets [52]. Unlike those works, the objective of Time-Print is to identify the
peripheral device, instead of the host.

78
3.7.2 Flash Based Fingerprints
Several prior works have investigated whether the properties of flash devices can
be utilized for fingerprinting. For example, device fingerprints are constructed using
programming time and threshold voltage variations [135, 174]. Others [74, 88, 92, 123,
143, 159, 176] further investigate the design of physically unclonable functions in flash
chips and explore the impact of write disturbances, write voltage threshold variation,
erase variations, and read voltage threshold variation. Sakib et al. [142] designed a
watermark into flash devices by program-erase stressing certain parts of a device.
The above techniques work at a physical level, which requires control and
functionalities that may not be available in a cost-constrained, mass-market device
like a USB flash drive. Time-Print only utilizes read operations (a common function
available on all USB flash drives) and thus is non-intrusive. In addition, while these
technologies could be incorporated into new devices, Time-Print is fully compatible
with existing devices and only requires a slight modification to the host driver.

3.7.3 USB Attacks and Defenses


USB is an easy to use and trusting protocol, which immediately begins to
communicate with and set up devices when they are plugged in. Tian et al. [169]
surveyed the landscape of USB threats and defenses from USB 1.0 to USB C, showing
that most existing defenses that require extra hardware do not adequately work with
legacy devices. Several attacks [86, 76] have demonstrated that modifying the firmware
of USB devices can rapidly subvert a system.
Many defenses have been proposed to mitigate the problem. For example, the
TPM (trusted platform module) has been used to protect sensitive information [22, 32].
GoodUSB [165] attempts to thwart firmware modification attacks [86] by creating
a permission system so that users can specify permissions for devices.
VIPER [101] proposes a method to verify peripheral firmware and detect proxy
attacks via latency based attestation. Hernandez et al. [78] automatically scanned USB

79
firmware for malicious behaviors. USBFILTER [168] presents a firewall in the USB
driver stack to drop/allow USB packets based on a set of rules.
Similarly, Cinch [14] creates a virtual machine layer between USB devices and
the host machine to act as a firewall. Johnson et al. [89] designed a packet parser to
protect the system from malformed USB packets. Tian et al. [163] proposed a unified
framework to protect against malicious peripherals.
Other prior works like USBeSafe [91] and USBlock [122] utilize machine learning
algorithms to analyze the characteristics of USB packet traffic to prevent keyboard
mimicry attacks [86]. Like those works, Time-Print is a software-based approach to
enhancing USB security.

3.8 Summary
This paper presents Time-Print, a novel timing-based fingerprinting mechanism
for identifying USB mass storage devices. Time-Print creates device fingerprints by
leveraging the distinctive timing differences of read operations on different devices. I
develop the prototype of Time-Print as a completely software-based solution, which
requires no extra hardware and thus is compatible with all current USB mass storage
devices. To assess the potential security benefits of Time-Print, I present a comprehen-
sive evaluation of over 40 USB drives in three different security scenarios, demonstrating
Time-Print’s ability to (1) identify known/unknown device models with greater than
99.5% accuracy, (2) identify seen/unseen devices within the same model with 95% accu-
racy, and (3) individually classify devices within the same model with 98.7% accuracy.
I further examine the practicality of Time-Print, showing that Time-Print can retain
high accuracy under different circumstances while incurring low system latency.
Now that the offensive and defensive capabilities of side channels have been
demonstrated, I next turn our attention to how side channels can be utilized to invade
user privacy. Unlike the first attack in Chapter 2 which required the attacker utilize
extra hardware to monitor the user device, the next attack will take place completely

80
remotely, requiring no extra hardware, and attacking user privacy from across the
internet.

81
Chapter 4

AN EXPLORATION OF ARM SYSTEM LEVEL CACHE AND GPU


SIDE CHANNELS

4.1 Introduction
While Advanced RISC Machines (ARM) processors have dominated the mobile
device market over the past decade, recently they have also gained market share in both
cloud computing and desktop applications. Enterprises like Apple and Samsung have
announced plans to develop ARM based laptop devices that function with the complete
MacOS and Windows operating systems. Apple has already released its M1 ARM chip
to power its newest laptop and desktop devices. Spurring this rapid expansion of ARM
devices into new markets is the adoption of a more peripheral based design that attaches
a number of coprocessors and accelerators to the System-on-a-Chip (SoC). ARM has
also adopted a System-Level Cache to serve as a shared cache between the CPU-cores
and peripherals. This design works to alleviate the memory bottleneck issues that exist
between data sources and the accelerators, allowing higher speed communication and
increased performance.
If the marketshare of ARM processors in desktop and laptop systems continues
to increase, it is expected that attackers will pay more attention on ARM and explore
more vulnerabilities of ARM. While extensive research has been conducted on exploring
and securing microarchitectural side channels on Intel’s x86 systems, far less research
has been focused on the ARM architecture. Furthermore, as mobile OSes tend to deny
low level control over the hardware, most vulnerabilities are usually within non-essential
APIs [190, 87, 54, 39, 103, 194] and are rapidly patched. ARM designers must be careful
to ensure that their designs are not vulnerable to malicious attacks when exposed to a

82
full fledged operating system, where OS developers are able to exert far fewer restrictions
on potential attacker activities.
In this paper, we present an in-depth security study on recent personal computing
devices (e.g., mobile phones and laptops) equipped with ARM processors with the
recent DynamIQ [110] design. Unlike previous designs that only share cache within
core clusters, these devices contain multiple levels of cache and share the last level
cache with other core clusters and accelerators (e.g., graphics processing unit). Unlike
x86 processors, these ARM devices utilize heterogeneous core architectures, different
caching policies, and advanced energy aware scheduling to increase performance and
battery life. We endeavor to examine whether those advancements (e.g., new cache
architectures, the tight integration of accelerators, etc.) make the ARM platform more
difficult to attack compared with with x86 platforms.
Specifically, we focus on investigating cache occupancy channels [151], which
continually monitor shared cache activities, to fingerprint websites. We design a series
of microbenchmarks to better understand how ARM system behaviors (e.g., energy
aware scheduling, core selection, and different browsers) affect the cache occupancy
channels. Based on our preliminary study, we further optimize the attack for these
new ARM cache designs and consider multiple different browsers, including Chrome,
Safari, and Firefox. The redesigned attack significantly reduces the attack duration
while increasing accuracy over previous cache occupancy attacks. Furthermore, we
introduce a novel GPU contention channel in mobile devices, which can achieve similar
accuracy as the cache occupancy channel. To evaluate the proposed attacks, we conduct
a thorough evaluation on these attacks across multiple devices, including iOS, Android,
and MacOS with the new, ARM-based, M1 MacBook Air. The experimental results
show that the System-Level Cache based website fingerprinting technique can achieve
promising accuracy in both open (up to 90%) and closed (up to 95%) world scenarios.
Overall, the main contributions of this work are summarized as followed:

• An examination of the system level cache within new ARM SoCs that utilize

83
the DynamIQ design principle, especially how different components and software
scheduling affect cache behaviors.

• A thorough evaluation of the cache occupancy side channel attack on Android,


iOS, and MacOS platforms implemented in both native and javascript attack
vectors.

• An analysis of JavaScript engine memory management and how it impacts attack


effectiveness.

• The discovery of a new GPU side channel attack that can be utilized to fingerprint
user behaviors on MacOS and Android.

The rest of this chapter is organized as follows: Section 4.2 provides necessary
background information. Section 4.3 presents the threat model and discusses the unique
challenges that the ARM architecture creates for attackers in a shared cache occupancy
attack. Section 4.4 details our system design and Section 4.5 describes our experimental
setup. Section 4.6 analyzes our findings and Section 4.7 surveys related works. Finally,
Section 4.8 concludes the paper.

4.2 Background
4.2.1 Caching and Side-Channel Attacks
Modern computer systems utilize a tiered memory system to enhance their
performance, from the smallest and fastest (i.e., L1) to larger and slower (e.g., L2
and L3). Two important distinctions in caching are exclusive and inclusive caching.
Inclusive caching guarantees that any memory address that is included in a cache tier
is also present in the cache tiers below it. For example, a value in the L1 cache is
also present in the L2 and L3 caches. By contrast, an exclusive caching policy ensures
that items are only present in one level of the cache (e.g., an item in the L1 cache is
not present in the L2 or L3 cache). While there are various pros and cons to both
caching policies, Intel x86 processors mostly employ inclusive caching, but recent ARM
processors tend to utilize exclusive caching policies.

84
As portions of the cache are shared between all processes, it has been widely
exploited for side channel attacks. By determining whether specific memory is in the
cache (e.g., timing its access time), attackers can infer the information of the victim.
The ‘prime+probe’ attack [106, 127] attempts to identify vulnerable data locations that
indicate specific program flows. With a high resolution timer and a predictable program,
cache-based side channel attacks allow attackers to extract private information such as
encryption keys.
Cache Occupancy Channel. Shusterman et al. [151] suggested two versions of
the cache occupancy channel, cache occupancy and cache sweeping. In cache occupancy,
they designated a sample rate (every 2ms) and accessed the entire buffer. If the buffer
is accessed faster than 2ms, the total time to access the buffer is recorded. If the
access takes longer than 2ms, a miss is recorded. In cache sweeping, the cache buffer is
continually accessed and the number of full ‘sweeps’ in each sampling period is recorded.
At the beginning of each sample period, the system starts accessing the cache from the
first location. They demonstrated that such techniques can be used for robust website
fingerprinting in x86 systems.

4.2.2 Consumer ARM System Design


Unlike x86 systems which utilize homogeneous core designs in their processors,
consumer ARM devices (as opposed to ARM based server platforms which are out of
the scope of this work) differ greatly and utilize a heterogeneous architecture.
ARM big.LITTLE and DynamIQ. In ARM, the big.LITTLE design was
first developed to overcome the battery limitation in mobile devices. The big.LITTLE
architecture consists of a SoC made from two discrete computing clusters, one low
power group of cores and one high power group [107]. With a number of new scheduling
techniques, the architecture allows the mobile OS to utilize high and low power cores
for different tasks to extend battery life. In ARM, the cache system is also redesigned.
Instead of having a private and shared cache architecture with an identical size across
all cores, big.LITTLE utilizes differently sized caches, wherein the high performance

85
ARM DynamIQ Architecture

big Cluster
LITTLE Cluster core [0]
core [0] core [1] L1I L1D
L1I L1D L1I L1D
core [1]
core [2] core [3] L1I L1D
L1I L1D L1I L1D

L2 L2

Dynamic Shared Unit

System Level Cache

GPU ISP
L2

DSP

Figure 4.1: Overview of ARM’s DynamIQ architecture featuring heterogeneous processor


cores organized into high (big) and low (LITTLE) performance clusters. The CPU
clusters and accelerators (GPU, ISP, and DSP) are all connected to a shared system
level cache.

86
cores have access to larger L1/L2 caches than their lower performance counterparts.
As the L2 caches of the different core clusters are not shared between clusters, a large
amount of cache coherency traffic is necessary to facilitate switching tasks between the
high and low performance cores, resulting in suboptimal performance.
To overcome this performance limitation, a newer system ‘DynamIQ’ [110] was
developed for ARM. The DynamIQ system allows greater modularity and design freedom
than the original big.LITTLE system. DynamIQ allows the processor designers to
create multiple clusters of heterogeneous processors (instead of just two), and employs
a shared L3 cache to improve computational performance between processor clusters,
as shown in Figure 4.1. Our work explores the potential security vulnerabilities in this
shared cache architecture.
Accelerators. Due to the explosive popularity of machine learning applications
in image and signal processing domains, mobile devices have begun to require a low
power method for executing neural network inference functions. To resolve this issue,
current mobile devices make use of a number of accelerators or co-processors to enable
advanced functionalities within their energy budget. Recent versions of Apple’s custom
A series chips, Qualcomm’s Snapdragon, and Samsung’s Exynos chips have begun to
increase their reliance on accelerator peripherals. Those chips include dedicated digital
signal processors, image signal processors, motion co-processors, neural processing units,
and graphics processing units.
The inclusion of numerous accelerators creates a major system design issue.
To utilize a co-processor, it must be supplied with data and a set of instructions to
operate on. The co-processor must then complete its calculations and return the data
to the main processor. In a non-integrated SoC, communication with co-processors
must take place over a bus, and this can severely limit performance speedup. Nvidia
has attempted to resolve part of this problem on x86 with GPUDirect [70], allowing
for direct transfer of data to the GPU without the CPU. To speed up co-processor
performance in ARM, the DynamIQ system utilizes a system level cache that is shared
with these accelerators. ARM calls this technology cache stashing [108], which allows

87
tightly coupled accelerators (such as GPUs) to directly access the shared L3 cache and
in some cases directly access L2 caches.

4.2.3 Website Fingerprinting and Timer Restrictions


Website fingerprinting attacks identify the websites that a user visits. Usually
this involves training a classification system to distinguish a series of sensitive websites
that the attacker is interested in. The motivations for website fingerprinting can range
from a desire of learning information about a target (e.g., political views, health issues,
and gambling activity) to the construction of a user profile for advertisement tracking.
Typically, website fingerprinting attacks involve an attacker that observes encrypted
network traffic and attempts to classify the user’s activities through features extracted
from the packet stream (e.g., timing, packet size, and packet order) [35, 69, 81, 130, 139].
However, such attacks require access to the network traffic of the victim. To sidestep this
requirement, researchers have identified that the action of downloading and rendering a
website inevitably leaves a trace in the CPU and cache activities of the victim system,
which can be monitored via local side channel to identify the victim’s website visiting
activities [115, 151].
Motivated by high profile side channel attacks like Spectre [93] and Melt-
down [105] that utilize the JavaScript performance.now() command to perform
nanosecond resolution timing measurements, browser and mobile operating system
designers have worked to limit access to system APIs and high resolution timer resources.
Specifically, in response to the Spectre and Meltdown attacks, browser manufacturers
have greatly reduced the precision of the
performance.now() counter [133] to between 50 microseconds and 1 millisecond. With
the typical difference between cache misses and hits being defined in 10s of nanoseconds,
this resolution is insufficient to successfully launch most side channel attacks.

88
4.3 Threat Model and Challenges in ARM
4.3.1 Threat Model
This work studies the ability of an attacker to fingerprint a user’s website
browsing activity via a low frequency contention channel in either the shared cache or
the GPU of an ARM SoC. The attacker is motivated to track the user’s web activity for
some malicious purposes, such as to better identify the victim’s interests for targeted
advertising or to covertly determine sensitive information (e.g., medical condition,
sexual/political preferences, etc.) for the purpose of discrimination or blackmail. We
consider two typical scenarios in website fingerprinting: (1) closed world, where the
victim only visits websites from the list of sensitive websites; and (2) open world, where
users might also visit some non-sensitive websites. To accomplish the fingerprinting task,
the attacker can pre-profile a list of sensitive websites and build a model based on specific
browsers (e.g., Chrome/Firefox/Safari) and devices (e.g., MacBook/Smartphone).
To evaluate the potential threat from this attack, we mainly examine a web-based
attacker who is only capable of delivering JavaScript from a website. We also conduct
an investigation of an app based attacker who is able to trick a user into installing
malware, but impose additional limits, analyzing how well the attack would function if
the OS clock functions were similarly limited to those of web browser1 .
Web-Based Attacker. The web-based attacker attempts to exploit the cache
occupancy channel in the context of the web browser, delivering a JavaScript file to
the user via a malicious advertisement on a legitimate page or by tricking the user
into visiting a malicious web page. We assume that the attacker is unable to exploit
any vulnerabilities in the browser. Instead, (s)he attempts to create a cross tab attack
scenario, wherein the user leaves the tab with the malicious JavaScript open and
continues to browse other websites in a different tab. The malicious JavaScript in the
background tab continues to run and attempts to monitor the user’s activity. This is

1
Researchers have demonstrated that the high precision timers available to native
programs can produce very accurate attacks. OS developers may move to reduce the
attack surface by reducing the granularity of available timers in the future

89
reasonable as all current web browsers enable users to visit multiple websites at the same
time in different browser tabs. While tabs are isolated from each other in software, they
are not necessarily segregated in hardware. Furthermore, the weak attacker is restricted
by the privileges granted to JavaScript, and are subject to the reduced precision timers,
memory management, and scheduling constraints that the browser enforces.
App-Based Attacker. We assume that the app-based attacker is capable
of tricking the user into installing an application or program onto their device that
contains the malicious observation code. The code can be integrated into a benign
application such as a music player, fitness tracker, or social media application, and
is therefore capable of running a disguised process to monitor user activities. Unlike
the web-based attacker, the app-based attacker is not restricted to only JavaScript
and has access to the APIs provided by the operating system, allowing better control
over memory management and scheduling. However, the attacker is not granted any
super-user privileges and does not utilize any exploit to access privileged commands.
Note that, in both scenarios, the application/JavaScript does not necessarily
need to be sourced from a purely malicious entity. Such a tracking service could
be deployed in social media applications to better identify and profile user activities.
Large ad-supported companies like Google or Facebook could also greatly benefit from
deploying a similar script on their webpages, continually monitoring users browsing
activities to better target advertisements.

4.3.2 Cache Occupancy Challenges in ARM


Exploiting the occupancy statistics of the last level cache has been studied
with varying degrees of success across x86 systems [151, 140, 48]. In parallel to this
work, Shusterman et al. [150] performed a cursory proof that the cache occupancy
could also be applied to ARM systems. We greatly expand their work, investigating
a number of different configurations and optimizations across multiple browsers and
devices. To motivate these optimizations, we first describe unique challenges that the
ARM ecosystem presents to the cache occupancy channel.

90
Table 4.1: Devices and High Power (HP) and Low Power (LP) core configurations
utilized in this work.
Device Core Configuration High Power L1/L2 Low Power L1/L2 System Level Cache
2x Lightning (HP) 128KB L1i / 128KB L1D / Core Unknown L1i / 48KB L1D / Core
iPhone SE 2 16MB
4x Thunder (LP) 8MB L2 Shared 4MB L2 Shared
4x Kryo 385 Gold (HP) 64KB L1i / 64KB L1D / Core 64KB L1i / 64KB L1D / Core
Android 2MB
4x Kryo 385 Silver (LP) 256KB L2 / Core 128KB L2 / Core
4x FireStorm (HP) 192KB L1i / 128KB L1D / Core 128KB L1i / 64KB L1D / Core
MacBook Air 16MB
4x IceStorm (LP) 12MB L2 Shared 4MB L2 Shared

ARM Cache Contention. ARM systems differ from common x86 architectures
in multiple aspects. ARM offers exclusive and inclusive caching at different levels, and
utilizes heterogeneous architectures in which multiple different core architectures and
cache layouts may be present on the same chip. Also, each type of core may run
at different frequencies. Those factors increase the difficulty of exploiting the cache
occupancy channel in the ARM architecture. Since the system level cache is the only
cache level shared by all processor cores in ARM, if the scheduler moves the spy and
victim processes between different core types, it can greatly affect the observed cache
profile.
Due to the exclusive nature of the last level cache in ARM, when a process
migrates the data in its L1/L2 caches, the data will not be present in the last level
cache, but in the L1/L2 caches of its previous location. Upon migrating a process from
one core type to another, some ARM processors invalidate the entirety of the previous
cores’ caches, while others may allow that data remain until it is evicted. In either case,
in an exclusive cache setup, any reads to locations that were in the L1/L2 cache of the
previous location will be serviced from the L1/L2 and have no impact on the L3 cache.
This greatly hinders the cache occupancy channel: while in an inclusive cache, one
could reliably observe L3 occupancy (if the value were removed from L3, it would be
removed from all higher levels), the exclusive cache can serve the value from either the
previous L1/L2 or main memory, giving no indication as to the status of the L3 cache2 .

2
The L3 cache on ARM also maintains the ability to be selectively inclusive if an
item is utilized by more than one core [111], however, the cache occupancy JavaScript
channel does not utilize shared memory and should not experience this behavior.

91
Exclusive caching also has drawbacks with respect to buffer size. In an x86
system with inclusive caching, the spy process evicting the entire L3 cache would also
remove any data in the L1/L2 caches. Thus, when the victim process accesses data,
it always causes activities in the L3 cache3 . However, in an ARM system, if a victim
process accesses a buffer small enough to fit in the L1/L2 cache, a spy process that is
monitoring the entirety of the L3 cache will never see this activity. While this behavior
might be unnoticed, and even preferable, to a program under normal circumstances, it
is not ideal for the cache occupancy channel. The cache occupancy channel assumes
that continually accessing a large buffer in cache will completely evict any data of the
victim process from the L3. Also, it assumes that any access to memory will bring
data back into the L3, making it observable. Thus, to better suit ARM processors, the
access patterns and buffer sizes for the cache occupancy channel should be carefully
considered.
Browser Differences. Further complicating the applicability of the cache
occupancy channel is the memory management of a web browser. The web-based
attacker must work within the constraints of the JavaScript engine within each web
browser. Today’s popular web browsers, including Google Chrome, Apple Safari, and
Mozilla Firefox, utilize different JavaScript engines. Furthermore, these JavaScript
engines must interact with the system scheduler. Different OSes (e.g., Google’s Android,
Apple’s iOS, and MacOS) likely utilize carefully tuned schedulers to maximize the
performance. Finally, the JavaScript engines of the major browsers will manage memory
in different ways, and the garbage collector of each JavaScript engine will handle memory
management in a way that is not accessible to the attacker. Thus, a one size fits all
approaches to cache occupancy fingerprinting is certainly not ideal as each browser may
act very differently, even on the same hardware.

3
In some x86 server CPUs (specifically Skyake-X CPUs from Intel, the L3 is ‘non-
inclusive’, meaning that it is neither fully inclusive or exclusive. Consumer CPUs from
Intel have not yet adopted this layout.

92
4.4 Optimizing ARM Cache Occupancy
We first design a series of microbenchmarks to better understand ARM system
behavior. In particular, we investigate how energy aware scheduling, core selection, and
different browsers impact the cache occupancy channel.

4.4.1 Test Devices


We select three commonly utilized devices: an iPhone SE 2 to test iOS, a Google
Pixel 3 for Android, and a MacBook Air 2020 with M1 chip for MacOS. Detailed
information about each device is included in Table 4.1. In particular, the Google
Pixel 3 utilizes a Qualcomm SnapDragon 845 with high and low power cores based
off of the Cortex A-75 and A-55 respectively, making the acquisition of cache sizes
straightforward. The iPhone SE2 utilizes Apple’s A13 Bionic processor which contains
completely custom designed cores from Apple. These cache sizes are mainly acquired
from community microbenchmarking [63] and die image analysis. The MacBook Air
2020 contains Apple’s newest ARM chip, the M1. This contains high and low power
processor cores that are one generation newer than those in the A13. Apple is known
to design custom CPU cores and cache designs to increase their performance over their
competitors. Community benchmarking [63, 64] has found that Apple limits the amount
of the shared cache that single cores can utilize. This has large implications for shared
and system level cache usage, indicating that it may be quite difficult to reliably evict
cache entries.
We employ a Node.JS server to serve HTML and JavaScript resources to our
test devices. As JavaScript is single threaded, our JavaScript microbenchmarks run
within a web worker4 context.

4
Web workers were designed to facilitate background processing off of the main UI
thread, allowing for complex computation to take place in the background while keeping
a website responsive.

93
4.4.2 Cache Access Pattern
Modern ARM processors utilize cache prefetchers to learn data access patterns
and bring data into the cache beforehand. To accurately measure the cache performance
of a device, it is necessary to develop a cache access pattern that defeats these prefetchers.
While exact prefetching algorithms are closely guarded secrets, current systems broadly
utilize two types of prefetcher, the next line prefetcher and the stride prefetcher.
The next line prefetcher exploits spatial locality, assuming that the processor
will want to access the next data line and therefore fetches it from memory. The stride
prefetcher actively learn patterns in data access and fetch the data based on the pattern.
For example, the stride prefetcher observing that a program accesses every 10th element
of an array will begin to bring future elements into the cache before they are requested.
It has been demonstrated that the stride prefetcher is limited in recognizing
patterns within memory pages and can only keep track of a certain number of patterns
before the hardware pattern matching is exhausted [51]. To evade the two prefetchers,
we follow a similar access pattern to that of [51]. We create a large array of buffers
which spans multiple memory pages. We then access the first line of every page, then
the third line, then the fifth line, etc. By accessing every other line, we avoid any
impact from the next line prefetcher. By accessing one item from each buffer before
looping back to the first buffer, we exhaust the ability of the stride prefetcher to learn
a pattern.

4.4.3 Foreground vs. Background Activity


With an access pattern that can largely mitigate the influence of the prefetchers,
we then design a microbenchmark to check cache behavior differences between foreground
and background activity. Specifically, we seek to understand whether a tab is in the
foreground (likely running on a higher performance core), or in the background (likely
running on a lower performance core) significantly affects the behavior of memory
accesses.

94
Average Memory Access Time Google Pixel 3
120
110 12

Background Access Time (ns)

Foreground Access Time (ns)


100
90 11
80 10
70
60 9
50
Background 8
40 Foreground
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
Buffer Size (MB)

Figure 4.2: Google Pixel 3 Cache Average Memory Access Time

To this end we create a large buffer and access increasingly large portions in the
prefetcher thwarting manner described previously, and record the required time in each
iteration. We then normalize the access times with respect to the number of memory
accesses to better understand the cache sizes and memory management. To assess
background activity, we run the script in a background tab while the foreground tab is
set to www.google.com. We also find that writing to the accessed buffer (e.g. increment
a counter stored at each array location) increases the consistency of experiments.
This can be attributed to a more complex instruction stream reducing the amount of
optimization and/or reordering that can occur, and thereby better exposing the cache
sizes.
Figures 4.2 shows the result for the Google Pixel 3. We observe a large difference
between the behavior of the foreground and background tabs. The cache access in the
background is about 5-10 times longer than those in the foreground. Also, the shape
of the distribution is different, clearly demonstrating that the higher and lower power
processor cores behave differently. The iPhone SE2 also demonstrates very different
foreground and background cache behavior, as shown in Figure 4.3. Background accesses
are nearly 10x slower than foreground accesses, and the background memory access time

95
Average Memory Access Time iPhone SE 2
50
6.0
45 5.5

Background Access Time (ns)

Foreground Access Time (ns)


5.0
40
4.5
35 4.0
30 3.5
3.0
25
Background 2.5
20 Foreground 2.0
0 2 4 6 8 10 12 14 16
Buffer Size (MB)

Figure 4.3: iPhone SE 2 Cache Average Memory Access Time

curve is significantly different from the foreground access curve. The foreground curve
experiences multiple sharp increases in cache access time, indicating the multiple levels
of cache are present (e.g., L1, L2, L3) while the behavior of the background process
shows far less distinguishable increases in timing.

4.4.4 Browser Memory Management


In a desktop operating system like MacOS, major browsers (e.g., Google Chrome,
Apple Safari, Mozilla Firefox) typically utilize their own rendering and JavaScript engine.
Also, the M1 Macbook Air is the first device running a desktop/laptop operating system
utilizing a heterogeneous processor. We therefore examine cache behavior on the M1
MacBook Air across these major browsers.
Figures 4.4, 4.5, and 4.6 show the results for different browsers. Most notably,
Apple’s Safari is the only browser that seems to take advantage of the heterogeneous
cores with a 10x slowdown in access speed and a noticeably different timing pattern
for the background tab. It indicates that the foreground and background process
were impacting different caches (the different cache architectures of the high vs. low
power cores). Both Google Chrome and Mozilla Firefox seem to maintain the same

96
Average Memory Access Time Chrome
6.0 6.0
5.5 5.5

Background Access Time (ns)

Foreground Access Time (ns)


5.0 5.0
4.5 4.5
4.0 4.0
3.5 3.5
3.0 3.0
2.5 2.5
2.0 Background 2.0
Foreground
1.5 1.5
0 5 10 15 20
Buffer Size (MB)

Figure 4.4: M1 MacBook Air Cache Average Memory Access Time in Chrome

Average Memory Access Time Safari


45 5.5
40 5.0
Background Access Time (ns)

Foreground Access Time (ns)


35 4.5
30 4.0
25 3.5
20 3.0
15 Background 2.5
10 Foreground 2.0
0 5 10 15 20
Buffer Size (MB)

Figure 4.5: M1 MacBook Air Cache Average Memory Access Time in Safari

97
Average Memory Access Time Firefox
5.5 5.5
5.0

Background Access Time (ns)

Foreground Access Time (ns)


5.0
4.5 4.5
4.0 4.0
3.5 3.5
3.0 3.0
2.5 2.5
Background
2.0 Foreground 2.0
0 5 10 15 20
Buffer Size (MB)

Figure 4.6: M1 MacBook Air Cache Average Memory Access Time in Firefox

access speed for their respective foreground and background processes, indicating that
background tabs are not relegated to the low power cores.
We also observe that the overall shape of the timing curves for cache accesses is
unique to each browser, indicating that even though the access pattern was the same,
the memory allocation algorithms for each JavaScript engine are vastly different. Thus,
understanding how these allocation strategies affect cache timing can greatly increase
the accuracy of a potential cache occupancy attack. Specifically, when searching for an
optimal buffer size used in the cache occupancy attack, we expect that sharp increases
in memory access time indicate a buffer overflowing a cache level, and therefore presents
a suitable target size to begin testing.
Figures 4.4, 4.5, and 4.6 also shows that each browser has different locations
for the increases in memory access time. This can be attributed to the differences in
JavaScript engines, which utilize different interpreters and compilers. Also, JavaScript is
a prototype based language, and there is a variable amount of overhead for creating any
type of buffer or array (e.g. an Int32Array will contain extra bytes describing the usage).
Furthermore, the garbage collection system in a JavaScript engine prevents users from
directly controlling their memory allocations. Current garbage collection algorithms are

98
designed to reduce memory fragmentation and reclaim / reduce the memory footprint
of the programs running. This step is frequently referred to as a ‘compact’ step and
many algorithms will physically copy the memory to a new (different) location without
any warning.
In addition, each JavaScript engine utilizes a unique allocation strategy and thus
allocates different amounts of memory for the same size object. We use the developer
tools within Mozilla Firefox and Google Chrome to examine this in a more fine grained
manner. We find that a 1,024 element Int32 array in Google Chrome utilizes 4,220
Bytes as opposed to an expected 4,096, an excess of 124 Bytes while a single element
array utilizes 136 Bytes, an excess of 132 Bytes. In Firefox, these same arrays utilize
4,224 Bytes and 96 Bytes, respectively. These differing overheads would require that an
attacker design specific code for each browser.
In summary, the memory management of the JavaScript engine has a large effect
on the actual allocations and stability of memory addresses utilized in a browser based
JavaScript attack. The combination of these effects with the aforementioned issues of
prefetchers, differing cache sizes with heterogeneous core designs, and exclusive cache
policies greatly deteriorate the ability of an attacker to exploit a cache occupancy
channel.

4.5 Attacks on ARM


In this section we present our optimizations to the cache occupancy channel
for various ARM devices. To determine the effectiveness of various modifications
to the channel, we design a robust data collection setup with the Appium [15] and
Selenium [147] frameworks to control our iOS, Android, and MacOS devices.

4.5.1 Setup
Data Sets. To monitor the accuracy of the cache occupancy channel, we utilize
an abbreviated open world dataset, which consists of multiple accesses to sensitive and
non-sensitive websites. It marks all non-sensitive websites as a single class, regardless

99
of domain. Particularly, We utilize a dataset of 1,500 website accesses, containing 100
accesses to the top 10 Alexa websites (i.e., sensitive websites) and 1 access to 500 other
websites not within the Alexa top 100 (i.e., non-sensitive websites). To prevent any
biasing of the dataset, we generate a random order for these 1,500 accesses and then
utilize the same order for every experiment. We believe this randomization is important,
and previous works do not discuss the access order. If all websites are visited in the
same order repeatedly, it might lead to invalid accuracy data when dealing with a cache
channel. Unlike network based fingerprinting attacks, the CPU cache may retain some
of its state between website accesses causing the machine learning system to identify
incorrect features and boost the accuracy of the test. Note that this abbreviated dataset
is used in this section to optimize the side channel attacks on ARM. In the next section,
we conduct a thorough evaluation using a much larger dataset.
Machine Learning Approaches. We evaluate the performance utilizing
multiple supervised learning algorithms. Specifically, we utilize the Rocket [53] transform
paired with Ridge regression and a convolutional neural network. We rely on similar
hyperparameters to the original cache occupancy paper [151] as our starting point. All
classifiers are trained and tested with a cross validation strategy, wherein we utilized
90% of the data for training, and 10% of the data for testing. We report the average of
5 rounds of training and testing.

4.5.2 Optimizing Cache Occupancy Attack


We have demonstrated in Section 4.3.2 that, unlike previous studies in homoge-
neous CPU architectures, the accesses in background tabs can be nearly 10x slower.
This indicates that the system may never actually be able to access the entire buffer in
the allotted time period. Furthermore, browser manufacturers may continue to decrease
granularity of their timing sources, resulting in less useful information.
Examining the Snapdragon 845 processor in the Google Pixel 3, we find that
the low power cores are based off of the Cortex A55 design from ARM and that the
Snapdragon 845 processor has been configured to utilize 2MB of system level cache.

100
Using the information from Figure 4.2, accessing the cache takes about 60ns at a 2MB
buffer size. Since the Snapdragon 845 employs a 64 Byte cache line size, to avoid
prefetching, we should access every 32nd integer in our 2MB buffer. As the buffer can
hold ≈500,000 integers, this results in ≈ 16, 000 accesses. At 60ns per access, this
equates to just under 1ms. While the Snapdragon 845 has configured the system level
cache to be 2MB, the Cortex A55 supports up to 4MB of shared cache [109], and the
accesses may take almost 2ms with no background activity, and will almost certainly
take more than 2ms if the processor is performing another task. Thus, if the system
described in [151] is used without modification every trace would be nearly identical
with only overlong accesses and no identification would be possible. To this end, we
propose a series of modifications that work for devices regardless of their access speed
to cache. This enables the attacker to adjust the buffer size for the device and not have
to worry about adjusting the sample rate if the device happens to be very slow.
Modifications. The first modification entails recording the number of cache
accesses within the time frame, instead of the time to complete accesses. This is
advantageous for a few reasons. First, this system is far less effected by changes in the
accuracy of clock. The system will always record the number of actual cache accesses,
a number that is far more fine grained than time to access the cache. To enhance
system performance on slower devices, we also increase the access window time to 4ms
to increase the number of possible accesses. With these initial modifications, we achieve
75% open world accuracy in the abbreviated 10 site test (Section 4.5.1) on the Google
Pixel 3.
With the first enhancement, the system checks the number of total cache accesses
in the time period. It then needs to frequently check the clock to see if the time period
has been completed. We find that the Android system only completes about 2,500
accesses per 4ms window, which is far lower than the original predicted value of ≈16,000
accesses per 1ms window. We find that when profiling the page, the vast majority of the
code runtime is taken by the script performing the performance.now() call to check
whether the time window is elapsed. Since the ARM last level caches are exclusive, the

101
attack might have several issues if the cache occupancy system continually accesses
the same beginning elements of the buffer without ever accessing the entirety of the
buffer. In the worst case, if the number of accesses can fit in the L1 and L2 cache,
the script may never actually impact the L3 cache. Therefore, it can only observe
minimal information about the L3 occupancy, and thus performs poorly in the website
fingerprinting. If the accesses overflow into the L3 but do not fill it, the system will
perform sub-optimally as it is unable to fully observe the L3 cache. Furthermore, it will
continue to observe the same portion of the L3 cache, which may not provide useful
information.
We thus further employ two enhancements. The first enhancement accesses the
buffer in a circular fashion: if the script only completes 2,500 accesses in the time
window, it will access the 2,501st element at the beginning of the next window. It
only returns to the first element once all elements have been visited. This ensures
that the buffer eventually fills the L3 cache and that sequential observations observe
different parts of the cache. We find that this technique increases the accuracy of the
10 site open world dataset to about 83%. The next enhancement is to decrease the
amount of time that the script spends checking the time. Instead of checking after every
access, we check after every 20 cache accesses. This enhancement (without circular
accesses) increases the accuracy to 84%. We then combine both enhancements and
further increase the accuracy to 86%. We present a through evaluation in Section 4.6.

4.5.3 Novel GPU Channel


The DynamIQ CPU design not only adds the L3 shared cache between all of the
processing cores within a cluster, but also allows for the L3 cache to be shared with any
other peripherals contained within the SoC. This means that peripherals/accelerators
like the Graphics Processing Unit (GPU), Digital Signal Processor (DSP), and Image
Signal Processor (ISP) are all able to impact the shared cache. In particular, the GPU is
heavily utilized to display a web page to the user. Newer web browsers employ hardware
acceleration when rendering and displaying web pages. Elements like HTML5 Canvas,

102
WebGL or WebGL2 animations, and videos are also usually hardware accelerated. Thus,
we endeavor to explore whether the GPU and shared cache architecture of current ARM
DynamIQ can be utilized to create a website fingerprinting side channel.
It is challenging to exploit a GPU cache occupancy channel. WebGL2 and basic
HTML5 canvas elements only update at a low frequency of 60Hz. While these sampling
rates can be increased, working with the canvas element in a background tab further
increases the complexity and overhead. Also, it is not straightforward to determine the
amount of memory that a GPU process consumes. GPU programming within JavaScript
is mainly designed around graphical interfaces and smooth animations. An ideal attack
should instead perform minimal useless image display, but focus primarily on exploiting
the side channel. Therefore, we utilize a JavaScript library called GPU.js [71], which is
designed to enable the creation and deployment of GPU computational kernels from
JavaScript to WebGL compatible code. It can reduce the amount of boilerplate code
and other timing elements for an attacker.
We thus create a two dimensional buffer of data and repeatedly utilize the GPU
to process this buffer with different mathematical kernels.
Unlike our improved cache occupancy channel, accelerator based channels cannot
provide us with high granularity measurements. The accelerator based workload requires
that the CPU first declare the work, pass it to the accelerator (GPU), and wait until
the GPU completes its task. This means that the sizing and complexity of the kernel
task must be tuned for the optimal performance for fingerprinting.
To understand the performance of different settings, we create a spy script similar
in nature to the cache occupancy spy script. The GPU script reports the number of
kernel executions that it can complete in the monitoring time period. We conduct
experiments using multiple kernels, including the matrix multiplication and computing
the dot product. We find that the kernel that sums each row of the input array
delivers far superior performance. This might be due to massively decreased complexity
and time in this GPU kernel: the reduced complexity enables more possible kernel
executions, which in turn leads to better observability of GPU usage. We also check the

103
optimal size for the computation. A small size might result in mainly observing GPU
startup overhead, while a large size results in too much time spent in GPU computation,
decreasing observation granularity. We find that an overall compute array of between
20KB (Android) and 40KB (MacOS) organized into 5x4KB or 10x4KB arrays works
best. Finally, we examine the observation window, but limit our experiments to a
maximum 10 second duration to maintain a realistic approach. Again, we find disparate
sizes depending on platforms. The Google Pixel 3 provides the best performance
with 500 20ms observations and the M1 MacBook Air achieves its best results with
1,000 10ms observations. We believe this is caused by the speed of the processors:
the SnapDragon 845 functions much slower and thus requires more time to manifest
observable differences in computation performance as opposed to simply observing GPU
overhead.

4.6 Evaluation
In this section, we provide detailed performance results for the cache occupancy
and GPU contention channels. Unlike the previous section which utilized open world
testing of 10 sensitive sites and 500 open world sites, this section utilizes a much larger
dataset containing: 100 accesses to 100 sensitive sites (Alexa Top 100), and 1 access
to 5,000 other websites. We report both closed (only the sensitive websites) and open
(all websites) world accuracy. As before, to remove any bias from the experimentation,
the collection process is conducted via Appium or Selenium automation of the target
platform. The list of 15,000 total website accesses was randomized to ensure that there
were no unintentional ordering effects and the same random access order was utilized
for each experiment for better comparison.
To compute the accuracy of the fingerprinting, we utilize 10-fold cross validation
with a 90/10 train/test split. We report accuracy for two machine learning algorithms,
a ridge regression with a minirocket [53] transform and a minirocket transform with a
1D CNN (configuration presented in Appendix Table C.1). The ridge regression with

104
Table 4.2: Accuracy for web-based cache occupancy website fingerprint on multiple
ARM devices
Closed World Open World
Device CPU Browser
Ridge Regression CNN Ridge Regression CNN
Macbook Air Apple M1 Chrome 89 95.6 92.2 88.1 89.8
Macbook Air Apple M1 Safari 14 94.3 89.4 78.4 85.1
Macbook Air Apple M1 Firefox 88 88.1 83.9 68.2 77.8
iPhone SE2 Apple A13 Safari 14 80.2 75.7 65.8 72.7
iPhone SE2 Apple A13 Chrome 90 80.2 75.9 65.0 73.3
Google Pixel 3 Snapdragon 845 Chrome 90 88.0 81.8 66.0 75.9

Table 4.3: Accuracy for native application cache occupancy website fingerprint on
multiple ARM devices
Closed World Open World
Device CPU Browser
Ridge Regression CNN Ridge Regression CNN
Macbook Air Apple M1 Chrome 89 92.5 85.7 84.3 85.7
Macbook Air Apple M1 Safari 14 91.1 87.0 72.4 81.7
Macbook Air Apple M1 Firefox 88 89.3 85.9 70.5 81.1
iPhone SE2 Apple A13 WebKit View 71.5 68.7 64.0 69.1
Google Pixel 3 Snapdragon 845 WebView 81.9 76.3 67.7 74.1

minirocket transform is a recent advancement in time series machine learning and is


able to achieve results close to those of the 1D CNN in less than a minute.

4.6.1 Web-Based Attacker Results


Table 4.2 presents the accuracy of the web based cache occupancy fingerprinting
experiments. Our approach can achieve impressive results across all of the devices
in the closed world scenario (i.e., 100 sensitive websites), with accuracy ranging from
80%-95% when utilizing the ridge regression classifier.
The open world scenario (i.e., 100 sensitive websites and 5,000 non-sensitive
websites) also demonstrates impressive accuracy. In the open world cases we find that
in most cases the 1D CNN performs better than the Ridge Regression classifier. This

Table 4.4: Accuracy for GPU based website fingerprinting on ARM devices
Closed World Open World
Device GPU Browser
Ridge Regression CNN Ridge Regression CNN
Macbook Air Apple 7 Core Chrome 89 90.5 85.3 76.6 81.4
Google Pixel 3 Adreno 630 Chrome 89 88.2 82.6 67.6 77.3

105
behavior is expected as the 1D CNN utilizes multiple convolutional and pooling layers
to extract features from the dataset and learn both spatial and temporal patterns.
We note that the cache occupancy channel performs best on the Macbook Air,
and worst on the iPhone SE 2. This is likely related to the design of both the cache
systems and schedulers. The CPU core designs in the MacBook Air are one generation
newer, and the M1 chip was designed specifically for desktop/laptop workloads, and
was likely tuned for multiprocess workloads. Also, the M1 chip contains features to
prevent single cores from dominating the cache [64], and the A13 has been discovered
to use part of the shared high performance L2 cache as an extra L2 cache for the low
performance cores [63]. Apple also changes the amount of the cache that the high and
low power cores have access to depending on the DVFS states of the core [63].
To analyze these effects, we conduct experiments with different buffer sizes. The
iPhone SE2 and Google Pixel 3 offer relatively straight forward values. The Google
Pixel 3 reports 2MB of shared cache and we find that a 2MB buffer performs best in
the fingerprinting task. While the iPhone SE2 is unclear about the actual amount of
shared cache provided to the low power cores, we find that a 4MB buffer performs the
best in both tested configurations. Interestingly, this 4MB buffer seems to indicate that
the cache occupancy channel is solely utilizing the L2 cache of the low power cores,
potentially indicating that Apple schedules foreground browser rendering processes
to these low power cores or that the ‘extra’ L2 cache that is shared with the high
performance cores is not exclusively owned by either core type. The Macbook Air,
however, demonstrates vastly different behavior. Specifically, we find that a 4MB buffer
performs best for Google Chrome, a 10MB buffer for Mozilla Firefox, and a 24MB
buffer for Apple’s Safari. As previously mentioned, these differences may be caused by a
number of reasons, including renders and JavaScript engines. In general, attackers need
to adjust attack strategies based on various factors to achieve good overall performance.
Another possible factor in the reduced performance of the mobile devices vs. the
laptop form factor could be the trend of websites to deliver different pages to different
devices. When a laptop visits a website it views the entire site which usually contains

106
much more detailed content than the corresponding mobile website which gets served
to mobile devices. The vastly simplified mobile websites may appear more similar to
the cache occupancy channel resulting in the observed decreased accuracy.

4.6.2 App-Based Attacker Results


We further investigate whether an attacker that has the ability to run background
process, but experiences similar limitations to timing granularity as the browser (a
possible security feature in future OS updates), can achieve similar results exploiting the
proposed channels. To this end, we develop platform specific applications to examine
such attacks on each platform.
We create applications for the iOS and Android systems which feature two
processes, one which drives a ‘webview’5 and another which acts as the spy process.
This method has been used to study native side channel performance in website
fingerprinting before [115]. On the Macbook Air, a spy process written in C is launched
alongside the web browser to monitor traffic.
The results are reported in Table 4.3. Overall, we notice an interesting trend.
The buffer sizes that offered the best performance still performed best, however, in all
but the Firefox browser, the channel generally performed worse in the native setting.
This reduced performance is likely caused by the idiosyncrasies of the OS scheduler and
how the scheduler was designed.
The scheduler of a mobile phone is tasked with providing the best performance
to the foreground process and imposes strict limitations on background processes,
whereas the scheduler of a laptop/desktop operating system like MacOS should allow
more aggressive scheduling of background processes as they are important to user
satisfaction (severely diminishing the performance of background file sync, application
updates/installs, etc. would be unacceptable). MacOS, specifically, offers a number of

5
Both iOS and Android provide a mechanism called a webview to display web content
to users within an application. The webview functions as a web browser without the
navigation controls. Both iOS and Android webview components are nearly identical
similar to the system web browser.

107
different process priorities that have recently been shown to greatly effect which cores a
specific task is executed on [124] and thus mixing native and web browser processes may
result in unexpected scheduling. While the process in a background tab is very likely to
end up on the low power cores, the native process may be scheduled on either depending
on how the operating system interprets its priority/whether it is a user-facing process.
The impact that the OS scheduler has is particularly evident when the perfor-
mance between the native web-based attack are examined in the context of different
devices. The MacBook Air (with a more friendly background process scheduler), experi-
ences an average 2.5% drop in closed world accuracy while the mobile devices exhibit an
average 7.4% drop in closed world accuracy. Thus, it is possible that in most cases the
web browser actually provides a more stable attack surface than the native application.

4.6.3 Comparison to Prior Work


The cache occupancy channel has been studied from the perspective of website
fingerprinting attacks before, however, those attacks utilized variable timing windows
and gathered data over the course of 30 seconds [151, 150]. The results in this work
improve upon both the amount of data and attack duration dramatically, utilizing 4ms
attack periods over the course of only 8 seconds. Reducing the total attack time by
75% improves the realism of the attack. It is unlikely that a user will navigate to a
page and not interact with it for 30 seconds, and decreasing this to a more realistic 8
seconds makes the attack far more viable. Furthermore, this work provides the most
extensive study of the cache occupancy channel on ARM to date, examining both native
and web based attacks, providing an in depth discussion of cross-platform accuracy
enhancements, examines multiple MacOS and iOS browsers, and is the first work to
examine the attack on iOS.
The only direct comparison that can be made is the performance of the closed
world Chrome attack on the M1 chip, wherein this work performs 5% better than the
work in [150]. Our Android performance is also 3% better in the closed world setting,
however, the devices are different.

108
While it is difficult to directly compare to works done on homogeneous x86
systems like those done in [151], our open world Safari performance is 4% better than
their best neural network configuration, and the closed world attack is 22% better. One
item that complicates comparison to [151] is their open world data. Their work claims
99% accuracy in delineating between a sensitive and non-sensitive website, which could
indicate significant differences between the open and closed world datasets. Contrarily,
our work combines and randomizes the order of the collection of the open and closed
world datasets to ensure that there are no cross-sample ordering artifacts which might
artificially increase accuracy.

4.6.4 GPU Results


We utilize the same testing setup as the cache occupancy channel to evaluate the
GPU contention channel. We only modify the spy process to utilize the GPU as opposed
to the CPU. While all major browsers support web workers, only Google Chrome on
Android and MacOS allowed for unrestricted access to the GPU in a background web
worker via GPU.js, thus limiting our experiments to these two platforms. While the
GPU channel does not perform as well as the standard cache occupancy channel on
MacOS, the accuracy on Android improves slightly in both the open and closed world
scenarios, as listed in Table 4.4. This difference may be related to the different system
architectures which make up the Adreno GPU vs. the Apple designed GPU found on
the M1 chip.
Overall, these results strongly indicate that the GPU/cache contention channel
is capable of executing website fingerprinting attacks and should serve as a red flag to
device manufacturers. As more accelerators are tightly integrated into the standard
ARM SoC and web technologies rush to enable access (e.g. WebGPU [175]), special
care should be taken to ensure that these additions do not jeopardize user privacy.

109
4.6.5 Countermeasures
There are several approaches to potentially protect an ARM system from those
contention based side channels. For example, the system can introduce noise to the
measurement channel via extra operations, or manipulate timers and array accesses
via obfuscation such as in Chrome Zero [145]. However, introducing extra noise has
been shown ineffective [151] and leads to increased energy usage, which is unacceptable
for mobile devices. Also, Shusterman et al. [150] demonstrated that the protections
of Chrome Zero is largely ineffective and impose significant performance penalties.
Furthermore, browser based defenses cannot thwart App-based attacks.
Another defensive approach for energy restricted devices is to remove process
contention via hardware segmentation. This can guarantee that the processes are
unable to interact with one another. However, it requires complete redesigns of the
operating system scheduler and hardware. In the future work, we plan to develop
effective defensive solutions to detect significant contention and large swings in cache
occupancy (similar to [23]) for ARM devices.

4.7 Related Work


In this section, we briefly survey the research efforts in several areas. Specifically,
we conduct a detailed comparison with previous cache occupancy fingerprinting work.
Cache Occupancy. The most similar work is Shusterman’s cache occupancy fin-
gerprinting work [150, 151]. Their work was the first to attempt to exploit a cache
occupancy channel for website fingerprinting. While previous work [73] had examined
the individual eviction sets, these fine grained attackers were mitigated by modern
browsers limiting time resolution. Shusterman et al. [151] proposed that the contention
of the entire cache may provide enough information to fingerprint websites within the
x86 platform. Parallel to our work, Shusterman et al. published another work [150]
wherein they provide a cursory examination of the cache occupancy channel on the
Apple M1 and a Samsung S21 with the Chrome browser in a closed world scenario.

110
Our work provide a much deeper investigation of the cache occupancy channel
in ARM devices. In addition to Android and MacOS, we also study the iOS platform.
Furthermore, our approach differs from Shusterman’s in that we develop a vastly
different method for cache accesses (Section 4.5.2) which increases accuracy on budget
devices with slower processors. We also study the effect of different browsers and their
memory management, demonstrating that simply sizing the eviction buffer based on
the shared cache provides suboptimal results in different browsers engines on the same
hardware (Section 4.6.1). Besides, we also increase the attack effectiveness, utilizing
only 8 seconds of observation to identify the website unlike the previously required 30
seconds in both [151, 150]. Even with nearly 75% less sampling time, our experiments
outperform Shusterman’s work by more than 5% in testing on the M1 MacBook Air
with Google Chrome. Finally, we also propose (4.5.3) and test (4.6.4) the novel GPU
based contention channel and demonstrate that it is nearly as effective as the cache
occupancy channel in ARM SoC devices, raising the alarm on continued access to SoC
accelerator components from JavaScript.
Website Fingerprinting Website fingerprinting has long been an interesting target
for attackers. As desktop browsers were the original way to browse the web, many
website fingerprinting attacks focused on breaking privacy enhancing technologies
like HTTPS and ToR through attacks targetting features extracted from the packet
stream [35, 69, 81, 130, 139]. With the rise of mobile devices, more effort has been spent
examining mobile devices. MagneticSpy [115] examines both a JavaScript and app
based CPU activity channel employing the magnetometer. They perform similar open
and closed world examinations (albeit with fewer websites), and demonstrate impressive
fingerprinting accuracy. However, the JavaScript APIs that allowed access to these
sensors have since been removed from support in Firefox and Safari [148]. Furthermore,
iOS requires that users explicitly grant permissions to a website before it is allowed
to access their accelerometer data [96]. Several work [102, 183] examines power based
website fingerprinting on smartphones, however they require much higher frequency
sampling and cannot employ this attack from a JavaScript platform. Jana et al. [87]

111
examined the memory allocations of website traffic, but required privileged access to
process memory data (now removed from standard user access). Spreitzer et al. [155]
utilized the data usage API within Android to fingerprint websites, but this must be
done from a native application.
ARM Attacks Gulmezoglu et al. [73] built a similar contention based channel in
ARM devices, but mainly focused on finding contention among specific sets within the
cache ways of the device. Their attack only examines the Google Pixel 5, and only
utilizes native APIs within the system. While the work presents impressive results,
their system relies upon a identifying eviction sets within the cache. With a high
resolution timer this can take a few seconds, however, the low resolution timer available
from JavaScript [133] would make this task take infeasibly long. Lipp et al. [104] and
Gruss et al. [72] similarly construct memory based JavaScript attacks, but require either
privileged system calls or higher resolution timers than are currently available [133].
Timing Attacks from JavaScript Genkin et al. [66] executed encryption side channel
attacks from the browser but utilized web assembly and shared array buffers to construct
a high frequency timer. Oren et al. [126] similarly demonstrated that eviction sets could
be created via JavaScript timers and utilized this to provide a cursory examination of
website fingerprinting (not on ARM). Bosman et al. [29] demonstrated page dedupli-
cation attacks from JavaScript. Each of these attacks requires high resolution timers
that have since been removed from JavaScript [133]. Schwarz et al. [146] demonstrated
a number of interesting methods to achieve high resolution timing, however many of
these techniques have been disabled or hindered within major browsers.
GPU Attacks Lee et al. [98] examined website fingerprinting via shared memory
within the GPU. Frigo et al. [62] executed a number of side channel attacks from a
mobile GPU, however, these attacks require timing primitives that have been removed.
He et al. [77] uncovered a register leakage within Intel GPUs and exploited it to
identify websites. Naghibijouybari et al. [120] utilized GPU memory allocation APIs
within CUDA or OpenGL to track memory allocations and fingerprint websites. They
did not explore ARM integrated GPUs or execution from a JavaScript environment

112
and instead employed a spy program that ran as a native process with full access to
CUDA/OpenGL. Karimi et al. [90] examined a side channel attack against an ARM
SoC GPU and extracted AES keys by exploiting cache behavior, however the attack
requires a long execution time and a stable system which is not running other tasks,
and was not examined from a JavaScript perspective.

4.8 Conclusion
This chapter investigates whether the new ARM DynamIQ system design, specif-
ically the inclusion of a shared last level cache between all CPU cores and accelerators,
poses a security threat to individuals. We examine the information leakage in the
context of a website fingerprinting attack, demonstrating that a cache occupancy side
channel can be constructed to reliably fingerprint user website activities. We reveal this
security threat on Android, iOS, and MacOS, delving into how the channel responds to
different browser environments and proposing enhancements over previous works. In
addition, we unveil an accelerator based website fingerprinting channel, showing that
the SoC GPU can be exploited in a contention based side channel from JavaScript. Our
evaluation results indicate that both channels can achieve high website fingerprinting
accuracy on different browsers in Android, iOS, and MacOS systems in both open and
closed world scenarios.

113
Chapter 5

CONCLUSION AND FUTURE WORK

This dissertation has examined the security aspects of mobile devices and pe-
ripherals from the perspective of side channels. Specifically we have identified numerous
side channels which can be utilized both offensively and defensively in the realm of
mobile devices, uncovering novel techniques for stealing user input, defending secure
computers, and invading user privacy.

5.1 Summary and Contributions


In Chapter 2 we investigated mobile phones and the way that users interact
with them. Specifically, we identified that user input almost always displays a unique
animation for each action. We examine how the power utilization of a mobile phone is
highly correlated to the onscreen activity and uncover that a high speed acquisition of
the mobile phone power trace can identify the location of animations on the screen. We
develop an attack framework to surreptitiously gather these power traces and classify
the input animations with high accuracy, demonstrating that user passcodes can be
identified with accuracy above 99%.
In Chapter 3 we examined the security of laptop and desktop computers when
they interface with USB flash drives. We identified the threat that flash drives pose
to businesses and secure data and proceeded to develop a novel timing channel to
identify individual devices. We tested this system across numerous scenarios and
devices, achieving highly accurate classification accuracy in rejecting unknown devices.
We also examine methods for ensuring that the device remains usable, utilizing small
portions of the drive to ensure unimpeded classification accuracy while the device is
used as normal.

114
Finally, in Chapter 4 we analyze the current laptop/desktop marketplace, noting
that ARM chips are now being targeted at laptop and desktop systems. We identify
the major architectural changes that ARM SoCs have undergone, noting an increased
number of cache levels and shared caches as well as tightly integrating accelerators into
the memory subsystem. We notice that these new shared caches share some design
elements with x86 processors that have made them vulnerable to cache contention
side channels and demonstrate that with some optimization the accuracy of the cache
contention channel can be made compatible and highly accurate on ARM platforms.
We further develop a novel GPU contention channel in the SoC to fingerprint website
visits by the user.

5.2 Future Work


In this section we elaborate on future research directions for the work featured
in this dissertation.

• Investigating screen leakage from applications and text input fields. Charger
Surfing demonstrated that animations on the touch screen of a mobile phone can
lead to a leakage channel that allows attackers to identify the location of onscreen
animations. The case study demonstrates significant success in identifying user
passcodes. One barrier in the work is the amount of time and training data
required to create an accurate classifier. An investigation into designing an
automated process to extract animation content from applications that utilize
the system keyboard would be an ideal extension. Further work to investigate
the precision of the channel and whether it could be utilized to extract content
from the screen that is not related to animations (e.g. identify text on the screen)
would also be an ideal extension.

• Extending the timing channel utilized in Time-Print to other forms of flash


memory has the potential to create a difficult to forge identification scheme for
more complex devices. While Time-Print only examines USB flash drives, a similar

115
timing channel may be identifiable in the flash memory utilized by modern laptop
/ desktop and mobile phones. If possible, this could serve as another layer of
authentication to ensure that a user is logging in from their device. If an attacker
can find a way to reliably read the same location from disk (an infrequently
utilized file or system component), this type of channel may also be extended to
work from JavaScript and act as another method for device fingerprinting.

• The ARM cache and GPU contention channel can be extended by examining
a native implementation and extending the types of user activity analysis to
potentially identify the applications that a user is using. Manipulating the GPU
from JavaScript is far less exact than the CPU cache, however, further study
into how the GPU is shared between rendering processes would be beneficial.
Interestingly, the push of machine learning has lead to drafts of standards like
WebGPU [175] which may actually make the GPU contention channel far easier to
realize. Finally, investigating the other peripherals that share the last level cache
could be beneficial. Currently, image signal processors, digital signal processors,
and neural processing units all have access to the cache, determining if these
peripherals can be reached from a JavaScript context is an important next step
to better understanding the risks of tight accelerator integration into SoCs.

116
BIBLIOGRAPHY

[1] Behind The Charge: A Big Challenge for Hospitals. http://www.mkelements.


com/blog/behind-charge-big-challenge-hospitals.
[2] Briant Park Blog: Solar-Powered Charging Stations Land
in Bryant Park. http://blog.bryantpark.org/2014/07/
solar-powered-charging-stations-land-in.html.
[3] Chargeport Hotel Charging Station. http://www.teleadapt.com/
hospitality-products/powercharging/chargeport.
[4] Evil Maid Attack. https://en.wikipedia.org/wiki/Evil_maid_attack.
[5] Hackers Claim ‘Any’ Smartphone Fingerprint Lock Can Be Bro-
ken In 20 Minutes. https://www.forbes.com/sites/daveywinder/
2019/11/02/smartphone-security-alert-as-hackers-claim-any-
fingerprint-lock-broken-in-20-minutes/.
[6] Phone Battery Statistics Across Major US Cities. https://veloxity.us/
phone-battery-statistics/.
[7] Phone Chargers: China’s Latest Sharing Economy Fad. http://www.sixthtone.
com/news/2182/phone-chargers-chinas-latest-sharing-economy-fad.
[8] Please Stop Charging Your Phone in Public Ports. https://money.cnn.com/
2017/02/15/technology/public-ports-charging-bad-stop/index.html.
[9] Politician’s Fingerprint ’Cloned from Photos’ by Hacker. https://www.bbc.com/
news/technology-30623611.
[10] Power Up: A Guide to US Airport Charging Stations. http://www.cheapflights.
com/news/power-up-a-guide-to-us-airport-charging-stations/#ewr.
[11] Solar-Powered Phone Charging Stations Launch in Union Square. https://www.
dnainfo.com/new-york/20130620/union-square/solar-powered-phone-
chargingstations-launch-union-squarer.
[12] USBGuard. https://github.com/USBGuard/usbguard.
[13] USB Implementers Forum Revision 2.0. Universal serial bus power deliver specifi-
cation, 2016.

117
[14] Sebastian Angel, Riad S. Wahby, Max Howald, Joshua B. Leners, Michael Spilo,
Zhen Sun, Andrew J. Blumberg, and Michael Walfish. Defending against malicious
peripherals with cinch. In USENIX Security Symposium, pages 397–414, 2016.

[15] Appium. https://github.com/appium/appium.

[16] Adam Aviv, John Davin, Flynn Wolf, and Ravi Kuber. Towards Baselines for
Shoulder Surfing on Mobile Authentication. In Proceedings of the 33rd Annual
Computer Security Applications Conference, 2017.

[17] Adam Aviv, Katherine Gibson, Evan Mossop, Matt Blaze, and Jonathan M Smith.
Smudge Attacks on Smartphone Touch Screens. Proceedings of 4th USENIX
Workshop on Offensive Technologies, 2010.

[18] Michael Backes, Tongbo Chen, Markus Duermuth, Hendrik Lensch, and Martin
Welk. Tempest in a Teapot: Compromising Reflections Revisited. In Proceedings
of the 30th IEEE Symposium on Security and Privacy, 2009.

[19] Michael Backes, Markus Dürmuth, and Dominique Unruh. Compromising


Reflections-or-How to Read LCD monitors around the Corner. In Proceedings of
the 29th IEEE Symposium on Security and Privacy, 2008.

[20] Darrin Barrall and David Dewey. Plug and Root, the USB Key to the Kingdom.
Presentation at Black Hat Briefings, 2005.

[21] Adam Bates, R. Leonard, Hannah Pruse, Daniel Lowd, and K. Butler. Leveraging
USB to Establish Host Identity Using Commodity Devices. In ISOC Network
and Distributed System Symposium (NDSS), 2014.

[22] Adam Bates, Dave (Jing) Tian, Kevin R.B. Butler, and Thomas Moyer. Trustwor-
thy whole-system provenance for the linux kernel. In USENIX Security Symposium,
pages 319–334, 2015.

[23] M. Bazm, T. Sautereau, M. Lacoste, M. Sudholt, and J. Menaud. Cache-based


side-channel attacks detection through intel cache monitoring technology and
hardware performance counters. In Third International Conference on Fog and
Mobile Edge Computing, 2018.

[24] Ryad Benadjila, Arnauld Michelizza, Mathieu Renard, Philippe Thierry, and
Philippe Trebuchet. WooKey: Designing a Trusted and Efficient USB Device. In
ACM Computer Security Applications Conference (ACSAC), page 673–686, 2019.

[25] Yigael Berger, Avishai Wool, and Arie Yeredor. Dictionary Attacks Using Key-
board Acoustic Emanations. In Proceedings of the 13th ACM conference on
Computer and Communications Security, 2006.

118
[26] H. Bhargava and S. Sharma. Secured use of USB over the Intranet with anonymous
device Identification. In IEEE Conference on Communication Systems and
Network Technologies (CSNT), pages 49–53, 2018.

[27] T. J. Bihl, K. W. Bauer, and M. A. Temple. Feature Selection for RF Fingerprint-


ing With Multiple Discriminant Analysis and Using ZigBee Device Emissions.
IEEE Transactions on Information Forensics and Security, pages 1862–1874,
2016.

[28] Hristo Bojinov, Yan Michalevsky, Gabi Nakibly, and Dan Boneh. Mobile device
identification via sensor fingerprinting. https://arxiv.org/pdf/2002.05905.pdf,
2014.

[29] Erik Bosman, Kaveh Razavi, Herbert Bos, and Cristiano Giuffrida. Dedup est
machina: Memory deduplication as an advanced exploitation vector. In IEEE
Symposium on Security and Privacy (SP), 2016.

[30] Vladimir Brik, Suman Banerjee, Marco Gruteser, and Sangho Oh. Wireless device
identification with radiometric signatures. In Conference on Mobile Computing
and Networking (MobiCom), page 116–127, 2008.

[31] Niels Brouwers, Marco Zuniga, and Koen Langendoen. NEAT: a Novel Energy
Analysis Toolkit for Free-Roaming Smartphones. In Proceedings of the 12th ACM
Conference on Embedded Network Sensor Systems, 2014.

[32] Kevin R. B. Butler, Stephen E. McLaughlin, and Patrick D. McDaniel. Kells: A


Protection Framework for Portable Data. In ACM Annual Computer Security
Applications Conference (ACSAC), page 231–240, 2010.

[33] Eric Byres. The Air Gap: SCADA’s Enduring Security Myth. Commun. ACM,
page 29–31, August 2013.

[34] Liang Cai and Hao Chen. TouchLogger: Inferring Keystrokes on Touch Screen
from Smartphone Motion. Proceedings of the USENIX HotSec, 2011.

[35] Xiang Cai, Xin Cheng Zhang, Brijesh Joshi, and Rob Johnson. Touching from a
distance: Website fingerprinting attacks and defenses. In Computer and Commu-
nications Security (CCS), 2012.

[36] Aaron Carroll and Gernot Heiser. An Analysis of Power Consumption in a


Smartphone. In Proceedings of the USENIX Annual Technical Conference, 2010.

[37] Hai-Wei Chen, Jiun-Haw Lee, Bo-Yen Lin, Stanley Chen, and Shin-Tson Wu.
Liquid Crystal Display and Organic Light-Emitting Diode Display: Present Status
and Future Perspectives. Light: Science & Applications, 2018.

119
[38] Li Chen, Jiacheng Xia, Bairen Yi, and Kai Chen. PowerMan: An Out-of-Band
Management Network for Datacenters Using Power Line Communication. In
Proceedings of the 15th USENIX Symposium on Networked Systems Design and
Implementation, 2018.

[39] Qi Alfred Chen, Zhiyun Qian, and Z. Morley Mao. Peeking into your app without
actually seeing it: UI state inference and novel android attacks. In 23rd USENIX
Security Symposium (USENIX Security 14), 2014.

[40] Qi Alfred Chen, Zhiyun Qian, and Zhuoqing Morley Mao. Peeking into Your App
without Actually Seeing It: UI State Inference and Novel Android Attacks. In
Proceedings of the 23rd USENIX Security Symposium, 2014.

[41] Xiang Chen, Yiran Chen, Zhan Ma, and Felix Fernandes. How is Energy Consumed
in Smartphone Display Applications? In Proceedings of the 14th ACM Workshop
on Mobile Computing Systems and Applications, 2013.

[42] Yimin Chen, Xiaocong Jin, Jingchao Sun, Rui Zhang, and Yanchao Zhang.
POWERFUL: Mobile App Fingerprinting via Power Analysis. In Proceedings of
the IEEE Conference on Computer Communications, 2017.

[43] Yimin Chen, Tao Li, Rui Zhang, Yanchao Zhang, and Terri Hedgpeth. Eye-
Tell: Video-Assisted Touchscreen Keystroke Inference from Eye Movements. In
Proceedings of the 2018 IEEE Symposium on Security and Privacy, 2018.

[44] Yushi Cheng, Xiaoyu Ji, Juchuan Zhang, Wenyuan Xu, and Yi-Chao Chen.
DeMiCPU: Device Fingerprinting with Magnetic Signals Radiated by CPU. In
ACM Conference on Computer and Communications Security (CCS), 2019.

[45] Shane Clark, Hossen Mustafa, Benjamin Ransford, Jacob Sorber, Kevin Fu, and
Wenyuan Xu. Current Events: Identifying Webpages by Tapping the Electrical
Outlet. In European Symposium on Research in Computer Security. Springer,
2013.

[46] W. E. Cobb, E. W. Garcia, M. A. Temple, R. O. Baldwin, and Y. C. Kim. Physical


layer identification of embedded devices using RF-DNA fingerprinting. In IEEE
Military Communications Conference (MILCOM), pages 2168–2173, 2010.

[47] W. E. Cobb, E. D. Laspe, R. O. Baldwin, M. A. Temple, and Y. C. Kim. Intrinsic


Physical-Layer Authentication of Integrated Circuits. IEEE Transactions on
Information Forensics and Security, pages 14–24, 2012.

[48] David Cock, Qian Ge, Toby Murray, and Gernot Heiser. The last mile: An
empirical study of timing channels on sel4. In Computer and Communications
Security (CCS), 2014.

120
[49] Compaq, Hewlett-Packard, Intel, Lucent, Microsoft, NEC, and Philips. Universal
Serial Bus Specification, Revision 2.0, 2000.

[50] Mauro Conti, Michele Nati, Enrico Rotundo, and Riccardo Spolaor. Mind the
Plug! Laptop-User Recognition Through Power Consumption. In Proceedings of
the 2nd ACM International Workshop on IoT Privacy, Trust, and Security, 2016.

[51] Patrick Cronin and Chengmo Yang. A fetching tale: Covert communication with
the hardware prefetcher. In IEEE International Symposium on Hardware Oriented
Security and Trust (HOST), 2019.

[52] Andy Davis. Revealing Embedded Fingerprints: Deriving Intelligence from USB
Stack Interactions. Technical report, nccgroup, 2013.

[53] Angus Dempster, Daniel F Schmidt, and Geoffrey I Webb. MINIROCKET:


A very fast (almost) deterministic transform for time series classification.
arXiv:2012.08791, 2020.

[54] Wenrui Diao, Xiangyu Liu, Zhou Li, and Kehuan Zhang. No pardon for the
interruption: New inference attacks on android through interrupt timing analysis.
In IEEE Symposium on Security and Privacy (SP), pages 414–432, 2016.

[55] Mian Dong and Lin Zhong. Chameleon: a Color-Adaptive Web Browser for
Mobile OLED Displays. In Proceedings of the 9th International Conference on
Mobile Systems, Applications, and Services, 2011.

[56] Douglas Gilbert. sg3 utils. https://github.com/hreinecke/sg3 utils.

[57] C. K. Dubendorfer, B. W. Ramsey, and M. A. Temple. An RF-DNA verification


process for ZigBee networks. In IEEE Military Communications Conference
(MILCOM), pages 1–6, 2012.

[58] N. Falliere. Exploring Stuxnet’s PLC Infection Process.


https://community.broadcom.com/symantecenterprise/communities/community-
home/librarydocuments/viewdocument?DocumentKey=ad4b3d10-b808-
414c-b4c3-ae4a2ed85560&CommunityKey=1ecf5f55-9545-44d6-b0f4-
4e4a7f5f5e68&tab=librarydocuments, 2010.

[59] Jingyao Fan, Qinghua Li, and Guohong Cao. Privacy Disclosure Through Smart
Meters: Reactive Power Based Attack and Defense. In Proceedings of the 47th An-
nual IEEE/IFIP International Conference on Dependable Systems and Networks,
2017.

[60] Dinei Florencio and Cormac Herley. A Large-Scale Study of Web Password Habits.
In Proceedings of the 16th International Conference on World Wide Web. ACM,
2007.

121
[61] USB Implementers Forum. Defined class codes. https://www.usb.org/defined-
class-codes.

[62] Pietro Frigo, Cristiano Giuffrida, Herbert Bos, and Kaveh Razavi. Grand pwning
unit: Accelerating microarchitectural attacks with the gpu. In IEEE Symposium
on Security and Privacy (SP), 2018.

[63] Andrei Frumusanu. The Apple iPhone 11, 11 Pro & 11


Pro Max Review: Performance, Battery, & Camera Elevated.
https://www.anandtech.com/show/14892/the-apple-iphone-11-pro-and-max-
review/3, Oct 2019.

[64] Andrei Frumusanu. The 2020 Mac Mini Unleashed: Putting Apple Silicon M1 To
The Test. https://www.anandtech.com/show/16252/mac-mini-apple-m1-tested,
Nov 2020.

[65] Daniel Genkin, Lev Pachmanov, Itamar Pipman, Eran Tromer, and Yuval Yarom.
ECDSA Key Extraction from Mobile Devices via Nonintrusive Physical Side
Channels. In Proceedings of the 2016 ACM SIGSAC Conference on Computer
and Communications Security, 2016.

[66] Daniel Genkin, Lev Pachmanov, Eran Tromer, and Yuval Yarom. Drive-by
key-extraction cache attacks from portable code. In Bart Preneel and Fred-
erik Vercauteren, editors, Applied Cryptography and Network Security. Springer
International Publishing, 2018.

[67] Daniel Genkin, Mihir Pattani, Roei Schuster, and Eran Tromer. Synesthesia:
Detecting screen content via remote acoustic side channels. In IEEE Symposium
on Security and Privacy (SP), 2019.

[68] Daniel Genkin, Itamar Pipman, and Eran Tromer. Get Your Hands off My Laptop:
Physical Side-Channel Key-Extraction Attacks on PCs. Journal of Cryptographic
Engineering, 2015.

[69] Xun Gong, Nikita Borisov, Negar Kiyavash, and Nabil Schear. Website detection
using remote traffic analysis. In Privacy Enhancing Technologies Symposium
(PETS), 2012.

[70] GPUDirect. https://developer.nvidia.com/gpudirect, Mar 2021.

[71] GPU.js. https://github.com/gpujs/gpu.js.

[72] Daniel Gruss, David Bidner, and Stefan Mangard. Practical memory deduplication
attacks in sandboxed javascript. In Günther Pernul, Peter Y A Ryan, and Edgar
Weippl, editors, Computer Security – ESORICS 2015. Springer International
Publishing, 2015.

122
[73] Berk Gulmezoglu, Andreas Zankl, M. Caner Tol, Saad Islam, Thomas Eisenbarth,
and Berk Sunar. Undermining user privacy on mobile devices using ai. In
Proceedings of the 2019 ACM Asia Conference on Computer and Communications
Security, Asia CCS ’19. Association for Computing Machinery, 2019.

[74] Zimu Guo, Xiaolin Xu, Mark M. Tehranipoor, and Domenic Forte. Ffd: A
framework for fake flash detection. In ACM Design Automation Conference
(DAC), 2017.

[75] Mordechai Guri, Boris Zadov, Dima Bykhovsky, and Yuval Elovici. PowerHammer:
Exfiltrating Data from Air-Gapped Computers through Power Lines. IEEE
Transactions on Information Forensics and Security, 2020.

[76] hak5darren. Usb rubber ducky. https://github.com/hak5darren/USB-Rubber-


Ducky, 2016.

[77] Wenjian HE, Wei Zhang, Sharad Sinha, and Sanjeev Das. Igpu leak: An informa-
tion leakage vulnerability on intel integrated gpu. In 2020 25th Asia and South
Pacific Design Automation Conference (ASP-DAC), 2020.

[78] Grant Hernandez, Farhaan Fowze, Dave (Jing) Tian, Tuba Yavuz, and Kevin R.B.
Butler. Firmusb: Vetting usb device firmware using domain informed symbolic
execution. In ACM Conference on Computer and Communications Security
(CCS), page 2245–2262, 2017.

[79] Hewlett-Packard, Intel, Microsoft, NEC, ST-NXP Wireless, and Texas Instruments.
Universial Serial Bus 3.0 Specification, Revision 1.0, 2008.

[80] Hewlett-Packard, Intel, Microsoft, Renesas, ST-Ericsson, and Texas Instruments.


Universal Serial Bus 3.1 Specification, 2013.

[81] Andrew Hintz. Fingerprinting websites using traffic analysis. In Workshop on


Privacy Enhancing Technologies, 2002.

[82] Omar Adel Ibrahim, Savio Sciancalepore, Gabriele Oligeri, and Roberto Di Pietro.
Magneto: Fingerprinting usb flash drives via unintentional magnetic emissions.
ACM Trans. Embed. Comput. Syst., 2020.

[83] Advanced Systems International. USB-Lock-RP. https://www.usb-lock-rp.com/.

[84] Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar,
and Pierre-Alain Muller. Deep Learning for Time Series Classification: A Review.
Data Mining and Knowledge Discovery, 2019.

[85] Jeffrey Robert Jacobs. Measuring the Effectiveness of the USB Flash Drive as a
Vector for Social Engineering Attacks on Commercial and Residential Computer
Systems. Master’s Thesis Embry-Riddle Aeronautical University, 2011.

123
[86] Karsten Nohl Jakob Lell. BadUSB - On Accessories that Turn Evil. Blackhat
USA, 2014.

[87] Suman Jana and Vitaly Shmatikov. Memento: Learning secrets from process
footprints. In IEEE Symposium on Security and Privacy (SP), 2012.

[88] Shijie Jia, Luning Xia, Zhan Wang, Jingqiang Lin, Guozhu Zhang, and Yafei
Ji. Extracting robust keys from nand flash physical unclonable functions. In
Conference on Information Security (ISC), page 437–454. Springer-Verlag, 2015.

[89] Peter C. Johnson, Sergey Bratus, and Sean W. Smith. Protecting against malicious
bits on the wire: Automatically generating a usb protocol parser for a production
kernel. In ACM Annual Computer Security Applications Conference (ACSAC),
page 528–541, 2017.

[90] Elmira Karimi, Zhen Hang Jiang, Yunsi Fei, and David Kaeli. A timing side-
channel attack on a mobile gpu. In IEEE 36th International Conference on
Computer Design (ICCD), 2018.

[91] Amin Kharraz, Brandon L. Daley, Graham Z. Baker, William Robertson, and
Engin Kirda. USBESAFE: An end-point solution to protect against usb-based
attacks. In USENIX Research in Attacks, Intrusions and Defenses (RAID), 2019.

[92] M. Kim, D. Moon, S. Yoo, S. Lee, and Y. Choi. Investigation of physically


unclonable functions using flash memory for integrated circuit authentication.
Transactions on Nanotechnology, pages 384–389, 2015.

[93] Paul Kocher, Jann Horn, Anders Fogh, , Daniel Genkin, Daniel Gruss, Werner
Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael
Schwarz, and Yuval Yarom. Spectre attacks: Exploiting speculative execution. In
40th IEEE Symposium on Security and Privacy (S&P), 2019.

[94] Paul Kocher, Joshua Jaffe, and Benjamin Jun. Differential Power Analysis. In
Proceedings of the Annual International Cryptology Conference. Springer, 1999.

[95] T. Kohno, A. Broido, and K. C. Claffy. Remote physical device fingerprinting.


IEEE Transactions on Dependable and Secure Computing, pages 93–108, 2005.

[96] Andy Kong. Accessing the iphone accelerometer with javascript in ios 14 and
13. https://kongmunist.medium.com/accessing-the-iphone-accelerometer-with-
javascript-in-ios-14-and-13-e146d18bb175, Nov 2020.

[97] David Kushner. The real story of stuxnet, Feb 2013.

[98] Sangho Lee, Youngsok Kim, Jangwoo Kim, and Jong Kim. Stealing webpages
rendered on your browser by exploiting gpu vulnerabilities. In IEEE Symposium
on Security and Privacy, 2014.

124
[99] Lara Letaw, Joe Pletcher, and Kevin Butler. Host Identification via USB Finger-
printing. In International Workshop on Systematic Approaches to Digital Forensic
Engineering (SADFE), page 1–9, 2011.

[100] Lingjun Li, Xinxin Zhao, and Guoliang Xue. Unobservable Re-Authentication for
Smartphones. In Proceedings of the 20th Network and Distributed System Security
Symposium, 2013.

[101] Yanlin Li, Jonathan M. McCune, and Adrian Perrig. Viper: Verifying the integrity
of peripherals’ firmware. In ACM Conference on Computer and Communications
Security (CCS), page 3–16, 2011.

[102] Pavel Lifshits, Roni Forte, Yedid Hoshen, Matt Halpern, Manuel Philipose, Mohit
Tiwari, and Mark Silberstein. Power to peep-all: Inference attacks by malicious
batteries on mobile devices. Proceedings on Privacy Enhancing Technologies,
2018.

[103] Chia-Chi Lin, Hongyang Li, Xiao-yong Zhou, and XiaoFeng Wang. Screenmilker:
How to milk your android screen for secrets. In 21st Annual Network and
Distributed System Security Symposium, NDSS, 2014.

[104] Moritz Lipp, Daniel Gruss, Raphael Spreitzer, Clémentine Maurice, and Stefan
Mangard. Armageddon: Cache attacks on mobile devices. In 25th USENIX
Security Symposium (USENIX Security. USENIX Association, 2016.

[105] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas,
Anders Fogh, Jann Horn, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval
Yarom, and Mike Hamburg. Meltdown: Reading kernel memory from user space.
In 27th USENIX Security Symposium (USENIX Security 18), pages 973–990,
Baltimore, MD, August 2018. USENIX Association.

[106] Fangfei Liu, Yuval Yarom, Qian Ge, Gernot Heiser, and Ruby B. Lee. Last-level
cache side-channel attacks are practical. In 2015 IEEE Symposium on Security
and Privacy, 2015.

[107] Arm Ltd. big.little. https://www.arm.com/why-arm/technologies/big-little.

[108] ARM Ltd. Cache Stashing. https://developer.arm.com/documentation/


100453/0401/functional-description/l3-cache/cache-stashing.

[109] ARM Ltd. Cortex a-55. https://developer.arm.com/ip-


products/processors/cortex-a/cortex-a55.

[110] Arm Ltd. Dynamiq. https://www.arm.com/why-arm/technologies/dynamiq.

[111] ARM Ltd. L3 cache allocation policy. https://developer.arm.com/documentation/


100453/0002/functional-description/l3-cache/l3-cache-allocation-policy.

125
[112] Xiao Ma, Peng Huang, Xinxin Jin, Pei Wang, Soyeon Park, Dongcai Shen,
Yuanyuan Zhou, Lawrence Saul, and Geoffrey Voelker. Edoctor: Automatically
Diagnosing Abnormal Battery Drain Issues on Smartphones. In Proceedings of
the 10th USENIX Symposium on Networked Systems Design and Implementation,
2013.

[113] Jani Mantyjarvi, Mikko Lindholm, Elena Vildjiounaite, S-M Makela, and
HA Ailisto. Identifying Users of Portable Devices from Gait Pattern with Ac-
celerometers. In Proceedings of IEEE International Conference on Acoustics,
Speech, and Signal Processing, 2005.

[114] Martin Marinov. TempestSDR. https://github.com/martinmarinov/


TempestSDR, 2013.

[115] Nikolay Matyunin, Yujue Wang, Tolga Arul, Kristian Kullmann, Jakub Szefer,
and Stefan Katzenbeisser. Magneticspy: Exploiting magnetometer in mobile
devices for website and application fingerprinting. In Proceedings of the 18th
ACM Workshop on Privacy in the Electronic Society. Association for Computing
Machinery, 2019.

[116] Yan Michalevsky, Aaron Schulman, Gunaa Arumugam Veerapandian, Dan Boneh,
and Gabi Nakibly. PowerSpy: Location Tracking Using Mobile Device Power
Analysis. In Proceedings of the 24th USENIX Security Symposium, 2015.

[117] Micron. NAND Flash 101: An Introduction to NAND Flash and How to Design
It In to Your Next Product, TN-29-19. Technical report, 2010.

[118] Emiliano Miluzzo, Alexander Varshavsky, Suhrid Balakrishnan, and Romit Roy
Choudhury. Tapprints: Your Finger Taps Have Fingerprints. In Proceedings of
the 10th ACM International Conference on Mobile Systems, Applications, and
Services, 2012.

[119] John Monaco. SoK: Keylogging Side Channels. In Proceedings of the 2018 IEEE
Symposium on Security and Privacy. IEEE, 2018.

[120] Hoda Naghibijouybari, Ajaya Neupane, Zhiyun Qian, and Nael Abu-Ghazaleh.
Rendered insecure: Gpu side channel attacks are practical. In Proceedings of
the 2018 ACM SIGSAC Conference on Computer and Communications Security.
Association for Computing Machinery, 2018.

[121] Matthias Neugschwandtner, Anton Beitler, and Anil Kurmus. A Transparent


Defense Against USB Eavesdropping Attacks. In Proceedings of the 9th ACM
European Workshop on System Security, 2016.

[122] Sebastian Neuner, Artemios G. Voyiatzis, Spiros Fotopoulos, Collin Mulliner, and
Edgar R. Weippl. USBlock: Blocking USB-Based Keypress Injection Attacks. In

126
Data and Applications Security and Privacy, pages 278–295. Springer International
Publishing, 2018.

[123] T. Nguyen, S. Park, and D. Shin. Extraction of device fingerprints using built-in
erase-suspend operation of flash memory devices. IEEE Access, pages 98637–98646,
2020.

[124] Howard Oakley. How m1 macs feel faster than intel models: it’s about
qos. https://eclecticlight.co/2021/05/17/how-m1-macs-feel-faster-than-intel-
models-its-about-qos/, May 2021.

[125] National Institute of Standards and Technology. Security and privacy controls for
federal information systems and organizations, 2020.

[126] Yossef Oren, Vasileios P. Kemerlis, Simha Sethumadhavan, and Angelos D.


Keromytis. The spy in the sandbox: Practical cache attacks in javascript and their
implications. In Proceedings of the 22nd ACM SIGSAC Conference on Computer
and Communications Security. Association for Computing Machinery, 2015.

[127] Dag Arne Osvik, Adi Shamir, and Eran Tromer. Cache attacks and counter-
measures: The case of aes. In David Pointcheval, editor, Topics in Cryptology –
CT-RSA 2006. Springer Berlin Heidelberg, 2006.

[128] Emmanuel Owusu, Jun Han, Sauvik Das, Adrian Perrig, and Joy Zhang. ACCes-
sory: Password Inference Using Accelerometers on Smartphones. In Proceedings
of the 12th ACM Workshop on Mobile Computing Systems and Applications, 2012.

[129] J.L. Padilla, P. Padilla, J.F. Valenzuela-Valdés, J. Ramı́rez, and J.M. Górriz.
RF fingerprint measurements for the identification of devices in wireless com-
munication networks based on feature reduction and subspace transformation.
Measurement, pages 468 – 475, 2014.

[130] Andriy Panchenko, Fabian Lanze, Andreas Zinnen, Martin Henze, Jan Pennekamp,
Klaus Wehrle, and Thomas Engel. Website fingerprinting at internet scale. In
Network and Distributed Systems Symposium (NDSS), 2016.

[131] Abhinav Pathak, Charlie Hu, and Ming Zhang. Where is the Energy Spent Inside
My App?: Fine Grained Energy Accounting on Smartphones with eprof. In
Proceedings of the 7th ACM European Conference on Computer Systems, 2012.

[132] Abhinav Pathak, Charlie Hu, Ming Zhang, Paramvir Bahl, and Yi-Min Wang.
Fine-Grained Power Modeling for Smartphones Using System Call Tracing. In
Proceedings of the 6th ACM Conference on Computer Systems, 2011.

[133] Filip Pizlo. What spectre and meltdown mean for webkit.
https://webkit.org/blog/8048/what-spectre-and-meltdown-mean-for-webkit/, Jan
2018.

127
[134] Raymond Pompon. Attacking Air-Gap-Segregated Computers.
https://www.f5.com/labs/articles/cisotociso/attacking-air-gap-segregated-
computers, 2018.

[135] Pravin Prabhu, Ameen Akel, Laura M. Grupp, Wing-Kei S. Yu, G. Edward
Suh, Edwin Kan, and Steven Swanson. Extracting device fingerprints from flash
memory by exploiting physical variations. In Trust and Trustworthy Computing.
Springer Berlin Heidelberg, 2011.

[136] S. V. Radhakrishnan, A. S. Uluagac, and R. Beyah. GTID: A Technique for Phys-


ical Device and Device Type Fingerprinting. IEEE Transactions on Dependable
and Secure Computing, pages 519–532, 2015.

[137] Rahul Raguram, Andrew White, Dibyendusekhar Goswami, Fabian Monrose,


and Jan-Michael Frahm. iSpy: Automatic Reconstruction of Typed Input from
Compromising Reflections. In Proceedings of the 18th ACM Conference on
Computer and Communications Security, 2011.

[138] B. W. Ramsey, M. A. Temple, and B. E. Mullins. PHY foundation for multi-


factor ZigBee node authentication. In IEEE Global Communications Conference
(GLOBECOM), pages 795–800, 2012.

[139] Vera Rimmer, Davy Preuveneers, Marc Juarez, Tom Van Goethem, and Wouter
Joosen. Automated website fingerprinting through deep learning. In Network and
Distributed Systems Symposium (NDSS), 2018.

[140] Thomas Ristenpart, Eran Tromer, Hovav Shacham, and Stefan Savage. Hey, you,
get off of my cloud: exploring information leakage in third-party compute clouds.
In Computer and Communications Security (CCS), 2009.

[141] J Rogers. Please Enter Your Four-Digit Pin. Financial Services Technology, US
Edition, 2007.

[142] S. Sakib, A. Milenković, and B. Ray. Flash watermark: An anticounterfeiting


technique for nand flash memories. IEEE Transactions on Electron Devices, pages
4172–4177, 2020.

[143] S. Sakib, M. T. Rahman, A. Milenković, and B. Ray. Flash memory based


physical unclonable function. In IEEE SoutheastCon, pages 1–6, 2019.

[144] Paul Sawers. US Govt. plant USB sticks in security study, 60% of subjects take
the bait. https://thenextweb.com/insider/2011/06/28/us-govt-plant-usb-sticks-
in-security-study-60-of-subjects-take-the-bait/, 2011.

[145] Michael Schwarz, Moritz Lipp, and Daniel Gruss. Javascript zero: Real javascript
and zero side-channel attacks. In Network and Distributed System Security
Symposium, 2018.

128
[146] Michael Schwarz, Clémentine Maurice, Daniel Gruss, and Stefan Mangard. Fantas-
tic timers and where to find them: High-resolution microarchitectural attacks in
javascript. In Aggelos Kiayias, editor, Financial Cryptography and Data Security.
Springer International Publishing, 2017.

[147] Selenium. https://github.com/SeleniumHQ/selenium.

[148] Sensor - web apis: Mdn. https://developer.mozilla.org/en-


US/docs/Web/API/Sensor.

[149] Len Sherman. The Basics of USB Battery Charging: A Survival Guide. Maxim
Integrated Products, Inc, 2010.

[150] Anatoly Shusterman, Ayush Agarwal, Sioli O’Connell, Daniel Genkin, Yossi Oren,
and Yuval Yarom. Prime+probe 1, javascript 0: Overcoming browser-based
side-channel defenses. In 30th USENIX Security Symposium (USENIX Security),
2021.

[151] Anatoly Shusterman, Lachlan Kang, Yarden Haskal, Yosef Meltser, Prateek Mittal,
Yossi Oren, and Yuval Yarom. Robust website fingerprinting through the cache
occupancy channel. In 28th USENIX Security Symposium (USENIX Security 19),
2019.

[152] Zdeňka Sitová, Jaroslav Šeděnka, Qing Yang, Ge Peng, Gang Zhou, Paolo Gasti,
and Kiran Balagani. HMOG: New Behavioral Biometric Features for Continuous
Authentication of Smartphone Users. IEEE Transactions on Information Forensics
and Security, 2016.

[153] ANTSpec Software. Flash drive information extractor, 2019.

[154] Riccardo Spolaor, Laila Abudahi, Veelasha Moonsamy, Mauro Conti, and Radha
Poovendran. No Free Charge Theorem: A Covert Channel via USB Charging
Cable on Mobile Devices. In International Conference on Applied Cryptography
and Network Security. Springer, 2017.

[155] Raphael Spreitzer, Simone Griesmayr, Thomas Korak, and Stefan Mangard.
Exploiting data-usage statistics for website fingerprinting attacks on android. In
Proceedings of the 9th ACM Conference on Security & Privacy in Wireless and
Mobile Networks, WiSec ’16, 2016.

[156] Steve Stasiukonis. Social Engineering, the USB Way.


https://www.darkreading.com/attacks-breaches/social-engineering-the-usb-
way/d/d-id/1128081, 2006.

[157] Yang Su, Daniel Genkin, Damith Ranasinghe, and Yuval Yarom. USB Snooping
Made Easy: Crosstalk Leakage Attacks on USB Hubs. In Proceedings of the 26th
USENIX Security Symposium, 2017.

129
[158] Jingchao Sun, Xiaocong Jin, Yimin Chen, Jinxue Zhang, Yanchao Zhang, and
Rui Zhang. VISIBLE: Video-Assisted Keystroke Inference from Tablet Backside
Motion. In Proceedings of the 23rd Network and Distributed System Security
Symposium, 2016.

[159] S. Sutar, A. Raha, and V. Raghunathan. Memory-based combination pufs


for device authentication in embedded systems. Transactions on Multi-Scale
Computing Systems, pages 793–810, 2018.

[160] K. Suzaki, Y. Hori, K. Kobara, and M. Mannan. DeviceVeil: Robust Authen-


tication for Individual USB Devices Using Physical Unclonable Functions. In
Conference on Dependable Systems and Networks (DSN), pages 302–314, 2019.

[161] Di Tang, Zhe Zhou, Yinqian Zhang, and Kehuan Zhang. Face Flashing: a Secure
Liveness Detection Protocol based on Light Reflections. Proceedings of the 25th
Network and Distributed System Security Symposium, 2018.

[162] The Wireshark Team. Wireshark. https://www.wireshark.org/.

[163] D. J. Tian, G. Hernandez, J. I. Choi, V. Frost, P. C. Johnson, and K. R. B. Butler.


Lbm: A security framework for peripherals within the linux kernel. In IEEE
Symposium on Security and Privacy (S&P), pages 967–984, 2019.

[164] Dave Jing Tian, Adam Bates, and Kevin Butler. Defending Against Malicious
USB Firmware with GoodUSB. In Proceedings of the 31st Annual Computer
Security Applications Conference, 2015.

[165] Dave Jing Tian, Adam Bates, and Kevin Butler. Defending against malicious
usb firmware with goodusb. In ACM Annual Computer Security Applications
Conference (ACSAC), page 261–270. Association for Computing Machinery, 2015.

[166] Dave (Jing) Tian, Grant Hernandez, Joseph I. Choi, Vanessa Frost, Christie Raules,
Patrick Traynor, Hayawardh Vijayakumar, Lee Harrison, Amir Rahmati, Michael
Grace, and Kevin Butler. ATtention Spanned: Comprehensive Vulnerability
Analysis of AT Commands Within the Android Ecosystem. In Proceedings of the
27th USENIX Security Symposium, 2018.

[167] Dave (Jing) Tian, Nolen Scaife, Adam Bates, Kevin Butler, and Patrick Traynor.
Making USB Great Again with USBFILTER. In Proceedings of the 25th USENIX
Security Symposium, 2016.

[168] Dave (Jing) Tian, Nolen Scaife, Adam Bates, Kevin Butler, and Patrick Traynor.
Making USB great again with USBFILTER. In USENIX Security Symposium,
pages 415–430, 2016.

130
[169] J. Tian, N. Scaife, D. Kumar, M. Bailey, A. Bates, and K. Butler. SoK: “Plug &
Pray” Today – Understanding USB Insecurity in Versions 1 Through C. In IEEE
Symposium on Security and Privacy (S&P), pages 1032–1047, 2018.

[170] J. Tian, N. Scaife, D. Kumar, M. Bailey, A. Bates, and K. Butler. SoK: ”Plug
Pray” Today – Understanding USB Insecurity in Versions 1 Through C. In 2018
IEEE Symposium on Security and Privacy, May 2018.

[171] M. Tischer, Z. Durumeric, S. Foster, S. Duan, A. Mori, E. Bursztein, and M. Bailey.


Users Really Do Plug in USB Drives They Find. In IEEE Symposium on Security
and Privacy (S&P), pages 306–319, 2016.

[172] USB-3.0-Promoter-Group. Universial Serial Bus Type-C Authentication Specifi-


cation Release 1.0 with ECN and Errata, 2017.

[173] Vadim Mikhailov. uhubctl. https://github.com/mvp/uhubctl.

[174] Y. Wang, W. Yu, S. Wu, G. Malysa, G. E. Suh, and E. C. Kan. Flash memory
for ubiquitous hardware security functions: True random number generation and
device fingerprints. In IEEE Symposium on Security and Privacy (S&P), pages
33–47, 2012.

[175] Feature: WebGPU. https://www.chromestatus.com/


feature/6213121689518080.

[176] S. Q. Xu, W. Yu, G. E. Suh, and E. C. Kan. Understanding sources of variations


in flash memory for physical unclonable functions. In International Memory
Workshop (IMW), pages 1–4, 2014.

[177] Weitao Xu, Guohao Lan, Qi Lin, Sara Khalifa, Neil Bergmann, Mahbub Hassan,
and Wen Hu. Keh-Gait: Towards a Mobile Healthcare User Authentication
System by Kinetic Energy Harvesting. In Proceedings of the 24th Network and
Distributed System Security Symposium, 2017.

[178] Yi Xu, Jared Heinly, Andrew White, Fabian Monrose, and Jan-Michael Frahm.
Seeing Double: Reconstructing Obscured Typed Input from Repeated Compro-
mising Reflections. In Proceedings of the 2013 ACM SIGSAC conference on
Computer and communications security, 2013.

[179] Zhi Xu, Kun Bai, and Sencun Zhu. Taplogger: Inferring User Inputs on Smart-
phone Touchscreens Using On-Board Motion Sensors. In Proceedings of the 5th
ACM Conference on Security and Privacy in Wireless and Mobile Networks, 2012.

[180] Lin Yan, Yao Guo, Xiangqun Chen, and Hong Mei. A Study on Power Side
Channels on Mobile Devices. In Proceedings of the 7th Asia-Pacific Symposium
on Internetware, 2015.

131
[181] Bo Yang, Yu Qin, Zhang Yingjun, Weijin Wang, and Dengguo Feng. TMSUI:
A Trust Management Scheme of USB Storage Devices for Industrial Control
Systems. In Information and Communications Security”, pages 152–168, 2016.
[182] Qing Yang, Paolo Gasti, Gang Zhou, Aydin Farajidavar, and Kiran Balagani.
On Inferring Browsing Activity on Smartphones via USB Power Analysis Side-
Channel. IEEE Transactions on Information Forensics and Security, 2017.
[183] Qing Yang, Paolo Gasti, Gang Zhou, Aydin Farajidavar, and Kiran S Balagani.
On inferring browsing activity on smartphones via usb power analysis side-channel.
IEEE Transactions on Information Forensics and Security, 2016.
[184] Guixin Ye, Zhanyong Tang, Dingyi Fang, Xiaojiang Chen, Kwang In Kim, Ben
Taylor, and Zheng Wang. Cracking Android Pattern Lock in Five Attempts. In
Proceedings of the 24th Network and Distributed System Security Symposium,
2017.
[185] Qinggang Yue, Zhen Ling, Xinwen Fu, Benyuan Liu, Kui Ren, and Wei Zhao.
Blind Recognition of Touched Keys on Mobile Devices. In Proceedings of the 2014
ACM SIGSAC Conference on Computer and Communications Security, 2014.
[186] Pete Zaitcev. The usbmon: USB monitoring framework, 2005.
[187] J. Zhang, A. R. Beresford, and I. Sheret. SensorID: Sensor Calibration Finger-
printing for Smartphones. In IEEE Symposium on Security and Privacy (S&P),
pages 638–655, 2019.
[188] Lide Zhang, Birjodh Tiwana, Zhiyun Qian, Zhaoguang Wang, Robert Dick, Morley
Mao, and Lei Yang. Accurate Online Power Estimation and Automatic Battery
Behavior Based Power Model Generation for Smartphones. In Proceedings of the
8th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign
and System Synthesis, 2010.
[189] Xiaokuan Zhang, Xueqiang Wang, Xiaolong Bai, Yinqian Zhang, and XiaoFeng
Wang. OS-level Side Channels without Procfs: Exploring Cross-App Information
Leakage on iOS. In Proceedings of the 25th Network and Distributed System
Security Symposium, 2018.
[190] Xiaokuan Zhang, Xueqiang Wang, Xiaolong Bai, Yinqian Zhang, and XiaoFeng
Wang. Os-level side channels without procfs: Exploring cross-app information leak-
age on ios. In 25th Annual Network and Distributed System Security Symposium,
NDSS. The Internet Society, 2018.
[191] Xiaokuan Zhang, Yuan Xiao, and Yinqian Zhang. Return-Oriented Flush-Reload
Side Channels on ARM and their Implications for Android Devices. In Proceedings
of the 2016 ACM SIGSAC Conference on Computer and Communications Security,
2016.

132
[192] Nan Zheng, Kun Bai, Hai Huang, and Haining Wang. You Are How You Touch:
User Verification on Smartphones via Tapping Behaviors. In Proceedings of the
IEEE 22nd International Conference on Network Protocols, 2014.

[193] Man Zhou, Qian Wang, Jingxiao Yang, Qi Li, Feng Xiao, Zhibo Wang, and
Xiaofen Chen. PatternListener: Cracking Android Pattern Lock Using Acoustic
Signals. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and
Communications Security, 2018.

[194] Xiaoyong Zhou, Soteris Demetriou, Dongjing He, Muhammad Naveed, Xiaorui
Pan, XiaoFeng Wang, Carl A. Gunter, and Klara Nahrstedt. Identity, location,
disease and more: Inferring your secrets from android public resources. In
Proceedings of the ACM SIGSAC Conference on Computer & Communications
Security, 2013.

[195] Li Zhuang, Feng Zhou, and Doug Tygar. Keyboard Acoustic Emanations Revisited.
ACM Transactions on Information and System Security, 2009.

133
Appendix A

ADDITIONAL FIGURES AND TABLES FOR CHARGER-SURFING:


EXPLOITING A POWER LINE SIDE-CHANNEL FOR SMARTPHONE
INFORMATION LEAKAGE

134
Table A.1: Smartphones Used For Evaluation
Screen
Phone (Release Year) OS Processor GPU
Resolution Technology
4 x 1.5 GHz A-53
Motorola G4 (2016) Android 6.0.1 Adreno 405 1920x1080 LCD
4 x 1.2 GHz A-53
Samsung Galaxy Nexus
Android 6.0.1 2 x 1.2 GHz A-9 PowerVR SGX540 1280x720 Super AMOLED
(2012)
Apple iPhone 6+ (2014) iOS 12.1 2 x 1.4 GHz Typhoon PowerVR GX6450 1920x1080 LCD
2 x 2.3 GHz Monsoon
Apple iPhone 8+ (2017) iOS 12.1.2 Apple GPU 1920x1080 LCD
4 x 1.4 GHz Mistral

135
Table A.2: Classification Network Used for iPhone
iPhone Classification Network
Layer Operation Kernel Size
1 Input 100000x1
2 Convolution 50x50
3 MaxPool 5
4 Convolution 50x50
5 MaxPool 5
6 Convolution 50x50
7 MaxPool 5
8 Convolution 50x50
9 GlobalAveragePool -
10 Dropout 0.5
11 Dense 10

Table A.3: Classification Network Used for Android

Android Classification Network


Layer Operation Kernel Size
1 Input 800000x1
2 Convolution 100x100
3 MaxPool 5
4 Convolution 50x75
5 MaxPool 5
6 Convolution 50x75
7 MaxPool 5
8 Convolution 50x75
9 GlobalAveragePool -
10 Dropout 0.5
11 Dense 10

136
Appendix B

ADDITIONAL FIGURES AND TABLES FOR TIME-PRINT:


AUTHENTICATING USB FLASH DRIVES WITH NOVEL TIMING
FINGERPRINTS

B.1 Additional Figures and Tables

Layer Type Kernel Size # of Filters/Neurons


1 2D Convolution (1,3) 8
2 2D Max Pool (1,2) -
3 2D Convolution (1,3) 16
4 2D Max Pool (1,2) -
5 2D Convolution (1,3) 128
6 2D Max Pool (1,2) -
7 Flatten - -
8 Dropout .1 -
9 Dense - 50
10 Dense - 50
11 Dense - 2

Table B.1: Neural network architecture used for classification.

137
Appendix C

ADDITIONAL FIGURES AND TABLES FOR AN EXPLORATION OF


ARM SYSTEM LEVEL CACHE AND GPU SIDE CHANNELS

Table C.1: 1D Convolutional Neural Network Configuration


iPhone Classification Network
Layer Operation Kernel Size
1 Input 10000x1
2 Convolution 256x8
3 MaxPool 8
4 Convolution 256x8
5 MaxPool 8
6 Convolution 256x8
7 MaxPool 8
8 Flatten -
10 Dropout 0.2
11 Dense Number of Classes

138

You might also like