You are on page 1of 8

# Using GPU Technologies to Drastically

## Accelerate FDTD Simulations

Introduction
Wireless technologies play a significant part of the world we live in and
have quickly developed from obscure and mysterious to openly accepted
and often demanded. Cell phones, once a status commodity, are now
common world-wide. GPS devices, communicating with satellites over
10,000 miles away traveling at thousands of miles per hour, are found in
numerable devices giving users real-time, precise location. WIFI stations
communicate with a host of devices, providing untold conveniences.
Doctors gain precise detail about the inner workings of patients for
diagnosis and treatment. Uncounted devices operating simultaneously
and in close proximity necessitates precision and intelligence to ensure
correct and safe functionality. However, time to market for high tech
devices directly affects competitiveness and profitability. Is it possible
to be accurate and still get to market quickly? While the answer may
be complex, many may benefit from using a GPU accelerated Finite
Difference Time Domain method as described in this paper.

1
Overview of Finite Difference Time
Domain Method
The Finite Difference Time Domain (FDTD) method has been utilized
over the last several decades and become increasingly more prevalent
in scientific research and technical industry. The origin of the FDTD
method is generally attributed to Kane Yee who, in a paper published
in 1966 [1], described a method for computing Maxwell’s Equations
discretely in a time-stepping manor. This technique was later expanded
and named by Allen Taflove[2]-[4]. The FDTD method is virtually unique
to EM simulation methods because it directly implements Maxwell’s Curl
Equations, which models electromagnetic fields at the most elementary
level. Since the method is fundamentally sound, FDTD is often used to
verify results originating from faster, assumption based techniques. The
method has been applied to problems ranging from kilohertz to visible
light. While accurate, the FDTD method has inherent obstacles that have
kept it from being universally used. For example, the entire computation
space must be evaluated at each time step and the grid dimensions must
be sufficiently small to accurately model the signal propagation. For
effectively large project spaces, FDTD-based codes may become memory
intensive and relatively slow.

## Figure 1: A CAD representation of an F/A-22 Raptor rendered in XFdtd version

7.0.3. CAD images may be imported or created in geometry space and easily
converted into FDTD grid.

## Remcom’s XFdtd® is an EM solver based on the FDTD method and has

been used significantly to model structures that require a high level of
fidelity. For the FDTD results of this paper, Remcom’s XFdtd version 7.0.3
was utilized.

2
Overview of GPU Technology
Graphics Processing Unit (GPU) technology has exploded over the last
decade. Why? Because video gaming connoisseurs have demanded and
been willing to pay for it. High end GPUs perform significant quantities
of computations in order to render high resolution images and action
sequences in a seemingly seamless manner. GPUs take advantage
of parallel processing, or threading, in order to render calculations
simultaneously. In fact, GPUs may have hundreds of threads operating
calculations at any given time.

## Figure 2: GPU performance has grown at a much faster rate

than the modern CPU. Data provided by NVIDIA.

## Years ago, a concept of general-purpose computing on a GPU (sometimes

referred to as GPGPU) began. Initially, engineers had to “trick” the GPU
into a graphics format even though graphics were not the desired result.
While there were a number of successes in these attempts, the difficultly
in developing was prohibitive to many. Those who did succeed were able
to see significant speed improvements that kept development interest high.
Towards the end of 2006 NVIDIA launched its CUDA (an acronym for
Compute Unified Device Architecture) technology, which was designed
to make GPU computing truly general-purpose. Today, GPU technology is
used increasingly across many industries as a way of speeding up time
intensive calculations. Generally, methods that involve inherently parallel
computations, such as FDTD, exhibit a significant amount of speedup
using this method.

3
Quantifying Speedups Using FDTD
on the GPU
Several variables determine speedup using GPU accelerated simulations.
One factor is the specific hardware used in simulation. Figure 2 describes
GPU performance over the last seven years; there’s a significant
performance increase with each device release. One would expect to see
approximately 2x performance benefit when comparing the Tesla T10 with
the Tesla G80.

## To the left is NVIDIA’s Tesla C1060

computing board—their latest
computation specific GPU with
4 GB memory capacity.

## To the right is the Tesla

S1070 computing system
with four times the capacity
of the single C1060.

## Another consideration for comparing CPU to GPU timing is identifying

calculations performed on the GPU(s). XFdtd, like many scientific tools,
saves data at designated intervals depending on the types of results
requested by the user. When saving data, field results are pulled from
the GPU and saved by the CPU rather than the graphics card(s). Typical
saves may not radically alter the overall simulation time but may when
significant amounts of data are requested—such as large volume SAR
calculations or multiple steady state frequency extractions. Bus speeds and
system RAM contribute to the overall performance of GPU simulations—
Remcom generally recommends twice the system RAM as that available
on the GPUs. To most accurately compare technology performance,
data saves should be minimized. However, for individual justification for
changing technology, data saves should be considered to the extent that
they are used in real application simulations.

## Now, we consider the actual timing comparisons for some examples.

The combination of these examples begins to showcase the benefits of
coupling the GPU technology with the accuracy of FDTD simulations.

4
8x8 Array of Patch Antennas
For the first example, we consider a
patch antenna array built into an 8x8
configuration. This specific array is
detailed more fully by S. Bellofiore, et al.
[5]. The overall memory requirement to
simulate this project was about 233 MB.
Steady state far field data was requested
during simulation time and a single
Figure 3: Amplified portion of frequency source was used.
8x8 patch antenna array with
far field pattern representation.
The benchmark for this simulation was
an HP xw9400 with dual-quad Opteron
2216 running 64-bit Red Hat Linux.
By contrasting simulation times with the GPU accelerated runs using
NVIDIA’s Tesla C870, Quadro FX 5600, and Tesla C1060, the resulting
speedups were on the order of 14x, 18x, and 47x, respectively. At a
47x speedup, simulations that would typically take an hour to complete
would be finishing up in just over one minute and 16 seconds.

Rotman Lens
The Rotman Lens can be costly to simulate since the device is electrically
large and contains a relatively complex geometry along one plane. The
lens shown in Figure 4 is resolved in a
geometry that requires about 1 GB in
RAM. A broadband source was used
and only S-Parameters were requested
during simulation time.

## GPU simulations were run using one

and two NVIDIA Tesla C1060 cards.
This produced a performance increase
of about 49x and 75x, respectively. Figure 4: Rotman lens as
generated by Remcom’s RLD
Due to its nature, the Rotman Lens software and imported into XFdtd.
may run for multiple days to resolve a
single device. With a 75x speedup, a simulation requiring three days to
complete would complete in just less than one hour.

5
Vivaldi Quad Flared Horn Antenna Array
The array for this next analysis came
from a 1994 paper written by E.
Thiele and A. Taflove [6]. The paper
goes through a number of examples
using a Vivaldi Flared Horn, but for
this example, only the final quad horn
antenna array is simulated. The project
space for this example is a grid region
Figure 5: Array of Vivaldi Quad of 873 x 559 x 174 and requires about
Flared Horn Antennas with 3D
2.5 GB of RAM. A broadband source
antenna pattern displayed.
was used and steady-state far field
pattern was requested for 10 GHz.

## When running this on the Tesla cards, performance speedups of 43x

and 54x were achieved for one and two cards, respectively. This was a
significantly large project and a noticeable amount of data was requested,
but we were still able to realize more than a 50x speedup. This could be
the difference between two weeks or six and one quarter hours.

Cell Phone
The simulation of a cell phone represents an interesting challenge for
EM simulation tools. The modeling of internal conductors and dielectric
components for most handheld devices requires a high degree of fidelity.
A typical simulation may require the calculation of SAR information, which
carries a significant amount of data transfers.

## For this case, a project was used which carried

a memory footprint of about 750 MB of RAM. By
contrasting simulation times with those achieved
by using NVIDIA Quadro FX 5600, speedups of
29x and 49x were realized by using one and two
GPUs, respectively. When changing to the NVIDIA
Tesla C1060, the speedup values increased
to 54x and 88x. To put that in perspective, a
simulation that requires 24 hours to run an a
single CPU would be reduced to only requiring Figure 6: Image of cell
16 minutes and 22 seconds for an 88x speedup. phone with SAM head.

6
Summary
The marriage of the FDTD method and the GPU technology ensures
a strong combination between accuracy and speed. The overall time
saved using this combination should benefit users of FDTD. By taking
advantage of the GPU speeds, weeks, if not months, could be saved
getting research or product to market.

90
80
GPU Speedup Over CPU
70
60
50
40
30
20
10
0
Quadro FX 5600 Tesla C 870 Tesla C 1060 2x Tesla C 1060
8x8 Patch Array 17.81 13.74 46.79 45.83
Rotman Lens 48.94 74.45
Cell Phone 29.19 13.17 54.26 87.99

References
[1] K.Yee, “Numerical Solution of Initial Boundary Value Problems Involving Maxwell’s Equations in
Isotropic Media,” IEEE Trans. Antennas Prop., AP-14, 1966, pp. 302-307.

## [2] A. Taflove and M. E. Brodwin, “Numerical Solution of Steady-State Electromagnetic Scattering

Problems using the Time-Dependent Maxwell’s Equations”. Microwave Theory and Techniques,
IEEE Transactions, 1975, pp. 623–630.

[3] A. Taflove and M. E. Brodwin, “Computation of the Electromagnetic Fields and Induced
Temperatures within a Model of the Microwave-Irradiated Human Eye”. Microwave Theory and
Techniques, IEEE Transactions, 1975, pp. 888–896.

[4] A. Taflove, “Application of the Finite-Difference Time-Domain Method to Sinusoidal Steady State
Electromagnetic Penetration Problems”. Electromagnetic Compatibility, IEEE Transactions, 1980,
pp. 191–202.

## [5] S. Bellofiore, J. Foutz, R. Govindarajula, I. Bahçeci, C. Balanis, A. Spanias, J. Capone, and

T. Duman, “Smart Antenna System Analysis, Integration and Performance for Mobile Ad-Hock
Networks (MANETs)”. IEEE Trans. Antennas Prop., AP-50, 2002, pp. 571-581.

[6] E. Thiele and A. Taflove “FD-TD Analysis of Vivaldi Flared Horn Antennas and Arrays”. IEEE
Trans. Antennas Prop., AP-42, 1994, pp. 633-641.

7
The Remcom Difference
Remcom has been leading the EM market with innovative simulation
and wireless propagation tools for 15 years. In addition to our flagship
product, XFdtd, we offer a suite of innovative software and services,
accessible and responsive support provided by a staff of experts, and
demonstration, visit www.remcom.com

Customer Focused
Remcom is enthusiastically devoted to listening to our customers and
understanding their needs, building requested features directly into
the software with each new release. And since we’ve been providing
EM expertise and solutions since simulation software became a reality,
you can be confident that many years of experience have gone into
the design and functionality of the products we create and the way we
support them.

Personal Attention
Our reputation for providing excellent and accessible technical support
is a result of the talent we recruit and our willingness to put our best
people in touch with customers in need. When you call Remcom
for support or even just for advice, you speak directly with our most
respected engineers.

## XSite — Remcom’s Monthly e-Newsletter for

EM Professionals
Subscribe to XSite, Remcom’s monthly e-newsletter, to be notified of
product announcements and special offers, new whitepapers and
technical articles, support tips, and upcoming events.

## Remcom, Inc. +1.888.7. REMCOM (US/CAN)

315 S. Allen St., Suite 222 +1.814.861.1299 phone
State College, PA 16801 USA +1.814.861.1308 fax

sales@remcom.com
XStreamWhitepaper-1009