You are on page 1of 4

Power and Performance aware Electronic System

Level Design
*
Amal Ben Ameur, *Didier Martinot, *Patricia Guitton- François Verdier, Michel Auguin
Ouhamou, $Valerio Frascolla University Cote d’Azur, CNRS, LEAT
Intel Mobile Communications Sophia Antipolis, France
*
Sophia Antipolis, France,
$
Neubiberg, Germany

Abstract—System-on-Chip (SoC) designers face many challenges approach can be used as an effective support to the analysis of
to improve at the same time performance and energy efficiency, the performance and the power consumption at platform level,
due to the continuous increase of the architecture complexity. taking into consideration the memory system. The approach
Designers use Electronic System Level (ESL) tools and virtual developed in [3], and [4], used to describe power and clock
prototyping to face this complexity in the early step of the system aware systems, is considered in our proposed simulation model
design. Power consumption includes dynamic power and static with the aim to bring out changes of states, which can induce
power. Power consumption and performance are adversely different behaviors on the memory access sequences. On the
affected by supply voltage and frequency. This potential trade-off basis of this model, it is compared in term of performance and
cannot be studied separately.
power dissipation, not only the efficiency of various techniques
Our work enhances an existing industrial performance model
with the introduction of a new power-aware library, which allows
integrated in the memory controllers, but also the dimensioning
a combined early power and performance analysis. and the organization of the memory system.
Keywords—System-on-Chip (SoC), Electronic System Level This paper focuses on enhancing an industrial internal
(ESL), performance, energy efficiency. design framework with results taken from academia, thus it
shows a direct impact of more theoretical results into a
I. INTRODUCTION development-oriented work low, creating a virtuous cycle of
industry-academia collaboration.
Following Moore’s law, the complexity of SoCs increases The rest of the paper describes in Section II the use cases
constantly due to the continuous addition of new features that in focus, in Section III the power library used, in Section IV the
require the integration of a growing number of processing obtained results and finally in Section V the paper summary
cores and IPs in a single chip. Between 2006 and 2014 the and hints at the next steps.
average number of IPs embedded in a SoC increased from 30
to 120, whereas integrated processor cores increased from 1 to II. USE CASE PRESENTATION
20 [1]. As a consequence, the underlying power consumption
becomes more and more important. In order to face this issue, For our new methodology validation, we use an Intel
several means exist in the industry for power estimation, but proprietary pre-silicon simulation environment, used for
they are still based on low-level CAD tools, take time and cellular modems performance assessment and analysis. Fig. 1
require a rather detailed software model of the hardware shows an abstracted architecture of a cellular modem, which
architecture blocks. In fact, even if cycle-accurate tools give includes several IPs like CPU, on-chip and off-chip memories,
quite accurate power estimates, they require a high amount of interconnects and other functional blocks. Among the latter, we
simulations. Thus, transactional level architecture explorations focus on the L2 Copro block, which is composed of hardware
and simulations, appear as the most effective approach to accelerators needed for the layer two data processing, which
assess designs with rapid power estimation tools. includes a set of hardware components that we call Tiles, for
the sake of simplicity.
At the same time, literature shows that the power consumed
due to the memory system represents a growing part of the We use a stochastic modeling approach to have a model of
total chip power consumption. For example, [2] reports that the CPU behavior (traffic generated, processing duration, etc.)
during video-playback about 35% of the total energy of a before the modem software itself is available. As a result, the
Samsung Galaxy S3 I9300 is due to movements of data in the platform model is not meant to be able to execute real software,
memory hierarchy. Therefore it becomes more and more but to carry out architecture exploration and validation.
important to design memory systems that optimize at the same
time energy consumptions and the performance of each IP
accessing the memory hierarchy.
Hence, our first objective is to study how a simulation
978-1-5386-3166-9/17/$31.00©2017 European Union

Authorized licensed use limited to: University of Management & Technology Lahore. Downloaded on July 16,2020 at 20:13:56 UTC from IEEE Xplore. Restrictions apply.
Yet, in order to explore power and performance tradeoffs, it
is essential to also model the impact of the power management
strategy into the various performance models and to be able to
collect power dissipation information out of each key IP model.
This simulation framework fits quite well our need to
explore, in a first stage, the impact of various Tile scheduling
strategies as well as various power management strategies onto
the memory system performance.

III. PWCLKARCH LIBRARY


A recently proposed power modeling library called
Fig. 1. Simplified representation of a cellular modem performance PwClkARCH implements the high-level modeling approach
model proposed in [3], and [4], so to assist SoC designers with a
system level SystemC-TLM framework, inspired by the UPF
Therefore, it is important that the behavior (traffic patterns standard [5]. It is a generic framework (Fig. 3) that defines all
and processing duration) of the different IP models are concept of power and clock aware systems, needed to control
matching accurately what can be observed on real silicon SoC. power dissipation and that enables the exploration of different
The platform model activity is driven by a task scheduler fed power management strategies at ESL. The PwClkARCH is
by a task graph describing the various processing flows based on partitioning IPs into power domains (PDs) and clock
involved in the targeted use case. Fig. 2 shows a simplified domains (CDs). A CD is a group of IPs that share the same
diagram that maps software tasks to the platform blocks. All clock source [3]. It helps on defining the clock tree structure
information included in the task graph is collected during the which is composed of Digital Phase Locked Loop units
SoC profiling phase on the previous generation silicon. So the (DPLLs) and Clock Managers (CMs). Each CM takes its input
main goal of the model is to produce an accurate traffic pattern, clock from the corresponding DPLL, so to provide clock
which is necessary for the analysis of the performance of the signals to the list of associated IPs. In fact, when the CM
concurrent execution of the different IPs. receives a control signal from the Power Manager (PM), in
which the PM requests an adjustment of one domain state or
one particular IP state, CM changes the current power state by
applying division factors on its clock input. After processing
the request, the CM sends an acknowledgment to the PM
through a Clock Activity Status signal. A PD is a group of IPs
that share the same primary supply net [4]. Power monitors are
defined to capture power events and automatically update the
following appropriate power equations (1) and (2), with the
final aim of providing at the end of the simulation the power
log files, which can be plotted in diagrams to analyze the
power behavior. In equations (1) and (2), V denotes the supply
voltage, C is the capacitance that needs to be charged at each
clock transition, F is the clock frequency and R_leakage is the
leakage resistance which causes the leakage current in gates.

P_dynamic = C *V² *F (1)

Fig. 2. Platform level blocks mapped into simulation tasks


P_static = V² /R_leakage (2)

This simulation environment provides also an accurate


memory model, including all internal buffering and arbitration To connect the power model to the functional model, the
stages, memory controllers and memory devices. PwClkARCH uses a SystemC module named Power
Management Unit (PMU) that includes a PM and a set of
The use case considered is a cellular modem running in a Power Domain Controllers (PDCs). The PM implements the
high data rate LTE FDD mode, capable of reaching up to 1 power management strategy that we want to apply. In this
Gbps data throughput. paper we apply a clock gating strategy only.
Performance analysis mainly aims to verify that IPs are The library allows the implementation of a power
designed and sized properly, i.e. they have sufficient resources management strategy on top of a functional SystemC-TLM
to fulfil their tasks on time. This is done by defining relevant model, described within a virtual prototype environment.
performance indicators for each IP (e.g. number of instructions Therefore, our aim is to add a power model using library to the
per cycle, processing durations, and internal FIFO levels). performance environment introduced in section II. By
enriching this new environment with the necessary features to

Authorized licensed use limited to: University of Management & Technology Lahore. Downloaded on July 16,2020 at 20:13:56 UTC from IEEE Xplore. Restrictions apply.
study the memory system parameters and behaviors, we as a Tile switches to an idle state, its clock is gated. The
provide a new methodology to explore different memory frequency becomes no more constant and depends on the state
systems, where power and performance are primary of the Tile: active or inactive. This has a direct impact on the
constraints. dynamic power consumption of each Tile and in the same way
of the L2 Copro block as presented in Fig. 5. We can see that
in time slots where all the Tiles are inactive, the dynamic
power consumption is equal to zero. In Fig. 4 the maximum
clock frequency of Tiles is equal to 288 MHz, with the chosen
clock frequency start time, both periodicity and deadline values
used in the task graph can be well met. However, with a
maximum clock frequency of Tiles set up to 250 MHz, (green
curve in Fig. 5), the dynamic power is reduced but there is a
violation of the deadline of the task assigned to Tile 1.
Obviously reducing further the clock frequency to 200 MHz
(blue curve in Fig. 5) improves the power consumption but
tasks on 3 Tiles go beyond their deadlines. This experiments
illustrates that such approach helps also on defining the best
strategy of power management (here limited to clock gating
applied to Tiles) associated with the functional activity of the
Tiles (due to the scheduling of tasks on Tiles). As soon as the
Fig. 3. PwClkARCH Library Structure performance constraints are met, we can deduce the expected
power profile generated by the system. If more than one
scheduling algorithm is available and they all show similar
IV. SIMULATION RESULTS performance, then power figure become the key parameters for
One of the first aim of this work is to prove that the power- the final decision on which one is to be picked.
aware library described in Section III, and the performance
model described in Section II fit together, and then to show the
benefit of applying clock gating on the L2 Copro block in order
to reduce dynamic power consumption while verifying that
performance and timing still fulfil the hard real time
requirements.
In our case, we have defined only one PD and one CD
including the eight Tiles of the L2 Copro block, enumerated
from 1 to 8. Thus in our PMU, we instantiate one CM and one
PDC. The connection between the performance model and the
power model is done through the PM of the PMU unit. The
scheduler has the knowledge of the task’s assignment on the
different IPs. When it schedules a task on a Tile, the scheduler
sends the information (Tile’s number and Tile’s state) to the Fig. 4. Clock Frequency Variation per Tile (x= time [ns], y= Freq [Hz])
PM. A Tile is inactive when it’s not running a task and when
there is no task in its pending task queue. The PM takes this
information and turns off the clock of the corresponding Tile if
the state received is inactive, or turns it on if the state received
is active. In this use case, only clock gating is applied because
idle periods of the L2 Copro, which allow power gating to be
applied, are not long enough to save power.
The objective of this use case is to identify the sufficient
clock frequency associated with the scheduling technique
developed in the scheduler, which allows to satisfy deadline
constraints, and further, to obtain in parallel the associated
power profiles. The task graph activation period is 1ms, the
deadline is set to 0.9 ms for some Tiles and to 0.75 ms for the
other blocks.
Using PwClkARCH, we plot the frequency variation per Fig. 5. Total Dynamic Power Consumption of all Tiles (x= time [ns], y=
Tile, as illustrated in Fig. 4, after applying clock gating on each P_dyn [mW])
Tile according to its active/inactive state. It can be observed We need to tune our power model and to include the
that at the beginning of each time slot, all the Tiles are busy, so missing models and especially the memory system. Results
their clock frequency is set up to the maximum value. As soon show that this is feasible and there is an interest in doing so.

Authorized licensed use limited to: University of Management & Technology Lahore. Downloaded on July 16,2020 at 20:13:56 UTC from IEEE Xplore. Restrictions apply.
V. CONCLUSION
In this paper we present intermediate results of our work on
providing a joint performance-energy consumption simulation
environment. This model helps making early decisions on the
choice of the most suitable architecture (memories size,
bandwidth, frequency, etc.) and of the scheduling strategies in
communications SoCs.
The next step of our work will focus on the introduction of
a memory model, within our power model, which allows a
global analysis of the system, in term of performance and
power, knowing that the power strategy influences the
frequency of the memory requests.

REFERENCES
[1] Harry Foster, Navigating the Perfect Storm: New Trends in
Functional Verification, in 10th International Haifa Verification Conference,
HVC 2014 Haifa, Israel, November 18-20, 2014
[2] C. W. D. Pandiyan, C.J. Wu. Quantifying the energy cost of data
movement for emerging smart phone workloads on mobile platforms. In IEEE
International Symposium on Workload Characterization, October 26-28,
Raleigh, North Carolina, 2014
[3] H. Affes, M. Auguin, F. Verdier, A. Pégatoquet, “Methodology for
inserting Clock-Management strategies in Transaction-Level Models of
System-on-Chips”, Forum on specification & Design Languages FDL,
September 14-16, Barcelona, Spain, 2015
[4] Mbarek, O., Pegatoquet, A., and Auguin, M. A methodology for
power-aware transaction-level models of systems-on-chip using upf standard
concepts.PATMOS (2011), J. L. Ayala, B. García-Cámara, M. Prieto, M.
Ruggiero, and G. Sicard, Eds., vol. 6951 of Lecture Notes in Computer
Science, Springer, pp. 226–236.
[5] Unified Power Format (UPF 2.0) Standard: ‘IEEE standard for
design and verification of low power integrated circuits’. IEEE 1801TM,
March, 2009

Authorized licensed use limited to: University of Management & Technology Lahore. Downloaded on July 16,2020 at 20:13:56 UTC from IEEE Xplore. Restrictions apply.

You might also like