Poll

What is your preferred platform for FPGA Design Flow ?:

Ultra Low Power Design Techniques for FPGAs

Power consumption is one of the most significant concerns apart from design complexity in FPGA design today. The concept of achieving System-On-Chip (SoC) functionality in an FPGA cuts down the design complexity yet power consumption is quickly becoming the most critical issue for the design community.

The world's consumer electronics market is moving towards more portable products such as Portable Media Players (PMPs), MP3 Players, TV-On-Mobile, digital camera, camcorders, etc, where battery life is one of the most attractive design specifications for customers. Moreover, the requirement of lower power consumption increases drastically due to the encapsulation of electronic product without active cooling mechanism. Therefore, power reduction techniques must be considered during the design and component selection phase.

FPGA Power Components

There are four basic power components that need to be considered when selecting the appropriate FPGA technology for a low power application. They are Power Up or In-Rush power, Configuration power, Static power, and Dynamic power.

In-rush power: - During the startup stage, the device requires a substantial amount of logic array current for a specific duration in order to ramp up VDD to the correct voltage. This initial high current is called inrush current often seen as a power spike. The duration of VDD ramp up depend on the amount of current available from the power supply. When the VDD supply reaches 90 percent of its value, this initial high current is not required any longer.

Configuration Power: - The Configuration power is the power required to configure the device. This phase is typically required only for SRAM-based FPGAs where the configuration data for the device is stored in an external non-volatile memory such as an EPROM or Flash device. During configuration and initialization, the device draws current as the routing and look-up table (LUT) configurations are read from memory into the device, resets registers, enables I/O pins, and enters operating mode.

Static power: - The static power is proportional to the static current that flows when it is powered up, configured, and doing nothing, without activity at I/Os and clock inputs.

Dynamic Power: - Dynamic power is very sensitive to switched capacitances, mainly routing capacitances and it is a function of operating frequency and switching of capacitive loads, such as I/Os, internal gates, registers, clock lines, buffers, and internal memory accesses.

Technology and Device Selection

The technology selection is the first and most important step in low power design and it has a significant impact. There are mainly three types of FPGA technologies, interconnect, available such as Antifuse, SRAM and Flash.

Antifuse-based FPGAs consumes less static power than SRAM-based devices due to its Fine-Grained Architecture. The number of transistors requires for an SRAM-based FPGA is more than Antifuse-based FPGAs to implement equivalent functionality. This is due to the fact that SRAM-based FPGAs uses six-transistor SRAM cells to configure the interconnect and logic cells. Antifuse-based FPGAs do not require such transistors, and therefore draw orders of magnitude less power in static mode.

Antifuse FPGAs offer much lower dynamic power than SRAM-based FPGAs due to low switching capacitance and resistance of the antifuse element. Antifuse FPGAs can reduce the dynamic power consumption up to a factor of five when compared to the equivalent SRAM-based solution.

In addition to static and dynamic power, SRAM-based FPGAs have two more power component than Antifuse FPGAs such as in-rush power and configuration power.

So it is very clear that Antifuse FPGAs can offer ultra low power due to the architectural and interconnects features. Among the Antifuse FPGA vendors Quicklogic is the Market leader in supplying the lowest programmable solutions for this very same reason.

The anti-fuse technology implemented in QuickLogic devices is a proprietary technology called ViaLink. The ViaLink anti-fuse permits low routing capacitance and resistance enabling low overall power consumption and better performance than any other FPGA technology.

Quicklogic has recently released PolarPro FPGA family, with standby currents < 10 mA, to address applications that demand ultra-low power, small packaging, and high design security. PolarPro FPGAs exceed the functionality previously addressed by CPLDs and FPGAs with significant power and cost savings. PolarPro offers new and innovative logic cell architecture, versatile embedded memory with built-in FIFO control logic and advanced clock management control units.

Architecture of PolarPro FPGAs

The low power architecture of PolarPro provides a feature-rich alternative to CPLDs and ASICs for handheld and portable applications. In addition, the low power consumption of PolarPro FPGAs enables designers to reduce system costs by using smaller, less costly voltage regulators and power sources.

Power Reduction techniques

The low power design techniques are classified into two types, device dependent techniques and device independent techniques. The device dependent techniques are Very Low Power (VLP) Mode, Memory Power Reduction and Clock Power Management.

FPGAs consume a significant amount of power during standby due to leakage currents. PolarPro devices have a special Very Low Power (VLP) pin that can enable a low power sleep mode, which substantially lowers power consumption of the device. When device is placed in VLP mode, state of all IOs, memory, and register values are retained. Since the previous values are retained, the device resumes full operation in approximately 250 µs after exiting the VLP mode.

The embedded memories in FPGAs consume a substantial amount of total design dynamic power. Memory blocks consume dynamic power as a result of internal RAM clocking. Unlike traditional embedded SRAM, PolarPro devices offers write chip select (WCS) and read chip select (RCS) signals along with write enable. The memory read or writes operations are disabled when these chip selects are de-asserted which reduces power consumed by the memory and its peripheral.

Clock networks also contribute a significant portion of dynamic power consumption due to their high switching activity and long routes across the device. PolarPro device offers five column clock buffers in each macro cell column. So dynamic power consumption can be reduced by reducing number of column clock buffers and disabling the unused column clock buffers.

PowerAware Placer, part of QuickLogic QuickWorks development software, is a tool that reduces dynamic power consumption by giving priority to power consumption throughout logic placement. The tool applies unique placement algorithms to cut down the number of column clock buffers by minimizing the number of logic columns in the design and disabling the unused column clock buffers for power savings.

The Device Independent techniques are Glitch Power minimization, FSM decomposition, Guarded Evaluation, Retiming, State assignment and re-encoding, Signal Gating, and Bus Power.

Glitch Power Reduction

Glitches within the configurable logic cell causes significant amount of power consumption due to faster switching activity. A glitch is dependent on the logic depth of the circuit and usually occurs when there is a mismatch in the arrival times of the gate inputs due to delay imbalances. Pipelining and Logic Depth Reduction are the techniques used for minimizing the glitches.

Pipelining is one technique used to reduce design glitches by inserting registers into a long combinational path. This will increases the speed of operation and reduces the depth of logic but adds latency.


 

Logic Depth Reduction is second technique used to reduce glitches and its propagation. The designer has to change the coding style in such a way that the fast switching signals comes only at bottom level of the logic. This can be easily achieved by re-ordering the construct in verilog/VHDL.

 

Equal Switching

 
Unequal Switching

 

Signal Gating and Clock Gating

  

This is the most commonly used method for power reduction to mask the unwanted switching activity from propagating. Address or data buses, and signals with high frequency or high glitching are usually preferred for gated logic implementation. . A typical gated logic is shown in figure.

The clock network in a sequential circuits cause more power dissipation, which is the only signal toggles all the time and tends to be highly loaded. So clock gating, mask the clock whenever the circuits is not in use, can be applied to sub circuits and whole system as well to reduce the clock power. But clock gating may cause glitches and adds up some delay on clock network, which lead to setup and hold time violations. So the designer has to be taken care these issues while doing the place and route of gating logic.

Bus Power

Buses are essential part of any digital system where more data processing is required. Address Bus, Data Bus and Control Bus are the common signal names in most of the digital system. A bus in a digital system causes significant amount of power consumption due to faster switching activity and large capacitive loading. If number of buses in the design is more, the power consumption will be more.

So reducing the number of buses in a design is the most effective solution to reduce the bus power. Figure shows the long data bus sharing by time multiplexing basis, MUX/DEMUX logic, where S1, source one, uses even cycle and S2 uses odd cycle.

Guarded Evaluation

 

This is the most commonly used method for power reduction to stop input switching activity from propagating whose output is not used. Mask the switching activity by adding latches at the inputs if output is not used. Figure shows an example for Guarded Evaluation; consider an adder whose output is used only under certain conditions. So input switching can be stopped whenever the condition is not satisfied.

Retiming

The datapath is the highest contributor of total dynamic power consumption in any design and it is directly proportional to input clock frequency and length of the path, which is directly proportional to capacitance. So this will lead to significant amount of power consumption in critical path. The retiming can be applied to reduce the critical path and thereby reduce the power consumption. 

Circuit before retiming

Circuit after retiming

Figures illustrate an example of typical retiming method. Flip-flops are moved across to combinational circuit without changing the combinational structure.

Either modifying the RTL code or putting the constraint in the Precision RTL can do retiming.

Specifying Retiming in Precision RTL: - Retiming is not automatic in the Precision RTL tool, by default, retiming is turned off. So designer has to be turned on the retiming algorithm for a design. This can be done in one of the following ways:

Open the Project Setting dialog box, as shown in figure, by clicking on the Setup Design icon on the Design Bar.

For batch operations, designer can enable retiming with setup_design retiming command.

Specify the dont_retime attribute on specific registers or modules in the VHDL/Verilog source code.

State assignment and re-encoding

The power consumption in FSM design is depending on the number of toggles of flip-flops during the state transition. So power consumption can be reduced by proper selection of FSM encoding style.

FSM can be implemented in a number of ways; encoded state (Binary and Gray) and one-hot encoding are commonly used. Encoded state machines (Binary and Gray) need less number of flip-flops than one-hot state machines, but need more combinatorial logic to generate the flip-flop inputs. However, gray code is better among the encoded state machines due to less number of toggles. Figure illustrates the state transition of binary encoded FSM and Gray encoded FSM, only one flip-flop is toggling at any transition in gray coded FSM.

Binary Code

# Toggles

Gray Code

# Toggles

000

3

000

1

001

1

001

1

010

2

011

1

011

1

010

1

100

3

110

1

101

1

111

1

110

2

101

1

111

1

100

1

 

However, One-hot state machine need more number of flip-flops than encoded state machines but it consume less combinational logic and less switching. So One-hot state machine is well suitable for FPGAs due to their register-rich architecture and speed.

Even though Precision RTL is providing an option to select the desired encoding style while synthesizing but the most effective way is to write into the HDL code.

FSM decomposition

State assignment, gated-clocks and Guarded Evaluation, input disabling, are the most commonly used low-power design techniques for FSM implementation. However, clock gating provides FSM with lowest dynamic power consumption than other technique. The designer should be decompose the FSM into two or more sub-FSMs in such a way that only one sub-FSM is clocked at a time. A simple communication protocol is needed for sub-FSMs to interact with each other, which handle the activation and de-activation of the sub-FSMs. Figure shows the typical example of FSM decomposition.

QuickLogics ViaLink technology dramatically reduces both dynamic and static power compared to conventional FPGAs based on SRAM and Flash technology without compromising on the performance. The additional power reduction techniques can be applied to the system where lower power is most critical.

Facebook  Twitter  Linkedin  YouTube      RSS

 

Check out FPGA related videos

Find Us On Facebook