Inria / Raweb 2004
Team: R2D2



Team : r2d2

Section: New Results

Keywords: reconfigurable architecture, grain of calculation, low-power consumption, Network-on-Chip, NoC, CDMA, sensor network, multiple-valued logic, MVL, System-on-Chip, SoC.

New architectures and technologies

Our studies, motivated by the constraints of high-performance, flexibility, and low-power consumption, focus on the following aspects:

DART reconfigurable architecture

Participants: Sébastien Pillement, Olivier Sentieys.

The definition of the DART architecture led to the Ph.D. thesis of Raphael David in 2003 [3]. In order to validate the theoretical aspects and simulated performances of this new computation paradigm through a silicon prototype, a collaboration has started with the LIST laboratory of CEA. The aim of this joint research project is to integrate a DART cluster implementing the channel estimation in the 802.11a Wireless LAN norm. The algorithmic complexity of channel estimation is 1784 MOPS and 356 MDPS (million of division per second). A VHDL model of DART at the register-transfer level has been designed. It is compatible with the SystemC cycle-true bit-true simulator. The synthesis of a DART cluster including six reconfigurable datapath and two dedicated dividers on a 130 nm CMOS technology from STMicroelectronics, leads to a 200 MHz clock frequency (i.e. 4800 32-bit MOPS plus 400 MDPS) for less than 10 square millimeters.

Other studies, such as the coupling of a DART cluster with other resources on a SoC, are pursued. A first model of coupling with a general purpose microprocessor is proposed in [16]. This demonstrates the potential of reconfigurable accelerators in order to increase calculation power and flexibility of a SoC. This work is done through a collaboration with ENIS (École Nationale d'Ingénieur de Tunis, F. Ben Abdallah Ph.D.).

Memory hierarchy in specialized SoC

Participants: Daniel Chillet, Olivier Sentieys.

Our research aims at defining a global memory organization model suited to SoC and a methodology which allows the designer to explore different memory organization solutions.

SoC architectures already propose large on-chip memory, with several memory banks and memory hierarchical levels (e.g. cache memory). In these systems, the main problem concerns the memory exploration in relation with the application needs. Several problems could be addressed in this context, such as cache, scratch-pad, and multi-bank memory. We focus our research on designing methodologies for optimal memory hierarchies. A first model has been defined for dedicated SoC and for large reconfigurable architecture such as FPGA circuits. This work is done through a collaboration with the École Nationale Polytechnique d'Alger (L. Abdelouel Ph.D.).

Reconfigurable architecture for control intensive applications

Participants: Stéphane Chevobbe, Olivier Sentieys.

Previous works have shown that reconfigurable architectures are particularly well-adapted for implementing regular processing applications. Nevertheless, they are inefficient for designing complex control systems. We work on a new concept of reconfigurable dedicated to the control architecture which can manage instruction parallelism as well as task parallelism. As the parallelism of an application can be well described by Petri nets, the architecture is able to directly implement Petri net on its structure. The core of the architecture is a network of asynchronous automatons. As the number of cells is limited, the concept of dynamic reconfiguration has been introduced to virtually grow the size of the architecture. Furthermore, the architecture is auto-adaptative, so it manages itself the reconfigurations according to the geometry of the application graph. After the definition of the paradigm of the RAMPASS architecture, our work consisted in evaluating the gain (performance, flexibility, power) of such an architecture. We realized two models of the architecture, one in SystemC to simulate the behavior of the architecture, and another one in RTL VHDL to estimate the performance and the area of the architecture.

NoC design using advanced mobile telecommunication techniques

Participants: Jean-Marc Philippe, Sébastien Pillement, Olivier Sentieys.

The increasing need of a low-power and high-speed interconnect lead us to investigate new signaling concepts. Among them stands the PAM (Pulse Amplitude Modulation) technique which consists in having multiple voltage levels encoded on a single wire. We have designed a quaternary link using custom transistors to overcome some of the interconnect problems. The foundry process is modified to meet the voltage threshold requirements of our transistors. The quaternary link is composed of a binary-to-quaternary encoder and a quaternary-to-binary decoder. The encoder converts two binary signals into a quaternary one and the decoder converts back the quaternary signal into two binary ones.

The SPICE simulations of the circuit show a great improvement in terms of energy consumption for global interconnects. This is due to the reduction of the voltage swing of some transitions. The energy consumption reduction is about 50% with a 10mm wire and our system consumes less energy even for a 1mm wire (compared to a full-swing binary system). Another advantage of this approach is that the transistor count for the whole system is very low and represents 22 transistors (10 for the encoder and 12 for the decoder). This link also reduces by two the number of wires needed to transmit the information. This enables us to increase the inter-wire distance to reduce the crosstalk noise. This contributes to the reduction of the interconnect area.

We have proposed an analytical energy consumption model for quaternary links. This model enables to predict the dynamic energy consumption of a complete link as a function of the wire length, electrical and technological parameters and statistic distribution of the binary inputs.

Wireless sensor networks

Participants: Mickaël Cartron, Olivier Sentieys.

The aim of our research is to optimize the energy efficiency of a wireless sensor network at the architectural and algorithm levels. We modeled the behavior of a low-level communication system, from the physical level to the packet retransmission system. The target of the processing is a dedicated architecture (ASIC), because of its lower power consumption compared to microprocessors or FPGAs. We modeled the bit-error-rate performance of the communication system and its power consumption as a function of several parameters (noise power, distance, packet size, amplification level). From analytical expressions of the performance and power separately, we can deduce the value of the power with a constraint of performance, which can be expressed as the energy consumed per successfully transmitted useful bit. With the help of this expression, we have highlighted an optimal operating point as a function of the input parameters. Recent results have shown that the use of this configuration for the architecture and for the communication system parameters allows to save up to 75% of the power, compared to the worst case technique.

Multiple-Valued Logic architectures and circuits

Participants: Daniel Chillet, Ekue Kinvi-Boh, Olivier Sentieys.

The design of MVL functions rests on the SUpplementary Symmetrical LOgic Circuit structure (SUS-LOC). A library of basic MVL logic, memory and arithmetic cells has been designed, characterized and compared with classical binary CMOS implementations.

We described VHDL models of ternary basic logic and arithmetic cells and of some arithmetic processing units (adder, multiplier, shifter). These models take into account the variation of the power consumption and the delay according to the cell output capacitive load and they are used as components when describing the structural architecture of a ternary DSP core.

A collaboration with the SOI group at the Catholic University of Louvain-La-Neuve (UCL) has allowed to implement ternary functions in a 2$ \mu$ SOI CMOS process using the SUS-LOC concepts, with a first aim to validate our experimental estimations. A 64-tert SRAM and a 4-tert adder have been designed and fabricated at UCL. These two circuits represent the very first full-ternary circuit ever fabricated. They have been successfully tested using specifically fabricated test equipments.