Page 1

A heterogeneous reconfigurable platform for cognitive radio systems Amor Nafkha, Julien Delorme, Renaud Seguier, Christophe Moy, Jacques Palicot SUPELEC/IETR Av. de la Boulaie CS 47601 F-35576 C´esson-Sevign´e CEDEX, France Tel. : + Fax : + Email : (amor.nafkha,julien.delorme,renaud.seguier,christophe.moy,jacques.palicot)

Abstract— In this paper, the first statement addresses a unified multistandard baseband structure. Moreover, authors define multiple HW blocks shared between several communication standard to minimize hw/sw implementation. The second statement describes a rapid prototyping methodology based on the SynDEx tool, suitable for transformation-oriented systems and heterogeneous multi-component architectures. To manage different radio contexts (multigranularity of configuration), authors adopt a hierarchical management configuration model. The last part of the article deals with cross-layer design in the context of a video application over Sundance FPGA/DSP platform, where the authors illustrate the validity of their approach for Cognitive Radio. Index Terms—SDR, SoC, partial reconfiguration, telecommunication, SynDEx, AAM, GPP, DSP, FPGA.

I. I NTRODUCTION As already foreseen by Mitola [1], a Cognitive Radio is the final point of software-defined radio platform evolution : a fully reconfigurable radio that changes its communication modules depending on network and/or user demands. His definition on reconfigurability is very broad and we only focus on the heterogeneous reconfigurable hardware platform for Cognitive Radio. Software Defined Radio (SDR) basically refers to a set of techniques that permit the reconfiguration of a communication system without the need to change any hardware system element. The goal of SDR is to produce communication devices which can support several different services. These terminals must adapt their hardware structure in function of the wireless networks such as GSM, UMTS, wireless LAN standards like IEEE 802.11a/b/g As a consequence, communication systems have to integrate more and more functionalities under the constraint of time to market which have to keep shorter as possible. For example, cell phone integrates plenty of applications like mp3 player, camera, video recorder, wireless communication (bluetooth, WiFi, IR, ...), games in addition to its basic functionality. These System on Chip (SoC) have to embed all these new applications and the different standards of communication (GSM, UMTS, 3G,...). By this

way, the silicon requirements for this SoC increase dramatically, implying also a bigger power consumption. It is necessary to bring flexible solutions for these SoC application requirements in the aim to propose the best trade-off as possible between performance, area, power consumption and hardware resources. SDR approach represents an essential evolution of communication systems to tackle the growing demands of systems interoperability and services convergence of multistandards environment. These evolutions are mainly driven by silicon progress which offered more and more hardware resources to the future electronics devices. One of our approach, is to analyzed several standards of telecommunication like GSM, UMTS or 3G and point out the common operators between these standards. After that, a SDR platform could be realized by reallocating processing elements onto the heterogenous platform in the way to switch from one standard to an another one by reusing the common elements. So, Software Defined Radio systems (SDR System) need to be reconfigurable to address the needs of flexibility and adaptability of SDR applications [3]. Moreover, current wireless applications architectures are composed of several heterogenous resources (DSP, GPP, FPGA, ...). There are several reasons for dealing with different types of resources that compose a SDR System. Application requirements, for example processing demands, have to be managed properly to optimized correctly the use of hardware resources in the way to reach the expected performances, reducing power consumption and increasing efficiency. These constraints can be achieved by designing systems mixing processors, DSP, FPGA, or any other type of reconfigurable computing resources. With this kind of heterogenous architecture, it is easier to efficiently meet the requirements of the large variety of signal processing. On the other hand, a methodology have to be used to correctly manage dynamic resource allocation. in this article, we propose to use SynDEx tool to reconfigure dynamically hardware resource assignment regarding the application requirements. The hardware architecture used is a sundance platform composed of DSP and FPGA. The Algorithm Architecture Matching (AAM) tool

is used to find the best hardware adequacy for this architec- architecture for the baseband transmitter based on SDR ture regarding the different modules which composed the principles. Depending on the requirements of the use case current signal processing chain in used. of figure 7, an optimum access method has to be selected The remainder of this paper is organized as follows. In for each step of scenario. The digital baseband architecture the section II we present video application used with the of the reconfigurable transmitter has consequently to deal several standards of telecommunications used. In section with the following algorithmic challenges : – Different modulation schemes : OFDM for Wireless ?? we present the SynDEx tool for the AAM. Then in secLAN, W-CDMA for UMTS and GMSK for GSM. tion IV, we present the Sundance hardware platform used to find the best hardware adequacy for this architecture re- principles. Depending on the requirements of the use case – Different data rates : up to 54 Mbit/s for Wireless to realize the implantation of our SDR approach. And figarding the different modules which composed the current of figure ??, an optimum access method has to be selected and 13 kbit/sThe fordigital GSM. nally, in section V, processing we present video application use signal chainthe in used. for LAN each step of scenario. baseband architecture Re-configuration of the digital baseband in this case The remainder of this paper is organized as follows. In of the reconfigurable transmitter has consequently to deal for the demonstration of our method. the section II we present video application used with the means with the following challenges : using the algorithmic same hardware platform for three stan– Different modulation schemes : OFDM for Wireless several standards of telecommunications used. In section dards, LAN, withW-CDMA as little for standard-specific (i.e. parallel) hardUMTS and GMSK for GSM. we present theBASEBAND Syndex tool forSTRUCTURE the AAM. Then in secII. M ULTIIII- STANDARD ware as possible. – Different data rates : up to 54 Mbit/s for Wireless tion IV, we present the Sundance hardware platform used The conceptto of thisthehardware emulation to sendAnd a fi- To define LAN and 13 kbit/s for GSM. a common framework structure for transmitter, realize implantation of our SDRisapproach. Re-configuration of athemulti-standard digital basebandanalysis. in this case video flow in nally, a chosen telecommunication standard per- use we need in section V, we present the video application to start from In an entire means using the same hardware platform for three stanthe demonstration of our method. forming by anfor heterogenous platform (DSP and FPGA). multi-standard baseband chain, the functions are very difdards, with as little standard-specific (i.e. parallel) hardCommunication between the two platforms (TX and RX ferent. So we classify functions into three classes, as it is ware as possible. ULTI - STANDARD BASEBAND STRUCTURE part) is realized byII.anMethernet link. Telecommunication’s presented To define common structure for are transmitter, inathe tableframework I. Commonalities easier to define The concept of this hardware emulation is to send a we need to start from a multi-standard analysis. In an entire standards retained here are : GSM, UMTS and 802.11g. a class than across classes. Each class of functions video flow in a chosen telecommunication standard per- inside baseband chain, the functions are very difIn each of these standards, the data bandwidth canand be FPGA). in- is multi-standard dedicated a specific reconfigurable hardware forming by an heterogenous platform (DSP ferent. So wetoclassify functions into three classes, as it is in order crease or decrease considering user requirements and chanCommunication between the two platforms (TX and RX to presented optimizein the implementations. Theeasier functions the table I. Commonalities are to defineof a class part) is realized by an ethernet link. Telecommunication’s nel transmission quality. In our demonstration platform, no which insidehave a class than across classes. Each class of functions the same structure are viewed as generic funcstandards retained here are : GSM, UMTS and 802.11g. is dedicated to a specific reconfigurable hardware in order RF transmission is supported, data rate is only motivated tions. So they are handled through few parameters to be In each of these standards, the data bandwidth can be in- to optimize the implementations. The functions of a class by user requirements. As mentioned before, the media andthe to same fit the requirements standard. crease or decrease considering user requirements andof chan- adapted which have structure are viewedof as ageneric funccommunicationnelused between the two is an ethernet transmission quality. In ourhosts demonstration platform, no tions. So they are handled through few parameters to be TABLE RF transmission is supported, data the rate basis is onlyof motivated link using an TCP protocol. Figure 1 shows the adapted and to fit the requirements of Ia standard. by user requirements. As mentioned before, the media of T HREE STANDARDS BASEBAND PROCESSING : F UNCTIONAL video demonstration platform. TABLE I communication used between the two hosts is an ethernet link using an TCP protocol. Figure 1 shows the basis of the video demonstration platform.


Class TCP / IP

Coding Host 1

Video EncoderHost 1 PCI

TCP / IP Video Encoder PCI

Video Decoder

Host Video 2 Decoder

Host 2


FPGA – DSP Sundance FPGA – DSP Sundance FPGA –Sundance DSP Platform Platform Platform ( GSM / UMTS / Wlan 802.11) ( GSM / UMTS / Wlan 802.11) ( GSM / UMTS / Wlan 802.11)

Data handling




Basic Blocks - CRC coding (UMTS/Wlan 802.11g) - Scrambling (UMTS/Wlan 802.11g) - Interleaving (GSM/UMTS/Wlan 802.11g) - Conv. coding (GSM/UMTS/Wlan 802.11g) - Diff. encoding (GSM) - Reordeing (GSM) - Burst building (GSM/UMTS/Wlan 802.11g) - Rate matching (UMTS/Wlan 802.11g) - Frame segment. (UMTS/Wlan 802.11g) - Puncturing (GSM/UMTS/Wlan 802.11g) - GMSK (GSM) - IFFT (Wlan 802.11g) - BPSK/M-QAM (UMTS/Wlan 802.11g)

Focusing on the three telecommunication standards, increase or decrease of data bit rate impacts directly bandTo explore our classification approach, an unified barequirement between the processing units mapped seband transmitter structure is presented in figure 2. The Focusing onwidth the three telecommunication standards, inTo explore our classification approach, an unified baon the heterogenous platform. In this context, switching highlighted HW blocks may be shared between multiple crease or decrease of data bit rate impacts directly bandseband transmitter structure is presented figure 2. The from one mode to another one induce that the previous communication standards. Depending on the used in standard width requirement between processing hardware platformthe mapping does not units fit wellmapped with the new highlighted HW blocks may be shared All between only its associated blocks will be reconfigured. unused multiple constraints of this new Maybe some processing communication functions are simply maintained as transparent on the heterogenous platform. In context. this context, switching standards. Depending onblocks. the used standard unit have to be switched from DSP to FPGA and vice a consequence, for switching from one standard to from one mode to another one induce that the previous onlyAs its associated blocks will be reconfigured. All unused versa. In the way to find the best trade-off between ar- an another one by a hardware platform reconfiguration, it hardware platform mapping does not fit well with the new functions are simply maintained as transparent blocks. chitecture and performance, we propose the use of Syn- is necessary to have a configuration management. For our constraints of dex thistool new context. some processing As a consequence, switching from one standard to to find the bestMaybe adequacy as possible every time method, two configurationfor features have to be combined to a context change is motivated user. This partvice will be ancreate the complete our multiunit have to be switched from DSP tobyFPGA and another one by configuration a hardwareframework platformofreconfiguration, it detailed later in section III. platform : configuration of data path and hierarversa. In the way to find the best trade-off between ar- is standard necessary to have a configuration management. For our Instead of designing multiple architectures for multiple chical management. chitecture and standards, performance, we propose use reconfigurable of Syn- method, configuration features to beoriencombined to we propose to develop the a single As thetwo communication applications arehave data-flow DEx tool to find the bestfor adequacy as possible theour complete framework architecture the baseband transmitterevery based time on SDR create ted [2], approach configuration is based on a data-path model. of Theour multi-

a context change is motivated by user. This part will be detailed later in section III. Instead of designing multiple architectures for multiple standards, we propose to develop a single reconfigurable

standard platform : configuration of data path and hierarchical management. As the communication applications are data-flow oriented [2], our approach is based on a data-path model. The


functions of the multi-standard transmitting chain are mapped into several Processing Block Units (PBU). Each PBU is optimized using specific reconfigurable hardware resources. In addition, a configuration path, also split into several Configuration Manager Units (CMU), controls the reconfigurable processing path. Each CMU, dedicated to a type of PBUs, manages the configuration of a type of baseband function in the chain. The split configuration path offers the possibility to partially reconfigure the transmitting chain by an independent reconfiguration of each PBU. The hierarchical configuration management model presented in [2], [8] is based on the configuration data path approach. This model is necessary to manage the multi-granularity of configuration required by the different contexts discussed in section V. It is composed of three levels of hierarchy that are detailed below : • level 1 : This first high level classification allows a control of category-specific functions to manage parameters at the standard level. The Configuration Manager L1 (L1 CM) works at the standard level as a host towards the underlying levels of management. This entity is in charge of choosing the functional units which will constitute the entire configuration of the baseband processing chain. At this level, generic functions are handled as generic components. Any hardware implementation is not yet considered. • level 2 : The generic functions selected at level 1 are parameterized at level 2 in accordance with standard specifications. The set of attributes of each function is handled by the Configuration Manager Unit L2 (L2 CMU) in order to create each functional context of the entire processing chain. • level 3 : The processing data path architecture of this third level depends on the reconfigurable computing resources of the hardware architecture. The main task of the (L3 CMUs) in the configuration path is to find the available processing resources and configure them to enable the execution of the functional context created at the above level. III. S YN DE X T OOL AAA (Adequation Algorithm Architecture or AAM) SynDEx [5] provides a formal framework based on graphs and system-level CAD software. On the one hand, these

specify the functions of the applications, the distributed resources in terms of processors and/or specific integrated circuit and communication media, and the non-functional requirements such as real-time performances. On the other hand, they assist the designer in implementing the functions onto the resources while satisfying timing requirements and, as far as possible, minimizing the resources. This is achieved through a graphical environment (Figure 3), which allows the designer to explore manually and/or automatically the design space solutions using optimization heuristics. Exploration is mainly carried out through timing analysis and simulations. The results of these prediction’s is a real-time behavior of the application functions executed on various resources, like processors, integrated circuits or communication media. This approach conforms to the typical hardware/software co-design process. Finally, for the software part of the application, code is automatically generated as a dedicated real-time executive. User

Architecture Graph


Algorithm Graph

Adequation Distribution / Scheduling Heuristic Generic Synchronized distributed executives

Target 1

Target 2

Target N

Timing graph (predictions)

Com 1

Com M


The matching step of SynDEx consists in performing a mapping and a scheduling of the algorithm operations and data transfers onto the architecture processing components and the communication media. It is carried out by a heuristic which takes into account durations of computations and inter-component communications to optimize the global application latency. • Application algorithm graph : Application algorithm is represented by a dataflow graph (DFG) to exhibit the potential parallelism between operations. The algorithm model is a direct data dependence graph. An operation is executed as soon as its inputs are available, and this DFG is infinitely repeated. SynDEx includes a hierarchical algorithm representation, conditional statements and iterations of algorithm parts. The application can be described in a hierarchical way by the algorithm graph. The lowest hierarchical level is always composed of indivisible operations. Operations are composed of several input and out-

put ports. Special inputs are used to create conditional statements. Hence an alternative sub-graph is selected for execution according to the conditional entry value. Data dependencies between operations are represented by valued arcs. Each input and output port has to be defined with its length and data type. These lengths are used to express either the total required data amount needed by the operation before starting its computation or the total amount of data generated by the operation on each output port. • Architecture graph : The architecture is also modeled by a graph, which is a directed graph where the vertices are computation operators (e.g processors, DSP, FPGA) or media (e.g SHD, PCI, ethernet) and the edges are connections between them. So the architecture structure exhibits the actual parallelism between operators. Computation vertices have no internal computation parallelism available. An example is shown in Figure 4. In order to perform the graph matching process, computation vertices have to be characterized with algorithm operation execution times. Execution times are determined during the profiling process of the operation. The media are also characterized with the time needed to transmit a given data type.



IV. P ROTOTYPING PLATFORM A typical Sundance device is made up of a host (PC) with one or more motherboards, each supporting one or more TIMs (Texas Instrument Module). A TIM is a basic building block from which you build your system. It contains one processing element which is not necessarily a DSP but an I/O device, or a FPGA. A TIM also provides mechanisms to transfer data from module to module. These mechanisms, such as SHBs (400MB/s), ComPorts (CP 20MB/s), or a global bus (to access a PCI bus up to 40MB/s), are implemented on the TIMs using FPGAs. The sundance motherboard SMT310Q is modular, flexible and scalable. Up to four different modules can be plugged into the motherboard and connected using CP or SDB cables. The SMT365 TIM module with a C6416 (600Mhz, 8 Mbytes of high speed ZBTRAM ) is very suitable for image processing solutions as the C64xx has special functions for handling graphics. The SMT348 provides a ’Base’ module for a range of applications and functions. The Virtex4 LX has the highest amount of Logic blocks in the Virtex4 range, and the FPGA also has direct access to 16 MBytes of QDRII RAM. This will provide storage for the majority of applications. The addition of a Xilinx configuration PROM enables the module to be used ’Stand-Alone’. Alternatively, FPGA configuration can be made from a connected DSP module. Data transmission between these modules is made through the Sundance High-Speed Bus (SHB) that works at the clock frequency of 100MHz and has a total width of 32 bits, i.e., the SHB provides an overall maximum transfer rate of 400 MB/s. The SHB is divided into two 16 bits Sundance Digital Buses (SDB). A schematic view of our platform is presented on the figure 5. Sundance Motherboard SMT310Q

Virtex II 2000

DSP Module SMT365

FPGA Module SMT348

DSP TI C6416

Virtex4 LX160


PCI Bus 32bits



The output files generated by SynDEx are exploited by our platform to manage correctly the hardware reconfiguration platform. These text files are managed by the L1 CM of the transmitter (the platform in charge to send video stream) and by L1 CM of the receiver to be standard compatible for the current or future transmission. In next section , we present the hardware platform used and its specifications.

As we mentioned before, the communication link used between the two host platform is an ethernet link used in TCP protocol (Easy implementation and used for streaming media applications). As presented in [?], [4], a hierarchical configuration management is proposed to mapped processing elements onto our heterogenous platform. This Hierarchical configuration management is illustrated in the figure 6.

tion between this to graph and generate constraints file for the DSP and the FPGA that give information for the reconfiguration of the platform. Then the Host of the platform send new bitstream to the DSP and new odrer of partial reconfiguration to the FPGA using the architecture described above. In the next section we present the application used to realize the demonstration of our method. V. A PPLICATION


Among these two components, one component needs special management, the FPGA. FPGA configuration is realized by a total or a partial bitstream loading to change processing functionalities inside it. So, at boot time, the configuration management of the GPP downloads the DSP boot program. This latter includes in its data memory the initial full configuration of the FPGA, which is the FPGA design architecture. It includes an internal configuration controller (MicroBlaze soft processor), the internal reconfiguration interface (ICAP), the initial instantiations of PBUs and the communication interfaces with the DSP. Figure 6 illustrates this platform architecture with details of the internal FPGA design that enables two types of dynamic partial reconfiguration approaches depending on their granularity level. One stays internal to the FPGA in case of limited-scale reconfiguration (for co-accelerators configuration) or design parameterizations. This implies to interconnect the MicroBlaze to the ICAP internal configuration interface. This kind of reconfiguration of the FPGA by an processor (MicroBlaze) embedded in the FPGA is called self-reconfiguration or auto-reconfiguration [6]. In this case, small partial bitstreams are stored inside the FPGA, and the use of auto-configuration lets free the other HW resources of the platform. At a larger scale reconfiguration for the HW accelerator is external. This implies to interconnect the DSP to the external SelectMap or internal ICAP reconfiguration interfaces. The bitstream corresponding to the design of HW accelerators are stored in an external SRAM memory. As a consequence, the host (application that manage the whole platform) is represented by the L1 level, Module management (FPGA and DSP) are represented by L2 level and then the tasks running on DSP or FPGA (processing elements like mapping, channel coder, FFT, ...) ar represented by the L3 level. This hardware architecture is represented by an architecture graph under SynDEx and in the same manner a DFG is done for the application task (telecommunication chain to be used). Then the heuristic of SynDEx realize the adequa-

The platform illustrates the adaptation of the radio link according to the compression of the source in a videotelephony context. A person switches-on his terminal in order to engage an audio-video conversation with another person. At the beginning of the communication, the face of the speaker and the background of the image are transmitted using a traditional compression mode. This requires a relatively high data-rate Over the time, a model of the person’s face is generated at the transmitter’s side, and sent to the receiver. Once this model is understood by the receiver, the transmitted parameters of the face’s model (orientation, opening of the mouth, of the eyes, direction of the glance) are enough to reproduce the face behavior at the receiver. This permits to save the data amount required to transmit the face of the speaker, by reducing very significantly the data to be transmitted through the air. The data rate variations by step as well as the dynamic reconfigurations of the radio link are illustrated in figure 7. Data rate 802.11g


Step 1




Eyes / mouth / face parameters and error transmission Mouth / face parameters and error transmission Face parameters and error transmission Image parameters transmission


At the start, the person switches-on his terminal and starts a video-conference service. Video coder starts learning the face model, as well as models for eyes and mouth reconstruction. Then the radio link goes through the following steps. Step 1 : The image is transmitted using a traditional compression mode. The terminal learn the 3D model of the person face and performs a 802.11g modulation with standard error coding.

Step 2 : The face model is learned : only high level parameters of the face are transmitted (location, size, orientation) so that the receiver can reconstruct the 3D model of the face with its texture on the already sent background. In order to improve the reconstruction at the receiver, errors between the model and the real image are also transmitted by the way of an UMTS modulation with standard error coding. Step 3 : The mouth variations are modelized : The mouth characteristics, as well as high level parameters of the face model are transmitted throughout UMTS with very robust error coding on data for the mouth model. Step 4 : In this last step all face features models are already learned : only high level parameters of all three face, mouth, and eyes models are transmitted, as well as the errors with the real image to help the reconstruction process. GSM modulation with standard error coding can then be used. The last step is the longer period of the video-call, which permits to very efficiently reduce the global mean throughput necessary to the communication. This justifies the efforts accepted at the beginning of the call in terms of adaptation complexity. Changing from a data rate to another is possible while permanently reconfiguring the air link characteristics to a significant degree. VI. C ONCLUSION This paper illustrates the pertinence of the SynDEx flow for Cognitive Radio. A signal processing analysis allows to take decisions on hardware resources reconfiguration on to heterogenous components at the PHY layer. Our prototyping platform for Cognitive Radio is general-purpose enough to support a large scale of applications and scenarios both in terms of radio reconfigurability and video processing. Future demonstrations will highlight major features of this global approach of Cognitive Radio. ACKNOLEDGMENT This work was performed in project E2R II which has received research funding from the Communitys Sixth Framework programme. This paper reflects only the authors views and the Community is not liable for any use that may be made of the information contained therein. The contributions of colleagues from E2R II consortium are hereby acknowledged. R EFERENCES [1] J. Mitola, Cognitive Radio : An Integrated Agent Architecture for Software Defined Radio. PhD Thesis, Royal Institute of Technology, Sweden, May. 2000. [2] J.P. Delahaye, J. Palicot, P. Leray A Hierarchical Modeling Approach in Software Defined Radio System Design IEEE Workshop on Signal Processing Systems, SIPS 2005, Athens (Greece), Nov. 2005. [3] A. Polydoros and al., Wind-flex : Developing a novel testbed for exploring flexible radio concepts in an indoor environment IEEE Comms. Mag., vol. 41, pp. 116122, 7 2003. [4] A. Nafkha and al., A reconfigurable baseband transmitter for adaptative image coding IST mobile summit, July 2007.

[5] T. Grandpierre, C. Lavarenne, and Y. Sorel, Optimized Rapid Prototyping for Real-Time Embedded Heterogeneous Multiprocessors, in CODES’99, Rome, Italy, May 1999, pp. 74-78. [6] B. Blodget, P. James-Roxby, E. Keller, S. McMillan and P. Sundararajan. ”A Self-reconfiguration Platform”. In proceeding of 13th International Conference on Field- Programmable Logic and Applications, FPL2003, pp. 565-574. Sept. 2003, Lisbon, Portugal. [7] J.P. Delahaye, C. Moy, P. Leray, J. Palicot, ”Managing Dynamic Partial Reconfiguration on Heterogeneous SDR Platforms”, Sdr Forum 2005, November, Los Angeles, USA. [8] J.P. Delahaye, C. Moy, P.Leray, J. Palicot, ”Partial Reconfiguration of FPGAs for Dynamical Reconfiguration of a Software Radio Platform”, in Proc. of IST Mobile and Wireless Communications Summit, Budapest, Hungary, June 2007.


le faire part de naissance du nenfant!