TOOLS OF XCELLENCE
by Chris Eddington Senior Technical Marketing Manager Synopsys Inc. email@example.com Baijayanta Ray Corporate Application Engineer for Synphony Model Compiler Synopsys Inc. firstname.lastname@example.org
ery high-speed fast Fourier transform (FFT) cores are an essential requirement for any real-time spectral-monitoring system. As the demand for monitoring bandwidth grows in pace with the proliferation of wireless devices in different parts of the spectrum, these systems must convert time domain to spectrum ever more rapidly, necessitating faster FFT operations. Indeed, in most modern monitoring systems, it is often necessary to use parallel FFTs to run at sample throughputs of multiple times the pace of the highest clock rate achievable in state-of-the-art FPGAs like the Xilinx® Virtex®-7, taking advantage of wideband A/D converters that can easily attain sample rates of 12.5 Gigasamples/second and more.  At the same time, as communications protocols become increasingly packetized, the duty cycles of signals that need to be monitored are decreasing. This phenomenon requires a dramatic decrease in scan repeat time, which necessitates low-latency FFT cores. Parallel FFTs can help in this regard as well, since the latency scales down almost proportionally to the ratio of sample rate to clock speed. For all of these reasons, let’s delve into the design of a parallel FFT (PFFT) with runtimeconfigurable transform length, taking note of the throughput and utilization numbers that are achievable when using parallel FFT.
HARDWARE PARALLELISM FOR FFTS Due to the complexity of implementing FFTs directly in logic, many hardware designers use off-the-shelf FFT cores from various vendors.  However, most off-the-shelf FFT cores use “streaming” or “block” architectures that process only one or fewer samples per clock, which limits the throughput to the maximum clock speed achievable by the FPGA or ASIC device. A PFFT offers a faster alternative. A PFFT can accept multiple samples per clock and process them in parallel, to deliver multiple output samples per clock. This architecture multiplies the throughput beyond the achievable device clock speed, but comes at an additional cost in area and complexity. Thus, to use a PFFT you will have to make trade-offs in throughput vs. area. The trade-offs for a typical Virtex-7 FPGA design are outlined in Figure 1 and Table 1. First Quarter 2013