Embedded Computing Design Spring 2021 with Embedded World Profiles

MOORE’S LAW VERSUS PARALLELISM

Beyond Moore’s Law: Parallel Processing in Heterogeneous SoCs By Brandon Lewis, Editor-in-Chief With the dependable performance-per-watt gains of transistor scaling drawing to a close, how will future generations of processors access the compute necessary to efficiently execute demanding workloads? The answer my come via parallel processing on heterogeneous SoCs.

“We’ve been working on 7 nm for a long time, and during that time we not only saw the end of Moore’s law, but we also saw the end of Amdahl’s law and Dennard scaling,” says Manuel Uhm, Director of Silicon Marketing at Xilinx. “What that means is, if all we did was take an FPGA and just shrink those transistors to 7 nm from our previous node, which was 16 nm, and just call it a day, many customers trying to move over the exact same design might quite possibly end up with a design that quite frankly does not have any increase in performance and may, in fact, increase power consumption.

may be our best option in high-performance computing (HPC) and other demanding use cases.

“And clearly that’s going totally the wrong way.”

According to Mandell, a key driver of general-purpose heterogeneous computing platforms in the embedded market “is a lot of hesitancy among OEMs and others today about committing to a hardware architecture.” The hesitation, he says, is a product of rapid evolutions in specialized accelerated silicon, as well as uncertainty in the frameworks and workloads that will be produced by the edge software and AI ecosystems in the coming years.

To be clear, it’s not impossible to shrink silicon transistors below 7 nm; 5 nm devices are already in production. It’s that the underlying metal isn’t running any faster, and current leakage is on the rise. Meanwhile, in the other direction, traditional multicore devices have hit scaling limitations of their own. Of course, those parallel processors have historically been homogeneous, “and the reality is there is no single processor archiecture that can do every task optimally,” Uhm contests. “Not an FPGA, not a CPU, not a GPU.” This isn’t to say parallelism can’t be advantageous in tackling the complex processing tasks presented by modern applications. Indeed, beyond Moore’s law and Dennard scaling, parallel computing

6

Yes, we still need parallel processing. But of the heterogeneous variety. Heterogeneous Processing: Not Just for Data Center As mentioned, the bleeding edge of heterogeneous parallel processing technology is a response to performance walls in high-end applications. But these architectures are also becoming more commonplace in embedded computing environments. Dan Mandell, Senior Analyst at VDC Research, points out that while “it is true that many heterogeneous processing architectures have been focused on high-end applications, particularly for the datacenter and HPC … miniaturization of FPGA SoCs and other heterogeneous accelerated silicon is top of mind for companies like Microsemi and Xilinx to bring more of these devices into intelligent edge infrastructure like edge/ industrial servers and IoT gateways.”

He expects all of these circumstances to “have a great influence in future semiconductor sourcing,” as well as how chip suppliers approach their processor roadmaps. “The price and power envelope of most of these FPGA SoCs today will force suppliers to initially focus on relatively high-end, high-resource embedded and edge applications,” Mandell posits. “However, there is an active effort to make FPGA SoCs ‘size agnostic’ to eventually support even battery-powered connectivity devices.” So as heterogenous parallel processing becomes more commonplace, should embedded engineers prepare for a paradigm shift in system design? Deepu Talla, Vice President and General Manager of Embedded & Edge Computing at Nvidia, doesn’t think so. “If you think about it, embedded processors have always used accelerators,” Talla says. “Even 20 years ago, there was an Arm CPU, there was a DSP, and then there was video encode/decode done in specific hardware, right? They’re fixed-function in some sense, but they’re all processing things in parallel.

Embedded Computing Design EMBEDDED WORLD | Spring 2021

www.embedded-computing.com

Turn static files into dynamic content formats.

Create a flipbook