HiPEACinfo 68

Page 1

68 HiPEAC conference 2023 JANUARY 2023 HiPEAC Vision 2023: A race against time Subhasish Mitra’s NanoSystems and Dream Chips Drive through of adaptive computing, with Ivo Bolsens Liliana Cucu-Grosjean on probability and safety-critical systems

3 Welcome

Koen De Bosschere 4 News 14 HiPEAC voices

‘NanoSystems are an opportunity for 1,000× energy efficiency’

Subhasish Mitra

16 HiPEAC voices

‘FPGAs are like 4x4 SUVs: highly customizable and can take you anywhere’

Ivo Bolsens 18 HiPEAC voices

‘Probabilities can be used to support worst-case reasoning as they come with strong mathematical proofs’

Liliana Cucu-Grosjean 20 Technology watch

A race against time: The HiPEAC Vision 2023 Marc Duranton and the HiPEAC Vision editorial board

22 Compute continuum special feature

‘Distributing computations across the continuum opens up huge possibilities’

Elli Kartsakli

24 Compute continuum special feature

Using WebAssembly for a more interoperable, secure cloud-edge continuum

Jämes Ménétrey, Pascal Felber, Marcelo Pasin and Valerio Schiavoni

26 Compute continuum special feature

Building a workable compute continuum

Lorenzo Blasi, Emanuele Carlini, Patrizio Dazzi, Konstantinos Tserpes, Narges Mehran, Dragi Kimovski, Radu Prodan, Souvik Sengupta, Anthony Simonet-Boulgone, Ioannis Plakas, Giannis Ledakis, Dumitru

Roman and Georgios Zacharopoulos

30 Innovation impact

‘In the compiler field, we can make a difference while remaining small and specialist’

Andrew Richards

32 Innovation impact

Competitive edge: How Axelera turns leading-edge research into AI innovation

Evangelos Eleftheriou and Fabrizio Del Maffeo

34 Innovation impact

POP-LEC: Better power management thanks to SMART4ALL support Tamás Kerekes

35 Innovation Europe

‘You can’t transfer control to the driver if the car doesn’t have a steering wheel’: Jaume Abella on how SAFEXPLAIN will deliver safer, smarter mobility

36 Innovation Europe

Go with the flow: How eFlows4HPC is harnessing compute power, data analytics and AI Rosa M. Badia

38 Innovation Europe

RISER: Raising RISC-V to the cloud Manolis Marazakis and Stelios Louloudakis

39 Innovation Europe

SPACE: Making astrophysics codes fit for the exascale era Andrea Mignone

40 Innovation Europe

CONVOLVE: Seamless design of smart edge processors

Manil Dev Gomony, Sander Stuijk, Henk Corporaal, Victor Sanchez and Marc Geilen

41 Technology opinion

In praise of posits

Tim Fernandez-Hart

42 HiPEAC futures

Make a DATE with the Young People Programme

Three-minute thesis: Profiling tools for data locality

Subhasish Mitra’s groundbreaking new architectures Liliana Cucu-Grosjean on probabilistic real-time systems Exploring adaptive computing with Ivo Bolsens
16 18

HiPEAC is the European network on high performance embedded architecture and compilation.


@hipeac hipeac.net/linkedin


HiPEAC has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 871174.

Cover image: Suppa on AdobeStock Design: www.magelaan.be

Editor: Madeleine Gray

Email: communication@hipeac.net

First of all, I would like to wish you a healthy and prosperous 2023 – both personally and professionally. I will remember 2022 as the year of the Russia-Ukraine war and the energy crisis that followed. Having grown up during the Cold War in Belgium, we were all taught that there was an evil empire on the other side of the Iron Curtain that we had to defend ourselves against, but it never came to a real conflict. I was shocked when I saw the Russian army invading Ukraine and the death and destruction it caused. I am not accustomed to the reality and brutality of war, or to seeing courage and determination like that of the Ukrainians defending their country and freedom. Compared to the suffering inflicted on the Ukrainian people, all the European crises combined are pretty insignificant.

Scientifically, 2022 was an excellent year. Medical science made several major breakthroughs in the areas of cancer, vaccines, and even reviving dead organs. In computing, the most remarkable event was the announcement of the Open AI language model ChatGPT. Most of us are just experimenting with it, making fun of the mistakes it makes. However, we should not forget that the first version of this technology was released in 2019 and there has been a lot of progress since then. It clearly has the potential to disrupt several sectors of the economy, including software development. I am sure we will see more of it in the coming year. This introduction has, for example, been co-authored by ChatGPT.

For HiPEAC, the year 2023 begins favourably with the start of HiPEAC7. Founded in 2004, HiPEAC has finally come of age. The focus of HiPEAC7 will be on cloud, edge, and IoT computing, giving the acronym a different meaning – ‘High Performance Edge And Cloud computing’. Although the objectives of HiPEAC7 slightly differ from those of HiPEAC6, our major public events and services will continue in HiPEAC7, and we invite you to keep taking part in these.

You may be reading this magazine at the HiPEAC 2023 conference, which is the last event of HiPEAC6 and the first event of HiPEAC7. At this conference, we will also launch the HiPEAC 2023 Vision, our roadmapping document, which includes a new set of recommendations to tackle the challenges of our times. I encourage you to delve into it and to get inspired by it.

Axelera’s HiPEAC-powered competitive edge Travelling across the compute continuum with Elli Kartsakli A
race against time: The HiPEAC Vision 2023 20 32 22

In December 2022, HiPEAC7 was officially launched. The new phase of the long-running HiPEAC project has a new title (High Performance, Edge And Cloud computing), reflecting HiPEAC’s ambition to cover the full compute continuum from edge to cloud to high-performance computing (HPC).

The main objective of HiPEAC7 is to stimulate and reinforce the development of the dynamic European computing ecosystem that supports the digital transformation of Europe. This will be achieved by guiding research and innovation of key digital, enabling, and emerging technologies, sectors, and value chains.

The longer-term goal is to strengthen European leadership in the global data economy and to accelerate and steer the digital and green transitions through human-centred technologies and innovations. To do so, HiPEAC will mobilize and connect European partnerships and stakeholders to be involved in the research, innovation and development of computing and systems technologies. They will provide roadmaps supporting the creation of next-generation computing technologies, infrastructures, and service platforms.

The project also plans to support to support rapid technological development, market uptake and digital autonomy for Europe in advanced digital technology (hardware and software) and applications across the whole European digital value chain.

Activities in HiPEAC7 fall into two broad categories: networking and roadmapping. For the networking aspects, in addition to its annual conference and ACACES summer school, HiPEAC will organize meetings to facilitate cross-industrial alignment with the major industry associations. HiPEAC Jobs activities will continue to support career development and competence building for people in

the ecosystem. With regard to roadmapping, the project will build on its experience creating the HiPEAC Vision document through a comprehensive process with various steps: ecosystem consultation, state of the art and technology trend analysis, consultation on drafts and tailored dissemination to different target audiences.

Partners in the new project consortium are as follows: University of Ghent (coordinator), Barcelona Supercomputing Center (BSC), CEA, CloudFerro, Eclipse Foundation, IDC, Inria, Inside Industry Association, RWTH Aachen, SINTEF and Thales.

HiPEAC was created in 2004 and has been running continuously since then; this is the seventh time that HiPEAC has been successful in obtaining competitive funding from the European Commission. HiPEAC is grateful to the European Commission for their continued confidence in the project. With HiPEAC7, the project aims to take computing systems in Europe to the next level.

The 2023 edition of the HiPEAC Vision, the biennial roadmap for computing systems in Europe, has been published. This is the ninth edition of the HiPEAC Vision; the first was released in 2008.

The focus of the 2023 HiPEAC Vision is on six ‘races’: the “next web”, artificial intelligence, new hardware, cybersecurity, sovereignty and sustainability. The Vision provides a comprehensive guide to the state of the art and provides recommendations for future research and development in Europe. See pp. 20-21 for an in-depth discussion of this edition.

In related news, a white paper linked to work on the HiPEAC Vision has also recently been published. Titled ‘Safety-Critical Collaborative Systems: Convergence to future CyberPhysical Systems’, the white paper may be found in the ‘downloads’ section of the HiPEAC Vision website. hipeac.net/vision

HiPEAC news
HiPEAC Vision 2023 published HiPEAC INFO 68 4
Cartoon: Arnout Fierens
HiPEAC7: High Performance, Edge And Cloud computing

a Tolosa!

Known as ‘la ville rose’ for its terracotta architecture, Toulouse is an innovation hotspot and the focal point for the European aerospace industry. HiPEAC 2023 local hosts Christine Rochange and Thomas Carle (University Toulouse 3IRIT) give us the lowdown on the local area and what we should do while we’re in Toulouse.

What makes Toulouse such a good location for HiPEAC 2023?

In the heart of southwest France, Toulouse boasts multiple attributes that seduce entrepreneurs and travellers alike. It combines a strong economic outlook, a stunningly beautiful heritage, a rich culture, high-quality gastronomy and much more. The city is knowledgeable and vibrant, thanks to a sun-kissed climate and a local temperament that is always inclined to celebrate.

Its architecture is dominated by brickwork and Roman tiles; three sites feature on the UNESCO World Heritage list, and its

museums recount two thousand years of history. Around one hundred private mansion houses are testament to the prosperity of the Golden Age of woad (pastel), an unusual plant grown in the 16th century for its blue pigment, used in dyeing. Not forgetting the allure of its well-preserved façades and narrow streets.

The Technopole is up there with the very best urban hubs in Europe and its local area is home to major economic projects and world-leading companies: Airbus, ATR, Thales, Actia, Evotec and even NXP. The global capital of the aeronautics industry, at the cutting edge of the space and digital sectors, transport of tomorrow and technologies of the future, its unique ecosystem is particularly adept at inspiring the work of businesses, universities and research centres.

What are some of the most interesting research projects in the area?

The University of Toulouse is part of the ANITI institute for artificial intelligence (3IA) which is part of a major national initiative on AI. ANITI brings together researchers from universities, engineering schools and research labs in the Toulouse area, as well as from major companies (e.g. Airbus, Renault, Thales, NXP, Liebherr, IBM, etc.) around three main axes: acceptable AI, certifiable AI and collaborative AI.

Toulouse is also the host of IRT SaintExupéry, one of the eight French techno-

logical research institutes that mingle researchers from academia with industrial partners to facilitate and accelerate technology transfer.

The aerospace domain is driven by Airbus and other major companies located in Toulouse and Blagnac, such as Continental and Thales. Other companies and startups also drive innovation in the internet of things (IoT) and embedded systems domains, such as Smile, Leroy automation, GreenSocs and Easymile, the latter of which has recently deployed fully autonomous transport vehicles on the university campus.

What should HiPEAC 2023 delegates do while in Toulouse?

The historic centre of Toulouse is the perfect place to wander: from its narrow streets to the majestic Capitole plaza, the banks of the Garonne river and the quiet and beautiful Brienne Canal. The most famous culinary specialty is cassoulet, a delicious dish made of beans, Toulouse sausage and duck, perfect to warm oneself in January before giving a talk.

Adjacent to Toulouse, the city of Blagnac is the host of Airbus, whose assembly lines can be visited (background checks are performed prior to the visits), and of the Aeroscopia museum of planes. Finally, Toulouse is the host of the Stade Toulousain, arguably one of the finest rugby teams in Europe.

HiPEAC news
Bienvenue à Toulouse!Welcome to Toulouse!
The city is known as ‘la ville rose’ for its brickwork

New CINI focus group on the compute continuum

The CINI Italian National Laboratory on High-Performance Computing: Key Technologies and Tools (HPC-KTT) brings together Italian universities and research institutes active in the high-performance computing (HPC) area. Its key mission is the establishment of a national organization where research and academic communities working on the various aspects related to HPC can meet, discuss, and develop joint research activities.

Recently, within the HPC-KTT laboratory, focus groups on different topics have been created, one of which targets the compute continuum. The compute continuum focus group started working in July 2022 and fosters the development of research initiatives in the field. Activities include the definition of a common conceptual framework for the compute continuum, research support, events, and the development of the research community at the national and international levels.

The group is working on a compute continuum manifesto, with the objective of establishing common ground to help identify firstclass research challenges and pave the way to future initiatives and activities.

The focus group is also organizing a special session at the 31st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP2023), which will take place from 1-3 March in Naples, Italy. This event will provide a venue for the discussion of novel solutions and technologies relevant to the development of compute continuum research and to future innovation in the field.


HPC: Key Technologies and Tools at CINI bit.ly/CINI-HPC-KTT PDP 2023 pdp2023.org

Contacts: Patrizio Dazzi, University of Pisa: patrizio.dazzi@unipi.it Maria Fazio, University of Messina: maria.fazio@unime.it

PlanV: Open-source hardware design made easy

From intellectual property (IP) cores like RISC-V central processing units (CPUs) to process design kits like Sky130, via electronic design automation (EDA) tools like Yosys: open source is changing the face of system-on-chip design, lowering the barriers to entry and allowing more and more companies to play the silicon game.

At the same time it can be overwhelming to navigate the possibilities offered by these new technologies. Enter PlanV, a young design house offering application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA) development services with focus on open source. At the core of the company’s offering are the selection, customization, verification and integration of IPs, along with the setup of workflows based on open tools.

Founded in summer 2022 in Munich, PlanV is already active in highly innovative projects. These include Culsans, which is developing a tightly coupled, low-latency cache-coherence unit for a multicore system based on CVA6.

In another project, robo-v-mcu, in partnership with Spanish robotics leader Acceleration Robotics, we are developing a RISC-V-based microcontroller unit (MCU) which natively hosts a ROS 2 node. This allows offloading of some functionalities to the system’s peripheries, in a customized MCU which can perform dedicated operations faster and more efficiently. This is only possible thanks to the flexibility offered by the RISC-V instruction set architecture (ISA) and by the existence of high-quality open-source IPs.


HiPEAC news

HiPEAC Tech Transfer Awards 2022 winners announced

thereafter in its electrical parameters, before integrating the model in a memory fault-analysis simulation platform. Using the insights generated, optimal and appropriate test solutions are developed. Working with industrial partners including imec demonstrated the advantages of DAT, which has won several best paper awards.

• Full-stack trusted WebAssembly runtime using Intel SGX enclaves for secure crypto-currency credit scoring

Valerio Schiavoni, University of Neuchâtel

• Design space exploration methodology for tensor train decomposition

Georgios Keramidas, Aristotle University of Thessaloniki

The THESSIS group at Aristotle University of Thessaloniki and the University of Plymouth (Professor Vasilios Kelefouras) developed a design space exploration (DSE) methodology for employing low-rank factorization (LSF), a well-known compression technique, in the dense layers of a deep neural network (DNN). The DSE methodology drastically prunes the huge LSF design space. The technology has been transferred to Think Silicon, a provider of ultra-low-power graphics processing units (GPUs) and machine learning accelerators.

• RWRoute: Open-source, timing-driven routing for commercial FPGAs

Yun Zhou, Ghent University

The Hardware and Embedded Systems (HES) research group at Ghent University has been investigating how to reduce routing runtime for field-programmable gate arrays (FPGAs) and thereby achieve a shorter FPGA design cycle, which would improve designer productivity. Building on previous work, together with Xilinx (now the AMD Adaptive & Embedded Computing Group), the group developed RWRoute, an open-source, timing-driven router for commercial FPGAs (FPT 2021). RWRoute has been integrated into the AMD RapidWright framework under the Apache 2.0 licence, laying the groundwork for fast, domain-specific FPGA implementation solutions.

• Device-aware testing: The road towards one defective part per billion

Said Hamdioui, Delft University of Technology

The Device-Aware-Test (DAT) (patent pending) developed at TU Delft seeks to overcome the shortcomings of existing commercial solutions for outgoing product quality in, for example, memory devices. This solution incorporates the impact of the manufacturing physical defect into the technology parameters of the device and

As part of the EU-funded project VEDLIoT, the team at the University of Neuchatel (UniNE) contributed novel libraries and support to deploy WebAssembly applications inside trusted execution environments (such as Intel SGX, or ARM TrustZone). Credora (formerly X-Margin Inc.) collaborated with UniNE to help develop and use these novel libraries in their distributed architecture and technology to power their provably private credit evaluations that promote transparent crypto credit markets. Several of the enhancements to the WebAssembly runtime and libraries have been merged into open-source repositories, further transferring the results towards widespread public use.

• Alive2: Automatic Verification of LLVM Optimizations Nuno P. Lopes, Instituto Superior Tecnico (IST), ULisboa Alive2 is an automatic compiler verification tool that uses translation validation to automatically verify optimizations of the LLVM compiler without requiring any change in LLVM or any change in the developers' workflow. It has been integrated into LLVM, developed as part of joint research with Azul and Google, and used by AMD, Apple, Microsoft and Sony PlayStation, among others.

• O(n)

Key-value Sort with Processing in Memory

Petar Radojkovic', Barcelona Supercomputing Center (BSC)

In the context of the bilateral sponsored research agreement, BSC provided dedicated services to Micron Technology, Inc. (Micron) to design and evaluate processing-in-memory architectures for high-performance computing. BSC provided its expertise in HPC applications, programming models, system simulation and performance analysis to evaluate the potential benefits of processingin-memory architectures capable of doing key-value sort directly in the DRAM. Through modest enhancements to DRAM, the team exploited the parallelism inherently available in memory devices to enable sort, leading to significant performance improvements.

Congratulations to all!

HiPEAC news
The HiPEAC Tech Transfer Award recognize examples of leading-edge technology being transferred to industry. This year, six candidates have won an award, as detailed below.

HiPEAC in Bosnia and Herzgovina: Industry 4.0 workshop

Isak Karabegovic', University of Bihac'

As reported in HiPEACinfo58, in 2019 HiPEAC Coordinator Koen De Bosschere and TETRAMAX Coordinator Rainer Leupers visited Sarajevo to participate in a roundtable entitled ‘Development and Implementation of Innovative Technologies’. This event marked the beginning of the HiPEAC group in Bosnia and Herzegovina, which brought together representatives from academia from all over the country, united in disseminating knowledge and spurring the adoption of innovative technologies.

Since then, members of this group have continued their activities to spread knowledge about industry 4.0 technologies and digitalization. Despite the COVID-19 pandemic, as well as the energy and economic crises – which are hitting small, developing countries like Bosnia and Herzegovina particularly hard – the HiPEAC group has been working to realize its goals. To this end, the group organizes scientific conferences and promotes knowledge exchange across the region. In addition to the annual New Technology Conference, which attracts an increasing number of participants from around the world every year, the group also organizes workshops.

In October 2022, members of this group, in cooperation with the Foreign Trade Chamber of Bosnia and Herzegovina and the Society of Robotics, held a workshop named ‘Implementation of Industry 4.0: Road to the Future’. Over 30 talks were held during this event, both from academia and industry.

Members of the HiPEAC group presented the papers on automation of production processes, collaborative robots, industry 4.0 and education, among other topics. The paper authors were: Isak Karabegovic', Ermin Husak, Edina Karabegovic', Mehmed Mahmic', Lejla Banjanovic'Mehmedovic', Mirha Bicˇo Ćar, Munira Šestic', Savo Stupar, Safet Isic', Samir Vojic', Amra Bratovcˇic', Anes Hrnic', Selma Hodžic', Samir Lemeš, Nermina Zaimovic'-Uzunovic', Kenan Varda and Amra Bratovcˇic'.

In his welcome address, Professor Karabegovic' emphasized the importance of organizing the workshop with the support of HiPEAC and in collaboration with the Foreign Trade Chamber of Bosnia and Herzegovina. By joining forces, the aim is to jointly influence decision makers in the country to adopt policies and programmes promoting faster participation in the global trends of industry 4.0 and innovation implementation. Through numerous examples, Professor Karabegovic' showed how positive public policies can accelerate the implementation of industry 4.0, while the absence of such policies is recognized as an obstacle.

The joint action of the academic community and chambers of commerce represents an important step forward in focusing efforts towards persuading decision makers in Bosnia and Herzegovina to formulate and adopt affirmative public policies. This would open the way to make the most of the potential and opportunities in industry 4.0 in the country. Connecting with networks such as HiPEAC, which can help speed up the transfer of knowledge and experience, is of great importance in this process.

HiPEAC news

State-of-the-practice central processing units (CPUs) / graphics processing units (GPUs) are very inefficient in comparison to biological brains, which are honed by natural selection. The recently started NimbleAI project will leverage key principles of energy-efficient light sensing in eyes and information processing in brains to create an integral sensing-processing neuromorphic chip that builds upon the latest advances in 3D-stacked silicon integration.

Following bio-inspired principles, NimbleAI will work with the conscious and unconscious rationale behind the decision-making process on what stimuli to process (and what stimuli to discard without processing) as well as when and how to process these stimuli. In NimbleAI, a frugal, always-on sensing stage will build basic understanding of the visual scene and drive a multi-tiered collection of highly specialized, event-driven downstream processing kernels and neural networks to perform visual inference using the minimum amount of energy.

NimbleAI envisions a highly integrated 3D stacked silicon architecture where sensing, memory, communications, and processing are physically fused and accuracy, energy, resources, and time are dynamically traded off to enhance the overall perception, i.e. maximize the amount of valuable visual information that can be captured and processed in a timely manner.

As in biological visual systems, sensing and processing components will be adjusted at runtime to match each other and operate jointly at the optimal temporal and data resolution scale across image regions. Hence, the NimbleAI solution will dynamically identify regions of interest of arbitrary size in the sensor that match the information distribution in the scene and apply specific settings to each sensor region to generate meaningful and minimal visual event-flows that are delivered through dedicated pathways across the 3D architecture to the downstream processing components.

We invite all members of the HiPEAC community interested in learning more about the NimbleAI project to get in touch with the project coordinator. Email: xiturbe@ikerlan.es


Nimble AI website nimbleai.eu @ NimbleAI_EU NimbleAI.eu

NimbleAI has received funding by the EU’s Horizon Europe Research and Innovation programme (grant no. 101070679), and by the UK Research and Innovation (UKRI) under the UK government’s Horizon Europe funding guarantee (grant no. 10039070)

“NimbleAI will leverage key principles of energy-efficient light sensing in eyes and information processing in brains to create a neuromorphic chip”
HiPEAC news
NimbleAI to design 3D integrated chips that sustain event-based 3D vision
The NimbleAI team: normal view (left) and perceived in event-based vision (right)

IEEE Cluster 2022

Clusters remain the primary system architecture for building many of today’s rapidly evolving computing infrastructures and are used to solve some of the most complex problems. The challenges to make them scalable, efficient, productive, and increasingly effective requires a community effort in the areas of cluster system design, advancing the capabilities of the software stack, system management and monitoring, and the design of algorithms, methods, and applications to leverage the overall infrastructure. Following the successes of previous IEEE Cluster conferences, IEEE Cluster 2022, held 6-9 September 2022 in Heidelberg, Germany, successfully solicited highquality original work to advance the state-of-the-art in clusters and closely related fields.

IEEE Cluster 2022 was co-organized by general co-chairs Abhinav Bhatele (University of Maryland) and Felix Wolf (TU Darmstadt), programme co-chairs Trilce Estrada, (University of New Mexico) and Torsten Hoefler (ETH Zürich), and local arrangement co-chairs Holger Fröning (Heidelberg University), Felix Zahn (CERN) and Kazem Shekofteh (Heidelberg University). The conference venue for IEEE Cluster 2022 was Heidelberg University, with talks delivered in its “New University” building.

IEEE Cluster is a long-running, top-tier conference on cluster computing, spanning hardware, middleware, software, and scientific applications of cluster computing. Four workshops and two tutorials on the first day preceded the main conference.

The conference featured three Keynote talks. Luca Benini (ETH Zürich / University of Bologna) illuminated how memory and processors can be arranged in proximity to each other to further improve the energy efficiency of processor architectures. Kristel Michielsen (Jülich

Supercomputing Centre / RWTH Aachen) gave an outlook on how quantum computers will be integrated into classic high-performance computing (HPC) infrastructures. Finally, Rio Yokota from Tokyo Institute of Technology demonstrated algorithmic improvements in basic matrix operations needed for artificial intelligence.

In a separate talk, Min Li (Huawei) highlighted the technical challenges the HPC community will face during the next decade. In addition, the main conference included 43 high-quality paper presentations, selected for publication in a rigorous peer-review process, on topics ranging from applications, algorithms, and libraries to architecture, networks / communication, and management, from programming and systems software to data, storage, and visualization. A poster session and expert panel on the programmability of novel hardware rounded off the programme.

The comprehensive technical programme of keynotes and contributed articles was complemented by various social events, including a visit to a local brewery in the old town of Heidelberg, a welcome reception took place in the university’s formal reception rooms, and a gala dinner at Heidelberg Castle.



IEEE Cluster 2022 website clustercomp.org/2022

HiPEAC news
Holger Fröning, Heidelberg University and Felix Wolf, TU Darmstadt

Third ITEM workshop held in conjunction with ECML

In September, the third edition of the workshop on the IoT, Edge, and Mobile for Embedded Machine Learning (ITEM) took place, again co-located with ECML-PKDD, the premier European machine learning and data mining conference. This year was the first edition to be held in person, which was greatly appreciated by attendees.

The first session of the workshop focused on tools, while the second session covered methods and applications. The tools session covered topics such as design space exploration, automated partitioning, deep convolutional neural networks, and predictive modelling. Methods considered included recurrent neural networks, cell optimization, and quantization, while applications discussed covered document localization, speech enhancement, fault diagnosis, de-noising, and image processing.

During the discussions multiple insights were discovered, such as an increasing trend towards edge computing as well as heterogeneity, resulting in more interest in model partitioning. Similarly, the complexity associated with distributing processing over edge and cloud sparked work on automated searching to avoid substantial complexity increase for the users. Also, predictive performance modelling was mentioned multiple times for reasoning about partitioning decisions.

Recurrent neural networks as well as convolutional network architectures were again in the focus of acceleration. Quantization is still a major method in model compression, and work was presented that proposed a method without a need to involve costly training when quantizing a model. Lastly, discussions showed a community-wide interest in benchmarking and performance evaluation, in particular with regard to energy.

ITEM2022 was organized by Holger Fröning (Heidelberg University), Gregor Schiele (University of Duisburg-Essen), Franz Pernkopf (Graz University of Technology), and Michaela Blott (AMD Research, Dublin). Kazem Shekofteh (Heidelberg University) helped as technical programme committee chair. Another edition of ITEM is planned for 2023, so if you are interested, stay tuned for updates or ask the organizers to add you to the mailing list.

Didem Unat named 2021 ACM SIGHPC Emerging Woman Leader in Technical Computing

Associate Professor Didem Unat (Koç University), has won the ACM SIGHPC Emerging Woman Leader in Technical Computing award, becoming the first person outside of the United States to receive the award. The award was presented at the Supercomputing Conference in November 2022 in recognition of her innovative work in designing programming abstractions and tools for data locality in high-performance and scientific computing , as well as her leadership role in the international HPC community.

Since 2014, Professor Unat has been leading her own parallel and multicore computing research group at Koç University. She received the Marie Sklodowska-Curie Individual Fellowship from the European Commission in 2015, the BAGEP Award from the Turkish Academy of Sciences in 2019, the British Royal Society Newton Advanced Fellowship in 2020, and the young scientist of the year award in Turkey in 2021. She is also the first researcher from Turkey to receive funding from the European Research Council in the field of computer science for her project BeyondMoore. Professor Unat is currently coordinator of the EuroHPC Joint Undertaking (JU) project, SparCity.

Aside from her research, Professor Unat has taken an active role in empowering women in the fields of STEM and is a dedicated advocate for gender equality in her research lab. She hopes to inspire young women and girls with her work and achievements in programming models, performance tools, and system software for emerging parallel architectures.

HiPEAC news

Torsten Hoefler receives the Sidney Fernbach Award 2022

HiPEAC member Torsten Hoefler (ETH Zürich) has been honoured with the IEEE Computer Society Sidney Fernbach Memorial Award for his pioneering contributions to large-scale parallel processing systems and supercomputers. According to IEEE, the ideas and software professor Hoefler and his group developed are actively used by tens of thousands of scientists today to power large-scale scientific simulations and artificial intelligence systems.

“One key architectural feature that distinguishes standard computers and commodity clouds with modern supercomputers [..] is networking, and Prof Hoefler has made numerous innovative, groundbreaking contributions to enable high performance and programmability in such machines,” said Satoshi Matsuoka, director or Riken R-CCS and Professor at Tokyo Institute of Technology, the recipient of the 2014 Sidney Fernbach award. “In fact his contributions have been comprehensive, from work on innovative and scalable network topologies, various network routing algorithms, performance modelling, to making key contributions to the MPI standard, on which practically every scalable parallel codes are programmed upon.”

Other high-performance computing (HPC) luminaries and previous Sidney Fernbach Award recipients David Keyes (KAUST) and Jack Dongarra (University of Tennessee) also paid tribute to Professor Hoefler.

Professor Hoefler has won numerous awards for his work, including being named as an IEEE Fellow and a member of Academia Europaea. He also received the Latsis Prize from ETH Zurich, the ACM Gordon Bell Prize, and two European Research Council (ERC) Grants. On behalf of the HiPEAC community, huge congratulations!

Dimitris Gizopoulos receives Meta award on Silent Data Corruptions at Scale

In February 2022, Meta launched the Silent Data Corruptions at Scale request for proposals (RFP) in response to the major computing systems challenge that the company (along with other hyperscalers like Google) recently revealed to the computing systems community. The company is seeking contributions from academic researchers to the alarming problem of hardware errors leading to completely undetected (i.e. silent) corruptions of program execution. Such corrupted program outcomes can propagate at large scale and can remain undetected for long periods of time, severely affecting applications and system services. Both Meta and Google have reported an unexpectedly high rate of such incidents in the order of one in a thousand cental processing units (CPUs).

Within this novel research domain, Meta has identified a range of research opportunities, including: architectural solutions to data corruption; fleetwide testing strategies and distributed computing resiliency models; software and library resiliency; and silicon-level design, simulation, and manufacturing approaches.

Meta’s request for proposals attracted 62 proposals from 54 universities and institutions around the world. The only research proposal awarded out of North America is that submitted by HiPEAC member Professor Dimitris Gizopoulos (University of Athens), entitled “Hardware failures root causing: Harnessing microarchitectural modeling”. The five winning institutions are the University of Athens, Stanford University, Carnegie Mellon University, Northeastern University, and the University of British Columbia.

In December, Professor Gizopoulos was also named ACM Distinguished Member for his outstanding contributions to scientific computing. Congratulations to Professor Gizopoulos on behalf of the HiPEAC community!

HiPEAC news
Photo credit: Jo Ramsey, SC Photography

ACM NanoCom Outstanding Milestone Award presented to Sergi Abadal

In October 2022, HiPEAC member Professor Sergi Abadal was presented with the ACM NanoCom Outstanding Milestone Award in recognition of his transversal contributions in the field of wireless chip-scale communications, from electromagnetics to wireless-enabled computer architectures.

Professor Abadal is the recipient of European Research Council (ERC) Starting Grant (WINC) and also the project coordinator of WIPLASH H2020 FET-OPEN project. He is area editor of the Nano Communication Networks (Elsevier) Journal, where he was selected Editor of the Year 2019. From 2020, he has been an ambassador of the European Innovation Council (EIC) through its programme of National Champions. His current research interests are in the areas of chip-scale wireless communications, including channel modelling and protocol design, and the application of these techniques for the creation of novel architectures for next-generation computing systems in the classical and quantum domains.

‘I am extremely grateful my advisors, colleagues, collaborators and the students who have brought me to this point in my career. Also, many thanks to the steering committee of the ACM NanoCom for their nomination and support, especially Massimiliano Pierobon, Laura Galluccio, and Josep Miquel Jornet,’ said Professor Abadal.

On behalf of HiPEAC, many congratulations to Professor Abadal!

Read our interview with Professor Abadal in HiPEACinfo 67 bit.ly/HiPEACinfo67_Sergi_Abadal

What’s on HiPEAC TV

The HiPEAC YouTube channel features talks – including conference keynotes – interviews and animated shorts, across a range of horizontal and vertical application areas.

Featured video: HiPEAC22 Keynote 1: Efficient Machine Learning: AlgorithmsHardware Co-design – Hai 'Helen' Li More accurate machine-learning requires larger models – but large models pose problems both in the training and inference phases. In this compelling keynote talk from the 2022 HiPEAC conference, Hai 'Helen' Li (Duke University) describes how her research group uses a hardware co-design approach to radically improve the efficiency of machine-learning through sparsity and partitioning.


Dates for your diary

ASAP 2023: 34th IEEE International Conference on Application-specific Systems, Architectures, and Processors 19-21 July 2023, Porto, Portugal

General chair: HiPEAC member João M.P. Cardoso Abstract submission: 20 February asap2023.org

ECRTS 2023: 35th Euromicro Conference on Real-Time Systems 11-14 July 2023, Vienna, Austria Paper submission: 1 March ecrts.org/call-for-papers

ISLPED 2023 :ACM/IEEE International Symposium on Low Power Electronics and Design 7-8 August 2023, Vienna, Austria Abstract submission: 6 March islped.org

embedded world 2023 14-16 March 2023, Nuremberg, Germany embedded-world.de

DATE 2023: Design, Automation and Test in Europe 17-19 April 2023, Antwerp, Belgium

HiPEAC Jobs activities with the Young People Programme (see p. 42) date-conference.com

DSN 2023: 53rd IEEE/IFIP International Conference on Dependable Systems and Networks 27-30 June 2023, Porto, Portugal Programme committee co-chair: HiPEAC member Onur Mutlu dsn.org

HiPEAC news

HiPEAC 2023 keynote speaker Subhasish Mitra is professor of electrical engineering and of computer science at Stanford University and has won numerous awards for his work. We asked Subhasish about reinventing computer architecture, future technologies, and delivering system security and robustness.

‘NanoSystems are an opportunity for 1,000 × energy efficiency’

Why is it imperative to develop new computer architectures?

The first era of computing was dominated by relays, vacuum tubes, and discrete transistors. Then, in 1958, Jack Kilby described ‘the monolithic idea’ – and modern-day integrated-circuit chips were born. Today, abundant-data applications such as AI, augmented reality / virtual reality, and analytics are demanding large gains in computing energy and throughput for increasingly powerful systems. And, at this exact moment, conventional approaches are stalling. Existing computing systems use large off-chip memory and spend enormous time and energy shuttling data back-andforth, especially for abundant-data applications – the memory wall. This memory wall gets worse as traditional 2D scaling gets increasingly difficult – the miniaturization wall. This deadly combination poses a massive challenge.

Just as integrated circuits brought together then disparate discrete components, the next leap in computing architecture requires the next leap in integration, which must seamlessly fuse disparate parts of a system – compute, memory, inter-chip connections –synergistically into new ‘NanoSystem’ architectures.

How are ‘NanoSystems’ fundamental to your vision of new computing chips?

NanoSystems are systems built using nanotechnologies – new transistors and memories, new nanofabrication techniques or even new sensors. The key question is: how do you exploit new nanotechnologies to build new architectures that overcome fundamental bottlenecks but are highly difficult, or even impossible, using existing technologies? That requires synergistic innovations – i.e. co-design – across multiple levels: material, device and integration technology levels, circuit and architecture levels, and application and algorithm levels. Such NanoSystems are the key to unprecedented functionality, throughput, and energy efficiency of future computing systems.

What are the advantages of 3D architectures?

Our vision is N3XT 3D MOSAIC: N3XT stands for Nano-Engineered Computing Systems Technology, and MOSAIC is an acronym for MOnolithic / Stacked / Assembled IC. N3XT 3D MOSAIC exploits

unique properties of nanotechnologies to create new N3XT 3D chip architectures– which use ultra-dense vertical connections (e.g., monolithic 3D, beyond existing through-silicon vias (TSVs)) to integrate heterogeneous logic and memory technologies (that themselves are high performance, dense, and energy efficient) to create new architectures for computation immersed in memory.

Multiple N3XT 3D chips are integrated through a continuum of chip stacking- / interposer- / wafer-level assembly / integration for packaged NanoSystems. This continuum – from ultradense monolithic 3D all the way to interposer assembly – is a key departure from today’s siloed development of logic / memory / chiplets / chip-stacking technologies and separate optimizations at the chip, package, and board levels. We target 100-1,000× system-level energy-delay-product benefits over 2D / 2.5D / TSV 3D across abundant-data workloads from the edge to the cloud.

How would you like to see computer architecture evolving over the next 10 years?

N3XT 3D MOSAIC will play a critical role for new computing architectures. The three pillars are: N3XT 3D MOSAIC technologies and architectures, Illusion scaleup on N3XT 3D MOSAIC, and co-design.

N3XT 3D MOSAIC technologies and architectures

A wide variety of nanotechnologies are key enablers. Monolithic 3D can achieve ultra-dense 3D integration by fabricating (several) layers of heterogeneous technologies (memory, logic, sensing, and others) directly atop or below previous layers. Nano-scale inter-layer vias – already used for vertical routing in today’s chips – are used for dense 3D connectivity. For monolithic 3D based on sequential fabrication of 3D layers, logic and memory technologies on upper layers must be fabricated at temperatures < 400°C and must be physically thin to be connected using inter-layer vias with limited aspect ratios. Sequential 3D fabrication challenges include yield concerns (which may require architecture-level repair) and long chip fabrication times (parallel fabrication of 3D layers may be required to shorten total process time).

HiPEAC voices HiPEAC INFO 68 14

Several beyond-silicon logic technologies – carbon nanotube FETs, 2D materials, oxide semiconducting FETs – are compatible with such 3D fabrication. Carbon nanotube FETs have been implemented in multiple industrial facilities (Analog Devices, SkyWater Technology Foundry) at mature process nodes. Various memory technologies compatible with dense 3D fabrication, such as Magnetoresistive RAM (MRAM), Resistive RAM (RRAM), Ferroelectric FETs, exhibit complementary properties with respect to density, retention, write endurance, ease of integration. Monolithic 3D integration of RRAM, carbon nanotube FETs and silicon CMOS has been established at SkyWater.

Illusion scaleup on N3XT 3D MOSAIC

Illusion orchestrates application execution on a system of multiple identical or non-identical chips – i.e., multiple N3XT 3D chips in N3XT 3D MOSAIC – each with its local on-chip memory, compute and quick ON/OFF mechanisms, to create an illusion of a ‘Dream Chip’ with near-Dream energy and execution time. A Dream Chip densely co-locates all memory and compute on a single chip, quickly accessible at low energy. Dream Chips overcome the memory wall with large energy and throughput benefits.

Multiple Illusion hardware prototypes have demonstrated for deep learning inference with additional < 5% energy and < 4% execution time vs. corresponding Dream Chips. Illusion follows the mantra: move computation, not data. By ensuring enough local (on-chip) memory (through N3XT 3D) and quick chip ON/ OFF (e.g., using on-chip RRAM), Illusion ensures that:

1. Computations are performed where their data reside, avoiding massive inter-chip traffic.

2. Idle energy is eliminated by quickly turning ON/OFF individual chips.


Co-design through synergistic innovations across abstraction levels – material/device/integration, circuit, architecture, software, algorithm – is critical for N3XT 3D MOSAIC. New co-design opportunities exist for almost every aspect of N3XT 3D MOSAIC, including workload mapping, architectures exploiting memories with complementary properties, thermal management for dense compute on 3D layers, architecture-level repair techniques for yield, and error resilient system architectures.

How can we respond to challenges such as robustness and cybersecurity?

Beyond energy efficiency and throughput, we must ensure that future NanoSystems are also robust. Malfunctions of computing systems have serious consequences that continue to increase as systems become more complex, interconnected, and pervasive. Hardware failures, especially, are a growing concern. Today’s verification and test methods to screen design flaws (bugs) and

manufacturing defects cannot meet the levels of thoroughness demanded by today's (and future) systems. For remarkably small feature sizes, several reliability failure mechanisms, largely benign in the past, are becoming visible at the system level. Hardware security is a growing concern. These barriers appear at a time when robust operation is essential, from (autonomous) cars all the way to cloud data centres. How do we overcome these challenges? Here are a few examples from my research group and my collaborators.

Design bugs: Our QED techniques enable highly thorough presilicon verification and post-silicon validation. Backed by theory, QED also enables drastic verification productivity gains – from months to days – while detecting highly critical bugs that escape existing approaches, as demonstrated by various industrial results including Infineon in Europe.

Manufacturing defects and reliability failures: Our soft error resilience techniques (BISER, LEAP) improve soft error rates by three to five orders of magnitude. Our circuit failure prediction concept signals impending circuit failures before errors occur. Its adoption is growing rapidly under various names such as predictive maintenance, predictive health monitoring, and system lifecycle management. Our CASP approach enables a system to predict and detect hard failures in logic (i.e. complements soft error resilience and memory error correction) by testing itself thoroughly, concurrently during normal operation, without downtime visible to end-user. CASP derivatives have been implemented in several industrial products, such as Intel’s recent in-field scan announcement.

Side-channel security attacks: The UPEC technique from TU Kaiserslautern (by Professors Wolfgang Kunz and Dominik Stoffel), derived from QED concepts, is highly effective in detecting (previously unknown) hardware side-channel security vulnerabilities.

HiPEAC voices
SkyWater Technology Foundry

Originally from Belgium, HiPEAC 2023 keynote speaker Ivo Bolsens worked at Imec for several years before joining Xilinx as chief technology officer; more recently, he became senior vice president at AMD when the chip giant acquired Xilinx. We caught up with Ivo to find out about the main trends driving processor development, his vision for future computing systems and what field-programmable gate arrays (FPGAs) have in common with 4x4s.

What are the main trends in computing systems, and what’s driving them?

The reach of artificial intelligence (AI) continues to expand. AI is not about running existing applications better; it’s about solving whole new class of problems that were beyond reach before. Behind this rapid expansion, entirely new programming paradigms are quickly evolving, and with those come new frameworks and even new languages.

Software is truly leading hardware in this segment. Providers of compute and acceleration technology – such as AMD – stay relentlessly focused on the latest AI software developments, so that the greatest software innovations can be optimized and accelerated in tomorrow’s platforms.

Machine learning (ML) training requirements are driving future supercomputer architectures which were previously driven by the requirements of high-performance computing (HPC) applications. AI data-intensive compute requires high-bandwidth memory (HBM), while the scale of the AI networks requires model parameters to be shared across 1000s of compute devices. This necessitates interconnects between compute nodes that offer significantly more bandwidth than PCIe, and the amount of scale-out bandwidth outside of the compute node is increasing accordingly.

There is also a trend towards a pod-level scale as a new hierarchy in the scale-out interconnect. A three-tiered interconnect hierarchy – node-pod-cluster – is emerging, driven by the needs of AI training systems.

Novel chip packaging approaches promise to integrate one level of off-package interconnect and focus on increasing the amount of memory reachable at SRAM-level speeds. Mainstream training graphics processing units (GPUs) are growing the compute capabilities of devices at a rate faster than Moore’s Law, with a similar goal: to increase the compute locally prior to scaling

with multi-device interconnects. One example is AMD Instinct™ accelerators, data centre GPUs built from multiple dice to increase per-device compute performance.

Driven by ML applications, traditional floating-point compute is being replaced by lower-precision datatypes. Training and inference recipes are starting to adapt to these new datatypes. Sparsity offers the promise of further gains. AMD training devices offer support for BF16, FP16, and FP8 datatypes (MI300), while AMD Versal™ inference devices demonstrate the use of even lower-precision datatypes such as INT8 and INT4.

What’s your vision for future network-centric architectures? The latest neural networks, such as transformers, scale to 1000s of processors. Depending on their configuration, they will drive different resource mixes, i.e. the memory / compute / networking ratio is not constant. This requires a data centre consisting of disaggregated resources, whereby the compute and storage resources can be scaled in the appropriate ratio and connected in the optimal network topology.

SmartNICs (network interface cards) will enable the processing of data in motion; central processing units (CPUs) and peer accelerators process the data in use; and in- or near-memory compute processes data at rest.

For network-centric architectures to be widely adopted, the security domain also needs to be extended to include all the disaggregated resources. Confidential computing using CPUbased technologies such as AMD Secure Encrypted Virtualization (SEV) helps protect data in use, but these technologies are restricted to compute nodes. Existing interconnect protocols including Compute Express Link (CXL) Integrity and Data Encryption (IDE) and Internet Protocol Security (IPsec) provide security for data in motion. Security protocols are usually layered on top of each other, but encryption and authentication, which are quite expensive operations, need to become more efficient.

HiPEAC voices
‘FPGAs are like 4x4 SUVs: highly customizable and can take you anywhere’

In disaggregated environments with dynamic application creation, the secure key distribution and coordination function is becoming a bottleneck, so new techniques are required.

What are the benefits and challenges associated with FPGAs?

Reconfigurable, adaptive computing devices provide an extra degree of freedom for implementing compute tasks, rather than mapping the algorithm to a predefined instruction set. Adaptive computing also allows users to adapt the architecture of the compute hardware. However, leveraging this extra degree of optimization gives rise to greater complexity for the programmer. The main challenge is to abstract the complexity while allowing the programmer to unleash the full potential of optimization.

What are some examples of FPGA applications?

The AMD Versal architecture spans the compute continuum from edge to cloud to HPC. For edge and automotive, there are selfcontained system-on-chip (SoC) single devices (Versal AI Edge and AI Core series). For data centres, you’ll find Alveo accelerator cards and SmartNICs attached to EPYC CPUs in the cloud for offloading AI and networking functions. Then there are the VCK5000 Versal compute cards deployed in AMD Heterogeneous Accelerated Compute Clusters (formerly known as the Xilinx Adaptive Compute Clusters) systems at leading universities and national labs for scientific computing applications.

Finally, if processors were a car, what would FPGAs be?

CPUs are commuter vehicles that get you to the office every day, they are the reliable workhorses of the industry. GPUs are sportscars, fast and glitzy; you can drive them to work, but it’s hard to fit the family inside, and they are gas gobblers. FPGAs are 4x4 SUVs. They are highly customizable, can be used by soccer moms or to transport big loads of goods, and can take you anywhere, including off road where other cars can’t go.

Ivo’s top futuristic technologies

• Trusted internet: Blockchain technology will play a central role in raising digital trust and making interactions more secure, ultimately leading to the trusted internet.

• Composable data centre: SmartNIC technology will enable the next-generation server infrastructure that provides the ability to flexibly create, adapt and deploy servers using pools of disaggregated, heterogeneous compute, storage and network fabric. This will allow you to configure exactly the servers needed per application or workload and connect them in a workloadspecific topology. This will lead to cloud agility and scale, and more efficient economics.

• Artificial intelligence: New architectures will emerge to close the gap between the incremental increase of compute capabilities delivered by traditional platforms and the exponential growth in compute requirements of innovative AI algorithms. In a departure from today’s conventional training and inference practice, we will see a scalable, unified architecture and continual learning throughout the lifetime of a deployed application. Dataflow architectures are required to break down the wall between distinctive phases of training and inference and edge and cloud. The unified architecture ensures that compatible data formats are used and optimizations for data formats such as sparsity representations don’t break between cloud and edge.

• Data processing: New memory architectures are required that enable efficient data processing and data movement. The efficient processing of data, be it data in motion in the network, data in rest in storage nodes, or data in use in the compute nodes, requires rethinking the interaction between memory and compute. Data movement is an essential factor in the efficiency of future scalable systems and will require the further introduction of foundational technologies such as high-density and high-bandwidth chip-tochip connectivity in a compute node enabled by 3D stacking and silicon photonics to efficiently scale out the compute infrastructure over 1000s of compute nodes.

Growing AI pervasiveness from cloud to edge

HiPEAC voices
Alveo SmartNIC and VCK500

HiPEAC 2023 keynote speaker Liliana Cucu-Grosjean (Inria) has dedicated much of her career to researching probabilistic approaches to critical real-time systems. In this interview, Liliana talks to HiPEAC about safety-critical systems, launching a start-up and the importance of diversity monitoring.

can be used to support worst-case reasoning as they come with strong mathematical proofs”

Can you explain a bit about probabilistic approaches to critical real-time systems – including to the nervous flyers in the audience?

Probabilistic approaches existed well before my co-authors and I started working on their application for critical real-time systems. Indeed, since the 1980s, different average-based reasoning results have been proposed as critical real-time designers felt more comfortable associating probabilities to values that could be obtained by simulation. Our main contribution is the evolution of probabilistic descriptions by replacing the averagebased reasoning with the worst-case reasoning.

Obviously, critical real-time systems need to guarantee that the computations are completed within a certain timeframe. Current time budgets for this kind of system are often calculated with average-based reasoning, as the largest observed value is observed from a limited simulation. The worst-case time bound is then obtained by adding some high-water factor, without a mathematically supported justification. The probabilistic worst-case reasoning brings this mathematical justification by calculating the probability that a time budget is insufficient. I guess our revolutionary statement is that probabilities can be used to support worst-case reasoning as they come with strong mathematical proofs supporting complex qualification processes.

How has the field developed over the last few years? How do you see it evolving over the next few years?

Probabilities have provoked polarized reactions within the critical real-time systems community, either strong opposition or immediate curiosity to understand when – and when not – to use such approaches. In fact, one should only use probabilistic approaches when static or exact models cannot be proposed. Probabilistic models should not be used when exact models exist as they give a more pessimistic result, as well as requiring a major modelling effort and strong mathematical background to produce.

However, the trade-off becomes extremely interesting for complex architectures and for users with performance-oriented expectations, where composability guarantees are harder or impossible to provide with non-probabilistic approaches. Displaying statistics on observed parameters like time latencies, energy consumption or hardware events is done naturally by designers without necessarily providing associated proofs or being able to identify what the result obtained corresponds to, due to the large amount of data there is to analyse.

Nevertheless, the area of probabilistic approaches has gained maturity, with numerous research groups mixing probabilistic and statistical approaches, indicating that this research direction is becoming more established. One aspect that, I believe, will accelerate the use of these approaches is the increased societal expectation for decreased energy consumption and carbon footprint of computing systems, in general.

HiPEAC 2023 takes place in Toulouse, a renowned hub for avionics. How is your research being used by companies such as Airbus?

Toulouse is, indeed, the location of my main industry collaborators like Airbus for avionics and space but also EasyMile for autonomous vehicles. They mainly use our hybrid worst-case execution time estimators, where statistics are used to decrease the high complexity of the problems under consideration by identifying execution paths that are important with respect to the time behaviour. A static approach completes this estimation for those execution paths, while it makes the overall estimation compliant with the highest qualification standards. Moreover, there is a noticeable side effect of the probabilistic approaches as they allow to reveal software regression properties, which is excellent news for these users.

HiPEAC voices

Today, both Airbus and EasyMile are sponsor customers of our spin-off, meaning that they support us with many hours of joint work to understand how much and where probabilities may help within the time validation of their systems.

How did your spin-off come about? What advice would you have for HiPEAC members seeking to launch companies based on their research?

StatInf is an Inria spin-off that I co-founded together with Adriana Gogonel, a post-doctoral student within my research group at Inria. She brought the perspective of a pure statistician to our domain and helped us re-formulate our problem, making it more “appealing” for mathematicians. By the end of her post-doctoral stay at Inria, we had a nice academic prototype protected by two patents and registered with French software protection agency APP. Several industrial collaborators indicated that they would be interested in buying this kind of technology if it was translated into a robust industrial-level implementation tool.

At that time, we had the opportunity of transferring it to existing small / medium enterprises (SMEs), but in the end launching our own start-up looked like the best option. One of the main reasons was that if you transfer technology to an existing SME, then you lose out on public funding that is available to new start-ups in France and Europe. Moreover, creating your own start-up allows you to make early decisions about the technology roadmap that are not always an option within larger SMEs.

My best tip? If you intend to create a start-up within a HiPEAC area, talk to those who have done it in industry domains close

to yours. Do not look at companies like Meta or Google since the business models and the markets may be very different. It looks obvious said like that, but some start-up people may dream of large funding rounds that are not necessarily a good idea in embedded systems markets. Such markets have sufficient money but lack sufficient human resources. The limited number of unicorns in our industry domains is actually a revealing sign.

You’ve been active in equality and diversity work at Inria. Do you have any advice for how HiPEAC can ensure everyone feels welcome in all aspects of their diversity.? Statistics! In order to propose appropriate solutions, you need to identify the causes and continuously monitor the impact of proposed solutions. I have done such work at Inria as co-founding chair of the equal opportunities committee from 2015 to 2021, but also at IEEE as an active member in the Technical Community on Real-Time Systems (TCRTS) since 2016. Not all solutions Inria has implemented would work for IEEE as Inria is a research institute and IEEE is a scientific association, but both have in common the fact that once diversity statistics are monitored, people become aware of inequalities and act by making internal selection processes as transparent as possible.

Why is diversity important? The critical real-time systems field faces stiff competition with other research and industry domains that can makes it seem less attractive because of lower salaries, or because of the technical knolwedge necessary to work within these domains. Being able to attract more diverse people is key to attracting young students when salaries or difficult problems to solve could be a barrier.

HiPEAC voices
Airbus work with Liliana’s spin-off, StatInf. Photo: © Airbus - Master Films - Hervé Goussé

The latest edition of the HiPEAC Vision analyses trends in computing across six ‘races’: for the “next web”, for artificial intelligence (AI), for new hardware, for cybersecurity, for sovereignty and for sustainability. Once again, the format of this roadmap document is a set of recommendations followed by a series of articles analysing each topic in more depth. In this article, the HiPEAC editorial board sets out the main themes of this edition.

A race against time: The HiPEAC Vision 2023

The last few years have seen rapid, profound changes across the world. In technology, we are witnessing breakthroughs in artificial intelligence (AI) nearly every week; new paradigms like quantum computing are attracting significant amounts of funding; and the (industrial) metaverse is on the horizon.

From the geopolitical point of view, technology is increasingly seen as a strategic asset, with different world regions competing for leadership. With global supply chains under unprecedented pressure, more world regions are aiming for technological sovereignty. Meanwhile, reports of new cyberattacks, cybercrimes and cyberwar are becoming more frequent. Moreover, now that climate change is becoming increasingly evident, sustainability is finally being taken seriously, and optimizing processes to use less raw material and energy is now a major objective.

This HiPEAC Vision explores the urgency of these themes in terms of six races for leadership: races both with other world regions and against time. With technology evolving faster than humans can adapt, and environmental pressures intensifying, it has never been more important for Europe to identify clear priorities for computing.

Each chapter is dedicated to a different leadership race and features a series of articles laying out the key issues. In addition, the editorial board has developed three global recommendations which cut across the different ‘races’. Throughout, cartoons by the Belgian comic artist Arnulf provide a light-hearted, often satirical take on the content.

Global recommendations

• Break silos to gain a holistic view, which is necessary for global optimizations. Promote collaboration between teams, launch joint project calls to create synergies between domains, establish cross-disciplinary European competence centres and promote open source (hardware and software)

• Support the development of tooling for cross-disciplinary and multi-dimensional challenges. While taking a global view is necessary for enhanced optimization, it is more complex. We should use AI as “helpers” to propose solutions – which should be thoroughly validated and tested before implementation.

• Develop trustable runtime orchestrators able to manage complex systems, with the help of experts in system orchestration. In some cases the system will have to select a large number of options “in real-time”. This will require the development of trustable orchestrators which are loyal to their users.

HiPEAC Vision
Editorial board: Marc Duranton, Koen De Bosschere, Bart Coppens, Christian Gamrat, Madeleine Gray, Thomas Hoberg, Harm Munk, Charles Robinson, Tullio Vardanega, Olivier Zendra.

Technology races

The race for the “next web”

Increasing amounts of data are being generated by machines that are interconnected and linked with the physical world to provide new optimizations and services. The “next web” will have to take this into account and smoothly integrate the “web of machines” with the “web of humans”, executing computations where they make most sense and taking into account new non-functional requirements. Europe should develop, standardize, test and validate such related technologies running on the continuum of computing, including trustable orchestrators to manage this complexity and serve both personal and industrial users by enabling a European industrial metaverse.

The race for artificial intelligence (AI)

The “next web” will require intelligent data processing across the compute continuum. Powerand data-hungry large language models are delivering incredible results. At the other end of the spectrum, it is increasingly important to have efficient AI systems at the edge. Europe should focus on AI solutions that enhance its strengths in the embedded sector, keep up with developments in large models, prepare for them to be squeezed into edge devices, and use AI judiciously as a “helper”, for example to develop new software and hardware, although AI-generated results should always be validated before use.

The race for innovative and new hardware

The widespread use of AI and the “next web” will only be possible if the systems on which they are executed are both efficient and affordable. Emerging paradigms, such as quantum computing, neuromorphic computing, spintronics and photonic devices, appear to offer efficient solutions for particular problem areas, at a much lower

energy consumption than current von Neumann architecturebased semiconductor devices. Research in this area should be coupled to practical applications and environmental impact, and should tackle heterogeneous integration with current technologies (in hardware and software).

The race for cybersecurity

Computing systems will only be practically usable if they are secure and safe to use. Given the fact that modern societies depend almost entirely on digital technology, the stakes are higher than ever. Critical infrastructure and supply chains should have both their hardware and software components hardened against cyber-attacks. The EU should build on its strengths in cybersecurity, including for post-quantum systems, and should broaden mandatory security and privacy, EU-based, audit and certification of IT systems.

The race for sovereignty

It is becoming increasingly obvious that the globalization of previous decades is falling apart. Europe should strive for digital sovereignty, both in hardware and in software and should promote open source. It is also vital to keep investing in talent, research and innovation, and in a more entrepreneurial ecosystem.

The race for sustainability

To reduce the footprint of computing, designers should use a full lifecycle assessment when designing new computing systems, and computer architects should focus on embodied energy. Europe should develop new economic models that also take into account the lifecycle environmental costs and should continue searching for ways to develop more environmentally friendly (digital) goods and services e.g. by dematerialization.

FURTHER INFORMATION: hipeac.net/vision

HiPEAC Vision HiPEAC INFO 68 21

Elli Kartsakli is a senior researcher at Barcelona Supercomputing Center (BSC).

A recipient of a prestigious Ramón y Cajal grant from the Spanish Government, Elli has been in active in a number of European Union-funded projects, including ELASTIC and CLASS, and is now coordinating two projects funded by the Spanish government, PROXIMITY and AIR-URBAN. We caught up with Elli to find out more about creating a viable compute continuum and powering safer, cleaner mobility.

‘Distributing computations across the continuum opens up huge possibilities’

So Elli, what can you tell us about the compute continuum? Well, I’m from a telecommunications background, so the first time I heard of the concept of the compute continuum was when I began working with the computation community.

Ah… so it’s like learning a whole new vocabulary. Yes: in telecommunications, you have multi-access edge computing, where users are connected to a base station and the traffic goes to the edge. In computation, what I understand by the compute continuum is the use of different resources, from data centres to edge devices, according to the needs of the application. This means that data is processed where it makes most sense; in many cases, there is no need to send data to the cloud, as it can be processed locally, reducing latency and energy costs.

Well, that sounds pretty simple. Not really. Some of the main issues are around interoperability and orchestration. Developing an application that can be distributed and executed seamlessly across the compute continuum is not trivial. That involves looking for the right programming frameworks and application programming interfaces (APIs), and then handling the orchestration of the computing tasks to meet the application requirements. For example, at BSC we’ve been using our in-house COMPSs programming environment for the distribution aspect.

Oh, I see. But you’ve already done a lot of work here, right? Our research group, headed by [HiPEAC member] Eduardo Quiñones, has experience in researching and developing software frameworks that address these issues. We’re starting a new project funded by the European Union (EU) in January 2023, called EXTRACT, which will tackle interoperability between edge and cloud – bear in mind that even the same tool, such as Kubernetes, has different implementations depending on where its executed. In the EU-funded CLASS and ELASTIC projects, we worked on frameworks for smart-city mobility solutions – for connected cars and connected trams respectively – and the national projects PROXIMITY and AIR URBAN build upon this approach.

I’m glad you mentioned smart cities, as this seems an area where distributed resources would be a big plus… Being able to distribute computations across the continuum opens up huge possibilities for meaningful applications with real social value. The main thing is to be able to extract value from data. In the case of smart cities, you have sensors collecting massive amounts of local data which could be transformed into meaningful information: sensors measuring air quality, cameras, parking monitoring systems, etc. In addition, there’s data from cellphone users, as well as macroscopic information such as bus timetables and weather forecasts. Not to mention that there will be increasing amounts of data from connected vehicles in the future.

If we could collect all this data and process it in an intelligent way, we could deliver services to enhance safety, manage traffic or adapt to specific events, such as how a major congress in the city might impact upon congestion. You could also provide personalized services, as we’ll be doing with one EXTRACT use case, which focuses on providing personalized evacuation plans for mobility-impaired people. On the collective scale, this kind of intelligence could help local authorities understand the impact of the actions they take. For example, if traffic-calming measures result in safer streets and better air quality or if they have unintended consequences, such as causing traffic jams elsewhere that worsen air quality.

Compute continuum special feature HiPEAC INFO 68 22

Compute continuum special feature

It sounds like edge processing has a major role here. Yes, but most applications are currently cloud based, which is why research projects like ours are investigating how best to move computations to the edge. As an example of some of the issues we’ve encountered, in CLASS we used the LTE (long-term evolution) communications standard, but data had to go through the telecommunications operator core network before coming back to the edge. Also, the synchronization of multiple sensing sources was a challenge.

Learning from this experience helped us in ELASTIC, where we wanted to complement data from trams with, for example, data from external cameras which might spot things not ‘seen’ by the tram. However, in ELASTIC we used a WiFi connection, which had reliability issues and was not suitable for establishing fast connections with moving vehicles. This is obviously unacceptable in safety-critical, real-time systems. That’s why in PROXIMITY we will be trying integration with the 5G network.

Tell us more about PROXIMITY and AIR-URBAN. PROXIMITY applies the same principles as ELASTIC but in a local setting, in this case the city of Barcelona. It will provide a software framework for the development, deployment and execution of data-analytics applications. We’ll use programming models to distribute the computing and APIs to connect to the 5G network. We’ll also create a converged platform that integrates edge and cloud resources and provide dynamic orchestration – with joint compute and communication resource allocation –to meet application demands. This framework will be used to identify accident blackspots and generate alerts – for example, providing visual reminders to motorists and tram drivers.

AIR-URBAN will leverage a similar architecture, but in this case we will investigate the impact of traffic on air quality and emissions. Today, there are many models estimating vehicle emissions, but these are based on simulations. Our goal in this project is to analyse real-time traffic data using artificial intelligence (AI) and detect how specific traffic events impact air quality at small time scales, as well as improve current models for air-quality prediction. This will be set up in Barcelona, and the information will be used to help evaluate the impact of measures introduced by the city council.

So many things, and we still haven’t discussed privacy or security… Yes, that’s a massive area of research. We took steps towards secure edge applications in ELASTIC, and security will also feature in EXTRACT. But we are open to collaborations with colleagues who specialize in this area.

PROXIMITY is funded under the Spanish Government’s Proyectos de Generación de Conocimiento 2021 scheme (PID2021-124122OA-I100) while AIR-URBAN is funded under the Proyectos de Transición Ecológica y Transición Digital 2021 call (TED2021-130210A-I00). CLASS and ELASTIC received funding from the European Union’s Horizon 2020 research and innovation programme under the grant agreement numbers 780622 and 825473 respectively, while EXTRACT is funded under the EU’s Horizon Europe research and innovation programme (grant agreement number: 101093110).


CLASS project class-project.eu ELASTIC project elastic-project.eu

The EU-funded ELASTIC project provided a data-analytics framework for connected trams in Florence

While the promise of the compute continuum – being able to run applications anywhere from tiny edge devices to large computing infrastructures – is enticing, one of the main issues dogging the realization of this promise is interoperability. In a recent paper, researchers from the Complex Systems Group at the University of Neuchâtel advocated the WebAssembly binary instruction format as a solution to this problem. We caught up with Jämes Ménétrey, Pascal Felber, Marcelo Pasin and Valerio Schiavoni to find out more.

Using WebAssembly for a more interoperable, secure cloud-edge continuum

format that enables you to compile source code from different programming languages into portable bytecode, developed by a consortium of industry leaders, including Microsoft, Google and Mozilla,’ he notes.

Flexibility through migration

While the term ‘cloud-edge continuum’ gives the impression of a smooth spectrum of compute possibilities, the reality is more complicated. ‘The cloud-edge continuum is a complex system comprised of various kinds of applications and hardware,’ says Marcelo Pasin. ‘These components are typically managed by large teams using an extensive collection of software tools, which are used by different customers. Today, the maintenance of such ecosystems is far from seamless, partially due to the ubiquity of proprietary products. This means that specific solutions are often developed, leading to the use of incompatible software.’

Such distributed systems also pose pressing security challenges, explains Valerio Schiavoni. ‘The cloud-edge continuum is accessed through the internet and hosted in various locations. Unlike mainstream cloud architectures, edge and internet-ofthings (IoT) devices are deployed in local infrastructures, close to the end users, where physical security cannot be guaranteed.’

Yet providers and users still need guarantees that the confidentiality and integrity of their data will be preserved, he says. To ensure that systems are trustworthy, therefore, semiconductor companies like Intel, AMD and Arm provide trusted execution environments (TEEs), says Valerio, which ensure the correct execution of pieces of software while verifying their authenticity and integrity. This is accompanied by remote attestation for trustworthy communication between distributed devices.

For the development of future distributed systems, therefore, there is a clear need for a more interoperable environment which ‘would run seamlessly across hardware devices and software stacks while preserving good performance and a high level of security’, says Marcelo. WebAssembly fulfils all of these requirements: it is ‘a lightweight, general-purpose binary code

One of the main benefits of the compute continuum is the ability to run software where it makes most sense, from local processing on edge devices to data crunching in the cloud. ‘Migration takes advantage of the cloud-edge continuum by moving running software across computing nodes. To keep latency down, you might want to relocate a service closer to end users. Online video games, which operate time-critical player interactions, are one example. Or think of machine-learning algorithms, which often require levels of computational power only available in the cloud, but for which you might need to migrate code to process data closer to the users due to local legal regulations,’ explains Jämes Ménétrey. ‘However, migration is much more of a challenge if the environment is as heterogeneous as the continuum.’

By offering homogeneity, WebAssembly reduces the effort of migration, according to Jämes. ‘Cloud infrastructures use different technologies and hardware to those found in small devices, so hardware abstraction is crucial. Likewise, having the same software stack across the continuum facilitates migration regardless of the underlying operating system. For example, a binary compiled in WebAssembly is compatible with different TEE technologies.’ This gives WebAssembly the versatility to handle various different tasks across the continuum, he adds.

This versatility results in a number of advantages as a common execution unit for the cloud-edge continuum, Jämes notes. ‘First, many programming languages can define WebAssembly as a compilation target. This means that software developers can use their favourite programming languages instead of being constrained by those natively supported by the underlying platforms. Second, unlike other frameworks like Java or .NET, WebAssembly is compact and has a small memory footprint, while leveraging a sandbox that protects the hosting platform from

Compute continuum special feature HiPEAC INFO 68 24

Compute continuum special feature

malicious applications. Third, WebAssembly uses a standardized operating system interface called WASI, enabling applications to operate regardless of the underlying system. Finally, developers can adjust the performance and memory usage depending on the execution modes of the WebAssembly runtime.’

Cooperation for better technology

Many of the advantages of WebAssembly can be attributed to its open-source nature, Jämes notes: ‘While proprietary solutions usually keep their intellectual property closed to maintain an advantage over competitors, the industry leaders involved in WebAssembly realized that cooperation leads to a better standard with higher adoption rates – as shown by the number of browsers supporting this technology.’ Indeed, consortium members actively contribute to the development of WebAssembly, while researchers and software engineers are working together to build better WebAssembly runtime systems, such as WAMR, he adds, leading to significant improvements in the bytecode execution speed.

The community also welcomes proposals for extensions, as well as integration with technologies such as machine learning and cryptography, says Jämes. ‘These proposals focus on adding capabilities while remaining independent of a particular library. As such, future software can use a standardized application programming interface (API), for instance, to train a model and compute a hash while leaving the implementation to the runtime system, with the potential to leverage accelerators where available, Jämes adds.

As a young technology (version 2.0 was released in April 2022), Jämes points out that there are still challenges ahead before WebAssembly can be widely adopted. ‘For example, managed programming languages like C#, Java or Python still do not support direct compilation to WebAssembly, relying on workarounds instead,’ he notes. ‘We also observed performance

overheads when benchmarking WebAssembly software compared to the native version.’ However, he suggests that the benefits of WebAssembly outweigh the drawbacks.

As for the future of the cloud-edge continuum, the research group’s vision is clear. ‘We envision the cloud-edge continuum as an interoperable, scalable and distributed ecosystem, where software may be executed on any device and moved as required, irrespective of the platform or operating system,’ says group leader Pascal Felber. ‘This will transform the development lifecycle of future applications, enabling developers to focus on the business value instead of dealing with the complexity of each piece of infrastructure.’

WebAssembly is the ideal technology for this task, according to Pascal, thanks to its abstraction of the operating system, hardware and programming languages, combined with the security guarantees it can provide using TEEs. ‘While challenges still exist, we are confident WebAssembly and trusted computing will be a versatile, reliable and efficient foundation for software development in the years to come.’


J. Ménétrey et al. ‘WebAssembly as a Common Layer for the Cloudedge Continuum’, FRAME 22 bit.ly/FRAME22_WebAssembly

This work includes results from the VEDLIoT project, which has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 957197.

An ideal cloud-edge continuum would run software where it makes sense, from cloud infrastructures to edge devices
“We envision the cloud-edge continuum as an interoperable, scalable and distributed ecosystem”

Building a workable compute continuum

Smart orchestration of applications in the compute continuum

the end-users of an application). This approach reduces the latency between the data producers (or consumers) and the computational and/or storage resources.

The adoption of cloud computing is spreading at an incredible pace. This transition is not only cost effective for application providers, avoiding the set-up and maintenance costs of the server-side infrastructure of services, but is also convenient for end users, who benefit from up-to-date applications via lightweight clients.

However, not all modern applications can be effortlessly migrated onto cloud computing infrastructures. Some application features and constraints, like latency sensitiveness, can prevent straightforward cloud-based deployment and execution of services, as in the case of virtual reality applications. Other applications may present issues concerning their migration to cloud environments due to constraints on data, either because there is too much data to be moved or because there are legal restrictions on where the data has to be stored.

Edge computing is an attempt to overcome these limitations; it is a service computing paradigm aimed at bringing the computation as close as possible to the data producers and consumers (e.g.

From an edge-computing perspective, the infrastructure revolves around a compute continuum that goes from one (or more) cloud(s) to a potentially large number of edge resources geographically distributed over a broad area. Managing such a complex set of distributed, often heterogeneous resources can pose practical problems, as well as interesting research challenges.

Among these is the nontrivial task of selecting adequate resources for a given application service, conditioned by its users and their dynamically varying locations. Many approaches have been proposed to tackle the problem, some based on traditional optimization techniques, whereas others exploit data-driven approaches (e.g. machine learning). There are solutions based on centralized solvers and others with a distributed nature.

The ACCORDION H2020 EU project aims to create a computing platform to support the execution of next-generation applications (i.e. applications characterized by stringent requirements in terms of latency and network bandwidth) at the edge. As part of the project, several tools and technologies aimed at supporting the correct placement and management of applications have been developed.

Among these are the SmartORC orchestrator and the Edge Minicloud. SmartORC (Smart Orchestration of Resources in the Compute Continuum) aims to support the efficient placement of applications in a cloud/edge federation. It embeds modules for properly processing application descriptions, indexing and discovery of computational resources and an optimization engine based on mixed-integer linear programming. In the context of ACCORDION, it has also been enriched with a module for rendering the application descriptions to be given in input to Kubernetes and with the lifecycle management of applications running on the ACCORDION platform.

The Edge Minicloud is an active entity that by providing a set of services (such as resource management, application deployment,

Lorenzo Blasi (HPE Italy), Emanuele Carlini (ISTI-CNR), Patrizio Dazzi (University of Pisa), Konstantinos Tserpes (Harokopio University of Athens)
What are the main elements of a workable compute continuum? Here we explore some of the work being done by the HiPEAC community, including orchestration, resource management and next-generation cloud computing.
Compute continuum special feature HiPEAC INFO 68 26

monitoring, etc.) supports the implementation of a distributed resource ecosystem. Each Edge Minicloud manages a localized set of resources that represent an aggregation of edge devices that are involved in the ACCORDION Platform, accessed and managed by means of a virtual infrastructure manager (VIM). The VIM oversees and plays an active role in the allocation of VMs and containers on the resources belonging to the edge devices. The Edge Minicloud abstracts the different VIM technologies that can be used for such an activity. In ACCORDION the actual VIM exploited is K3S extended with KubeVirt to also manage Virtual Machines, for workloads that cannot be easily containerized, as well as Unikernels.

Both SmartORC and the Edge Minicloud are actively developed with the aim of improving their efficiency and reliability. Furthermore, a key aspect that characterises these tools, which is continuously improved and developed, is their propensity to enable fully decentralized management of edge resources with the ambition of realizing a completely autonomous and decentralized compute continuum.

FURTHER INFORMATION: ACCORDION project accordion-project.eu SmartORC orchestrator smartorc.org

ACCORDION has received funding from the European Union’s Horizon 2020 ICT Cloud Computing programme under grant agreement no. 871793.

DataCloud: Resource provisioning, scheduling and deployment of big-data pipelines

Narges Mehran, Dragi Kimovski, Radu Prodan (all University of Klagenfurt), Souvik Sengupta, Anthony Simonet-Boulgone (iExec Blockchain Tech), Ioannis Plakas, Giannis Ledakis (UBITECH) and Dumitru Roman (SINTEF)

Modern big-data pipeline applications, such as machine learning, encompass complex workflows for real-time data gathering, storage and analysis. Big-data pipelines often have conflicting requirements, such as low communication latency and high computational speed. These require different kinds of computing resource, from cloud to edge, distributed across multiple geographical locations – in other words, the computing continuum. The Horizon 2020 DataCloud project is creating a novel paradigm for big-data pipeline processing over the computing continuum, covering the complete lifecycle of bigdata pipelines.

To overcome the runtime challenges associated with automating big-data pipeline processing on the computing continuum, we’ve created the DataCloud architecture. By separating the discovery, definition, and simulation of big-data pipelines from runtime execution, this architecture empowers domain experts

with little infrastructure or software knowledge to take an active part in defining big-data pipelines. The DataCloud runtime bundle explores the allocation, scheduling and orchestration of data processing on the computing continuum. The bundle includes a number of tools to lower technological barriers to the incorporation of big-data pipelines in organizations’ business processes, as described below.

DataCloud runtime bundle architecture

The DataCloud runtime bundle consists of three tools that manage the big-data runtime lifecycle on the compute continuum:

1. ADA-PIPE provides a data-aware algorithm allowing smart and adaptable provisioning of resources and services.

2. R-MARKET provides a decentralized, trusted marketplace for resources (software appliances and hardware devices).

3. DEP-PIPE enables flexible and scalable deployment and orchestration of big-data pipelines.

The DataCloud architecture integrates these runtime tools with the design bundle, comprising the DEF-PIPE for big-data pipeline definition and SIM-PIPE for simulation and analysis.

Compute continuum special feature

Overall, the DataCloud runtime bundle architecture defines 14 tool interaction steps interconnected with the two design bundle tools through a visualization interface. The diagram above shows an integrated workflow of the tools.

• Step 1: The data pipeline user sends a stream or batch of data produced by heterogeneous sources.

• Step 2: DEF-PIPE translates and defines the data pipeline stages and structure from the user input.

• Step 3: SIM-PIPE simulates the pipeline execution based on requirements including processing speed, memory and storage size.

• Steps 4-6, 8: ADA-PIPE receives the structure of the pipeline and analyses the dependencies between the pipeline stages, explores the requirements of each specific stage – such as processing speed, memory and storage size – and sends them to DEP-PIPE for deployment.

• Steps 7, 9, 10: R-MARKET creates a decentralized marketplace based on a permissionless blockchain. This federates the set of heterogeneous, virtualized resources along the computing continuum and reserves the appropriate ones based on the requirements identified.

• Steps 11-13: DEP-PIPE deploys the big-data pipeline based on the schedules and execution plans defined by ADA-PIPE. In addition, it continuously monitors and reports on the pipeline’s execution on the computing continuum. This monitoring information is provided to DEF-PIPE for visualization and ADA-PIPE for adaptation in response to execution anomalies.

In terms of future work, we plan to improve the performance of the runtime bundle by validating the requirements of the business use cases in the DataCloud project.

DataCloud’s business use cases include predicting deformations in ceramics and analysing manufacturing assets

DataCloud has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 101016835.

Compute continuum special feature

Supercloud design challenges

Cloud computing services are expected to dominate the future of computation. Already hyperscale cloud architectures are offered by every major tech company (Amazon, Apple, Cisco, Google, Huawei, Meta, Microsoft etc.). Almost every enterprise (~94%) uses cloud services and the global market value is expected to reach an astonishing US$832.1 billion in 2025. However, cloud computing is still mostly limited to running business and web applications. Why are cloud services not used for more demanding computations? Also, what will be the future of the cloud?

Supercloud has joined the chat.

The term ‘supercloud’ was used by Bill McColl, referring to clouds that could support high-performance computing (HPC), artificial intelligence (AI)/machine learning applications, processing of big data etc. Currently, cloud computing is limited to being rented as a virtual machine, a server, which is used for storage purposes or running business and web applications. However they are not yet designed to run large and demanding HPC or AI applications that would require big amounts of data and scale on thousands or tens of thousands of cores. So what is the issue?

Let’s discuss fault and tail tolerance.

There are actually two critical problems: fault and tail tolerance. The first can be roughly translated as: “What happens in our system when one of the nodes of computation fails?” A faulttolerant design, for instance a fault-tolerant cloud, would allow a system to continue its operation in the event of a fault, even at the expense of slightly lower performance, instead of failing completely. The second refers to the latency required to respond to a request. The longer the tail, the longer the latency. A tailtolerant system manages to respond to nearly every request with low latency.

How can we design the superclouds of the future?

Two of the main trends in computing have been centralized systems and decentralized, distributed systems. In our efforts to increase performance there have been parallel systems with shared memory that are non-scalable uniform memory access (UMA) and non-uniform memory access (NUMA)), in terms of adding more resources (e.g. more central processing unit (CPU) cores), and parallel computation using distributed components and distributed memory that are scalable. Accordingly, from

the software perspective, there have been models to fit into these approaches using the multithreading programming model (POSIX Threads, OpenMP etc.) for the former and BSP, MPI, among others, for the latter. However, neither seems capable of addressing fault and tail tolerance.

Supercloud design and automation

The term ‘supercloud design’ describes the process of analysing an application’s requirements, using cost models in order to perform performance estimation and finally mapping an application’s computational components to the available resources of a supercloud.

Understanding the challenges of future superclouds, overcoming them and automating their designs so that HPC and AI applications can run efficiently will shape the future of computing. Domainspecific compilers, the automation of design space exploration, and intelligent memory management seem promising domains towards this cause. Advances in these areas can drive the design of future supecloud systems.


McColl, Bill. “Superclouds: Scalable High Performance Nonstop Infrastructure for AI and Smart Societies.” (2017).

Cloudwards. 26 Cloud Computing Statistics, Facts & Trends for 2022. Last accessed October 2022. cloudwards.net/cloud-computing-statistics/#Sources

Markets and Markets. Cloud Computing Market. Last accessed October 2022. bit.ly/3hdQ6fY

Compute continuum special feature

Based in Edinburgh, Codeplay Software provides development tools allowing engineers to build software capable of running on a variety of different processors. For years, Codeplay has championed open standards, as outlined by Chief Executive Andrew Richards in HiPEACinfo 58. In June 2022, the company was acquired by Intel. HiPEAC caught up with Andrew to find out more about this groundbreaking company’s path to innovation.

the compiler field, we can make a difference while remaining small and specialist’

How does open source promote innovation? How has Codeplay’s work with open standards been advantageous to the company?

The strength a small company has is its innovative ideas, but its weakness is being too small to take on a large challenge. With open systems, a small company can work on its specialism and combine with other teams building the rest of the system. Maybe you’re writing a plug-in for some larger software package, or adding your own customization for an open-source project. Or, in our case, working with international partners on defining an industry-standard programming model. Open systems and open interfaces let specialist teams work together.

With closed systems, only a tiny number of innovators have the resources and capability to deliver a product. That makes innovation extremely hard. Innovation thrives in small teams with the freedom to try new things. The challenge is getting those small teams to work together to deliver something amazing!

For example, Codeplay defined the SYCL programming model. SYCL enables people to use the latest high-performance software approaches that are being adopted in C++, but also to build new high-performance processors to run that software. This enabled the US Exascale Computing Program to adopt SYCL as a key programming model for their new supercomputers: bringing in both innovative new hardware (like the Ponte Vecchio graphics processing unit (GPU) from Intel) with scientific software delivering new breakthroughs (like GROMACS or LAMMPS).

This scale of project was far beyond what a small Edinburgh company could achieve on its own. But at the same time, the level of innovation required was more challenging than large

companies could achieve: many had tried and failed to deliver a programming model that could open up new processors architectures and still support software at scale. Codeplay, with the help of HiPEAC and other European funding sources, was able to deliver this unique value.

What are technical challenges are you most proud of solving? We’ve worked for many years on the challenge of performance portability. There is a widespread belief that to achieve high performance you need to sacrifice portability. That means you can either have software that runs on one kind of processor very fast, or you can run software on lots of different processors slowly. That would make it very difficult to innovate in processor architecture: you could bring an innovative processor to market, but then you would have to either sacrifice performance, or spend years porting software to the processor, by which time it’s obsolete. If performance portability is impossible, it becomes financially unviable to bring new processors to market. We showed it’s possible, as long as you’re practical about it.

To achieve this goal, we surveyed all the different techniques used in the field. We did this because we saw people already achieving performance portability to some extent using C++ in videogames. We also saw techniques in GPUs that made performance easier. So, we integrated all these techniques into a standard programming model (SYCL), which is based on C++. Then we worked with people across the industry to build out this vision and prove it worked. Today, people achieve very high performance across a wide range of processor architectures using SYCL and these techniques. You can particularly see it in scientific high-performance computing (HPC), where there are now plenty of papers proving the approach works.

Innovation impact

What has Codeplay been doing with SYCL™ and oneAPI recently?

SYCL and oneAPI are already being used extensively in HPC and especially the US Exascale Computing Project to support supercomputers across the three main GPU architectures: Intel, AMD and NVIDIA.

What we’re doing right now is opening up oneAPI as much as possible to make it even easier to use on different hardware platforms. Recently, we opened up the governance of oneAPI to bring in new participants: this is going to make oneAPI an ecosystem that isn’t governed by any one organization. That’s great for the industry and also a fantastic opportunity for HiPEAC members to get involved and drive the direction of what we’re doing.

What oneAPI brings innovators is the benefits of Intel’s huge resources, along with the US Exascale Computing program building out the software ecosystem, but in an open way where you can add your own new ideas. It’s a fabulous opportunity to bring new compiler techniques to scientific and AI software at scale, as well as bringing accelerated computing into new markets.

How do you attract and retain talent?

We put a lot of effort into hiring new talent from university and training people up in these new skills of compilers for accelerator processors. HiPEAC is really great place to find talented compiler people. It’s harder to hire experienced compiler people, but we manage to do it from time to time by being a very visible, respected innovator in the field.

We also focus a lot on how to retain great people. Codeplay is a very friendly and helpful place to work. It’s also a place to switch between lots of interesting projects. And because we’re very public in the industry, it’s a place to make a mark on the industry.

A lot of companies go through cycles of hiring loads of compiler developers to work on some new processor architecture or idea. Then they find it’s really hard to build compilers and get them working at scale, leading to rounds of redundancies. Codeplay has been growing steadily through all these cycles: being cautious when there’s crazy growth (such as all the ‘magic’ AI graph compilers) and then being focused on the next big thing when there’s a downturn. It looks like we’re just finishing a crazy boom in magic compilers and going through a tougher period. But for us at Codeplay, this means that people are being realistic about what compilers can and can’t do right now and that’s good for a company like us that focuses really hard on compiler technologies that work.

Any other advice for people wishing to build a successful deep tech company?

You can try to sell lots of very deep tech products, but it requires huge amounts of investment and large sales teams. The alternative is to have huge global impact, which for deep techies like us may be easier and require less investment. So my advice is: think carefully about the route you go down. The obvious route of building a massive company is very tough. In the field of compilers, we can make a real difference while remaining relatively small and specialist. There are different kinds of success, so think about what works for you. What worked for us was massive impact.

Innovation impact
“With open systems, a small company can work on its specialism and combine with other teams building the rest of the system”
Codeplay at the 2022 HiPEAC conference. The company is also sponsoring HiPEAC 2023

Founded in 2021, the artificial intelligence (AI) technology company Axelera AI has had a meteoric rise, recently announcing that it had successfully won US$27 million in Series A investment. With HiPEAC member Evangelos Eleftheriou as its chief technology officer (CTO) and fellow HiPEAC members (and hardware luminaries) Marian Verhelst and Luca Benini on the scientific advisory board, the company’s innovation potential is evident. We caught up with Evangelos and Axelera Chief Executive Fabrizio Del Maffeo to find out more.

Competitive edge

How Axelera turns leading-edge research into AI innovation

How did Axelera come about?

Fabrizio: Axelera AI was founded in July 2021 by a core team from the company

Bitfury AI, led by myself as its new CEO and co-founder, along with a core team from imec, the global nanotechnology leader. Evangelos Eleftheriou, CTO and co-founder, and a group of researchers from IBM Zurich Lab joined the company a few months later, completing the founding team.

Today's AI technology has been designed primarily for cloud computing operations, a sector with relatively few constraints on cost, power, and scalability. The key players have so far delivered inefficient and expensive technologies based on standard computing and graphic computing architectures that are poorly suited to the needs of AI/edge workloads. Hardware for edge applications requires an entirely innovative design considering specific computational performance, power, and economic conditions.

What is the main technology created by the company?

Evangelos: Axelera has created an AI platform for AI-inference acceleration of computer vision workloads at the edge. The platform comprises a powerful AI processing unit (AIPU) chip and a versatile end-to-end integrated software stack. The AIPU design is based on a four AI-core architecture. Each AI-core can execute all layers of a standard neural network without external interactions. The AI-cores can either be stitched together to boost the throughput of a complex workload or operate independently on the same neural network to reduce latency or process concurrently different neural networks for applications featuring pipelines of neural networks.

Using the application software development kit (SDK), developers can build computer-vision applications for the edge quickly and easily. In principle, deep learning expertise is not required. The SDK turns a high-level declarative definition of a computer vision pipeline into optimized code which can run on a wide range of host architectures and provides the building blocks for constructing end-to-end solutions in different application environments, from fully embedded use cases all the way to scale-up solutions running in edge servers processing 1000s of concurrent streams.

How are disruptive architectural concepts like in-memory computing related to the Axelera offering?

Evangelos: Each AI-core is a RISC-V-controlled dataflow engine delivering up to 53.5 TOPs (tera operations per second), for a compound throughput of Axelera’s four-core AIPU of up to 214 TOPs. Central to the operation of each AI-core is an in-memorycomputing-based matrix-vector-multiplier (MVM) for accelerating matrix operations, offering unprecedentedly high energy efficiency of 15 TOPs/W. In contrast to analogue in-memory computing approaches, Axelera’s disruptive digital in-memory computing design is immune to noise and memory non-idealities that affect the precision of the analogue matrix-vector operations as well as the deterministic nature and repeatability of the MVM results. By accumulating in full precision, the partial products enable state-of-the-art FP32 iso-accuracy for a wide range of computer vision applications without retraining.

What kinds of application could Axelera’s solutions power?

Fabrizio: Axelera AI is building the technology to support growth in sectors like smart cities, retail, security, the internet of things (IoT), medicine, mobility and more. Many businesses are keen

Innovation impact

to expand into these technologies but face limitations due to inadequate data-processing capabilities, an extensive onboarding process, high energy demands, or steep adoption costs. We are committed to delivering an AI platform that reflects the needs of industry and does not require an arduous process to adopt or integrate.

Our AI platform will deliver flexibility and the ability to run multiple state-of-the-art neural networks while maintaing high accuracy. This will unlock better data usage and understanding for companies to hone their offerings and meet market demand while remaining efficient and powerful at a fraction of the cost.

Axelera has been very successful in obtaining funding. Do you have any advice for other HiPEAC start-ups who want to pitch to venture capitalists (VCs)?

Fabrizio: If a startup is still in the pre-product phase, it is extremely important to have a strong team with complementary skills and experience – research and development (R+D), hardware, and software engineering – in order to provide credibility to the plan and timeline, and to de-risk the execution. It is also important to have customers’ feedback to validate the product-market fit as soon as possible. Although software is initially easy and low in cost to adjust or even pivot during product development, hardware is extremely difficult and expensive to adjust during this phase. Having at least one executive with a proven track record in bringing to market new product lines will give credibility to the overall project.

Finally, some startups should be more ambitious: investors are interested in multiplying their investment at least 10 times in five to 10 years and thus it is important to demonstrate fast conversion of the capital invested into enterprise value in the financial projections. Long-term value creation often wins over short-term profit generation; therefore, one should not save money to reach the breakeven point but invest the money to create twice the value year after year.

Do you see Axelera as contributing to European technological sovereignty? If so, how?

Evangelos: In the Key enabling technologies for Europe's technological sovereignty report (see ‘Further information’, below), one of the six key enabling technologies identified as critical for Europe to reach technological sovereignty is AI. Axelera AI is a high-tech European company focusing on developing and productizing cutting-edge software and hardware solutions for AI-inference acceleration of computer vision and naturallanguage processing (NLP) workloads at the edge.

As such, with the spectacular growth in one and a half years since its establishment and a strong intellectual property portfolio under development, Axelera AI is becoming a European champion in AI. In this way Axelera AI is contributing towards the European Union's mandate, in the report’s words, ‘to develop, provide, protect and retain the critical technologies required for the welfare of European citizens, and the ability to act and decide independently in a globalised environment’


Axelera AI website axelera.ai

Tiana Ramahandry et al. Key enabling technologies for Europe’s technological sovereignty, EPRS, 2021 (pdf) bit.ly/EPRS_KET_sovereignty_2021

Catch Axelera in the industry exhibition at HiPEAC 2023

Innovation impact
“It is important to demonstrate fast conversion of the capital invested into enterprise value in the financial projections”

As shown in the case studies reported in HiPEACinfo 61, 64 and 65, the SMART4ALL Innovation Action has been powering technology transfer across Europe. In this article, Tamás Kerekes explains how the Italian company NplusT teamed up with Hungarian design house PCB Design to build an equipment prototype, supported by SMART4ALL.

POP-LEC: Better power management thanks to SMART4ALL support

Developers of devices for the internet of things (IoT) and cyberphysical systems (CPS) face a paradox: with increasing compute and storage demands, these devices are often power hungry, yet many – especially smaller devices at the edge – have severe power constraints. ‘Power use is strongly impacted not only by the hardware but also by device management algorithms,’ explains Tamás Kerekes, the president and chief executive of NplusT. ‘However, given the constant pressure of catching market opportunities, it’s difficult for companies to apply the latest development methodologies, which could help them reduce power use.’

This is how the POP-LEC, or ‘Power Profiler Instrument Accelerates Time-to-Market of IoT/CPS Devices’, project was born. Based on their experience in semiconductor component and solid-state drive (SSD) design, the NplusT team invented a new concept to validate device-management algorithms for power optimization. This methodology could be applied at the early stages of development, even before the implementation of device hardware prototypes.

‘POP-LEC is an all-in-one instrument which is able to emulate the target hardware and run the device-management algorithms,’ says Tamás. ‘During the execution of the algorithms, a specific parametric measurement unit (PMU) is activated. This reports the power use linked to specific algorithm segments, enabling decisions to be made to optimize power use.’ The architecture of the instrument is shown in the figure below.

The POP-LEC equipment prototype

The design challenge, according to Tamás, is related to the PMU. ‘It cannot be designed for a specific application, so it needs to support measurement ranges from the μA to the A domains. Fast switching between ranges is also required in order to provide accuracy without loss of information, while fast data sampling (MHz range) with firmware-driven filtering and waveform capture is also essential.’

PCB and NplusT teamed up to design a new PMU and build an equipment prototype, supported by the SMART4ALL technology transfer programme. This project builds on several successful collaborations between the two companies on leading-edge memory-characterization products – see for example the NanoCycler for NAND characterization reported in HiPEACinfo 59, supported by the TETRAMAX Innovation Action.

The POP-LEC PMU architecture invented by NplusT has a patent pending. PCB Design has designed and implemented the hardware prototype, which has been integrated by the firmware and software provided by NplusT, explains Tamás. By mid 2022, the prototype had been successfully validated and characterized. Data analysis and correlation was supported by SMART4ALL HPC data centre at ESDA Lab, which provided the computing resources for the execution.

NplusT is working on market analysis for several applications. In addition to IoT / CPS devices, other possible applications include a standalone instrument for generic use, and a dedicated instrument for non-volatile memory technology evaluation. This will be followed by an industrialization phase, continuing the collaboration between the two companies’ engineering teams, which aims to create new products. ‘With this work, we will extend the NplusT portfolio and contribute to our strategic objectives of growth and diversification,’ says Tamás.

Innovation impact
POP-LEC architecture

Fully autonomous vehicles will only be possible thanks to artificial intelligence (AI), yet the approach to AI is radically different to the approach to functional safety, a prerequisite for mobility solutions. The European Union (EU) project SAFEXPLAIN seeks to change that, delivering explainable – and therefore certifiable – AI fit for use in safety-critical contexts. HiPEAC caught up with SAFEXPLAIN coordinator Jaume Abella (Barcelona Supercomputing Center) to find out more.

‘You can’t transfer control to the driver if the car doesn’t have a steering wheel’

Why is it essential to include AI in safetycritical systems such as cars, trains and satellites?

Autonomous systems are the next big revolution in transportation and related domains. So far, only AI-based systems have been proven capable of providing accurate-enough perception and navigation solutions for transportation.

AI therefore has to work in these systems, but, as of today, there is no way to build safe AI-based systems in general. Until now, there have been workarounds for if the AI system fails, for example by transferring back the control to the driver. That’s not an option if your car doesn’t even have a steering wheel.

What are the current roadblocks to applying AI in safetycritical systems?

The problem is that the AI software development process is fundamentally at odds with software development processes for safety-critical systems. The latter take a top-down approach, starting from specific safety goals and requirements, and ensuring that the software is correct by design.

In the case of AI, however, the implementation is based on experimentation and intuition. It uses a bottom-up approach using representative data, which is then tuned empirically. This makes it very difficult to trace the reasons behind the decisions it makes, which means that the AI can’t be certified for use in safety-critical systems without the aforementioned workarounds.

What does SAFEXPLAIN set out to do?

SAFEXPLAIN will devise deep learning (DL) solutions – a key subset of all AI solutions – starting from functional safety requirements and providing explainability and traceability. The project will also provide recommendations to adapt functional safety standards so that they can certify aspects of software that haven’t been foreseen until now. This will allow the certification

of DL software, as long as it can demonstrate that it has followed a particular methodology.

This implies a shift in thinking on the part of certification authorities. Traditionally, certification has allowed failure rates for hardware to allow for external factors such as radiation, but software has been assumed correct (infallible) by design. This new approach would allow certification of software that has a possibility of failing, but with multiple levels of diverse redundancy implemented to ensure that safety is maintained.

SAFEXPLAIN will also deliver DL software implementations that both meet the above requirements and run efficiently and with time predictability on relevant high-performance computing (HPC) platforms. In addition, it will provide multiple safety patterns with their corresponding DL realizations to meet varying safety requirements, along with industrial tools and case-study integrations proving the feasibility of the SAFEXPLAIN approach.

To achieve this, the project is bringing the functional safety and AI communities together and helping them to understand one another. In effect, my main role is to act as a translator, facilitating the definition of common terms that everyone can understand. I’m confident that the baseline solutions already exist within the AI community; it’s just a case of identifying the necessary properties for safety-critical systems and bridging the gap between the two worlds.

Innovation Europe
SAFEXPLAIN project consortium

Data-hungry modern applications, such as climate simulations, increasingly require a combination of high-performance computing (HPC), artificial intelligence (AI) and data analytics (DA). To facilitate the design of these applications, eFlows4HPC is creating a European workflow platform to integrate all three, as well as developing methodologies to widen access to HPC. Here, eFlows4HPC coordinator Rosa M. Badia, manager of the Workflows and Distributed computing group at the Barcelona Supercomputing Center, tells HiPEAC all about it.

Go with the flow

How did eFlows4HPC come about?

The motivation is twofold: on one hand, supercomputing systems are becoming more and more complex, with fatter nodes that include more processors and heterogeneity.

On the other hand, application providers aiming at using these resources are designing increasingly complex workflow applications that comprise not only traditional HPC modelling and simulation, but also artificial intelligence (AI) and / or data analytics (DA) components.

However, traditional programming environments for HPC, AI and DA are quite different and do not support their integration well. eFlows4HPC came about with the goal of providing a software stack that provides these functionalities.

How will eFlows4HPC promote the use of heterogeneous compute resources?

eFlows4HPC aims at providing an easy-to-use programming interface with a powerful underlying runtime stack that is able to make the most of HPC systems. To leverage heterogeneous resources, WP3 focuses on identifying bottlenecks in the project’s applications and optimizing them with specialized implementations for graphics processing units (GPUs), field-programming gate arrays (FPGAs) and the European Processor (EPI).

As part of the WP3 work package, we identified computational kernels (i.e. sparse linear systems, single value decomposition) in the applications and AI kernels (i.e. convolutional neural networks) that are currently being optimized and ported.

How will the HPC workflow-as-a-service concept improve access to HPC resources?

HPC workflow-as-a-service (HPCWaaS) aims at providing a methodology to facilitate the development, deployment and execution of complex workflows in HPC systems. This

service brings the function-as-a-service (FaaS) concept to HPC environments, with the aim of masking the complexity in HPC workflow execution to end users.

The project software stack offers different programming tools to constitute the workflows. Once developed, these workflows can be deployed and executed through the HPCWaaS interface, which also manages user credentials. This means that, for the end user, executing an application workflow simply requires selecting it from the workflow repository, selecting the HPC system on which to run it, and starting its deployment and execution.

HPC is recognized as crucial in addressing grand societal challenges, and widening its access will enable the adoption of HPC by new user communities, increasing the number of challenges able to benefit from it.

What kind of use cases could the eFlows4HPC results be applied to?

While eFlows4HPC aims at providing generic tools for all kinds of application, within the project the methodologies focus on applications grouped into three different ‘pillars’ chosen for their significant industrial and social relevance: manufacturing, climate, and urgent computing for natural hazards.

Pillar I focuses on the construction of digital twins for the prototyping of complex manufactured objects, integrating stateof-the-art adaptive solvers with machine learning and datamining, and thereby contributing to the ‘industry 4.0’ vision. Specifically, a reduced-order model of the cooling system of an industrial electrical engine is under development. To ensure safe motor operation, it is necessary that the temperature does not exceed a critical threshold, since the electrical insulation of the engine can be damaged due to thermal degeneration. The availability of the digital twin will result in optimized engine operation, reducing the risk of damage from overheating.

Innovation Europe HiPEAC INFO 68 36
How eFlows4HPC is harnessing compute power, data analytics and AI

Pillar II focuses on the development of intelligent and novel endto-end Earth system models (ESM) workflows able to: (i) rapidly adapt and evolve according to the dynamic conditions of the climate simulations, and (ii) make better use of computational and storage resources by performing smart (AI-driven) pruning of ensemble members (and releasing resources accordingly) at runtime. This pillar is also devoting efforts to the development of machinelearning models to enhance understanding of climate simulations and to produce added-value products, with an application in tropical cyclone detection.

Finally, Pillar III aims at the development and optimization of workflows for the urgent simulation of earthquakes and tsunamis. The goal is to develop faster, more reliable workflows that can be managed by end users. The partners are working on specific cases from the Mediterranean region, Mexico, Chile and Iceland.

What made you choose the use cases in the eFlows4HPC project?

The EuroHPC call presented a series of priority application areas in its call, so that narrowed the field. Within these areas, we looked for use cases that showed challenges in the integration of computation aspects with artificial intelligence and data analytics.

We also looked for workflows that could leverage the dynamism and reactive aspects offered by our runtime components.

How is eFlows4HPC linking up with other EU-funded projects and initiatives?

The project has performed collaboration activities with multiple projects, especially those funded in the same call: ACROSS, HEROES, MICROCARD, and REGALE. As an example, we participated in a HiPEAC 2022 workshop organized by ACROSS where all the projects presented an overview and technical developments. Two additional EuroHPC projects were invited (RED-SEA and DEEP-SEA), as well as the B-CRATOS project. The event helped project partners to identify areas of common interest and possible collaborations.

A similar HiPEAC 2022 workshop was organized by the EVEREST project, where I was invited to give a keynote. This workshop also featured presentations from the DAPHNE, LEXIS and EVEREST projects.

The SKA Regional Centres, the regional support network for the international initiative to provide square-kilometre array telescopes, invited me to present in their WG5 Compute and Storage Workshop. Attendees showed an interest in evaluating the HPCWaaS methodology and the eFlows4HPC consortium plan to invite them to one of our community workshops to be delivered in the second phase of the project.

The project has also collaborations with the IS-ENES (Earth system modelling), ESiWACE (climate and weather simulations) and PerMedCoE (personalized medicine) communities. BSC is one of the project partners in DT-GEO, which aims to build a digital twin for geophysical extremes, where eFlows4HPC methodologies will be used for the development of workflows. Finally, eFlows4HPC has also supported the ChEESE Centre of Excellence in solid earth by developing complex workflows for earthquake impact simulation.

Innovation Europe
eFlows4HPC applications include climate modelling and urgent computing for natural disasters. © Cyclone Idai (2019) photo courtesy of ESA, CC BY-SA 3.0 IGO
Rosa presented eFlows4HPC at the 2022 HiPEAC conference

RISC-V is on a roll, and there are plenty of projects using the opensource instruction set architecture funded by the European Union, as reported in HiPEACinfo 66. One of the latest is RISER, which will develop the first all-European RISC-V cloud server infrastructure, significantly enhancing Europe's strategic autonomy in open-source technologies.

RISER: Raising RISC-V to the cloud

RISER will leverage and validate open hardware high-speed interfaces combined with a fully featured operating system environment and runtime system, enabling the integration of low-power components, including RISC-V processor chips from the European Processor Initiative (EPI) and EUPILOT projects, in a novel energy-efficient cloud architecture.

RISER brings together seven partners from industry and academia to jointly develop and validate open-source designs for standardized form-factor system platforms, suitable for supporting cloud services. Specifically, RISER will build two cloud-focused platforms:

(1) An accelerator PCIe card integrating up to four RISC-V chips derived from the EUPILOT project eupilot.eu. This accelerator can plug-in into any PCIe-enabled system on chip (SoC), whether ARM or x86.

(2) A cloud services platform, interconnecting microserver boards developed by the project, each one supporting up to four RISC-V chips coupled with high-speed storage and networking. Embracing hyperconvergence, the RISER microserver architecture allows for distributed storage and memory to be used by any processor in the system with low overhead. The open-source system board designs of RISER will be accompanied by open-source low-level firmware and systems software, and a representative Linux-based software stack to support cloud services, facilitating uptake and enhancing the commercialization path of project results.

Three use cases will be developed to evaluate and demonstrate the capabilities of RISER platforms:

a) Acceleration of compute workloads;

b) Networked object and key-value storage;

c) Containerized execution as part of a provider-managed IaaS environment.

PROJECT NAME: RISER: RISC-V for cloud services START/END DATE: 01/01/2023 - 31/12/2025

KEY THEMES: RISC-V, open hardware interfaces, PCIe accelerator board, microserver, boot firmware, Linux operating system, open-source software stack for cloud services


WEBSITE: riser-project.eu CONTACTS: Manolis Marazakis, FORTH maraz@ics.forth.gr Stelios Louloudakis,FORTH slouloudak@ics.forth.gr

RISER has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement no. 101092993. The project is funded under the call on ‘Digital and emerging technologies for competitiveness and fit for the green deal’.

FURTHER INFORMATION: bit.ly/HaDEA_call_digital-Green-Deal

Innovation Europe

SPACE: Making astrophysics codes fit for the exascale era

A newly funded Centre of Excellence (CoE), SPACE (Scalable Parallel Astrophysical Codes for Exascale) aims to prepare astrophysics and cosmology applications for the transition to exascale and beyond. The centre will enable eight of the most widely used high-performance codes for this kind of application to be used on pre-exascale systems funded by the EuroHPC Joint Undertaking (JU).

Numerical simulations based on HPC are essential tools for modelling complex, dynamic systems, interpreting observations, and making theoretical predictions in the field of astrophysics and cosmology. Contrasting the results from numerical simulations with the current of complex observational data produced by the latest generations of ground- and space-based observatories will provide new insights into astronomical phenomena, the formation and evolution of the universe, and the fundamental laws of physics.

Evolving eight of the most used European codes in this field so they can harness the power of exascale machines requires new co-design approaches to scientific computing. In these approaches, hardware and software evolve jointly, with the design and development of each taking into account the other’s goals, requirements and constraints. To achieve this, SPACE will bring together scientists, community code developers, HPC experts, hardware manufacturers and software developers to evolve stateof-the-art codes into new software capable of exploiting future computer architectures efficiently.

In addition, the related data processing and visualization applications and workflow will be advanced and their exascale capabilities enhanced. Innovative solutions based on in situ or in transit technologies will be used, together with solutions leveraging machine learning, to help overcome challenges relating to the storage, access and processing of large data volumes.

The SPACE CoE will also promote the adoption of general and community standards for data products adhering to FAIR principles, as well as promoting the interoperability of data and applications based on the International Virtual Observatory Alliance (IVOA) technological standards and best practice.

These objectives will be achieved by creating a multi-disciplinary environment that brings together the following:

• science and high-performance computing (HPC) knowledge from the astrophysics and cosmology domain

• HPC expertise from four European Union (EU) HPC centres hosting either pre-exascale or petascale facilities

• knowledge of and access to cutting-edge technologies

• workflow integration

• machine learning

• visualization

PROJECT NAME: SPACE CoE: Centre of Excellence in Scalable Parallel Astrophysical Codes for Exascale START/END DATE: 01/01/2023 – 31/12/2026


PROJECT COORDINATOR: Andrea Mignone andrea.mignone@unito.it

KEY THEMES: high-performance computing (HPC), exascale, parallel computing, space, astrophysics

PARTNERS: Italy: University of Turin, Italian National Institute for Astrophysics (INAF), CINECA, E4, EnginSoft; Belgium: KU Leuven; Czech Republic: IT4Innovations at VSB - Technical University of Ostrava; France: Bull Atos, Centre national de la recherche scientifique (CNRS); Germany: LMU München, Goethe University Frankfurt; Greece: Foundation for Research and Technology - Hellas (FORTH); Norway: University of Oslo; Spain: Barcelona Supercomputing Center (BSC) BUDGET: € 7,994,812.50 million (€ 3,997,406.00 million EU contribution)

SPACE CoE has received funding from the European Joint Undertaking on High-Performance Computing Joint Undertaking (JU) under grant agreement no. 101093441.

Innovation Europe
De Marchi; CC BY 4.0 HiPEAC INFO 68 39

CONVOLVE: Seamless design of smart edge processors

How can the European Union (EU) secure its lead in embedded ultra-low-power secure processors for edge computing? As the world braces for smart applications powered by artificial intelligence (AI) in almost every edge device, there is an urgent need for an ultra-low-power edge AI system-on-chips (SoC) or smart edge processors. This kind of processor offloads the computing closer to the source of data generation to address the limitations (e.g. latency, bandwidth) of cloud or centralized computing. Based on current projections, the edge AI hardware market is expected to grow about 40% per year, reaching beyond US$70 billion by 2026.

Unlike existing solutions, this smart edge hardware needs to support high throughput, reliable, and secure AI processing at ultra-low power, with a very short time to market. With its strong legacy in edge solutions and open processing platforms, the EU is well positioned to lead in the smart processor market for edge devices. However, this can only be realized when the EU can make AI edge processing at least 100 times more energy efficient, while offering sufficient flexibility and scalability to deal with AI as a fast-moving target. Since the design spaces of these complex smart edge processors are huge, advanced tooling is needed to make their design tractable.

The EU-funded project CONVOLVE addresses these roadblocks and thereby paves the way to EU leadership in edge AI by making smart edge processors more efficient, while ensuring security by design that protects the data and privacy of European society. It takes a holistic approach with innovations at all design-stack

levels, including ultra-low-power memristive circuits, exploiting computing in memory and approximate computing, more advanced deep learning models, online learning, exploiting dynamism and reconfiguration at the deep learning, architecture and circuit-levels, while rethinking the whole compiler stack. The CONVOLVE project is led by Eindhoven University of Technology and the consortium includes 18 partners from eight countries covering the entire European continent and featuring a good mixture of academic partners, large enterprises, and small / medium enterprises (SMEs).


START/END DATE: 01/11/2022 – 30/10/2025

KEY THEMES: microprocessors, AI, ultra-low power, design space exploration

PARTNERS: Netherlands: Eindhoven University of Technology, Technische Universiteit Delft, ViNotion, CognitiveIC; Switzerland: ETH Zürich, Friedrich Miescher Institute for Biomedical Research Foundation: Spain: Thales Alenia Space España, Universidad de Murcia; Belgium: Katholieke Universiteit Leuven, Confederation of Laboratories for Artificial Intelligence research in Europe, Axelera AI; Germany: Robert Bosch, NXP Semiconductors Germany, Ruhr-Universitaet Bochum; Denmark: GN Store Dord; UK: University of Manchester, University of Edinburgh; Morocco: Universite Internationale de Rabat. BUDGET: € 11 million WEBSITE: convolve.eu

CONVOLVE has received funding under the European Union’s Horizon Europe research and innovation programme under grant agreement number 101070374.

Innovation Europe HiPEAC INFO 68 40

In this opinion piece, Tim Fernandez-Hart (Brunel University London / Sundance Multiprocessor Technology) makes the case for posits as an alternative to floating-point operations.

In praise of posits

Arithmetic is at the heart of what every computer does, whether at the edge, in a laptop or in a supercomputer cluster; it is fundamental to their operation and defines their speed, power consumption and precision. Currently, floating-point is the most common number representation, particularly for machine learning and scientific computing. But increasingly, the effectiveness of floating points is being questioned and new number formats are being investigated.

The IEEE 754 floating point standard was first ratified in 1985 and represents numbers in scientific format (1.54 x 1011). By allowing the decimal point to float, these numbers have a huge dynamic range (64-bit floats can represent ~5x10-324 to ~2x10308). They have a sign-bit, exponent-bits and fraction bits (Figure 1). If we add extra exponent bits, we widen the dynamic range, while if we add fraction bits we improve precision. The IEEE-754 standard defines several standards of which 16-bit, 32-bit and 64-bits are the most common.

Figure 1. A representation of a 32-bit floating-point number, S = sign bit, E = exponent bit, F = fraction bit

So what’s the problem? Well, over a quadrillion (bit patterns in the 754 standard at 64-bit) are reserved for Not-a-Number (NaN). This is wasteful, as these bit patterns could be used to represent values. If a number is too large to be represented, it gets rounded up to infinity, and there are many other quirks which hardware engineers must deal with, requiring extra silicon, time and energy to process.

In 2017, John Gustafson proposed posits, a rethink of the floating-point format and the third iteration of his UNUM format. It takes the idea of floating-point and improves on it. First, it reduces the NaN representations to a single bit-pattern. Second, it does away with oddities like negative zero and third, it has the remarkable property that when you increase the number of bits,

it simultaneously increases the dynamic range and the precision. Figure 2 outlines their structure.

Figure 2. A representation of a 32-bit posit, S = sign bit, R = regime bit, E = exponent bit, F = fraction bit

In his original paper, Gustafson shows that for many mathematical operations, posits are more accurate than floats with the same number of bits. They pull this trick by introducing the regime bits. which act like a super-exponent. These are not fixed in length and grow or shrink to scale the exponent, which, in turn, scales the fraction similarly to the exponent in floatingpoint. This gives posits tapered accuracy and means they do not overflow or underflow. As the regime bits grow, they crowd out the fraction bits and so, just like with floats, as the value gets large, precision reduces. But, unlike floats, if the values are scaled close to zero, the regime bits are shorter, increasing the number of fraction bits and therefore precision (Figure 3).

Figure 3. The value of pi represented as a. 16-bit floating-point number; b. 16-bit posit. The value encoded by the posit is 3.1416015625, whilst the one encoded by the float is 3.140625. The posit is accurate to three decimal places, while the float only to two decimal places

Reducing the number of bits while maintaining accuracy allows more complex computation to take place at the edge. It also speeds up those computations, which in turn reduces energy consumption. Posits are being investigated by researchers in climate physics who want to reduce the run time of their simulations as they are run on supercomputers which cost millions in electricity to run, with a carbon footprint to match. It is very early days, but Google, IBM, and Facebook have all shown an interest, meaning posits may be finding their way to a supercomputer or edge device near you soon.

Technology opinion

Building on the success of previous years, the 2023 edition of the DATE (Design, Automation and Test in Europe) conference will once again include the Young People Programme, and HiPEAC is once again participating. Young People Programme Coordinator Anton Klotz (Cadence) tells us more.

Make a DATE with the Young People Programme

The largest electronic design automation (EDA) conference in Europe, DATE will be held on 17-19 April 2023 in Antwerp, Belgium. The Young People Programme (YPP) at DATE conference is an initiative to support PhD and master’s students in their career development. Several initiatives are planned to support that goal.

Sponsorship of attendance

To enable current master’s and PhD students to attend such a highprofile conference, sponsoring companies and the IEEE Council on Electronic Design Automation (CEDA) will fund registration for the full conference for Young People Programme participants. In addition to the numerous YPP events, the participants will have the opportunity to attend keynotes, focus sessions, tutorials and networking events.

Careers Fair - Industry

Participating companies will post open positions on the HiPEAC Jobs portal and students will be able to upload their CVs. At the conference, the companies will introduce themselves and participate in a speed-dating recruitment event, where recruiters and students can have a short first interview, which will hopefully lead to a follow-up. There will be a keynote by Professor Dragomir Milojevic who will talk about the 3D integration of integrated circuits and job opportunities resulting from that technology. The event will also feature a panel of industry experts, who will discuss different career paths in microelectronics and adjacent industries.

Student Teams Fair

Another initiative of the YPP is the Student Teams Fair, which brings together university student teams with EDA and microelectronic companies. Student teams will have the opportunity to present their activities, success stories and challenges, while companies can provide support (such as free tool licences, personalized webinars and financial support) for future activities. There will also be a workshop for system design and analysis.

PhD Forum

The DATE PhD Forum is a great opportunity for PhD students to present their work to a broad audience in the system design

and design automation community, and to establish contacts for entering the job market. For their part, representatives from industry and academia get an insight into state-of-the-art research in the system design and design automation space.

Careers Fair - Academia

It’s not just industry representatives who are looking for top talent graduates. Academic institutions also need high-potential candidates to support their research in government-funded projects. Programme leaders will promote their research projects and open positions to YPP attendees, so those who want to stay on the academic research path get the chance to learn about these opportunities. There will also be a panel discussing academic career paths in different countries.

University Fair

The DATE University Fair provides a platform to disseminate government-funded projects. This is also of interest to industry, as the outcomes of fundamental research can be potentially applied to commercial products. As there are often follow-up projects, which again need a new generation of researchers, this closes the loop with the Academic Careers Fair.



DATE Conference Young People Programme date-conference.com/young-people-programme

For further details about individual events, please contact: Careers Fair - Industry and Student Teams Fair: young-people-program@date-conference.com PhD Forum: phd-forum date-conference.com Career Fairs – Academic and University Forum: university-fair date-conference.com HiPEAC Jobs portal hipeac.net/jobs

Check out the videos from the 2022 DATE YPP on HiPEAC TV: Careers Fair: bit.ly/DATE22_YPP_Careers_Fair_video Careers Panel: bit.ly/DATE22_YPP_Careers_Panel_video

HiPEAC futures

Three-minute thesis

Featured research: Profiling tools for data locality

NAME: Muhammad Aditya Sasongko




THESIS TITLE: Precise Event Sampling: In-depth Analysis and Sampling-based Profiling Tools for Data Locality

Precise event sampling is a low-overhead profiling feature in current commodity central processing units (CPUs) that allows sampling of hardware events and identifies the instructions that trigger the sampled events. While a number of profiling tools have been developed using this feature, none of these tools detects inter-thread data movement nor measures data locality in multithreaded applications, which have become increasingly important due to the ubiquity of multicore architectures. What is more, there have been only few works that analyse this tool in terms of accuracy, overhead, stability and functionality, and most works focus exclusively on Intel architecture.

In this thesis, we present three major contributions. First, we performed the most comprehensive and in-depth qualitative and quantitative analyses to date on processor event-based sampling (PEBS) and instruction-based sampling (IBS), which are the precise event sampling facilities of Intel and AMD, respectively. Next, we showed the potential for imaginative use of precise event sampling in developing low-overhead yet accurate profiling tools for multicore. Lastly, we designed two diagnostic tools with a particular focus on data movement as it constitutes the main source of inefficiencies.

The first tool, ComDetective, is a communication matrix generation tool that leverages performance monitoring units (PMUs) and debug registers to detect inter-thread data movement on a sampling basis and avoids the drawbacks of prior work by being more accurate and introducing low time and memory overheads (1.30x and 1.27x respectively). Using ComDetective, we generated insightful communication matrices from several microbenchmarks, the PARSEC benchmark suite, and some CORAL applications and compared the produced matrices against the matrices of their MPI counterparts. We also identified

communication bottlenecks in a number of codes and achieved up to 13% speedup.

Our second tool, ReuseTracker, measures reuse distance – a widely used metric that measures data locality – in multithreaded applications by also considering cache-coherence effects with much lower overheads than existing tools, while retaining high accuracy. We demonstrated in two use cases how ReuseTracker can be used to guide code refactoring by detecting spatial reuses in shared caches that are also false sharing and how it can also be used to predict whether certain applications can benefit from adjacent cache line prefetch optimization.

To analyse key differences between Intel’s PEBS and AMD’s IBS, we first developed a series of carefully designed microbenchmarks which we used for the analysis. We found that Intel PEBS samples hardware events more accurately and with higher stability in terms of the number of samples that it captures, while AMD IBS records a richer set of information at each sample. We also discovered that both PEBS and IBS are afflicted with bias when sampling the same event across multiple different instructions in a code.

Aditya’s PhD supervisor, Assoc. Prof. Didem Unat, commented: ‘We expect that the analysis, algorithms, and the tools devised in Aditya’s thesis will benefit hardware architects in designing new precise event-sampling features and performance engineers in performance tuning of their software, while also paving the way for a new generation of low-overhead profiling tools.’

HiPEAC futures
In the latest in our series on HiPEAC PhD students, Aditya Sasongko describes his thesis on profiling tools for data locality.
Thanks to all our sponsors for making #HiPEAC23 a success! Join the community Join the community @hipeac @hipeac hipeac.net/linkedin hipeac.net/linkedin hipeac.net hipeac.net This project has received funding from the European Union’s Horizon2020 research and innovation programme under grant agreement no. 871174 Sponsors correct at time of going to print. For the full list, see hipeac.net/2023/toulouse
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.