Issuu on Google+

Challenges facing HPC and the associated R&D priorities: a roadmap for HPC research in Europe

Authors Mark Sawyer, Business Development and Project Manager, EPCC Mark Parsons, Executive Director, EPCC, Associate Dean for e-Research, University of Edinburgh

1

PlanetHPC is supported under the Objective “Computing Systems” of Challenge 3 “Components and Systems” of the ICT Programme of the European Commission


Foreword In November 2011, the PlanetHPC project issued the report “A Strategy for Research and Innovation through High Performance Computing�[1], which made the case for investment in High Performance Computing (HPC) at the European level, and suggested a strategy for HPC research, development and innovation. The case for investment in HPC is founded on three main premises. First is the proven track-record of HPC in delivering benefits on both an economic and societal level. Second is the increased investment in HPC being made by Europe’s global competitors and by many emerging economies; in the coming decades, to out-compute will be to out-compete. Third is the recognition that there are fundamental technological challenges ahead that threaten the progress of ICT in general and HPC in particular. Since these technologies are key enablers that underpin much of our economic and social wellbeing, it is vital that we identify, analyse and overcome these barriers. This document mainly addresses the barriers that need to be overcome to fully exploit HPC. These challenges are both technology and business related. The roadmap attempts to analyse what is being done now to overcome them, and to suggest future actions and objectives. The roadmap should be viewed as complementary to the report from November 2011, and much of the content follows discussions with the same team of contributors. The document will be periodically updated, based on consultations with the European community of users and experts in HPC.

2

[1] http://cordis.europa.eu/fp7/ict/computing/documents/planethpc-strategy.pdf

1


Contents Foreword 1 Contents 3 Executive Summary 5 Opportunities and Impact 9 HPC as a Key Enabling Technology 9 Societal and Economic Benefits 9 Challenges

13

Challenge 1: Mastering Parallelism and Heterogeneity 15 Processor Technologies 15 Programming Models 18 Modernising Applications 19 Challenge 2: Reaching New Markets 22 Real-time HPC 22 Robust HPC 23 Secure HPC 25 Challenge 3: Data Deluge 26 Data-intensive Computing 26 Data Transfer: Remote and Mobile HPC 27 Challenge 4: Developing Skills 29 Education, Training and Outreach 29 Tools and Productivity 30  owards 2020: Key Recommendations T 32 Industry and Community Viewpoint 32 Analysis 34 Key Recommendations 35 Implementation 37 Conclusion 39 References 41

2

3


Executive Summary High Performance Computing (in common with the computing domain in general) is at a cross roads; technological challenges threaten to disrupt three decades of continuous exponential growth in the computational power of HPC systems. Europe must act to counter this threat. HPC is a proven technology for delivering economic and societal benefits, and many developed and emerging economies outside the European Union are investing heavily in it. Many countries have recognised that to out-compute is to out-compete. The European Union is well placed to retain a world-leading position in HPC. It has expertise in embedded systems, from which it is anticipated that many technological breakthroughs that will affect HPC will emerge. It has worldclass HPC facilities and support services for academia and industry provided by initiatives such as the PRACE and DEISA projects, together with national HPC initiatives. It has a proven track record of technology transfer that has led to successful communities of HPC software vendors and established user bases. The PlanetHPC project has carried out a programme of engagement with industry and academic stakeholders to assess the areas of research that should be prioritised for Horizon 2020 and beyond; these are presented in this document, together with suggestions for future research objectives and goals. We see challenges in four key areas: • Mastering massive parallelism and heterogeneous systems • Reaching new markets • Dealing with massive data • Developing skills Mastering massive parallelism and heterogeneous systems This challenge is driven by technology, and arises because of the need for the microprocessor industry to produce more energy efficient components. Multicore systems are now ubiquitous, and the number of processing cores per microprocessor is firmly on an upward trend. Components such as GPUs and FPGAs are becoming commonplace in HPC systems. We can be sure that from now on HPC systems will feature ever increasing amounts of heterogeneity and parallelism. With the increase in parallelism comes the further challenge of moving data which takes time and requires energy. Successful HPC applications must be designed not only to exploit parallelism on a massive scale, but also to minimise data movement. Modernising applications, particularly those used by industry, to exploit fully the characteristics of future HPC systems must be a priority for any R&D and Innovation strategy. The disruption that this challenge will cause both in the academic and business use of HPC will be significant and will require innovation at all levels: from components, processors and systems to programming models and high-level design tools and application software. 4

5


Reaching new markets Industry stands to benefit enormously from wider application of HPC, but although HPC has been exploited in many industrial applications areas, there are still many others where it has great potential but as yet little penetration. HPC has historically succeeded in scientific, engineering and manufacturing applications where high performance by itself is sufficient to provide a solution. There are many application areas however where high performance alone is not sufficient. Non-functional aspects of computing such as reliability, robustness and security are needed in application areas such as real-time financial trading, intelligent transport and the management of utilities. The HPC community must adapt to meet these types of requirements if new markets and applications are to benefit from it. There are clear incentives for both the HPC supply and application demand sides to meet these requirements. For the suppliers, untapped markets will provide a revenue stream for hardware and software providers. On the demand side there are the benefits that HPC has been proven to deliver in other sectors. The HPC community must also get better at selling HPC as an enabling technology that can achieve results in industrial applications, not as something that should be used because it is intrinsically clever or advanced. HPC must also be seen to deliver value beyond academic research. PlanetHPC encountered a view expressed by many industrialists that HPC was not a technology for them. Many failed to see its potential and so lacked interest. HPC must be promoted intelligently so as to raise awareness of opportunity while avoiding the pitfalls of hype. Dealing with massive data

There also tends to be a reliance on people who have combined expertise in both HPC and specific applications. For example, an engineer working on a computational fluid problem often needs detailed knowledge of how the software they are using works, and how its performance can be optimised on particular machines or for certain data sets. This is not an optimum use of skills, and places too much reliance on key individuals. Key recommendations Based on discussions with leading industry and academic HPC experts, the PlanetHPC project makes a number of key recommendations related to these four challenges: Key recommendation 1: Carry out research into new simulation and numerical methods that scale to millions of processors and exploit datalocality. Encourage co-design with processor development. Key recommendation 2: Develop programming models that can be efficiently implemented on massively parallel and heterogeneous systems, including emerging low-power computing platforms. Key recommendation 3: Investigate the migration of existing codes to accelerator and multicore technologies, and developing re-usable libraries and software modules optimised for these architectures. Key recommendation 4: Demonstrate to innovative software companies the need to modernise applications and to engage in research programmes. Ensure that collaboration models protect the intellectual property of software vendors. Key recommendation 5: Promote the image of HPC as a key enabling technology, particularly aiming at SMEs; dispel the image that HPC is only for technically-knowledgeable large companies.

There is a relentless trend of massive data generation across all human activities: business, government, personal and academic. It is estimated that the amount of data produced each year is greater than the sum of all that previously created. The growth of data has outpaced the development of tools to deal with it. This has left us struggling, not only with the sheer volume of data being generated, but also with the ever-increasing complexity of data interactions.

Key recommendation 6: Understand non-functional requirements of new application areas.

This is a significant issue for HPC in terms of applications and systems, and it impacts in a number of ways. Computing systems have typically been designed for compute-intensive workloads, but different approaches to system design are needed for data-intensive workloads. New tools and methods are needed to manage, curate and extract value from data. This is important across the ICT domain, but is of particular significance for HPC applications which typically generate large data sets.

Key recommendation 9: Develop tools for managing data on a massive scale and extracting value from it; the HPC community should engage with those involved in big-data research.

Developing skills

Key recommendation 7: Make it easier for small companies to evaluate HPC technology; establish a network of HPC pilot projects. Key recommendation 8: Investigate system architectures suitable for dataintensive computing.

Key recommendation 10: Make training and education an essential part of infrastructure and research initiatives. Key recommendation 11: Adopt an inter-disciplinary approach in tackling research challenges; involve application specialists, mathematicians and software engineers in R&D&I activities.

PlanetHPC has identified a shortage in HPC skills, with only a very small fraction of the ICT workforce having HPC and parallel computing expertise or experience. 6

7


Opportunities and Impact HPC as a Key Enabling Technology In the 21st Century, High Performance Computing (HPC) is without doubt a key enabling technology for technologically advanced nations. Many countries world-wide are investing in HPC and some, most notably the USA, China and Japan are investing vast sums of public money on related infrastructure. In a statement from the European Commission in February 2012, Neelie Kroes, European Commission Vice-President responsible for the Digital Agenda, said:

“HPC is a crucial enabler for European industry and for more jobs in Europe. It’s investments like HPC that deliver innovations improving daily life. We’ve got to invest smartly in this field because we cannot afford to leave it to our competitors.”

HPC is a crucial enabling technology for Europe: we must invest to keep pace with our competitors

HPC has had a major impact on industry and commerce and is an established and indispensable tool in many industrial and societal sectors. New applications are constantly emerging which are able to exploit HPC. The benefits made possible by HPC go beyond a positive return on investment. Improvements in healthcare, the development of efficient transport systems, the quest for renewable and clean energy sources, and support to decision making through faster-than-real-time simulations on real data in emergency response are examples of how HPC could transform our lives. One of the most prevalent trends in ICT over the last decade has been the convergence of technologies. Previously it was possible to distinguish products, services and applications as being particular to one ICT niche, such as HPC, mobile, embedded or database. This is no longer the case. We have moved to a world in which a hand-held device can have the same performance as a powerful supercomputer of 20 years ago, in which connectivity of devices is the norm rather than the exception, and in which applications are expected to interact seamlessly rather than to stand alone.

Converging technologies offer challenges and opportunities

This opens great opportunities for HPC. The increasing interoperation of applications will create a demand for HPC in areas where it has previously had little impact, and use of HPC will add value to existing products and services. Research and development of HPC should take place in the context of this convergence, and must take account of the non-functional requirements, such as reliability and security, that will be required to expand the range of applications that can use HPC.

Societal and Economic Benefits There are numerous example of the economic and societal benefits that have been gained through HPC. In its communication to the European Parliament “High-Performance Computing: Europe’s place in a Global Race” [1], the Commission notes many such benefits:

8

97% of the industrial companies that employ HPC consider it indispensable for their ability to innovate, compete, and survive. HPC has enabled automakers to reduce the time for developing new vehicle platforms from an average of 60 to 24 months, while greatly improving crashworthiness,

9


environmental friendliness, and passenger comfort. Some of these firms have cited savings of EUR 40 billion from HPC usage. HPC lies behind the weather forecasts we rely on to plan our daily activities and to deal with severe weather conditions that can devastate lives and property. Hospitals in Germany use HPC to predict which expectant mothers will require surgery for Caesarean births, with the goal of avoiding traditional, riskier last-minute decisions during childbirth. Thus, HPC is vital for the EU’s industrial capabilities as well as for its citizens. At a macroeconomic level, it has been shown that returns on investment in HPC are extremely high and that the companies and countries that invest the most in HPC lead in science and economic success. Furthermore, advances in the area of HPC such as new computing technologies, software, energy efficiency, storage applications, etc. feed into the broader ICT industry and the consumer mass market, becoming available in households within five years of their introduction in highend HPC. Conversely, advanced computing technologies developed for the consumer domain (e.g. energy efficient chips, graphic cards) are increasingly used in HPC. There are opportunities for much greater benefits across a wider spectrum of industrial sectors. For these benefits to be realised, HPC must continue its recent trend of providing ever more computational performance while at the same time introducing features such as reliability, security and realtime capabilities. There must a raising of awareness in industry that HPC is a technology that is not restricted to large companies with big budgets, or to academic researchers working in Grand Challenge science. HPC is a key enabling technology that can play a vital part in ensuring prosperity for SMEs and large companies alike, across almost all industry sectors. Some of the many examples of the use of HPC are described in the next sections, together with some visionary scenarios that could be enabled by HPC in the future.

Healthcare The development of new medicines and treatments relies on an understanding of how complex biological systems interact. Heart modeling is one example. The structure of a patient’s heart can be obtained through MRI scans; this data is then placed on a computer and used to construct a model heart. Researchers can study the electrical activity that controls heartbeats in this model heart, allowing problems such as arrhythmia to be identified and possible surgical interventions to be tested on the model before being used on the patient. HPC Vision: Healthcare

10

HPC offers the prospect of understanding the interaction of systems of organs which at present is only partially feasible. So-called ‘insilico’ testing of drugs and therapies will be essential if major disorders like cancer, neurological and cardiovascular diseases are to be tackled. The next generation of HPC offers the promise of personalised medicine, where a treatment will be tailored to a specific individual.

Intelligent transport HPC has been used in the design and operation of transport systems. UK company Quadstone Paramics produces HPC software that simulates traffic at the level of individual vehicles. The software has been used for diverse applications, from modeling vehicle emissions in the Antwerp district in Belgium to evaluating evacuation plans for hurricanes in the USA. HPC Vision: Intelligent transport Road networks across Europe are at, or near, full capacity, and intelligent traffic management is needed to avoid gridlock without the cost of building new roads. HPC systems must be developed that can deal with data from millions of sensors and vehicles to control and manage road networks on large scales. The impact of this would be enormous: better fuel efficiency, lower emissions, safer roads and more predictable journey times.

Automotive The automotive industry is a long-time user of HPC systems to design cheaper, more efficient and safer vehicles. HPC has been used to model the mechanical and aerodynamic properties of the design, together with the combustion processes in the engine. The greater the fidelity of the simulations that HPC allows, the better the designs. Testing the safety of vehicles and how well they protect their occupants is an essential part of the design process. Carrying out crash tests using real vehicles is expensive and time-consuming. Simulation using HPC has allowed manufacturers to reduce the number of tests needed by up to 90%. This reduces cost and means that safer models can be brought to market more rapidly. HPC Vision: Automotive Although HPC is a well-established tool in the automotive industry, there are many more benefits that could be gained by further developments of the technology. The need for alternative fuels for cars has created new opportunities for HPC. How to make efficient motors and energy storage devices will increasingly become the focus for research.

In-vehicle systems for safety and navigation which complement intelligent traffic management systems will rely on robust and ubiquitous access to HPC.

Energy HPC has been used in the exploration and production of oil and gas. Processing seismic data to find new sources of fossil fuels, and planning the production of oil once the fields are operational, would be impossible without HPC. Finding sources of energy which are sustainable and not damaging to the environment is an urgent challenge. Only HPC can provide engineering simulations of the structural and aerodynamic properties of system designed to harness wind and wave power. 11


For example, an essential part of designing an of-shore wind turbine is an assessment of the combined wind loading plus wave loading on the structure. Aberdeen-based company Prospect FS evaluated the benefits of using parallel computing to carry out numerical simulation of offshore wind turbines. The company foresees a factor of 20 return on the investment, with additional income of €1.5M over three years. HPC Vision: Energy Research programmes are underway in Europe with the long term objective of making nuclear fusion a feasible option as a future energy source. This makes heavy demands on computing power that will require the next generation of computers to achieve exascale performance (1018 operations per second) and beyond.

The electricity generation and distribution infrastructure of the future will require intelligent management. This will only be feasible with highly robust HPC that is able to deal with many sources of generation (down to the scale of domestic wind and solar generators, together with smart metering). Oil and gas remain important sources of energy. The continued search for fossil fuels requires ever more HPC capacity as the fidelity of the processing becomes more demanding and the economics of extraction even more challenging.

Manufacturing Manufacturing is still the driving force of the European economy, contributing over €6,500 billion in GDP and providing more than 30 million jobs. It covers more than 25 different industrial sectors, largely dominated by SMEs, and generates annually €1,500 billion of economic added value. HPC is used in manufacturing mainly for design of products, and has been successfully used in specific sectors such as automotive and aerospace. HPC Vision: Manufacturing The vision of Factories of the Future is to have intelligent ICT embedded in every aspect of manufacturing from design and manufacturing processes to maintenance of products after sale.

Manufacturers will be able to bring products to market much faster by incorporating HPC in the design and production cycle. It will be possible to customise products and make them safer, giving manufacturers a competitive advantage. This is of particular relevance to Europe, with its focus on high quality, high value manufacturing which can benefit most from the application of HPC.

Massive amounts of operational data from machinery and production lines will need to be processed to keep factories running efficiently and with low maintenance costs.

Business will be able to respond to customer needs by analysing data on their behaviour in order to provide a better service and retain market share. 12

Challenges Information and Communication Technologies (ICT), which have propelled Western economies for decades will face a number of challenges over the coming years. These challenges have been analysed in different ICT contexts by projects such as the High Performance and Embedded Architecture and Compilation Network of Excellence (HiPEAC) and the European Exascale Software Initiative (EESI). HiPEAC’s focus is towards system-on-chip based applications and embedded systems, while EESI is focused on the high end of HPC. PlanetHPC has attempted to cover the large territory between these two ends of the HPC spectrum by consulting with industrial users of HPC, ISVs and infrastructure providers.

Technology disruption will affect HPC as much as other aspects of ICT

What emerges from these analyses is that the challenges that are foreseen at the various levels of HPC have common roots, although the effects may be different across the ICT domain. The major challenges facing HPC are: • Mastering massive parallelism and heterogeneous systems • Reaching new markets • Dealing with massive data • Developing skills Mastering massive parallelism and heterogeneous systems This challenge is driven by technology, and arises because of the need for the microprocessor industry to produce more energy efficient components. Multicore systems are now ubiquitous, and the number of processing cores per microprocessor is firmly on an upward trend. Components such as GPUs and FPGAs are becoming commonplace in HPC systems. We can be sure that from now on HPC systems will feature ever increasing amounts of heterogeneity and parallelism. With the increase in parallelism comes the further challenge of moving data which takes time and requires energy. Successful HPC applications must be designed not only to exploit parallelism on a massive scale, but also to minimise data movement. Modernising applications, particularly those used by industry, to exploit fully the characteristics of future HPC systems must be a priority for any R&D and Innovation strategy. The disruption that this challenge will cause both in the academic and business use of HPC will be significant and will require innovation at all levels: from components, processors and systems to programming models and high-level design tools and application software. Reaching new markets Industry stands to benefit enormously from wider application of HPC, but although HPC has been exploited in many industrial applications areas, there are still many others where it has great potential but as yet little penetration. HPC has historically succeeded in scientific, engineering and manufacturing applications where high performance by itself is sufficient to provide a solution. 13


There are many application areas however where high performance alone is not sufficient. Non-functional aspects of computing such as reliability, robustness and security are needed in application areas such as real-time financial trading, intelligent transport and the management of utilities. The HPC community must adapt to meet these types of requirements if new markets and applications are to benefit from it. There are clear incentives for both the HPC supply and application demand sides to meet these requirements. For the suppliers, untapped markets will provide a revenue stream for hardware and software providers. On the demand side there are the benefits that HPC has been proven to deliver in other sectors. The HPC community must also get better at selling HPC as an enabling technology that can achieve results in industrial applications, not as something that should be used because it is intrinsically clever or advanced. HPC must also be seen to deliver value beyond academic research. PlanetHPC encountered a view expressed by many industrialists that HPC was not a technology for them. Many failed to see its potential and so lacked interest. HPC must be promoted intelligently so as to raise awareness of opportunity while avoiding the pitfalls of hype. Dealing with massive data There is a relentless trend of massive data generation across all human activities: business, government, personal and academic. It is estimated that the amount of data produced each year is greater than the sum of all that previously created. The growth of data has outpaced the development of tools to deal with it. This has left us struggling, not only with the sheer volume of data being generated, but also with the ever-increasing complexity of data interactions. This is a significant issue for HPC in terms of applications and systems, and it impacts in a number of ways. Computing systems have typically been designed for compute-intensive workloads, but different approaches to system design are needed for data-intensive workloads. New tools and methods are needed to manage, curate and extract value from data. This is important across the ICT domain, but is of particular significance for HPC applications which typically generate large data sets. Developing skills PlanetHPC has identified a shortage in HPC skills, with only a very small fraction of the ICT workforce having HPC and parallel computing expertise or experience. There also tends to be a reliance on people who have combined expertise in both HPC and specific applications. For example, an engineer working on a computational fluid problem often needs detailed knowledge of how the software they are using works, and how its performance can be optimised on particular machines or for certain data sets. This is not an optimal use of skills, and places too much reliance on key individuals.

14

Challenge 1: Mastering Parallelism and Heterogeneity Energy efficiency has been identified across the computing systems spectrum as being a massive challenge for the future. It will no longer be feasible to pursue performance at any price because the energy consumption will be too high. Instead, the efficiency of systems in terms of processing operations per unit of energy (FLOPs/Joule) will be the measure that must be maximised. The result of this will be a change in emphasis for processor design away from high performance on single processors to energy-efficient computing in parallel using heterogeneous devices. This change is already having an impact on HPC, with the emergence of processing based on new architectures.

Future HPC systems will be massively parallel and heterogeneous

Historically it has been possible to achieve better performance year-on-year by using the double win of faster processors in greater numbers. However the days of ever increasing processor speeds are over. The impact on industry of this challenge must not be underestimated. HPC applications already form a key part of many industrial processes, representing major investments over many years. However most commercial HPC software does not scale well enough to be able to exploit further parallelism, and has not been designed to exploit heterogeneous systems. As the limits of scalability are reached, industry is threatened by a performance dead-end in which application software cannot exploit next generation computers. This scenario must be avoided. Mastering massive parallelism and heterogeneity will involve exploiting new processor technologies, some of which are emerging from the embedded systems domain. It will require the development of new programming models and languages, and the modernisation of applications.

Processor Technologies Background Power consumption will be a major challenge for the future of computing systems and HPC systems in particular. At the high end we see that the power consumption of a large HPC system capable of petaflop performance is in the order of 1-2 MW. Therefore a system capable of exaflop performance would have an unfeasibly large power requirement (in terms of both cost and logistics) if based on today’s technology. At the system-on-chip level the motivation is for lower power to avoid overheating and to extend battery life. The power budget for an exascale system is suggested at 20MW by the US Department of Energy [2], which implies that the number of floating point operations per unit of energy must increase by two orders of magnitude compared to today. Power in HPC systems is consumed by functional units (CPUs and GPUs), main memory (DRAM and cache), interconnect (on-chip, between chips, between boards, and between racks), secondary storage, cooling and inefficiencies in power distribution. Efficiency improvements are required in all these areas to make exascale computing systems feasible; simply focusing on low-power CPUs, for example, will not be enough.

15


Current activities

The energy efficient data centre The EUROCLOUD [9] project aims to scale a platform based on low-energy ARM processors together with novel memory technology to support hundreds of cores in a single server. The intention is to show the way to a viable data centre with a million cores.

Accelerators Graphical Processing Units (GPUs) have been seen for some time as a potential low-power means of achieving high computing power. GPUs are being used as accelerators for numerically-intensive applications from the desktop up to some of the world’s most powerful computers.

Future directions

The EC-funded LPGPU project [3] aims to produce a low-power GPU architecture suitable for running future graphics software; the emphasis in this project is on games and smartphone-based applications. The CARP project [4] aims at improving the programmability of accelerated systems, particularly those using GPUs. Multicore The EC-funded project DEEP (Dynamical Exascale Entry Platform) [5] intends to use Intel’s Many Integrated Core (MIC) [6] technology for applications such as brain simulation, space-weather simulation, climate simulation, computational fluid engineering, high temperature superconductivity and seismic imaging. Intel announced the MIC architecture in 2011, with the Knights Corner processor being the first to be based on the MIC architecture. The MIC concept is to host many low-power processing units (up to 50 initially) in a single processor. It targets highly parallel applications such as oil exploration, scientific research, financial analyses, and climate simulation. Heterogeneous cores and low power systems An alternative approach is exemplified by the big.LITTLE processing concept from ARM [7], which combines powerful processing elements with less powerful but extremely low-energy elements. The processors are aimed at the system-on-chip market where the heterogeneous workload is well suited to such a processor. ARM is the partner in the Mont-Blanc FP7 project [8], which aims to produce exascale performance with a power reduction factor of 15-30. The FLEXTILES project [12] aims to define and develop an energy-efficient and easily programmable heterogeneous many-core platform with selfadaptive capabilities. The architecture is a 3D stacked chip with a many-core layer and a reconfigurable layer. This project is aimed more at embedded applications, but could extend to HPC.

New technologies such as silicon photonics, which uses optical signals to carry out on-chip communications, may increase the speed and reduce the energy requirements of data transfer within integrated circuits. Likewise 3D stacking, in which multiple layers of components are laid one on top of another, offers the prospect of shorter interconnects and more on-processor (and hence lower energy) data transfers. Objectives and Goals Short and medium term: The HPC community should continue to engage with the communities doing basic research into energy efficiency at the component level (e.g. HiPEAC). HPC and data centre design for higher energy efficiency should be the subject of R&D. This lies mainly within the domain of engineering.

Long term: Research into novel computing such as quantum computing and bio-computing. Analysis of potential architectures based on these technologies and the types of applications for which they could be used.

Who should be involved: This is an area which will require interaction between many stakeholders, including modelling and simulation experts, mathematicians, computational scientists, application designers and users, independent software vendors and HPC system providers.

Reconfigurable computing is being investigated as a possible option for lowenergy, high-performance processing. To date, reconfigurable computing using FPGAs has only been successful in a limited number of applications; development costs and issues with data transfer between the CPU and the FPGA have hampered progress in many cases.

Novel computing technologies such as quantum computing and bio-computing, which offer the prospect of high computational performance and very low power, are still largely research topics and it will take some considerable time before usable technologies emerge. The long term benefits for HPC could be enormous however, and hence investment in these high-risk, high-reward research areas should be strengthened. The EC is supporting bio-computing through projects such as MOLOC, and quantum computing through the QUIE2T project, the QAP project, the QUERG project and the QUEVADIS project.

The CRISP project [10] and the FASTER project [11] are both investigating reconfigurable computing, the former focussing on stream processing and the latter by transforming existing designs in high-level languages.

The industrial mass market for processing and memory components will be consumer products, servers and embedded systems, rather than HPC. This means that HPC applications will need to adapt in order to be able to use

Reconfigurable computing

16

The processor architectures which have emerged in recent years must be mastered

Moore’s Law, which predicts that the number of components per unit area of an integrated circuit area doubles every two years, is set to continue for a few more years. However this will not be accompanied by increases in switching frequency, which peaked around 2006, because it is not possible to reduce the voltage at which the components operate. In addition, there is a trend towards simpler processing elements with less power-consuming logic. The trend is therefore towards greater numbers of simpler processors, not faster and more complex processors.

HPC will need to adapt to use components aimed at the mass market

17


the components which will evolve primarily to serve these mass markets. However the underlying issue of energy efficiency is relevant at all levels of computing; at the data-centre and HPC-centre level the issue is about the cost and engineering logistics of delivering power, while at the embedded and consumer end it is about extending battery life and remaining within the operating limits. This is in fact good news for HPC, because it means that there is a driver from the mass market to find a solution to power consumption that will subsequently benefit the niche of HPC. The challenge for HPC will be to successfully exploit the solution that emerges.

parallel. Therefore mastering parallelism at the scale of billions of threads has to be the goal. Careful consideration must also be given to data transfer within systems. Historically developers have striven to reduce data transfer because latency and bandwidth limitations affect performance. It is now the case that the energy consumed in transferring data must be considered in addition to bandwidth and latency issues. The two main features that must be present in successful programming models are therefore support for massive parallelism and allowing data locality to be exploited. High levels of abstraction will be needed to support the complexity of future applications. Hybridisation should be investigated whereby two or more models can be combined to exploit heterogeneous systems. Programming models that are data-driven with dynamic scheduling of instructions and adaptive distribution of work should be investigated.

Programming Models Background Programming models are a key component in achieving high productivity and ease of application development. The HPC community has predominantly used the Message Passing Interface (MPI) and OpenMP frameworks as the models for parallel programs. The adoption of these standards was a major breakthrough in productivity; it gave application and run-time system developers stable interfaces, and thus protected investments by guaranteeing portability. These lessons must be remembered. Programming models must be developed which form an interface between the applications on one side, and compilers and run-time systems on the other. These programing models must allow applications to exploit parallelism on a massive scale while taking account of data locality.

Current activities There is evolution in this area through programming languages such as the proprietary language CUDA and OpenCL for GPU programming, and a class of languages termed Partitioned Global Address Space (PGAS) languages which allow programs to be expressed as a single address space but with the notion of data locality. Several FP7 projects are developing programming models: the VELOX project [13] has produced technology aimed at simplifying parallel programming using transactional memory. The PEPPHER [14], 2PARMA [15], ENCORE [16] and APPLE-CORE [17] projects are all involved in the area of programming models for multicore. These projects target embedded systems, although the results could be extended to high-end HPC. The EC-funded TeXT project (Towards ExaFLOP Applications) [18] is developing a hybrid approach based on MPI and SMP superscalar. The SMP superscalar programming environment is aimed at exploiting parallelism within a single multi-core processing node. Hybridising this with MPI allows for data movement between nodes. The objectives of the approach are to establish a migration path for existing applications, and the ability to exploit heterogeneous computing (through SMPSs) in a global model (through MPI).

Future directions

18

The inevitable consequence of the trends in processor technology and design described earlier in this section is that future systems will be massively

Programming models are needed which boost productivity and can be efficiently implemented on new architectures

Whereas programming models of the past focused on enabling high programmer productivity and application performance, those of the future must also enable high energy efficiency and the identification of high levels of parallelism. Objectives and goals Short and medium term: Investigation of newly developed models that support concurrency and data locality, such as PGAS methods, and their suitability to support migration for existing applications. Programming models and tools to support development for new applications on new architectures.

Long term: Tools with a high degree of automatic parallelisation and data placement capability.

Who should be involved: Research and development in this area should involve applications developers who will need to use programming models as a productivity tool, and those who design compilers and run-time systems which realise the programming models on future architectures.

Modernising Applications Background Many of the current methods of modelling and simulation are already reaching scalability limits with today’s hardware. Modern day supercomputers contain hundreds of thousands of computing elements; the number of applications that scale to this level of parallelism is very small and restricted to research applications. Most commercial software packages reach scaling limits at much lower processor core counts. If scalability cannot be improved for applications, then it will not be possible to use computers with millions of processing elements to speed up single high-fidelity application cases (often termed high capability). Instead use will be restricted to parameter studies (often termed high capacity), where many simulations of lower fidelity are run at the same time. The feedback that PlanetHPC has received from industrial users is that both high capability and high capacity modes of computing will be needed in the future. Thus the scalability challenge must be overcome.

Applications must be modernised to exploit more parallelism

19


Europe has a thriving community of Independent Software Vendors (ISVs), many of which originated as spin-off companies from academic institutions. The software applications which these companies have produced are key components in the HPC value chain, and the right conditions must be created for the companies to be able to adapt and modernise their codes in response to other technology changes.

Objectives and goals Short term: Investigate alternative algorithms that can exploit new processing technologies without changing the fundamental modelling techniques. Investigate performance-modelling techniques and develop tools to analyse performance of applications.

Medium term: Provide migration paths not just for individual applications but also for the whole working practice. Develop tools to support the migration process.

Current activities There are a number of EC-funded initiatives which are developing applications for future computers with capability towards exascale. Much of the emphasis is towards industrial use of HPC, and investigating how applications can be migrated to future systems. The EC-funded CRESTA project [19] is investigating migration pathways to exascale computing for a sample of HPC applications covering engineering, fusion research, weather prediction and life sciences. The project is taking both incremental and disruptive approaches to application models focussed around a process of co-design.

Research into exascale software for science must also have an impact on industrial applications

Long term: Investigate alternative modelling approaches to the current methods that are limited by scalability. Aim for scalability to millions of threads. Who should be involved: Computer scientists, application specialists, mathematicians, software engineers.

The EC-funded project NextMuse [20] is attempting to initiate a paradigm shift in computational fluid dynamics (CFD) and computational multimechanics (CMM) simulation software. The project is focussing on meshfree method, Smoothed Particle Hydrodynamics (SPH). This is fundamentally different from conventional finite element or volume techniques, which are regarded by many to be at the limits of their scalability on today’s systems. In the field of research into nuclear fusion the FP7 EXASCALEPLASMATURB project [21] is developing novel numerical tools in an attempt to carry out simulation of plasma turbulence on exascale systems of the future. The FP7 PESM project [22] aims to exploit exascale computing in the field of climate modelling and simulation. The project will use a new approach to CFD which will be better suited to future exascale computers, and novel approaches such as probabilistic computing. The FP7 EU-Russia project APOS-EU [23] is investigating massive and heterogeneous parallelism for the application areas of seismic modelling, oiland gas-reservoir simulation, computational fluid dynamics, fusion energy and molecular dynamics. The project is focussing on GPU-like processors, and is collaborating with the peer project APOS-RU which is funded by the Ministry of Education and Science of the Russian Federation.

.

Future directions Producing efficient applications that can run on highly parallel systems (up to millions to billions of cores) is absolutely essential if future HPC is to be exploited to the full. This is an area where HPC research should make strenuous efforts and invest heavily. There is no point in striving for ever higher system performance without applications that can use that processing capability. The key barrier to overcome is scalability, but it is not the only one. Applications must also be optimised for the underlying processor and memory architecture. 20

21


Challenge 2: Reaching New Markets Industry has successfully exploited HPC in many areas of science and engineering. However, to realise many of the opportunities described earlier in this report, HPC must become easier to use. Future HPC systems, infrastructures and software must fulfil new non-functional requirements such as real-time processing, robust and highly available computing, and security in order to succeed in new industrial application areas.

Current activities There are two HiPEAC projects in the area of real-time embedded systems: HPC’s success in science and engineering must be replicated in other sectors

As greater computing power has become available over the last three decades, so the uses to which it has been put have expanded, with the result that systems (hardware and software) have become increasingly complex. If HPC is to realise its full economic and societal potential it must be made more usable.

The MULTIPARTES project [26] is developing technology to support mixed criticality for trusted embedded systems. Potential applications are signalling and communications in railways. At the applications level, the FP7 HiPerDNO project [24] is developing a new generation of electricity distribution network management systems that exploit novel near to real-time HPC solutions.

Future directions

Reaching new markets will involve developing tools to manage these complex systems, addressing non-functional requirements, and providing education and training.

The characteristics of relevant applications need to be understood, with requirements being fed back to the system design level. Many applications are likely to be highly distributed and involve large numbers (possibly millions) of external sensors and actuators. Technologies to handle security of applications, systems and data will be required.

Current HPC use is predominantly based on the concept of an HPC centre which operates in batch mode. This is not the best paradigm for many applications, and creates a barrier to uptake of HPC. Enterprise computing is undergoing a revolution as the cloud computing paradigm has come to the fore. This has raised expectations among industrial end-users and HPC needs to find a model which delivers the benefits of cloud computing without compromising on performance.

For safety-critical applications (for example in the aerospace and automotive sectors), systems must be developed with provable worst case execution time (WCET). The development work underway in the embedded systems field should be extended to cover larger systems and applications. High-frequency financial trading is an HPC application which requires millisecond timing capability. Real-time decision support applications form a potentially large market, and technology to support this could be developed

With cloud computing, the responsibility for reliability, maintenance and provision are all pushed to the service provider, with the end-user only needing to worry about the application. HPC needs a similar model where the resources are available on-demand. However it is unrealistic to assume that this problem will simply be solved by adopting cloud computing; there is a large difference between capacity on demand, which the cloud can deliver, and capability on demand, which HPC applications need.

Objectives and goals Short term: Better understanding of requirements of potential applications. Outreach and dissemination activities.

Medium term: Technologies to handle dynamically changing load in realtime applications. Mechanisms to migrate applications while maintaining continuous operation.

Real-time and Embedded HPC Background There are many emerging industrial applications of HPC that require realtime capabilities. In addition to safety-critical applications where ‘hard’ realtime constraints exist, there are applications which need responses fast enough to control large infrastructures. One such application is monitoring and controlling electricity supply. In this application, tens of millions of meters provide data on the usage of an electricity distribution network. Supply network operators plan to use this data to make decisions in real time about the control of the network, leading to energy and cost savings. On a national or international scale these savings could be substantial, with corresponding savings in energy and reduction in carbon dioxide footprint. Applications such as this require HPC with different characteristics from centralised systems operating in batch mode. Research and development in this area has the potential to open up many new application areas which 22 will have societal and economic benefits.

The MERASA project [25] has produced a design for a time-predictable multicore processor and worst-case execution time analysis tools. The target applications are safety-critical, embedded applications.

The real world is real-time: HPC must adapt to this

Long term: Highly scalable real-time applications, with national, international or global reach.

Who should be involved: A key requirement is to understand the needs of applications, and to what extent ‘hard’, real-time capability is required. Therefore application users must be involved.

Robust HPC Motivation As HPC systems become ever more complex, the likelihood of failure or faulty operation of components grows. At present the responsibility for dealing with failures lies with the users. In the event of a failure, applications must either be restarted from scratch, or from a previously saved state or checkpoint. This is

Lack of robustness prevents HPC being used in many applications 23


becoming an increasingly inefficient process as system complexity grows and saving the state of an application becomes unmanageable. It simply does not work as a strategy for error recovery for many real-world applications.

Secure HPC

As described earlier in this report, many emerging and visionary applications will require robust and highly available HPC systems. A traffic management system for example that is controlling a metropolitan area must be based on an infrastructure that is close to 100% reliable. The same is true for managing any large infrastructure such as electricity generation and distribution.

Many applications of HPC require security for various reasons, such as commercial confidentiality, data-protection or in the interests of national security.

Robustness is something that can be applied to all levels of abstraction; each layer should strive to deliver a reliable service to layers above, in the face of unreliable service from lower layers.

Current activity The FP7 FLEXTILES project [12] aims to define and develop an energy-efficient yet programmable heterogeneous many-core platform with self-adaptive capabilities. The architecture is a 3D stacked chip with a many-core layer and a reconfigurable layer. The RELEASE project [27] aims to scale the concurrency-oriented programming paradigm to build reliable general-purpose software, such as server-based systems, on massively parallel machines.

Future directions As the complexity of HPC systems increases, malfunctions will arise more frequently. For HPC to be more usable, the responsibility for dealing with failures and faults should move away from the end-users and towards the application and system developers. If this is not achieved then HPC will be viewed as difficult to use and unreliable, and users will become disillusioned. Methods of detecting and correcting errors in runtime systems should be investigated, together with numerical methods that can be used on hardware that is not 100% reliable. Objectives and goals Short term: Lightweight mechanisms to detect errors and suspend and restart processes.

Medium term: Applications which can tolerate system malfunctions and uncertainties. Robustness at each level of abstraction, from application to electronic circuits.

Long term: Seamless continuous functioning of HPC systems similar to other utilities with near 100% reliability from user point of view.

Motivation

Current activity System and application security tends not to be a particular focus for HPC research. Security is generally implemented by the HPC centres and takes the form of physical security of the systems, due diligence when selecting users and the use of well-established encryption technologies and operating system features.

Lack of security prevents HPC being used in many applications

Future directions Security should be properly considered for applications and data archives. The future complexity of systems means that software and hardware security measures will need to be taken if HPC is to be trusted. As access to HPC becomes more widespread, it can be expected that systems will be subject to more attacks of various kinds, and protection must be provided. The effect of successful attacks could be extremely serious in some future applications. A malicious attack against a system that controls a transport network could cause widespread disruption and endanger lives. In addition to attacks on the live operation of HPC infrastructure, the security of data will become increasingly important. This is strongly related to the so-called data-deluge, and discussed in the next section. Objectives and goals Short term: Analysis of risks and threat sources. Best practice in place for HPC infrastructures. Medium term: Threat and intrusion detection and removal technology. Long term: Adaptive intrusion detection and threat removal. Highly secure and trusted HPC infrastructures.

Who should be involved: This area of research should involve the computer security community, together with application users to quantify risks and the impact of security breaches. Some of the research will be driven by regulations, and hence the appropriate expertise must be brought in.

Who should be involved: Application developers, system tools providers and processor designers have a role to play in meeting the robustness challenge.

24

25


Challenge 3: Data Deluge There is a relentless trend of massive data generation that is happening across all human activities: business, government, personal and academic. It is estimated that the amount of data produced each year is greater than the sum of all that previously created. The growth of data has outpaced the development of tools to deal with it. This has left us struggling, not only with the sheer volume of data being generated, but also with the everincreasing complexity of data interactions. In the following sections we consider data-intensive computing, and the related issue of data transfer for mobile computing which is of great importance for the industrial use of HPC.

Data-intensive Computing Motivation The so-called data-deluge has been identified by many, including the authors of the HiPEAC roadmap [28], as being a major challenge for science and society. HPC applications typically generate huge amounts of data, and the quantities will only increase as computational power increases and new applications are found. Effective tools must be found to manage, curate and extract value from this data. Data-intensive computing will make demands on technology at the infrastructure and system levels. At the infrastructure scale the issues include location of data with respect to processing to avoid costly transfer; security; regulation and provenance. At the system level the issues relate to memory and I/O bandwidth balanced against processing power.

Current activity In response to concerns over data management there are a number of EC initiatives underway to develop technology and infrastructure to support data intensive computing. At the infrastructure level, the EUDAT project [29] is building tools to support a Common Data Infrastructure for EU researchers. It will deploy a range of mature technologies as shared services within a Collaborative Data Infrastructure for e-science alongside GÉANT, TERENA, EGI/EMI, DEISA, and PRACE and make those services available to researchers from a range of fields of science. At the system level the IOLANES project [30] aims to develop I/O paths that will scale as the number of cores per processor increases. The Exascale IO working group (EIOW) is developing the next generation of HPC data storage software stack.

Future directions HPC should be generating use cases and requirements for data management research and development. There should be research into the architectural needs for HPC systems that will manage and process data on a massive scale. 26

The data deluge will be a major challenge for HPC applications

There are major opportunities for industry in this area. Digital manufacturing will produce enormous amounts of data covering all aspects of product lifecycle: design, manufacturing processes, condition monitoring of factories, product support and disposal. Tools to manage and extract value from this data will be in high demand, and the European manufacturing industry must respond. Objectives and goals Short term: Align HPC infrastructures with initiatives such as EUDAT. Develop tools for migration to standard-based infrastructure.

Medium term: Develop system architectures tuned for data-intensive workloads. Long term: Develop advanced tools for knowledge extraction/data mining.

Who should be involved: The HPC community must engage with data management experts to overcome these challenges.

Data Transfer: Remote and Mobile HPC Motivation Network infrastructure is just as important as computation for HPC applications that will be accessed remotely or from mobile devices. Users may be limited in the bandwidth available to them either due to limitations in network capability (for example in rural areas) or the costs of connections. Telecommunications providers offer various payment models for businesses, including payment only for the bandwidth used, and offer high guarantees of availability. These options must be made attractive for small businesses, and together with the proliferation of mobile devices represent a great opportunity for HPC applications with ubiquitous access.

HPC applications must be able to adapt to variable network capacity

Current activity Data transfer across telecommunications networks is costly in terms of both the supporting infrastructure (cables, routers, wireless access points etc.) and the energy needed to operate the network. The motivation for low power technologies is as relevant to telecommunications as it is to processing and storage. The capital and operational costs of networks are reflected in the price that consumers have to pay to use services, and will determine whether service providers will find it cost-effective to provide the same level of service in every geographical location. The HPC business-user community will need to adapt to this market-driven scenario of available services. Application developers will need to consider scenarios such as mobile clients moving from areas where connectivity is good to areas where it is poor or which are disconnected completely. Until high bandwidth reliable, communication becomes truly ubiquitous, applications designs must take network limitations into consideration.

Future directions The mass markets of enterprise business computing and consumer mobile use give telecommunications industries a clear motivation for investing in R&D for 27


low-energy technologies. The EUDAT project is investigating technologies such as replication and data staging to overcome the challenges of moving large amounts of data between HPC resources. This is being done mainly in the context of scientific data-sets for communities which use Research Infrastructure networking such as GEANT. Transferring the know-how to industry users will be beneficial, and EUDAT is working towards this goal. The EIOW are focussing on solutions for the exascale high-performance data management and storage technologies. Objectives and goals Short term: Develop applications that are tolerant of restrictions to network connectivity.

Medium and long term: Develop technologies that will cut the costs of ubiquitous high-speed reliable communications. Who should be involved: Mobile telecommunications experts and HPC must work together.

Challenge 4: Developing Skills This section is concerned with providing HPC training and education, not just at the technical level but also to promote the capabilities of HPC to industry. The need for tools to improve the productivity of HPC application developers is also dealt with in this section – this is an area in which Europe should maintain its position of strength.

There is a skill shortage in HPC which must be overcome

Education, Training and Outreach Motivation Education and training are needed so that HPC can be understood at the appropriate level, and used by business and science in the same way that other enabling technologies are. The dependence on people with dual expertise (in HPC and their own application field) must be broken, although the value of such experts will of course remain high.

Current activity Education and training are core elements of European HPC initiatives such as DEISA and PRACE. These projects offer on-line training material, organise periodic training events such as summer and winter schools, and hold conferences and symposia. The training and education programmes within PRACE have been successful, and should be continued. Similar programmes should be integral parts of infrastructure development and provision projects. A small but growing number of European universities offer graduate and postgraduate courses in HPC. It is also taught as part of some science and engineering courses. The MSc course in HPC run by EPCC at the University of Edinburgh has been running for over a decade and has produced several hundred graduates. Typically these graduates go on to find employment in industry or continue their studies using their HPC skills. The course run by EPCC is modular, which enables new developments to be easily incorporated into the curriculum. Commercial training courses are becoming available from system vendors, application vendor and tool vendors. Vendors clearly have an interest in educating and training professionals to use their tools and systems. The learning process represents a considerable investment for the individuals and their employers, and the move away from one set of applications and tools to those of another vendor represents a major cost.

Future directions Graduate and post-graduate training in HPC should be further encouraged. The capabilities of HPC should be taught to students of other disciplines; knowledge of this tends to be restricted to students of physical sciences and engineering. The intention should not be to teach all students to program supercomputers, but rather to raise awareness of the capabilities of HPC as a tool for research. Industrial outreach programmes to promote HPC should also be established. PlanetHPC ran an intensive campaign to find out HPC’s potential in a wide spectrum of European industries by contacting trade and research associations. 28

29


The results showed that the capabilities of HPC are not well enough understood by many industries, with many thinking that HPC is simply not important for them.

successful model of engagement in standards and investment can clearly succeed and should be a part of the future strategy.

Future directions

Objectives and goals

Developments such as MPI and MpCCI have had a very positive effect on HPC. From the application development side, MPI gave programmers a standard interface to target which protected the investment in developing parallel codes, and made skills more transferable across the industry and made training easier. MpCCI has seen its impact at the application-user level, where it allows users to couple together existing software to create powerful multi-physics applications.

Short term: Continuation of HPC training as part of infrastructure and R&D&I projects. Use real success stories to promote HPC to industry. Aim at industries where take-up is low. Medium and long term: Training focussed on heterogeneous massively parallel systems: application development, performance prediction and optimisation. Ambitious industrial technology transfer programmes looking towards the exascale era. Promote standards to avoid lock-in

However, neither MPI nor MpCCI are immune from the disruption that HPC is facing. Many question the future viability of MPI (certainly in its current form) as an efficient way to use the massive parallelism anticipated in the future, while MpCCI is based on the use of mesh-based simulation tools. Successors to MPI are likely to emerge, but because of its maturity and simplicity, it is likely to remain important for attracting new users to HPC.

Who should be involved: Universities, infrastructure initiatives, industry associations.

Tools and Productivity

New methods of simulation may require major development work for MpCCI. However the research and development community that has built up around it means that such development work can be well planned and executed.

Motivation Tools are one of the keys to generating powerful, robust, efficient and maintainable applications. Greater complexity of HPC systems and a more diverse application field will require more sophisticated tools. Tools for data management on a massive scale will also be required; this is dealt with in section 8.

Objectives and goals

Current activity

Long term: Tools with the capability to support application development by those with little or no technical knowledge of HPC.

Short term: Consolidation of EU efforts in tools and training.

Medium term: Tools focussed on heterogeneous massively-parallel systems: application development and debugging, performance prediction and optimisation. Promote standards to avoid lock-in.

Europe has strengths in the tools area with many successful companies and world-class products. The tools available include debuggers, performance analysis tools and application development toolkits.

Who should be involved: Research and development in this area will require close collaboration between application developers, tools providers and systems developers.

Standard interfaces have emerged that allow the developers of applications, runtime systems and compilers to achieve high productivity and to protect their investments. HPC application development benefited greatly in the 1990s following the adoption of MPI. Prior to MPI there were multiple proprietary message passing standards, which made porting applications extremely labour intensive and acted as a disincentive to developing parallel applications in the first place. Higher-level tools that allow the interoperation of applications are also needed. Europe invested in the MpCCI (Mesh-based parallel Code Coupling Interface) [31] which provides a means for different simulation packages to be coupled by providing an application independent interface. The software, developed by Fraunhofer-Institut f端r Algorithmen und Wissenschaftliches Rechnen SCAI, has been employed by over 150 users worldwide since its launch as a commercial product in 2002. MpCCI now supports most major industrial simulation packages, and has a thriving research community which contributes to its development. Europe was influential in the development of the MPI standard, and MpCCI arose as a result of inward investment by the European Commission. This 30

Europe should maintain its strong position as a provider of HPC tools 31


Towards 2020: Key Recommendations Industry and Community Viewpoint PlanetHPC members from across the HPC spectrum were asked for their views on what key research priorities the EC should focus on. Here are some of the key responses:

“The major issues are scalability and programmability. The EU should encourage developments which lead to standards. The US is too much driven by hardware vendors, rather than by end-users. Consequently there is no group to moderate the roadmaps these vendors are setting out. Systems vendors have become divorced from language developers. Somehow the two need to be brought together. The goal has to be scalability and virtualisation and this will not happen quickly if things are left to the hardware people. We have discussed several times the issue of scalability. Virtualisation is important because it will enable developers to protect their software investments rather than be forced into developments with a short-term life. Only when there are clear standards for virtualisation will it be possible to engage with ISVs to address licensing issues which have become a major obstacle to the more flexible use of third-party codes.” Dr Alfred Geiger, Head of Solutions & Innovations Scientific & Technical ICT, T-Systems. “There is clearly a role for government-funded research into basic algorithms and technologies that will enable software scaling to extend to the next level, and for programs that demonstrate this scalability on important ‘challenge applications’. Perhaps the most important industrial need is to encourage adoption of HPC, particularly by SMEs who can take significant advantage of advanced simulation enabled by HPC, but may not be aware of the benefits and may not have the expertise to manage a HPC infrastructure.” Barbara Hutchings, Director of the Alliance Program at ANSYS. “Our priority would be to investigate the potential of GPGPUs for running applications. A further priority would be the development of new solvers and techniques which could address new architectures including GPGPUs and multi-core devices.” Eloi Gaudry, software developer for Belgian SME FFT.

32

“There is a lot of HPC expertise in the EU and HPC facilities. This should be better used by industry. The EU has a role to play here. A network of HPC resources, academic experts and industrial users should be set up. A key issue would be how to set up this network and how to ensure that applications could move easily from the systems at one centre to those at another. There also needs to be research to adapt neural network and artificial intelligence systems to complex real-world applications. Such applications would require significant levels of computer power. Furthermore efforts should be made to port significant libraries such as Matlab and SciLab to highperformance systems.” Jesús García, Director of Scientific Relations and Advanced Computing at CTR, the Repsol Technology Centre.

“There needs to be considerable effort expended in HPC to address challenges in the usability of systems. The development of new use cases and application codes for Exaflop systems will create new business opportunities for HPC. Programming methods will need to be reviewed. New pre and post-processing methodologies and applications will need to be developed. European initiatives need to be part of a global effort from the HPC-user community. Hardware developments should be considered. Notwithstanding this, software is the key issue to be tackled.” Jean-Yves Berthou, Information Technologies Program Director, EDF Research and Development Division. “New languages will certainly be needed as we move towards exascale systems. There are several important issues around this which include the ease of programming offered by a flat address space and the performance issues resulting from a global memory model. PGAS languages will probably be very important in resolving this. We see programming issues such as the development of libraries as very important because they will help support portability. Methodologies to increase the scalability of applications on heterogeneous multicore systems and the development of related programming languages will be very important too.” François Bodin, CTO, CAPS-Enterprise. “There need to be improved programming paradigms to address efficient large-scale parallelism. These need to be developed in parallel with architectures for heterogeneous systems. Real benefits will come from heterogeneous systems programmed in a homogeneous way. The principles of CSP remain true today and point the way to programming large-scale systems. It will also be important to develop numerical algorithms which exploit parallelism by combined data and functional decomposition. Finally there should be a reconsideration of the potential of hybrid analog/digital computers. This approach could be used to develop very powerful systems. Accelerators have clear benefits, but they are overhyped and difficult to program. FPGAs may offer a better return on effort. Nevertheless, real performance improvements will only come through accelerators. We need to consider the whole computing paradigm. Heterogeneous computing is the only way forward where systems will be evaluated in terms of number of floating point operations per Joule. Multicore is probably not going to deliver in this regard.” Dr Chris Jones, Technologist Consultant, BAE Systems. “Invest massively in basic research, ranging from new numerical algorithms which combine numerical efficiency with good scalability and fault tolerance, to programming models beyond MPI 3.0 for Exascale systems, and to innovative postprocessing techniques for large-scale datasets.” PlanetHPC member. “Focus on the software aspects of HPC and the pervasion of industry with HPC” PlanetHPC member. “Application of HPC to the modelling and understanding of economic and financial risks, impacts, behaviours would be an essential context. Developments in low power, highly concurrent architectures need watching, if not encouraging.” PlanetHPC member. “I can see that users want guarantees that their data will not be stolen, and that espionage can be an issue in some government agencies and big corporations. I do however not think

33


the attack will happen on the HPC machine, but rather on the clients accessing the machine and which are more difficult to protect. Attacks are almost always coming from the outside world, and most (high-end) HPC systems have limited accessibility from the internet I think. Implementing software protection can considerably slow down a machine, so I doubt this will ever happen inside a supercomputer.� PlanetHPC member.

Analysis

Key Recommendations Challenge 1: Mastering massive parallelism and heterogeneous systems Key recommendation 1: Research into new simulation and numerical methods that scale to millions of processors and exploit data-locality. Encourage codesign with processor development. Key recommendation 2: Develop programming models that can be efficiently implemented on massively parallel and heterogeneous systems, including emerging low-power computing platforms.

The comments from industry and the community reinforce the view that future HPC systems (in the 5–10 year time frame) will be massively parallel and heterogeneous. This is an inescapable consequence of the energy challenge facing ICT as a whole. Research programmes must address how best to exploit these architectures. Software is viewed as the most important aspect of HPC by many experts.

Key recommendation 3: Investigate migration of existing codes to accelerator and multicore technologies, and developing re-usable libraries and software modules optimised for these architectures.

The experts agree that the productivity of developers must be enhanced by new programming models, paradigms and languages that allow algorithms to be implemented in efficient and maintainable ways. Tools and libraries aimed at massively parallel and heterogenous systems will be needed to improve productivity and protect investments.

Key recommendation 4: Demonstrate to innovative software companies the need to modernise applications and to engage in research programmes. Ensure that collaboration models protect the IP of ISVs.

Involving industry is seen by the community as a key element in R&D&I activity. This should include attracting users in areas where HPC is not currently pervasive. The needs of SMEs should be considered, particularly by trying to replicate the success of cloud computing in reducing capital cost and providing pay-per-use models. Non-functional aspects of HPC should be developed to meet the requirements of industry. These requirements need to be well understood.

Challenge 2: Reaching new markets

Key recommendation 5: Promote the image of HPC as a key enabling technology, particularly aiming at SMEs; dispel the image that HPC is only for technicallyknowledgeable large companies. Key recommendation 6: Understand non-functional requirements of new application areas. Key recommendation 7: Streamline the process for small companies to evaluate HPC technology; establish a network of HPC pilot projects.

Challenge 3: Dealing with massive data Key recommendation 8: Investigate system architectures suitable for dataintensive computing. Key recommendation 9: Develop tools for managing data on a massive scale and extracting value from it; the HPC community should engage with those involved in big-data research.

Challenge 4: Developing skills Key recommendation 10: Make training and education an essential part of infrastructure and research initiatives. Key recommendation 11: Adopt an inter-disciplinary approach in tackling research challenges; involve application specialists, mathematicians and software engineers in R&D&I activities.

34

35


Implementation The European Commission’s next programme of research and innovation, Horizon 2020, begins in January 2014 with the first calls for proposals. This programme runs until 2020 and has a proposed budget of €80 billion. It brings together all of the European Commission’s research and innovation funding under one programme and will massively simplify the rules governing such funding. Horizon 2020 has a strong focus on societal challenges and strengthening European scientific excellence and industrial leadership in innovation. As HPC has a key role to play in helping to model and test solutions to many of the societal challenges we face, it is important that the research objectives described in this document are properly funded and managed within Horizon 2020’s programme structure. In order to implement the research programmes we outline in the recommendations listed above, the following will be required: Broad scale of projects: experience in FP7 and earlier programmes has shown the importance of a variety of project sizes and shapes. One size does not fit all in research and innovation and a variety of project types should be considered from very large collaborative projects through to much smaller, targeted research activities. In tandem with these collaborative research projects, more projects focussed on take-up of their results by European businesses should be created. All projects, other than the basic science projects funded through the ERC, should bring together Europe’s leading experts from across the industrial and scientific research domains. Ongoing major initiatives such as PRACE should continue to be supported to demonstrate the ambition of Europe to maintain its position as a global player in HPC. This should be complemented by involvement and leadership in international standards. It has been noted in this report that there is a convergence of technologies across ICT, and that many of the challenges that arise in this domain have common roots. Research programmes should encourage strong interactions between embedded systems, HPC, mobile computing, data-intensive computing and other disciplines that will contribute to Next Generation Computing. The PlanetHPC community, supported by the finding of ESSI, recognises that software should be a major priority over the coming decade; there should be major investments in this area.

Industry involvement: a clear route to commercialisation and exploitation is necessary to stimulate economic growth in Europe. Committed industry involvement, with a clear route to market, in the research and innovation projects we have discussed in this document should be a pre-requisite. The research programme must be implemented so that industry partners, in particular SMEs, have low barriers for involvement and can see clear benefits and opportunities from the outset. The R&D&I strategy should be to extend the application range of HPC beyond its traditional areas of science and engineering and into new 36

37


areas where opportunities exist but take-up has been slow. This will involve innovations to make HPC easy to use, in terms of developing and deploying applications, and in terms of new access models to HPC resources. Industry involvement must involve all parts of the HPC value chain: end-users, software and hardware suppliers, service providers and systems integrators.

Cross-disciplinary approach: HPC is a technology and as such cannot generate economic growth (the overall market for HPC hardware and software in Europe and world-wide is quite small). Economic growth is generated through the use of HPC to develop new products and services and new solutions to scientific, industrial and societal challenges. The European Commission should develop new project models which allow communities of expertise to form around areas of technological excellence in Europe whilst connecting these communities to the disciplines that require the use of the new and innovative technologies.

Coordination and outreach activities: PlanetHPC and HiPEAC have demonstrated the importance of coordination activities to bring together multiple projects, to share experience and to discuss innovative approaches to new challenges. Such activities should be strengthened and expanded in Horizon 2020 with a greater emphasis on bringing together different communities together from across the programme. In support of greater industry involvement and cross-disciplinary action, the R&D&I strategy must incorporate outreach activities that promote the image of HPC as an accessible technology with a proven track record that is applicable in all domains. The experience of PlanetHPC shows that there is a persistent lack of understanding of the capabilities and potential of HPC. Those in the HPC community must act to overcome this barrier, but do so in ways that avoid hype and creating unrealistic expectations. Success stories which demonstrate how industry has benefited from HPC will play a vital role in this. All development projects should be strongly encouraged to disseminate their results, and there should be coordinated publicity and outreach actions to promote HPC to business.

38

Conclusion This document outlines the most significant challenges faced by HPC in the coming years. The primary challenge is that future HPC systems will be characterised by massive parallelism and heterogeneous processing elements. Future applications must therefore scale to high degrees (millions of threads) of parallelism, and be able to exploit the specialised capabilities of heterogeneous cores. This may require fundamental rewriting of applications, and research into modelling and simulations techniques and the associated numerical methods. Secondly there are great opportunities for HPC in new applications, which are being opened up by the convergence of ICT and greater computing performance. Non-functional properties of HPC, such as reliability and security, will be extremely important if these opportunities are to be exploited. Thirdly, the so-called data deluge is significant for HPC. The growth of data has outpaced the capability of tools to deal with it. HPC applications generally either take large data-sets as input or produce them as output, or do both. The root causes of the challenges are not unique to HPC; their effect will be felt in all areas of ICT. This gives reasons for the ICT community as a whole to tackle the challenges in a concerted fashion so that all aspects of ICT will benefit. Technology solutions that benefit a narrow aspect of ICT will be less useful than those that have broad application. It is important for those working in the HPC field to engage with others research communities such as embedded systems, mobile computing and data intensive computing. In the case of high performance and embedded computing, there is already an initiative underway in the form of the HiPEAC network of excellence. More interaction between those involved in high-end HPC and HiPEAC should be encouraged. The HPC community should also be involved in the EUDAT common data services initiative. Horizon 2020, with its focus on R&D and innovation, offers an important opportunity for the European HPC and embedded computing industries and research organisations to build on their world-leading expertise, ensuring that Europe continues to be a world-class player in the HPC domain.

39


References [1] High-Performance Computing: Europe’s Place in a Global Race http://www.etp4hpc.eu/HPC/ docs/CommunicationHigh-PerformanceComputingEuropesplaceinaGlobalRace-COM.pdf [2] US Dept of Energy Report: The Opportunities and Challenges of Exascale Computing: http:// science.energy.gov/~/media/ascr/ascac/pdf/reports/Exascale_subcommittee_report.pdf [3] The Low-power GPU (LPGPU) project: http://www.lpgpu.org [4] The CARP projec t http://www.carproject.eu [5] DEEP Project: http://cordis.europa.eu/projects/101347_en.html [6] Intel Website: http://www.intel.com/content/www/us/en/architecture-and-technology/manyintegrated-core/intel-many-integrated-core-architecture.html [7] ARM: http://www.arm.com/products/processors/technologies/bigLITTLEprocessing.php [8] Mont-Blanc Project http://www.montblanc-project.eu [9] The EUROCLOUDS project http://www.eurocloudserver.com [10] The CRISP Project http://www.crisp-project.eu [11] The FASTER Project http://www.fp7-faster.eu [12] The FLEXTILES project http://www.flextiles.eu [13] The VELOX project http://www.velox-project.eu [14] The PEPPHER Project http://www.peppher.eu [15] The 2PARMA project http://www.2parma.eu [16] The ENCORE project http://www.encore-project.eu [17] The APPLE-CORE project http://www.apple-core.info [18] The TEXT project http://www.project-text.eu [19] The CRESTA project http://cresta-project.eu [20] The NextMuse project http://nextmuse.cscs.ch [21] The EXASCALEPLASMATURB Project http://cordis.europa.eu/projects/100621_en.html [22] The PESM project http://cordis.europa.eu/projects/102370_en.html [23] APOS-EU Project http://apos-project.eu/about [24] HiPerDNO project http://dea.brunel.ac.uk/hiperdno [25] MERASA Project http://ginkgo.informatik.uni-augsburg.de/merasa-web [26] MULTIPARTES project http://www.multipartes.eu [27] The RELEASE project http://www.release-project.eu [28] The HiPEAC roadmap http://www.hipeac.net/system/files/hipeac-roadmap2011.pdf [29] The EUDAT project http://www.eudat.eu [30] The IOLANES project http://www.iolanes.eu [31] MpCCI http://www.mpcci.de

40

41


www.planethpc.eu

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

42


Planet HPC Roadmap 2013