DCD>Magazine Issue 32 - Chilled Efficiency

u “This platform is designed to tackle the largest AI training and inference problems that we know about. And as part of the Exascale Computing Project, there's a new effort around exascale machine learning and that activity is feeding into the requirements for Aurora.” That effort is ExaLearn, led by Francis J. Alexander, deputy director of the Computational Science Initiative at Brookhaven National Laboratory. "We're looking at both machine learning algorithms that themselves require exascale resources, and/or where the generation of the data needed to train the learning algorithm is exascale," Alexander told DCD. In addition to Brookhaven, the team brings together experts from Argonne, LLNL, Lawrence Berkeley, Los Alamos, Oak Ridge, Pacific Northwest and Sandia in a formidable co-design partnership. LLNL’s informatics group leader, and project lead for the Livermore Big Artificial Neural Network (LBANN) open-source deep learning toolkit, Brian Van Essen, added: “We're focusing on a class of machine learning problems that are relevant to the Department of Energy's needs… we have a number of particular types of machine learning methods that we're developing that I think are not being focused on in industry. “Using machine learning, for example, for the development of surrogate models to simplify computation, using machine learning to develop controllers for experiments very relevant to the Department of Energy.” Those experiments include hugely ambitious research efforts into manufacturing, healthcare and energy research. Some of the most data intensive tests are held at the National Ignition Facility, a large laser-based inertial confinement fusion research device at LLNL, that uses lasers to heat and compress a small amount of hydrogen fuel with the goal of inducing nuclear fusion reactions for nuclear weapons research. “So it's not like - and I'm not saying it's not a challenging problem - but it's not like recommending the next movie you should see, some of these things have very serious consequences,” Alexander said. “So if you're wrong, that's an issue.” Van Essen concurred, adding that the

DCD>Debates Are we ready for the high-density AI ready future?

Watch On Demand

Dr Suvojit Ghosh from McMaster University's Computing Infrastructure Research Centre (CIRC) and Chris Orlando, CEO of ScaleMatrix, discuss the growth of high-density computing workloads, and how to power and cool your data center as racks get denser. bit.ly/AreYouAIReady machine learning demands of their systems also require far more computing power: “If you're a Google or an Amazon or Netflix you can train good models that you then use for inference, billions of times. Facebook doesn’t have to develop a new model for every user to classify the images that they're uploading - they use a well-trained model and they deploy it.” Despite the enormous amount of time and money Silicon Valley giants pump into AI, and their reputation for embracing the technology, they mainly exist in an inference dominated environment - simply using models that have already been trained. “We're continuously developing new models,” Van Essen said. “We're primarily in a training dominated regime for machine learning… we are typically developing these models in a world where we have a massive amount of data, but a paucity of labels, and an inability to label the datasets at scale because it typically requires a domain expert to be able to interpret what you're looking at.”

models using things like transfer learning. Those are techniques that we're developing in the labs and applying to new problems through ExaLearn. It's really the goal here.” From this feature, and the many other ‘AI is the future’ stories in the press, it may be tempting to believe that the technology will eventually solve everything and anything. “Without an understanding of what these algorithms actually do, it's very easy to believe that it's magic. It's easy to think you can get away with just letting the data do everything for you,” Alexander said. “My caution has always been that for really complex systems, that's probably not the case.” “There's a lot of good engineering work and good scientific exploration that has to go into taking the output of a neural network training algorithm and actually digging through to see what it is telling you and what can you interpret from that,” Van Essen agreed. Indeed, interpretability and reproducibility remains a concern for machine learning in science, and an area of active research for ExaLearn. One of the approaches the group is studying is to intentionally not hard-code first principles into the system and have it "learn the physics without having to teach it explicitly," Van Essen said. “Creating generalized learning approaches that, when you test them after the fact, have obeyed the constraints that you already know, is an open problem that we're exploring.” This gets harder when you consider experiments that are at the cutting edge

"It's not like recommending the next movie you should see... these things have very serious consequences"

Working closely with experimenters and subject experts, ExaLearn is “looking at combinations of unsupervised and semisupervised and self-supervised learning techniques - we're pushing really hard on generative models as well,” Van Essen said. Take inertial confinement fusion research: “We have a small handful of tens to maybe a hundred experiments. And you want to couple the learning of these models across this whole range of different fidelity

38 DCD Magazine • datacenterdynamics.com

Turn static files into dynamic content formats.

Create a flipbook

DCD>Magazine Issue 32 - Chilled Efficiency

Published on Apr 3, 2019Technology & Computing

DCD Magazine

We don’t need to tell you that efficiency is important: It saves money and the planet at the same time. But it’s not easy for businesses - with bottom lines, shareholders and financial quarters - to embark on long term research projects. That’s why the EU hopes to help: In this issue, we head to Boden Type One in Sweden to examine an experimental data center that aims to bring hyperscale levels of energy efficiency to everyone. Elsewhere in the magazine, we talk to RagingWire’s CEO, focus on building colocation facilities in a jam-packed supplement, and discover how machine learning could fundamentally change science. There’s so much more, but you’ll have to read the magazine to find out what else we have crammed in.