Page 34

Peac performance Tensor Comprehensions uses the Halide compiler as a library. We

performance improve, live. The best strategy is serialized via

build on Halide’s intermediate representation (IR) and analysis

protocol buffers and reusable immediately or in offline scenarios.

tools, and pair it with polyhedral compilation techniques, so that you can write layers using similar high-level syntax but without

On the performance side, while we still have many improvements

the need to explicitly say how it is going to run. We also found

in the works, Tensor Comprehensions can already match or beat

ways to make our language even more concise, eliminating the

the performance of current ML frameworks integrated with

need to specify loop bounds for reductions.

hand-tuned libraries, in favourable cases. This is mainly achieved

Automatically synthesizing (efficient) GPU kernels

by the ability to adapt code generation strategies to specific problem sizes. We are constantly conducting performance evaluations of the kernels produced automatically by Tensor

Tensor Comprehensions use Halide and polyhedral compilation

Comprehensions.

The

early

results

demonstrate

strong

techniques to automatically synthesize CUDA kernels with

improvements on a variety of neural network models, compared

delegated memory management and synchronization. This

to the default usage of vendor libraries such as CuDNN in the

translation performs optimizations for general operator fusion,

Caffe2 and PyTorch frameworks.

fast local memory, fast reductions and JIT specialization for specific sizes. Since we do not try to own or optimize memory

As we extend our contribution to more hardware backends,

management, our flow is easily and efficiently integrated into

Tensor Comprehensions will complement fast libraries written by

any ML framework and any language that allows calling C++

hardware manufacturers such as NVIDIA and Intel, and will be

functions.

used in conjunction with libraries such as CUDNN, MKL or NNPack.

Contrary to classical compiler technology and library approaches, polyhedral compilation allows Tensor Comprehensions to

What to expect next

schedule computations of individual tensor elements on demand

This release will allow researchers and programmers to write

for each new network.

layers in a notation that is similar to the maths they use in their papers and communicate concisely the intent of their program.

At the CUDA level, it combines affine loop transformations,

They will also be able to take that notation and translate it easily

fusion/fission and automatic parallelization while ensuring data

into a fast implementation in a matter of minutes rather than

is correctly moved through the memory hierarchy. The numbers

days. As the toolchain grows, we expect usability and performance

in the figure below show the order in which tensor elements were

to increase and benefit the whole community.

initially computed and arrows represent dependencies between them. In this example, the figure rotation corresponds to loop

Tensor Comprehensions is integrated with the popular PyTorch

interchange which enables deep operator fusion.

and Caffe2 machine learning frameworks. We welcome feedback from other frameworks and teams. Email: tensorcomp@fb.com

To drive the search procedure, we also provide an integrated multithreaded, multi-GPU autotuning library which uses evolutionary

Facebook has sponsored the HiPEAC conference in the past, and this

search to generate and evaluate thousands of implementation

work builds on earlier work by a long-term industry-academia

alternatives and select the best performing ones. You can call the

collaboration called Polly Labs, supported by Arm, which won a

tune function on your Tensor Comprehension and watch the

HiPEAC Technology Transfer Award in 2015. Access Tensor Comprehensions via github: github.com/ facebookresearch/TensorComprehensions The full version of this article originally appeared on the Facebook Research blog: bit.ly/Facebook_Research_Tensor_Comp Further reading: Vasilache, et al. ‘Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions.’ Ithaca, New York: Cornell University Library, 2018. arxiv.org/abs/1802.04730

34 HiPEACINFO 54

Profile for HiPEAC

HiPEACinfo 54  

In this issue of the HiPEAC (www.hipeac.net) magazine, published to coinicide with May 2018 innovation-themed Computing Systems Week in Goth...

HiPEACinfo 54  

In this issue of the HiPEAC (www.hipeac.net) magazine, published to coinicide with May 2018 innovation-themed Computing Systems Week in Goth...

Profile for hipeac
Advertisement