GPU-based acceleration of FVM-based solver for incompressible transient flows with OpenFOAM
Internal Document, Copyright Vratis Ltd. 2011
Vratis ¡ Founded in 2006 as spin-off of Wroclaw University of Technology. ¡ Main activity: machine vision in biology, medicine and industry, hardware acceleration in science. ¡ Collaboration ¡ SlidePath Ltd. (Digital Pathology) ¡ ETH Zurich (Life Cell Imaging) ¡ Engys, iconCFD, IT’IS Foundation
¡ Product released in 2009. ¡ Objective: hardware acceleration of Computational Fluid Dynamics (CFD) simulations. • Motivation: time, cost and size constraints of HPC systems. • Typical simulations need hundreds GB space • Hospitals have no infrastructure for expensive clusters and supercomputers. • Typical duration of patient-specific simulations: several days or weeks.
SpeedIT in Action Proposed solution: hardware accelerator that can be placed close to the patient.
Data acquisition (MRI, CT) Determination of the geometry
Model building Solving of PDE Equations
Partners:
Post-processing Visualisation
SpeedIT Toolkit ¡ GPU-based library of iterative solvers commonly used in CFD simulations: ¡ BiCGSTAB for non-symmetric matrices. ¡ CG for symmetric matrices. ¡ Preconditioners: diagonal. ¡ Support for complex numbers. ¡ OpenFOAM plugin. ¡ Classic and eXtreme version.
¡ Testing environment: ¡ The University of Florida Sparse Matrix Collection and matrices from our CFD problems ¡ single & double precision, CSR matrix format, ¡ Performance with & without transfer. ¡ GPU GTX 275, CPU Intel Core 2 Quad 2.66 GHz
License: Creative Commons: Attribution
Test matrices Matrix name
Number of nonzero elements
Number of rows
Memory double precision [MB]
Memory single precision [MB]
Matrix name Number of nonzero elements
Number of rows
Memory double precision [MB]
Memory single precision [MB]
F1
26837113
343791
308,44
206,06
Freescale1
18920347
3428755
229,61
157,43
hood
10768436
220542
124,08
83,00
86,99
inline_1
36816342
503712
423,25
282,81
122,38
81,77
ldoor
46522475
952203
536,04
358,57
986703
823,92
550,53
msdoor
20240935
415863
233,23
156,01
23888775
986703
277,15
186,02
nd12k
14220946
36000
162,88
108,63
boneS10
55468422
914898
638,28
426,68
nd24k
28715634
72000
328,90
219,36
boneS10_M
18489474
914898
215,09
144,55
p029_66k_run9
10646210
323029
123,07
82,46
cage14
27130349
1505785
316,23
212,73
run20
3928604
165911
45,59
30,61
crankseg_1
10614210
52804
121,67
81,18
24271103
713969
280,48
187,90
crankseg_2
14148858
63838
162,16
108,19
af_shell10
52672325
1508065
608,54
407,61
542184
80800
6,51
4,44
audikw_1
77651847
943695
892,25
596,04
bmw3_2
11288630
227362
130,06
bmwcra_1
10644002
148770
bone010
71666325
bone010_M
apache1
test_nliter
Iterative solvers
Iterative solvers
SpeedIT 1.2 vs. CUDA 4.0 February 2011
CUSPARSE developed by
Integration with Integration is possible with the help of OpenFOAM Plugin (GNU GPL). GPU-based iterative solvers (SpeedIT) can be turned on during the run time with a few changes in conf files. GPU-based solver (Arael) is a standalone application that can be run in OpenFOAM directory.
Validation in OpenFOAM
Fig. Cavity3D run with OpenFOAM and SpeedIT (lines) and from literature [31] for Re=100 (dotted line) i Re=400 (solid line) without turbulence model.
Validation in OpenFOAM
Fig. Cavity3D run with OpenFOAM and SpeedIT (lines) and from literature [31] for Re=100 (dotted line) and Re=400 (solid line) without turbulence model.
Simulation of the blood flow in human aorta
Fig. Blood flow simulations run for potential flow solver, SIMPLEfoam and PISOfoam. For more information see Malecha Z., Miroslaw, L.,Tomczak,T., Z. Koza, M. Matyka,Tarnawski,W., Szczerba, D. GPU-based simulation of 3D blood flow in abdominal aorta using OpenFOAM, Arch. Mech. 63, 137 161(2011)
OpenFOAM speed-up
Poor acceleration due to memory bottleneck
Z. Malecha, L. Miroslaw, T. Tomczak, Z. Koza, M. Matyka, W. Tarnawski, D. Szczerba. GPU-Based Simulation of 3D Blood Flow in Abdominal Aorta Using OpenFoam. Arch. Mech. 63, 137 - 161(2011)
ARAEL ¡ Full GPU-acceleration. All most time-consuming operations are performed on the GPU card (NVIDIA) ¡ Iteratively solves Navier-Stokes equation with Finite Volume Method. ¡ Support for 3D unstructured meshes in OpenFOAM and VTK format. ¡ Solves problem with geometries up to 9 millions cells (estimated for NVIDIA C2070). ¡ Supported platforms: Linux 32-/64bit.
ARAEL 1.0 ¡ PISO (Pressure Implicit with Split Operator): Transient solver for incompressible flow. ¡ SIMPLE (Semi-implicit Method for Pressure-Linked Equation): Steady-state solver for incompresssible flow. ¡ Boundary conditions: zero gradient, fixed-value. ¡ Easy to use C-style interface.
Road Map Support for multi-gpu. Turbulence (RANS, kOmegaSST model with robust hybrid wall function)
Cavity3D in OpenFOAM
 Compared against Intel Q8400
92.2
100
90.4
67.4
10
4.1
3.85
2.9
2.7
1
0.249
0.165 0.065
0.1
0.02 0.01
1 CPU 4 CPU
0.055
4 CPU DIC
0.055
Tesla C2070
0.012 0.007 0.006
0.001
1.00E+03
1.06E+04
1.04E+05
1.00E+06
1 CPU
0.012
0.165
4.1
92.2
4 CPU
0.007
0.065
3.85
90.4
4 CPU DIC
0.006
0.055
2.7
67.4
0.02
0.055
0.249
2.9
Tesla C2070
Fig.Time [sec.] needed for single iteration of PISO solver for different resolution of cavity3D OpenFOAM case (icofoam) and different preconditioners on CPU side (diagonal and DIC) and GPU (diagonal only). Logarithmic scale.
Cavity3D in OpenFOAM
 Compared against Intel Q8400 GPU memory requirements 10000 5683.2
1000
GPU memory [MB]
642.5
100
74.4
10
9
2
1 1000
10000
100000
1000000
10000000
Number of cells for cavity 3D
Fig. GPU requirements needed for different resolution of cavity3D OpenFOAM case (icofoam). Logarithmic scale.
Cavity3D in OpenFOAM
 Compared against Intel Q8400 Time required for computation of the first 20 solver iterations CPU 4 cores DIC
100000
Tesla C2070 10000
Time [s]
1000
100
10
1
0.1 1000
10000
100000
1000000
10000000
Number of cells for cavity 3D
Fig. Time required to perform first 20 iterations calculated for different resolution of cavity3D OpenFOAM case (icofoam). Logarithmic scale.
Cavity3D in OpenFOAM
 Compared against Intel Q8400 Time needed for single iteration [sec.] 10000
1000
100
10 CPU 4 cores DIC Tesla C2070 1
0.1
0.01 1000
10648
103823
1000000
8998912
0.001
Fig. Time required to perform single iteration calculated for different resolution of cavity3D OpenFO case (icofoam). Logarithmic scale.
Cavity3D in OpenFOAM
 Compared against Intel Q8400 Acceleration over Intel Q8400 @ 2.66GHz 50.0 45.0 40.0 35.0 30.0 25.0
Tesla C2070 over 4 core CPU, diagonal preconditioner
20.0
Tesla C2070 over 4 core CPU, DILU/ DIC preconditioner
15.0 10.0 5.0 0.0 1000
10648
103823
1000000
8998912
Fig. Acceleration over 4 core CPU Intel Q8400 calculated for different resolution of cavity3D OpenFOAM case (icofoam).
Cavity3D in OpenFOAM
 Compared against Intel Xeon E5620 2.4GHz 
 DDR3 PC3-10666
Time needed for computation of one solver iteration 1000
662.504 271.8
100 51.8 31.122 10
9.309 2.9 Intel Xeon E5620 (1 core, DIC preconditioner)
1.3695
1
Intel Xeon E5620 (4 cores, DIC preconditioner) 0.469925
Tesla C2070, diagonal preconditioner
0.249 0.1 0.0644525
In collaboration with:
0.033275 0.02
0.055
0.01 0.0048275 0.005645 0.001 1000.00
10648.00
103823.00
1000000.00
8998912.00
Fig. Time (sec.) required to perform single iteration calculated for different resolution of cavity3D OpenFOAM case (icofoam). Logarithmic scale.
Cavity3D in OpenFOAM Compared against Intel Xeon E5620 2.4GHz DDR3 PC3-10666
Achieved acceleration for one solver iteration 6.000
5.247 5.000
4.000
3.343 3.000
2.914
Intel Xeon E5620, 4 cores vs. Intel Xeon E5620, 1 core 3.210
Tesla C2070 vs. Intel Xeon E5620, 4 cores 2.437
2.000
1.937
1.887
In collaboration with: 1.000
0.855 0.605 0.282
0.000 1000.00
10648.00
103823.00
1000000.00
8998912.00
Fig. Acceleration calculated for different resolution of cavity3D OpenFOAM case (icofoam).
Cavity3D in OpenFOAM Compared against Intel Xeon E5620 2.4GHz DDR3 PC3-10666
Time needed for computation of 20 solver iteration 10000
8704.84 3290.18
1000 404.41
100
600
94.17
24.3 Xeon E5620 1 core DIC
12.025
10
Xeon E5620 4 cores DIC 4.6425 0.73
1 0.64
Tesla C2070
2.29
0.78 0.325
0.1
In collaboration with:
0.0825 0.06175
0.01 1000.00
10648.00
103823.00
1000000.00
8998912.00
Fig. Time (sec.) required to perform 20 iterations calculated for different resolution of cavity3D OpenFOAM case (icofoam). Logarithmic scale.
Cavity3D in OpenFOAM Compared against Intel Xeon E5620 2.4GHz DDR3 PC3-10666
Achieved acceleration for 20 solver iterations 6.000 5.484 5.000
4.294 4.000
3.875
Intel Xeon E5620, 4 cores vs. Intel Xeon E5620, 1 core
3.000 2.646
2.590
Tesla C2070 vs. Intel Xeon E5620, 4 cores
2.400 2.027
2.000
In collaboration with: 1.336 1.000
0.445 0.000
0.096 1000.00
10648.00
103823.00
1000000.00
8998912.00
Fig. Acceleration calculated for different resolution of cavity3D OpenFOAM case (icofoam).
Visualization (Arael vs. OpenFOAM)
Evaluators, customers
Don’t wait. Just SpeedIT.
Lukasz Miroslaw, PhD CEO lukasz.miroslaw@vratis.com Tel. 0048 796 997 288 info@vratis.com www.vratis.com & speedit.vratis.com