3D FVM-based solver for transient incompressible flows finally on GPU

Page 1

GPU-based acceleration of FVM-based solver for incompressible transient flows with OpenFOAM

Internal Document, Copyright Vratis Ltd. 2011


Vratis ¡ Founded in 2006 as spin-off of Wroclaw University of Technology. ¡ Main activity: machine vision in biology, medicine and industry, hardware acceleration in science. ¡  Collaboration ¡ SlidePath Ltd. (Digital Pathology) ¡ ETH Zurich (Life Cell Imaging) ¡ Engys, iconCFD, IT’IS Foundation


¡ Product released in 2009. ¡ Objective: hardware acceleration of Computational Fluid Dynamics (CFD) simulations. • Motivation: time, cost and size constraints of HPC systems. • Typical simulations need hundreds GB space • Hospitals have no infrastructure for expensive clusters and supercomputers. • Typical duration of patient-specific simulations: several days or weeks.


SpeedIT in Action Proposed solution: hardware accelerator that can be placed close to the patient.

Data acquisition (MRI, CT) Determination of the geometry

Model building Solving of PDE Equations

Partners:

Post-processing Visualisation


SpeedIT Toolkit ¡ GPU-based library of iterative solvers commonly used in CFD simulations: ¡  BiCGSTAB for non-symmetric matrices. ¡  CG for symmetric matrices. ¡  Preconditioners: diagonal. ¡  Support for complex numbers. ¡  OpenFOAM plugin. ¡  Classic and eXtreme version.

¡ Testing environment: ¡  The University of Florida Sparse Matrix Collection and matrices from our CFD problems ¡  single & double precision, CSR matrix format, ¡  Performance with & without transfer. ¡  GPU GTX 275, CPU Intel Core 2 Quad 2.66 GHz

License: Creative Commons: Attribution


Test matrices Matrix name

Number of nonzero elements

Number of rows

Memory double precision [MB]

Memory single precision [MB]

Matrix name Number of nonzero elements

Number of rows

Memory double precision [MB]

Memory single precision [MB]

F1

26837113

343791

308,44

206,06

Freescale1

18920347

3428755

229,61

157,43

hood

10768436

220542

124,08

83,00

86,99

inline_1

36816342

503712

423,25

282,81

122,38

81,77

ldoor

46522475

952203

536,04

358,57

986703

823,92

550,53

msdoor

20240935

415863

233,23

156,01

23888775

986703

277,15

186,02

nd12k

14220946

36000

162,88

108,63

boneS10

55468422

914898

638,28

426,68

nd24k

28715634

72000

328,90

219,36

boneS10_M

18489474

914898

215,09

144,55

p029_66k_run9

10646210

323029

123,07

82,46

cage14

27130349

1505785

316,23

212,73

run20

3928604

165911

45,59

30,61

crankseg_1

10614210

52804

121,67

81,18

24271103

713969

280,48

187,90

crankseg_2

14148858

63838

162,16

108,19

af_shell10

52672325

1508065

608,54

407,61

542184

80800

6,51

4,44

audikw_1

77651847

943695

892,25

596,04

bmw3_2

11288630

227362

130,06

bmwcra_1

10644002

148770

bone010

71666325

bone010_M

apache1

test_nliter


Iterative solvers


Iterative solvers


SpeedIT 1.2 vs. CUDA 4.0 February 2011

CUSPARSE developed by


Integration with   Integration is possible with the help of OpenFOAM Plugin (GNU GPL).   GPU-based iterative solvers (SpeedIT) can be turned on during the run time with a few changes in conf files.   GPU-based solver (Arael) is a standalone application that can be run in OpenFOAM directory.


Validation in OpenFOAM

Fig. Cavity3D run with OpenFOAM and SpeedIT (lines) and from literature [31] for Re=100 (dotted line) i Re=400 (solid line) without turbulence model.


Validation in OpenFOAM

Fig. Cavity3D run with OpenFOAM and SpeedIT (lines) and from literature [31] for Re=100 (dotted line) and Re=400 (solid line) without turbulence model.


Simulation of the blood flow in human aorta

Fig. Blood flow simulations run for potential flow solver, SIMPLEfoam and PISOfoam. For more information see Malecha Z., Miroslaw, L.,Tomczak,T., Z. Koza, M. Matyka,Tarnawski,W., Szczerba, D. GPU-based simulation of 3D blood flow in abdominal aorta using OpenFOAM, Arch. Mech. 63, 137 161(2011)


OpenFOAM speed-up

Poor acceleration due to memory bottleneck

Z. Malecha, L. Miroslaw, T. Tomczak, Z. Koza, M. Matyka, W. Tarnawski, D. Szczerba. GPU-Based Simulation of 3D Blood Flow in Abdominal Aorta Using OpenFoam. Arch. Mech. 63, 137 - 161(2011)


ARAEL ¡ Full GPU-acceleration. All most time-consuming operations are performed on the GPU card (NVIDIA) ¡ Iteratively solves Navier-Stokes equation with Finite Volume Method. ¡ Support for 3D unstructured meshes in OpenFOAM and VTK format. ¡ Solves problem with geometries up to 9 millions cells (estimated for NVIDIA C2070). ¡ Supported platforms: Linux 32-/64bit.


ARAEL 1.0 ¡ PISO (Pressure Implicit with Split Operator): Transient solver for incompressible flow. ¡ SIMPLE (Semi-implicit Method for Pressure-Linked Equation): Steady-state solver for incompresssible flow. ¡ Boundary conditions: zero gradient, fixed-value. ¡ Easy to use C-style interface.

Road Map Support for multi-gpu. Turbulence (RANS, kOmegaSST model with robust hybrid wall function)


Cavity3D in OpenFOAM
 Compared against Intel Q8400

92.2

100

90.4

67.4

10

4.1

3.85

2.9

2.7

1

0.249

0.165 0.065

0.1

0.02 0.01

1 CPU 4 CPU

0.055

4 CPU DIC

0.055

Tesla C2070

0.012 0.007 0.006

0.001

1.00E+03

1.06E+04

1.04E+05

1.00E+06

1 CPU

0.012

0.165

4.1

92.2

4 CPU

0.007

0.065

3.85

90.4

4 CPU DIC

0.006

0.055

2.7

67.4

0.02

0.055

0.249

2.9

Tesla C2070

Fig.Time [sec.] needed for single iteration of PISO solver for different resolution of cavity3D OpenFOAM case (icofoam) and different preconditioners on CPU side (diagonal and DIC) and GPU (diagonal only). Logarithmic scale.


Cavity3D in OpenFOAM
 Compared against Intel Q8400 GPU memory requirements 10000 5683.2

1000

GPU memory [MB]

642.5

100

74.4

10

9

2

1 1000

10000

100000

1000000

10000000

Number of cells for cavity 3D

Fig. GPU requirements needed for different resolution of cavity3D OpenFOAM case (icofoam). Logarithmic scale.


Cavity3D in OpenFOAM
 Compared against Intel Q8400 Time required for computation of the first 20 solver iterations CPU 4 cores DIC

100000

Tesla C2070 10000

Time [s]

1000

100

10

1

0.1 1000

10000

100000

1000000

10000000

Number of cells for cavity 3D

Fig. Time required to perform first 20 iterations calculated for different resolution of cavity3D OpenFOAM case (icofoam). Logarithmic scale.


Cavity3D in OpenFOAM
 Compared against Intel Q8400 Time needed for single iteration [sec.] 10000

1000

100

10 CPU 4 cores DIC Tesla C2070 1

0.1

0.01 1000

10648

103823

1000000

8998912

0.001

Fig. Time required to perform single iteration calculated for different resolution of cavity3D OpenFO case (icofoam). Logarithmic scale.


Cavity3D in OpenFOAM
 Compared against Intel Q8400 Acceleration over Intel Q8400 @ 2.66GHz 50.0 45.0 40.0 35.0 30.0 25.0

Tesla C2070 over 4 core CPU, diagonal preconditioner

20.0

Tesla C2070 over 4 core CPU, DILU/ DIC preconditioner

15.0 10.0 5.0 0.0 1000

10648

103823

1000000

8998912

Fig. Acceleration over 4 core CPU Intel Q8400 calculated for different resolution of cavity3D OpenFOAM case (icofoam).


Cavity3D in OpenFOAM
 Compared against Intel Xeon E5620 2.4GHz 
 DDR3 PC3-10666

Time needed for computation of one solver iteration 1000

662.504 271.8

100 51.8 31.122 10

9.309 2.9 Intel Xeon E5620 (1 core, DIC preconditioner)

1.3695

1

Intel Xeon E5620 (4 cores, DIC preconditioner) 0.469925

Tesla C2070, diagonal preconditioner

0.249 0.1 0.0644525

In collaboration with:

0.033275 0.02

0.055

0.01 0.0048275 0.005645 0.001 1000.00

10648.00

103823.00

1000000.00

8998912.00

Fig. Time (sec.) required to perform single iteration calculated for different resolution of cavity3D OpenFOAM case (icofoam). Logarithmic scale.


Cavity3D in OpenFOAM
 Compared against Intel Xeon E5620 2.4GHz
 DDR3 PC3-10666

Achieved acceleration for one solver iteration 6.000

5.247 5.000

4.000

3.343 3.000

2.914

Intel Xeon E5620, 4 cores vs. Intel Xeon E5620, 1 core 3.210

Tesla C2070 vs. Intel Xeon E5620, 4 cores 2.437

2.000

1.937

1.887

In collaboration with: 1.000

0.855 0.605 0.282

0.000 1000.00

10648.00

103823.00

1000000.00

8998912.00

Fig. Acceleration calculated for different resolution of cavity3D OpenFOAM case (icofoam).


Cavity3D in OpenFOAM
 Compared against Intel Xeon E5620 2.4GHz
 DDR3 PC3-10666

Time needed for computation of 20 solver iteration 10000

8704.84 3290.18

1000 404.41

100

600

94.17

24.3 Xeon E5620 1 core DIC

12.025

10

Xeon E5620 4 cores DIC 4.6425 0.73

1 0.64

Tesla C2070

2.29

0.78 0.325

0.1

In collaboration with:

0.0825 0.06175

0.01 1000.00

10648.00

103823.00

1000000.00

8998912.00

Fig. Time (sec.) required to perform 20 iterations calculated for different resolution of cavity3D OpenFOAM case (icofoam). Logarithmic scale.


Cavity3D in OpenFOAM
 Compared against Intel Xeon E5620 2.4GHz
 DDR3 PC3-10666

Achieved acceleration for 20 solver iterations 6.000 5.484 5.000

4.294 4.000

3.875

Intel Xeon E5620, 4 cores vs. Intel Xeon E5620, 1 core

3.000 2.646

2.590

Tesla C2070 vs. Intel Xeon E5620, 4 cores

2.400 2.027

2.000

In collaboration with: 1.336 1.000

0.445 0.000

0.096 1000.00

10648.00

103823.00

1000000.00

8998912.00

Fig. Acceleration calculated for different resolution of cavity3D OpenFOAM case (icofoam).


Visualization (Arael vs. OpenFOAM)


Evaluators, customers


Don’t wait. Just SpeedIT.

Lukasz Miroslaw, PhD CEO lukasz.miroslaw@vratis.com Tel. 0048 796 997 288 info@vratis.com www.vratis.com & speedit.vratis.com


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.