Page of Kees Vuik containing information for Scientific Computing on GPU's

Scientific Computing on GPU's

The Graphics Processing Unit or GPU is used more and more for Scientific Computing. For relatively low costs one can obtain supercomputer performance (1 Teraflop). It appears however that some work has to be done to make an ordinary program suitable for use on the GPU. One of the important tools is the development of CUDA (Compute Unified Device Architecture). This is an extension of the C programming language, which can be used to program the GPU in an easy way. Furthermore, in many cases algorithms have to be adapted in order to make them suitable for GPU computing. Finally, optimization of the algorithm, implementation and use of the hierarchy of the GPU is needed to obtain real high speed ups.

Please find below some items which are important for the Numerical Analysis Group of the TU Delft.

GPU Teaching
GPU Research
GPU Publications
GPU Software
GPU Hardware

GPU Teaching

Delft University of Technology is a recognized NVIDIA GPU Education Center.

We teach the course "Introduction/advanced course Programming on the GPU with CUDA" a number of times. The teachers are Prof.dr.ir. C. Vuik, Ir. C.W.J. Lemmens, and Dr. M. Möller,

The next time the course is given on November 2-3, 2022, registration. Please consult the flyer for more details.

GPU Research

The core of our research is how to invent and implement algorithm to solve systems of discretized partial differential equations in an efficient way. Below we give some of the work that has been done and some new Bachelor, Master, and PhD Thesis projects. I am a member of the NIRICT Reconnaissance Topic: Performance and Correctness of GPGPU Applications team.

Bachelor Projects

Acceleration of the CONTACT-package by using a GPU (Pieter Loof, start February 2011)
Acceleration of the CONTACT-package by using a GPU (Michiel de Reus, August 2010). A short version of this report has appeared in Machazine, 15(2), 26-27, 2010.

Master Projects

Parallel GPU solver for PLAXIS
Jorn Hoofwijk ( PLAXIS )
Fast calculation of portfolio credit losses and the sensitivities on GPU
Arvind Nayak (ING )
GPU implementation of cellular traction forces in agent-based models
Wilbert Gorter ( Vortech )
Performance comparison of implicit and explicit schemes for the shallow water equations on a GPU with FORTRAN90 code
Floris Buwalda ( Deltares )
GPU acceleration of the PWTD algorithm for application in high-frequency communication and fotonics
Rory Gravendeel
Implementing Ecological Models on GPU
Lotte Peeters ( NIOZ )
Parallelization of Flow Modelling Code on a GPU
Simao Pereira

Simao visited our department to write his master thesis (period July-September 2011). He is a student of Prof. Miguel Nobrega from the University of Minho, Portugal.

Developing a fast (fast-time) solver for large sparse matrices for MARIN (Martijn de Jong)

A GPU implementation of a bubbly flow solver (Rohit Gupta, August 2010). This research (slides) is presented at the GPU Technology Conference 2010. Furthermore a TUD report has appeared.

PhD Projects

GPU accelerated iterative solvers for discretized partial differential equations (Rohit Gupta, start September 2010)

A presentation and slides of this work has been given at the GPU Technology Conference 2012, May 14-17, 2012, San Jose, California, USA.
On a GPU implementation of shifted Laplace preconditioned solvers for the Helmholtz equation (Hans Knibbe, start September 2009). He obtained one of the first results on the LGM. Also a paper has appeared and a presentation has been given at the ENUMATH Conference 2011.

A presentation and slides of this work has been given at the GPU Technology Conference 2012, May 14-17, 2012, San Jose, California, USA.

Presentations

At the International Conference On Preconditioning Techniques For Scientific And Industrial Applications , May 16-18, 2011 Bordeaux, France the following presentation has been given:

A fast GPU Implementation of the Deflated Preconditioned Conjugate Gradient method

Kees Vuik, Rohit Gupta and Kees Lemmens

Part of the motivation to enhance computing by GPU's comes from the film and gaming industry. To illustrate this please view the presentation Fast GPU Preconditioning for Fluid Simulations in Film Production by Dan Bailey of the company Double Negative. Also check how their key artists use Maths and/or Science from day to day.

GPU Software

This zip file contains software to solve a linear system Ax = b by the Deflated Preconditioned Conjugate Gradient Method on the GPU under certain assumptions. Please save the file and unzip it. In the final directory there are two readme files which have to be read in the following order:

preREADME
operatingREADME

Deflation is also used in the PARALUTION library which enables you to perform various sparse iterative solvers and preconditioners on multi/many-core CPU and GPU devices. The open source variant of this code is available on github.

GPU Hardware

The Little GREEN Machine II

A description of the machine is given here.

Press bulletins

Little Green Machine-II is funded by NWO.

The Little GREEN Machine I

Press bulletins

Volkskrant 22 april 2013
Technisch Weekblad 3-04-2010 (Dutch)
NRC Handelsblad 11-03-2010 (final part, Dutch)
Explosie in de rekenwereld (Dutch)
The Little Green Machine (Dutch)

64-bit Linux clusters

Besides this DIAM also has 2 64-bit Linux clusters, of which one has 8 nodes and the other 16 nodes. Both have state-of-the-art dual or quad core Intel processors with about 16 GByte internal memory for each node. These systems are mainly used for heavy computations that cannot be done on an ordinary desktop. These applications run either standalone or in an MPI based cluster environment.

GPU processors

The most recent (Nov 2010) GPU processor is the "Nvidia Tesla C2070", also known as the Fermi, with 6 Gigabyte internal memory.

Recently, all clusternodes were equipped with so-called GPU processors by Nvidia, which is known to give a performance boost of a factor 20-100 for several mathematical operations not involving recurrence. We also acquired one of the fastest architectures available at the moment (Feb 2010): the "Nvidia Tesla C1060", which will be used in the near future for advanced mathematical computations.

Student lab facility

DIAM also has a student lab facility with 16 simple Linux desktops (now also equipped with a simple GPU). This labroom is used for instructions connected with our math courses, but also to organize courses were we teach our students and new researchers how to use e.g. MPI on the clusters and the GPUs.

Contact information: Kees Vuik

Back to the home page of Kees Vuik