Verbesserung der Zuverlässigkeit von Computersimulationen bei der Vorhersage von Umweltrisiken

Internally funded project


Start date : 01.08.2004

End date : 31.07.2006


Project details

Scientific Abstract

This project studies the use of clusters of clusters for the numerical solution of partial differential equations. From an abstact point of view, cluster of clusters are a special case of heterogeneous parallel machines and hence of grid computing. However, in contrast to the extreme vision of grid computing where ideally all sorts of computing ressources are combined to solve a task, we restrict ourselves deliberately to a comparatively small number of parallel machines. Each cluster in the computational grid (to use the buzz word) has a dedicated internal network which is much more powerful than the standard internet connection between the participating machines.

Programming a cluster of clusters is similar to programming single CPU machines with deep memory hierarchies: In both cases the heterogeneous nature of the hardware has to be taken into account in order to utilize the ressources efficiently. 
From the experiences and results in the Dime project, it is obvious that algorithms and programs that were developed on (and for) homogeneous parallel machines have to be adapted to reflect the heterogeneous setting. It turns out that the differences in the network connection are more difficult to accomodate than the differences in node performance between the various architectures involved. The node differences are taken into account in the partitioning step as a form of a priori load balancing.

The project covers the development of both, algorithms and programs. For the program development, we build on the MPI-implementation PACX-MPI from the partners in Stuttgart. On the side of numerical algorithms, we concentrate on extrapolation based acceleration techniques for domain decomposition methods. The project has set up a cluster of clusters with machines from Houston, Stuttgart and Erlangen and has successfully completed the first test runs.

Involved: