Parallel Computing

From Mesham
Revision as of 16:09, 12 January 2010 by Polas (talk | contribs) (Current Solutions)
Jump to navigationJump to search

Parallel Computing

Parallel computing is the use of multiple computing resources to solve a problem. These problems can be very wide ranging, from smaller examples to highly complex cosmological simulations or weather prediction codes. Utilising parallel computing adds additional complexities and challenges to programming. The programmer must consider a wide variety of new concepts and change their mindset from sequential to parallel. Having said that, the world we live in is predominantly parallel and as such it is natural to model problems in this way.

The Problem

Current parallel languages are either conceptually simple or efficient - but not both. These aims have, until this point, been contradictory. If parallel computing is to grow (as we predict with current advances in CPU and GPU technology) then this issue must be addressed. The problem is that we are using current, sequential, ways of thinking to try and solve this programmability problem... instead we need to think "out the box" and come up with a completely new solution.

Current Solutions

There are numerous parallel language solutions currently in existance, we will consider just a few:

Message Passing Interface

The MPI standard is extremly popular within this domain. Although bindings exist for many languages, most commonly it is used with C. The result is low level, highly complex, difficult to maintain BUT efficient code. As the programmer must control all aspects of parallelism they can often get caught up in the low level details which are uninteresting but important. Additionally the programmer is completely responsible for ensuring all communications will complete correctly, or else they run the risk of deadlock, livelock etc...

Bulk Synchronous Parallel

The BSP standard was once touted as the solution to parallel computing. Implementations of this standard are most commonly used in conjuction with C. The program is split into supersteps, each superstep is split into 3 stages - computation, communication and global synchronisation via barriers. However, this synchronisation is very expensive and as such performance of BSP is generally much poorer than that of MPI. In conjuctional, although the communication model adopted by BSP is simpler the programmer must still address low level issues (such as pointers) imposed by the underlying language used.

High Performance Fortran

In HPF the programmer just specifies the general distribution of data, with the compiler taking care of all other aspects of parallelism (such as computation distribution and communication.) Although a simple, abstract language, because much emphasis is placed upon the compiler to deduce parallelism efficiency suffers. The programmer, who is often in a far better position is indicate parallel aspects, lacks control and is limited. One useful feature of HPF is that all parallel aspects are expressed via comments, such that the HPF program is also acceptable to a normal Fortran Compiler

Co-Array Fortran

This language is more explicit than HPF. The programmer, via co-arrays will distribute computation and data but much rely on the compiler to determine communication (which is often one sided.) Because of this one sided communication, messages are often short which results in the overhead of sending many different messages. Having said this, things are improving with reference to CAF, the new upcomming Fortran standard is said to include co-arrays which will see the integration of the CAF concepts into the standard Fortran.

Unified Parallel C

UPC is certainly a parallel language to keep an eye on - there is much development time and effort being spent on it at the moment. UPC uses an explicit parallel execution model with shared address space. There are memory management primitives added into the language and shared memory keywords and pointers. In adding all these keywords to the language does bloat it and result in a brittle tightly coupled design. Additionally C's array model is also inherited, which is limiting in data intensive parallel computations. One must still deal with pointers and the low level challenges that these impose.

Titanium

This is an explicit parallel version of Java it is safe, portable and allows one to build complex data structures. Similar to UPC it uses a global address space with numerous keywords and constructs added to the language to support parallelism. However, OO has an imposed (hidden) cost in terms of serialising and deserialising objects. There is also literature which indicates that the JRE does not consider memory locality, which is important for performance in HPC applications working on large data sets.