Parallel Computing

From Mesham
Revision as of 11:30, 3 July 2010 by Polas (talk | contribs)
Jump to navigationJump to search

Parallel Computing

Parallel computing is the use of multiple computing resources to solve a problem. These problems can be very wide ranging, from smaller examples to highly complex cosmological simulations or weather prediction codes. Utilising parallel computing adds additional complexities and challenges to programming. The programmer must consider a wide variety of new concepts and change their mindset from sequential to parallel. Having said that, the world we live in is predominantly parallel and as such it is natural to model problems in this way.

The Problem

Current parallel languages are either conceptually simple or efficient - but not both. These aims have, until this point, been contradictory. If parallel computing is to grow (as we predict with current advances in CPU and GPU technology) then this issue must be addressed. The problem is that we are using current, sequential, ways of thinking to try and solve this programmability problem... instead we need to think "out the box" and come up with a completely new solution.

Current Solutions

There are numerous parallel language solutions currently in existance, we will consider just a few:

Message Passing Interface

The MPI standard is extremly popular within this domain. Although bindings exist for many languages, most commonly it is used with C. The result is low level, highly complex, difficult to maintain BUT efficient code. As the programmer must control all aspects of parallelism they can often get caught up in the low level details which are uninteresting but important. Additionally the programmer is completely responsible for ensuring all communications will complete correctly, or else they run the risk of deadlock, livelock etc...

Bulk Synchronous Parallel

The BSP standard was once touted as the solution to parallel computing. Implementations of this standard are most commonly used in conjuction with C. The program is split into supersteps, each superstep is split into 3 stages - computation, communication and global synchronisation via barriers. However, this synchronisation is very expensive and as such performance of BSP is generally much poorer than that of MPI. In conjuctional, although the communication model adopted by BSP is simpler the programmer must still address low level issues (such as pointers) imposed by the underlying language used.

High Performance Fortran

In HPF the programmer just specifies the general distribution of data, with the compiler taking care of all other aspects of parallelism (such as computation distribution and communication.) Although a simple, abstract language, because much emphasis is placed upon the compiler to deduce parallelism efficiency suffers. The programmer, who is often in a far better position is indicate parallel aspects, lacks control and is limited. One useful feature of HPF is that all parallel aspects are expressed via comments, such that the HPF program is also acceptable to a normal Fortran Compiler

Co-Array Fortran

This language is more explicit than HPF. The programmer, via co-arrays will distribute computation and data but much rely on the compiler to determine communication (which is often one sided.) Because of this one sided communication, messages are often short which results in the overhead of sending many different messages. Having said this, things are improving with reference to CAF, the new upcomming Fortran standard is said to include co-arrays which will see the integration of the CAF concepts into the standard Fortran.

Unified Parallel C

UPC is certainly a parallel language to keep an eye on - there is much development time and effort being spent on it at the moment. UPC uses an explicit parallel execution model with shared address space. There are memory management primitives added into the language and shared memory keywords and pointers. In adding all these keywords to the language does bloat it and result in a brittle tightly coupled design. Additionally C's array model is also inherited, which is limiting in data intensive parallel computations. One must still deal with pointers and the low level challenges that these impose.

Titanium

This is an explicit parallel version of Java it is safe, portable and allows one to build complex data structures. Similar to UPC it uses a global address space with numerous keywords and constructs added to the language to support parallelism. However, OO has an imposed (hidden) cost in terms of serialising and deserialising objects. There is also literature which indicates that the JRE does not consider memory locality, which is important for performance in HPC applications working on large data sets.

ZPL

ZPL is an array programming language. The authors of this language have deduced that a large majority of parallel programming is done with respect to arrays of data. To this end they have created a language with specific keywords and constructs to assist in this. For instance the expression A=B*C will set array A to be that of arrays B and C multiplied. Whilst this is a useful abstraction unfortunately parallelism itself is implicit with limited control on behalf of the programmer. The net result is that much emphasis is placed upon the compiler to find the best solution and, with limited information, performance is adversely affected. Incidently, in Mesham the types have been written such that a concept such as array programming can be easily included. The same expression is perfectly acceptable to Mesham, with the complexity of the operation being handled in the type library.

NESL

NESL is a functional parallel language. Numerous people believe that functional programming is the answer to the problem of parallel languages. However, the programmer is so abstract from the actual machine it is not possible to optimise their code (they are completely reliant on the compiler's efficiency) nore is it often possible to map directly the runtime cost of an algorithm (although it is often possible to map this theoretically.) This high level of abstract means that it is difficult, in some cases impossible, for the NESL programmer to elicit high performance with current compiler technology. There also the, sometimes misguided, belief amongst programmers that functional languages are difficult to learn. Whilst this is not always the case it does put many programmer off, especially when the performance benefits of learning NESL are mediocre at best.