Difference between revisions of "Tutorial - RMA"

From Mesham
Jump to navigationJump to search
m (6 revisions imported)
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
<metadesc>Tutorial describing RMA of data in Mesham</metadesc>
 
<metadesc>Tutorial describing RMA of data in Mesham</metadesc>
'''Tutorial number eight''' - [[Tutorial_-_Parallel Types|prev]] :: [[Tutorial_-_Dynamic Parallelism|next]]
+
'''Tutorial number eight''' - [[Tutorial_-_Arrays|prev]] :: [[Tutorial_-_Dynamic Parallelism|next]]
  
 
== Introduction ==
 
== Introduction ==
Line 41: Line 41:
  
 
The code snippet above illustrates a potential question here, based on the assignment ''b:=a'' (which involves RMA) if the programmer wished to synchronise the RMA for this assignment, should they issue ''sync b'' or ''sync a''? The simple answer is that it doesn't matter as for synchronisation an assignment will tie the variables together so that, for instance ''sync b'' will synchronise RMA for variable ''b'', RMA for variable ''a'' and any other tied RMA for both these variables and their own tied variables.
 
The code snippet above illustrates a potential question here, based on the assignment ''b:=a'' (which involves RMA) if the programmer wished to synchronise the RMA for this assignment, should they issue ''sync b'' or ''sync a''? The simple answer is that it doesn't matter as for synchronisation an assignment will tie the variables together so that, for instance ''sync b'' will synchronise RMA for variable ''b'', RMA for variable ''a'' and any other tied RMA for both these variables and their own tied variables.
 +
 +
== Eager RMA ==
 +
 +
var a:array[Int,10]::allocated[single[on[1]]];
 +
proc 0 {
 +
    var i;
 +
    for i from 0 to 7 {
 +
        a[i]:=i;
 +
    };
 +
    sync a;
 +
}; 
 +
 +
We saw this example previously, where process zero will most likely write out the value of 10 (variable ''i'' after the loop) to all elements of the array, this is because the remote write is issued based on the variable rather than the variable's value. You could instead place the ''sync a'' call directly after the assignment, or alternatively remove this call all together and append the [[Eageronesided|eageronesided]] type to the type chain of variable ''a'' which will ensure the RMA communication and completion is atomic.
  
 
== Bulk Synchronous RMA ==
 
== Bulk Synchronous RMA ==
Line 65: Line 78:
 
== Notify and wait ==
 
== Notify and wait ==
  
The bulk synchronous approach is simple but not very scalable, certainly it is possible to play with different synchronisation options (for instance putting them inside the [[Proc|process selection]] blocks but care must be taken for data consistency. Another approach is to use the [[Notify|notify]] and [[Wait|wait]] support of the parallel function library. The [[Notify|notify]] function will send a notification to a specific process and the [[Wait|wait]] function will block and wait for a notification from a specific process.
+
The bulk synchronous approach is simple but not very scalable, certainly it is possible to play with different synchronisation options (for instance putting them inside the [[Proc|process selection]] blocks) but care must be taken for data consistency. Another approach is to use the [[Notify|notify]] and [[Wait|wait]] support of the parallel function library. The [[Notify|notify]] function will send a notification to a specific process and the [[Wait|wait]] function will block and wait for a notification from a specific process.
  
 
  #include <io>
 
  #include <io>
Line 96: Line 109:
 
In the example here process zero will issue a remote write to variable ''j'' (held on process one), then synchronise (complete) this RMA before sending a notification to process one. Process one will block waiting for a notification from process zero, and once it has received a notification will display its local values of ''j''. Due to the notification and waiting these values will be those written by process zero, if you comment out the [[Wait|wait]] call then process one will just display zeros.
 
In the example here process zero will issue a remote write to variable ''j'' (held on process one), then synchronise (complete) this RMA before sending a notification to process one. Process one will block waiting for a notification from process zero, and once it has received a notification will display its local values of ''j''. Due to the notification and waiting these values will be those written by process zero, if you comment out the [[Wait|wait]] call then process one will just display zeros.
  
There are some variation of these calls - [[Notifyall|notifyall]] to notify all processes, [[Waitany||waitany]] to wait for a notification from any process and [[Test_notification|test_notification]] to test whether there is a notification from a specific process or not.
+
There are some variation of these calls [[Notifyall|notifyall]] to notify all processes, [[Waitany|waitany]] to wait for a notification from any process and [[Test_notification|test_notification]] to test whether there is a notification from a specific process or not.
  
 
  #include <io>
 
  #include <io>

Latest revision as of 15:45, 15 April 2019

Tutorial number eight - prev :: next

Introduction

The default behaviour in Mesham is for communication involving variables to be performed via Remote Memory Access (RMA.) This is one sided, where data is remotely retrieved or written to a target process by the source. We briefly looked at this in the shared memory tutorial and here we build on that to consider the concepts in more depth.

Data visibility

function void main() {
  var a:Int::allocated[single[on[1]]];
  var b:Int::allocated[multiple[]];
  var c:Int::allocated[multiple[commgroup[0,1]]];
  var d:Int::allocated[single[on[0]];

  b:=a;
  proc 1 {
     c:=a;
  };
  d:=a;
  proc 1 {
     d:=a;
  };
};

In the code snippet above exactly what communications are occurring (i.e. are processes reading remote data or writing to remote data?) The best way to think about this is via a simple visibility rule; all variables marked multiple (including those with extra commgroup type) are private to the processes that contain them and all variables marked single are publicly visible to all processes. Therefore in the assignment at line 6 each processes will remotely read from a held on process one and write this into their own local (private) copy of b. At line 8, only process one will write the value of a (a local copy as a is held on the same process) into its own local (private) version of c, the value of c on process zero will remain unchanged. For variables marked single, assignment favours reading the value remotely if possible rather than writing remotely, for instance at line 10 the assignment d:=a will result in process zero reading the value of a from process one, but at line 12 the only process that can execute this is process one so this results in a remote write of a to variable d held on process zero.

Synchronisation

By default RMA is non-blocking, so that remote reads or writes might complete at any point and need to be synchronised before values are available. This approach is adopted for performance and scalability, such that many reads and/or writes can occur between synchronisation points. The sync keyword provides synchronisation in Mesham, there are actually two ways to use this, firstly sync on its own will result in a barrier synchronisation, where each process will complete all of its outstanding RMA and then wait (barrier) for all other processes to reach that same point. The other use of synchronisation is with a variable for instance sync v (assuming variable v already exists) which will ensure all outstanding RMA involving only variable v will complete - this second use of synchronisation does not involve any form of barrier so is far more efficient. It is fine to synchronise on a variable which has no outstanding RMA communications and in this case the processes will continue immediately.

Completion of outstanding RMA means that all communications have fully completed, i.e. remote writes have completed and the data is visibile on the target process.

function void main() {
  var a:Int::allocated[single[on[1]]];
  var b:Int::allocated[multiple[]];

  b:=a;
  sync b;
};

The code snippet above illustrates a potential question here, based on the assignment b:=a (which involves RMA) if the programmer wished to synchronise the RMA for this assignment, should they issue sync b or sync a? The simple answer is that it doesn't matter as for synchronisation an assignment will tie the variables together so that, for instance sync b will synchronise RMA for variable b, RMA for variable a and any other tied RMA for both these variables and their own tied variables.

Eager RMA

var a:array[Int,10]::allocated[single[on[1]]];
proc 0 {
    var i;
    for i from 0 to 7 {			
        a[i]:=i;
    };
    sync a;
};   

We saw this example previously, where process zero will most likely write out the value of 10 (variable i after the loop) to all elements of the array, this is because the remote write is issued based on the variable rather than the variable's value. You could instead place the sync a call directly after the assignment, or alternatively remove this call all together and append the eageronesided type to the type chain of variable a which will ensure the RMA communication and completion is atomic.

Bulk Synchronous RMA

Many of the RMA examples we have seen in these tutorials follow a bulk synchronous approach (similar to fences), where all processes will synchronise, then communicate and then synchronise again before continuing with computation.

function void main() {
  var a:Int::allocated[single[on[1]]];
  var b:Int::allocated[multiple[]];

  proc 1 {
     a:=55;
  };
  sync;
  b:=a;
  sync;
  proc 1 {
     a:=15
  };
};

Because RMA communication is non-blocking and may complete at any point from issuing the communication up until the synchronisation, in the example here we need two sync calls. The first one ensures that process zero doesn't race ahead and issue the remote read before process one has written the value of 55 into variable a. The second synchronisation call ensures that process one doesn't then rush ahead and overwrite the value of a with 15 until process zero has finished remotely reading it. If this last assignment (a:=15) did not exist then the last synchronisation could be weakened into sync b (or sync a) which will complete RMA on process zero at that point and process one would be free to rush ahead.

Notify and wait

The bulk synchronous approach is simple but not very scalable, certainly it is possible to play with different synchronisation options (for instance putting them inside the process selection blocks) but care must be taken for data consistency. Another approach is to use the notify and wait support of the parallel function library. The notify function will send a notification to a specific process and the wait function will block and wait for a notification from a specific process.

#include <io>
#include <string>
#include <parallel>

function void main() {
    var j:array[Int,10]::allocated[single[on[1]]];	

    proc 0 {
        var d:array[Int,10];
        var i;
        for i from 0 to 9 {
            d[i]:=i;
        };
        j:=d;
        sync j;
        notify(1);
    };

    proc 1 {
        wait(0);
        var i;
        for i from 0 to 9 {
            print(itostring(j[i])+"\n");
        };
    };
};

In the example here process zero will issue a remote write to variable j (held on process one), then synchronise (complete) this RMA before sending a notification to process one. Process one will block waiting for a notification from process zero, and once it has received a notification will display its local values of j. Due to the notification and waiting these values will be those written by process zero, if you comment out the wait call then process one will just display zeros.

There are some variation of these calls notifyall to notify all processes, waitany to wait for a notification from any process and test_notification to test whether there is a notification from a specific process or not.

#include <io>
#include <string>
#include <parallel>

function void main() {
    var j:array[Int,10]::allocated[single[on[2]]];	

    proc 0 {
        var d:array[Int,10];
        var i;
        for i from 0 to 9 {
            d[i]:=i;
        };
        j:=d;
        sync j;
        notifyall();
    };
    proc 1 {
        var m:array[Int,10];
        var p:=waitany();
        m:=j;
        sync m;
        var i;
        for i from 0 to 9 {
            print(itostring(m[i])+" written by process "+itostring(p)+"\n");
        };		
    };
    proc 2 {
        while (!test_notification(0)) { };
        var i;
        for i from 0 to 9 {
            print("Local value is "+itostring(j[i])+"\n");
        };
    };
};

This example extends the previous one, here j is held on process two and process zero remotely writes to it and then issues notifyall to send a notification to every other process. These other two processes could have used the wait call as per the previous example, but instead process one will wait on a notification from any process (which returns the ID of the process that issued that notification which is displayed) and process two tests for a notification and loops whilst this returns false.