Thursday, July 7, 2011

Implementing concurrent loops in ParaSail

The implementation of the ParaSail compiler and virtual machine (see http://parasail-programming-language.blogspot.com/2010/11/virtual-machine-for-parasail-with.html) are proceeding apace.  Basic support is there for interfaces, classes, operators, functions, procedures, module instantiations, object and type declarations, if, case, while, and assignment statements, user-defined indexing, parallel evaluation of expressions, parallel execution of statements on either side of a "||", etc.  However, a big gap at the moment is for loops, and in particular concurrent for loops.

For loops are a bit unusual in ParaSail, as they support both a continue loop and an exit  loop (i.e. "break") capability in the presence of parallelism.  Here is an example using continue loop:
for X => Root while X not null loop
    case X.Kind of
      [#leaf] => Process(X.Data);
      [#unary] =>
          Process(X.Data);
        ||
          continue loop with X => X.Operand;
      [#binary] =>
          Process(X.Data);
        ||
          continue loop with X => X.Left;
        ||
          continue loop with X => X.Right;
    end case;
end loop;
Here is an example using exit loop:
const Result : Id_Type; 
for X => Root then X.Left || X.Right while X not null 
  concurrent loop
    if X.Value == Desired_Value then
        exit loop with Result => X.Id;
    end if;
end loop with Result => null;
This first example processes an expression tree, and creates parallelism by using an explicit continue loop statement in concert with a concurrent sequence ("||") construct. The second example searches a binary tree and creates parallelism by both the use of concurrent loop and by having two values in the then part of the for loop header, which will result in the execution of the loop splitting into two separate threads at each binary node.

The semantics of saying concurrent loop rather than simply loop is that the next iteration starts before even starting the current iteration.  If we had said simply loop in the second example, then it would first check the Root node to see if it has the Desired_Value, and only if it did not would we proceed on to the next iteration(s), at which point we would check both the Left and Right subtrees (presuming both were non-null).  Since we said concurrent loop, the iteration on the Left and Right subtrees begins in parallel with executing the loop body on the Root.

The semantics of the exit loop statement are that all other threads within the loop being exited are stopped, and when the thread first reaching the exit loop is the only thread remaining, the assignments specified by the with-values clause are performed.  If the loop ends normally without an exit loop, then the assignments specified by the end with-values clause are performed.  The net effect is that if the Desired_Value is in the tree, Result ends up with the Id of that node.  If it is not found, Result ends up null.

See http://parasail-programming-language.blogspot.com/2010/06/generalized-for-loops-in-parasail.html and http://parasail-programming-language.blogspot.com/2010/06/intentional-race-condition-in-parasail.html for more details about the generalized for loop capabilities.

So given all of the above, what should be the implementation model for a for loop in ParaSail?  As mentioned in the blog entry on the ParaSail virtual machine (PSVM -- see link in first paragraph above), the implementation uses pico-threads to represent the potential parallelism in the code.  There are PSVM instructions for invoking a separate operation, or a local block, as a separate parallel pico-thread.  Currently this is used to support parallel evaluation of expressions or concurrent sequences ("||"), where each operand of a call on an operation, or each side of "||",  can be evaluated in its own pico-thread.  In fact, the compiler is a bit smarter than that, and it doesn't bother to create a pico-thread if the computation of the operand involves only simple built-in operations (such as basic arithmetic, numeric comparisons, etc.).  If it involves an out-of-line call then it becomes a candidate for being turned into a pico-thread (this is clearly an area for tuning of the compiler as more experience is gained).

So it makes sense for the body of a concurrent loop, or a loop that gains parallelism through multiple parallel "next" values or explicit uses of continue loop inside a concurrent sequence, to be executed as a separate pico-thread.  The compiler can turn the loop body into a local operation, with the parameter(s) being the loop index (or indices).  A continue loop is essentially a (tail) recursive call on this local loop-body operation, with the parameter being the next value specified in the with-values clause.  Similarly, if there are explicit "next" values given after then in the loop header, these can be handled using (parallel) calls on the loop-body operation, with the "next" value as the parameter.

Rather than using tail recursion, an alternative is to use a parallel call followed by a return, with the pico-thread for the parallel call being associated with a pico-thread master that controls the entire loop.  At the outer level, the whole loop can then be executed by creating the pico-thread master, and then starting the loop off with a parallel call on the loop-body operation, followed by waiting for the pico-thread master to be complete. 

The overall flow would be:
  • Create pico-thread master.
  • Initiate parallel-call on loop-body operation with parameter referring to initial value of loop index, presuming initial value satisfies the while condition (if any).
  • Wait for pico-thread master to be complete.
At the end of an iteration of a loop, or at a continue loop statement, the flow would be:
  • Determine next value(s) for loop index.  
  • Initiate parallel call(s) on loop-body operation with parameter referring to next value, presuming value satisfies the while condition (if any).
  • Return from the loop-body operation (don't wait for the call(s) to complete).
At the beginning of an iteration of a concurrent loop, the flow would be:
  • Determine next value(s) for loop index.  
  • Initiate parallel call(s) on loop-body operation with parameter referring to next value, presuming value satisfies the while condition (if any).
  • Perform the loop body for the current value.
  • Return from the loop-body operation (don't wait for the call(s) to complete).
When reaching an exit loop statement, the flow would be:
  • Tell pico-thread master of loop to terminate all but the current thread.
  • Wait for other threads to terminate.
  • Perform with-values assignment(s).
  • Return from the loop-body operation.

When the master is complete:
  • If there is a with-values clause on the end loop and the master completed normally, as opposed to being terminated early by an exit loop statement:
    • Perform the assignment(s) specified by the end loop with-values clause.

No comments:

Post a Comment