Designing ParaSail, a new programming language: October 2009

Tuesday, October 20, 2009

Using "=?" in ParaSail to implement "==", "!=", "<=", ">=", etc.

Here is a simple ParaSail feature. It is often the case that a user-defined type will have its own definition for equality and comparison. However, it is always a bit of an issue to make sure that "==" and "!=" are complements, and similarly for "<" and ">=", ">" and "<=". ParaSail skirts this issue by requiring the user to define only one operator, "=?", which returns a value from the enumeration Less, Equal, Greater, Unordered. The comparison and equality operators are defined in terms of "=?" (which is pronounced "compare"), in the natural way:

"==" -- Result of "=?" is Equal
"!=" -- Result of "=?" is not Equal
"<" -- Result of "=?" is Less
">" -- Result of "=?" is Greater
"<=" -- Result of "=?" is Less or Equal
">=" -- Result of "=?" is Greater or Equal

"Unordered" is used for types with only a partial ordering (or no ordering at all). "Ordered" is defined as the subtype containing only the values "Equal," "Less," and "Greater." The typical generic "Sort" operation would expect a "=?" operator whose result is guaranteed (via a postcondition) to be within the "Ordered" subtype.

The "=?" operator can be used explicitly if desired, which is often convenient when searching a binary tree:

    case Desired_Key =? Node.Key of
        Equal :   return Node;
        Less :    return Search(Node.Left, Desired_Key);
        Greater : return Search(Node.Right, Desired_Key);
    end case;

This would be another situation where the "=?" would be required to return an Ordered result, since we don't have an alternative that handles "Unordered."

Saturday, October 17, 2009

ParaSail's implicit parallelism

As mentioned in the original overview of ParaSail, implicit parallelism is at the heart of ParaSail (and of its name!). Every procedure/function call with multiple parameters involves implicit parallelism, in that all of the parameters are evaluated in parallel. Handoff semantics is used for writable parameters, meaning that a (non-concurrent) object that is writable by one parameter evaluation, isn't available to any other parameter evaluation. Operations on concurrent objects are not ordered.

Furthermore, in a sequence of statements, the only default limitation on parallelism is that imposed by operations on non-concurrent objects. Parallelism can be inhibited by using ";;" instead of merely ";" to separate two statements. This forces sequencing for operations on concurrent objects, which would otherwise not have any specified order. By contrast, "||" can be used to indicate that parallelism is expected, and it is an error if there are conflicting operations on non-concurrent objects in the statements on either side of the "||". That is, two statements separated by "||" are executed effectively in parallel, just as though there were embedded within two different parameter evaluations to a single procedure or function call.

For example:

    X := F(A, B);
    Y := G(B, C);;
    Z := H(R, S) || Z2 := Q(R, T);

As implied above by the formatting, "||" has highest precedence (binds most tightly), ";" next, and ";;" lowest (binds least tightly). So in the above, the assignments to X and Y are ordered only if that would be required by the shared use of B (at least one of them has B as a writable parameter). The assignments to Z and Z2 occur in parallel, and it would be an error if the shared use of R is unsafe (e.g. because R is writable by one or both, and is not a concurrent object). The assignments to X and Y will complete before beginning the assignments to Z and Z2 (though of course, a "very intelligent" compiler might still find parts that can be executed in parallel because they can't possibly affect one another). In all cases the evaluations of the parameters to a single procedure/function call happen in parallel.

Parentheses can be used to override the precedence for the statement separators:

    (M := N(A, B) ;; P := T(A, X)) ||
      (J := K(X, Y); M2 := N2(Y, Z));

In the above the assignment to M completes before the assignment to P starts. The assignments to J and M2 occur in parallel unless the shared use of Y makes that unsafe. And the assignments to M and P occur in parallel with the assignments to J and M2, and it is an error if there are conflicts due to shared use of X.

Note that ";" is both a statement separator and a statement terminator. On the other hand, ";;" and "||" are only meaningful as statement separators.

Implicit parallelism also shows up in ParaSail loops. The iterations of a for loop in ParaSail are executed as though each iteration were separated from the next by a ";", but with no particular ordering on the iterations. A particular ordering may be specified by using the forward or reverse keyword, which effectively puts a ";;" between the iterations. Alternatively, the concurrent keyword may be used to indicate that the iterations should be effectively separated by "||", in which case it is an error if there are any conflicts due to operations on a non-concurrent variable. For example:

    // compute sum of array A 
    for I in 1..10 loop
        Sum += A[I];
    end loop;

    // find last element equal to Z
    for I in 1..10 reverse loop
        if A[I] == Z then
             Last_Z := I;
             exit loop;
        end if;
    end loop;

    // sum the arrays B and C to produce A
    for I in 1..10 concurrent loop
        A[I] := B[I] + C[I];
    end loop;

Not too surprisingly, an exit loop statement is only permitted in a forward or reverse loop.

As illustrated above, parallelism is the default in ParaSail. It is our belief that that is the only way that programmers will learn to take full advantage of the multicore world. That is, in ParaSail, it is more work to create non-parallel algorithms than to create parallel ones, so programmers, being fundamentally a lazy lot, will end up writing mostly parallel algorithms in ParaSail.

We will cover ParaSail's structured synchronization mechanism (concurrent interfaces) in another post.

Monday, October 12, 2009

ParaSail has no global variables

ParaSail has no global variables. So how does that work? Global variables have long been recognized as trouble makers in the software world, and they get much worse in a language with lots of parallelism. But can we eliminate global variables completely? ParaSail tries to do that by eliminating any notion of statically allocated variables. In C or C++, global variables are those marked static or extern, or in a language with modules/packages like Ada, global variables are those declared at the module level. Although their declaration might be hidden, such variables still represent global, variable state, and still create trouble.

In ParaSail, interface modules are generic types, and class modules implement interfaces. That is, after providing generic parameters to an interface, you end up with a type. Any variable declared in a class is what is often called an instance variable, that is, there is one per object of the type defined by the class. There is nothing corresponding to what are called class variables. In other words, the variables declared in a class will always be effectively fields/components of some enclosing object. You can of course also have local variables inside the body of a procedure or function, but these are never statically allocated; they are stack-resident variables only (automatic in C parlance).

So what do we have instead? In many cases, we simply substitute local variables declared in some relatively high-level procedure or function body, which are then passed (by reference) to other operations as parameters. In ParaSail, passing a variable as a parameter generally has handoff semantics, meaning that it can't be simultaneously passed to multiple operations that might execute in parallel, nor can it be passed twice to the same operation as two different parameters (thereby eliminating certain kinds of aliasing problems).

When it is necessary to have a data structure that can be accessible to multiple operations running in parallel, the data structure must be explicitly designed for concurrent access. In other words it must be a concurrent data structure. In ParaSail, an interface may be specified as a concurrent interface, and then all objects of a type defined by the interface support concurrent access.

Containers represent a kind of middle ground, where the enclosing container might be designed for concurrent access, but the individual elements might not be, and so handoff semantics would again apply to the individual elements.

There may be some concurrent interfaces that typically have exactly one object; that is they are typically singleton interfaces. For example, the file system, or the database manager, or the system clock. ParaSail provides a conceptual global singleton Context interface for these, whose singleton object has one component for each such singleton interface, identified using the name of the interface. The Context must appear as a parameter for operations that ultimately need access to one of these singleton interfaces, signifying that the operation has potential side effects on these singleton objects. It would always be preferable to pass in the actual singleton objects needed, but as a convenience, the Context pseudo-object is available.

The Context pseudo-object is provided as a parameter to the main entry point of a ParaSail application, so that it can gain access to these singletons and pass them on to callers. If an operation has a Context parameter, and its caller also has a Context parameter, then the caller's parameter is passed automatically to the operation if not specified explicitly. The Context parameter may be read-only ("in") or read-write ("in-out"/"var"). Not surprisingly, the actual Context parameter must be read-write if the formal is read-write.

Saturday, October 3, 2009

ParaSail Topics to come

The prior entries gave an overview of ParaSail. Over the next few weeks I hope to write additional entries about:

ParaSail has no global variables, so how does that work?
how iterators, streams and for-loops are interrelated in ParaSail;
using a single compare primitive ("=?") in ParaSail to implement "==", "!=", "<=", ">=", etc.
ParaSail's implicit parallelism and structured synchronization;
ParaSail private interfaces and interface namespace hierarchy;
ParaSail's use of region-based storage management rather than object-at-a-time garbage collection;
ParaSail's use of compile-time checks (mostly), run-time checks (goal is to never need them), and exceptions (only when complexity or race-conditions make them necessary);
how attributes like taintedness (i.e. might this data come from a malicious source?) or physical units (e.g. meters/second, years, ...) can be "layered" onto existing types, along with appropriate pre- and postconditions, to provide additional security, safety, or correctness checking. Add-on attributes can also be associated with statements or other ParaSail constructs, such as an attribute indicating whether a given statement or operation is potentially blocking (that is, it might wait indefinitely on some queue or some other synchronization structure), or the total amount of stack space required by a given statement or operation.

So stay tuned. If you as a reader have other topics you would be interested in seeing further elaborated, please post a comment to that effect.

Thursday, October 1, 2009

ParaSail pass-by-reference

In ParaSail, operations like array indexing generally take and return a ref. What exactly is a ref? The intent is that it is roughly an address, but we would also like to use them as representing an arbitrary place in a more general container-like structure (such as a hash table). And we might want to use them to query whether an object is present, without actually inserting into the container as a side-effect. So it might be useful to think of a ref as a special kind of object with three operations:

fetch contents
store contents
check if present/exists/is-initialized

One could define the "Ref" interface roughly as follows:

  abstract interface Ref<Target_Type is Any<>> 
  is
      function Exists(R : Ref) -> Boolean;
      function Fetch(R : Ref {Exists(R)}) -> Target_Type;
      procedure Store(R : Ref; 
        New_Contents : Target_Type) {Exists(R)};
  end interface Ref;

A ref is sort of a place-holder, or perhaps a prescription for where to store an object, without necessarily making room for the object until it is actually stored. For a hash-table like container, a ref to one of its elements might be the key value, which isn't actually inserted into the hash-table until the Store occurs. One could imagine a hashed-mapping or a sparse array where you specify a "default" value for elements that have never been stored, and for such a structure, Exists would always return True, and Fetch would return the default value if no prior Store into the same ref had occurred. And conceivably Store would be a no-op if the default value is being stored and no prior non-default value had been stored.

Another characteristic of a ref is whether the target object can be written, or only read. That is, a read-only Ref would not have the Store operation. It is desirable if the read-only-ness of a ref automatically flows through a function that takes the ref as a parameter. That is, if the parameter to a function is a read-only ref, then clearly if the function returns a ref, then that returned ref should also be considered read-only.

This flow-through is common in many programming languages, where if an array is a constant, then so are its elements. It is desirable not to have to define two versions of operator "[]", one for read-only refs, and one for writable refs, since presumably they will have an essentially identical implementation. This argues for using ref as a parameter only when this flow-through semantics is desired, and presumably there will be an output that is also a ref. The read-only-ness of the result ref would be determined from that of the ref parameter(s), and only if all of the ref parameters are writable would the result ref be writable.

If we don't want this flow-through, and instead really want to specify that a parameter be writable, then we might as well use something more explicit such as "var" or "in out" for such a parameter mode.

Rather than building in the flow-through rule, we could instead add another query into our hypothetical Ref interface, namely Is_Writable, and then use appropriate pre- and postconditions to constrain how writability flows through. E.g. the postcondition for operator "[]" would be {Is_Writable(Result) = Is_Writable(Arr)}. Clearly also the precondition of the Store procedure would be {Is_Writable(R)}. This approach sounds more flexible, but it may be that we want the above flow-through rules as the default in the absence of some other specification.

Finally, we might want some default rules about whether Exists() should be True for a given ref parameter or result. One would think that in general one would expect Exists(X) to be true if X is an incoming ref parameter, but one would not necessarily expect Exists(Result) to be true if Result is a ref result of an operation like "[]". That is, the container should exist if you are indexing into it, but the selected element of the container doesn't necessarily exist just because you have used its key to index into the container.

Note that this general notion of Ref as an object in its own right would support object persistence fairly naturally, in that the Ref could be some kind of disk address, and not until you actually invoked Fetch or Store would you actually need to be sure that the relevant page of data is in memory. These Refs are analogous to the "smart pointers" of C++, but they have the added advantage that they automatically have limited lifetimes (like "regular" C++ references), so they won't outlive the container into which they refer.

When inside a function, the non-ref result(s) of the function are in fact probably represented as refs provided by the callers, indicating where the results should be stored. Such a ref would want to carry information about what region (in the region-based-storage-management sense) the result should be stored/built, so region can be thought of as another thing that is associated with a ref. These are sort of write-only refs, or at least ones where the function wouldn't want to presume that Fetch is well-defined before performing at least one Store via the ref. That is, Exists() would quite likely be False on entry to the function for each of the refs representing its results (aka "out" parameters).

Note that a general Ref object could also be used to represent an element of a packed array, where Fetch and Store would do the right thing in terms of unpacking/packing. Of course this is getting dangerously close to pass-by-name of Algol 60 fame, and clearly sounds like it would require a thunk or equivalent. But a (static or dynamic) polymorphic object and a thunk are not that much different anyway. The expectation would be that the types of these Ref objects would often be statically known, so a compiler could choose to inline as appropriate to achieve the desired level of performance.

We might want to generalize Ref objects further by adding Create_Referenced_Object and Delete_Referenced_Object operations, since we already have an operation for testing whether the referenced object Exists. This would make sense for a Ref to an element of a hash table, for example. Essentially for any Ref where Exists might return False, it might make sense to be able to Create or Delete the referenced object. In the above discussion, "Store" is presumed to create the referenced object if necessary, which is perhaps adequate for our purposes. But Delete would be useful as a way to restore the original state where Exists is False. This would only make sense for Refs where Exists can return False.

Designing ParaSail, a new programming language