Thursday, April 18, 2013

Systems Programming with Go, Rust, and ParaSail

Here is a paper that goes with the talk I am giving this coming Tuesday, April 23 at the DESIGN West, aka Embedded Systems Conference, in San Jose, on a comparison between Go, Rust, and ParaSail.

If you are in the Bay Area, come on down.  The talk is ESC-218, on Tuesday April 23 from 2:00-3:00PM in Salon 3:  http://www.ubmdesign.com/sanjose/schedule-builder/session-id/98


NOTE: Some browsers are having trouble with this version of the paper.  It was cut and pasted from a Word document, which is always dicey.  A PDF version of this paper is available on the ParaSail Google Group:

   http://groups.google.com/group/parasail-programming-language


Systems Programming in the Distributed, Multicore World with Go, Rust, and ParaSail 
S. Tucker Taft
AdaCore
24 Muzzey Street, 3rd Floor
Lexington, MA  02421
taft@adacore.com

Abstract                
The distributed, multicore train is stopping for no programmer, and especially the systems programmer will need to be ready to hop on to the distributed parallel programming paradigm to keep their systems running as efficiently as possible on the latest hardware environments.  There are three new systems programming languages that have appeared in the last few years which are attempting to provide a safe, productive, and efficient parallel programming capability.  Go is a new language from Google, Rust is a new language from Mozilla Research, and ParaSail is a new language from AdaCore.  This talk will describe the challenges these languages are trying to address, and the various similar and differing choices that have been made to solve these challenges.
Keywords multicore, distributed, parallel programming, systems programming language, Go language, Rust language, ParaSail language

1. Introduction
The distributed, multicore train is stopping for no programmer, and especially the systems programmer will need to be ready to hop on to the distributed parallel programming paradigm to keep their systems running as efficiently as possible on the latest hardware environments.  There are three new systems programming languages that have appeared in the last few years which are attempting to provide a safe, productive, and efficient parallel programming capability.  Go is a new language from Google [1], Rust is a new language from Mozilla [2], and ParaSail is a new language from AdaCore [3][4].
The designers of Go, Rust, and ParaSail all are facing a common challenge -- how to help programmers address the new distributed and multicore architectures, without having the complexity of programming going past that which is manageable by the professional, yet still human, programmer.  All programming languages evolve, and as a rule, they tend to get more complex, not less so.  If every time a new hardware architecture becomes important, the programming language is enhanced to provide better support for that architecture, the language can become totally unwieldy, even for the best programmers.  When the architectures changes radically, as with the new massively distributed and/or multicore/manycore architectures, this may mean that the language no longer hangs together at all, and instead has become a federation of sublanguages, much like a house that has been added onto repeatedly with a different style for each addition.
Because of the complexity curse associated with language evolution, when there is a significant shift in the hardware landscape, there is a strong temptation to start over in programming language design.  After many years of a relatively stable programming language world, we now see a new burst of activity on the language design front, inspired in large part by the sense that our existing mainstream languages are either not going to be supportive enough, or that they are becoming simply too complex in trying to support both the old and new hardware paradigms through a series of language extensions.
2. Go from Google
Go, Rust, and ParaSail all emerged over the past few years, each with its own approach to managing complexity while supporting parallelism.  Go from Google is the brain child of Rob Pike and his colleagues at Google.  Rob was at Bell Labs in the early Unix and C days, and in many ways Go inherits the C tradition of simplicity and power. Unlike C, storage management has been taken over by the Go run-time through a general purpose garbage collection approach, but like C, care is still needed in other areas to ensure overall program safety. 
From the multicore perspective, Go uses goroutines for structuring large computations into a set of smaller potentially parallel computations.  Goroutines are easy to create – essentially any stand-alone call on a function or a method can be turned into a goroutine by simply prefixing it with the word “go.”  Once a goroutine is spawned, it executes independently of the rest of the code.  A goroutine is allowed to outlive the function that spawns it, thanks to the support of the garbage collector; local variables of the spawning function will live as long as necessary if they are visible to the spawned goroutine.
For goroutines to be useful, they need to communicate their results back to the spawning routine.  This is generally done using strongly-typed channels in Go.  A channel can be passed as a parameter as part of spawning a goroutine, and then as the goroutine performs its computation it can send one or more results into the channel.  Meanwhile, at some appropriate point after spawning the goroutine, the spawner can attempt to receive one or more values from the channel.  A channel can be unbuffered, providing a synchronous communication between sender and receiver, or it can provide a buffer of a specified size, effectively creating a message queue. 

Communication between goroutines can also go directly through shared global variables.  However, some sort of synchronization through channels or explicit locks is required to ensure that the shared variables are updated and read in an appropriate sequence.

Here is an example Go program that counts the number of words in a string, presuming they are separated by one or more separator characters, dividing multi-character strings in half and passing them off to goroutines for recursive word counts:
 func Word_Count
   (s string; separators string) int = {
     const slen = len(s)
     switch slen {
       case 0: return 0 // Empty string
       case 1:          // single-char string
         if strings.ContainsRune
               (separators, S[0]) {
             return 0  // A single separator
         } else {
             return 1  // A single non-separator
         }
       default:   // divide string and recurse
         const half_len = slen/2
         // Create two chans and two goroutines
         var left_sum = make(chan int)   
         var right_sum = make(chan int)
         go func(){left_sum <- span="" word_count="">
              (s[0..half_len], separators)}()
         go func() {right_sum <- span="" word_count="">
              (s[half_len..slen], separators)}()
         // Read partial sums
         // and adjust total if word was divided
         if strings.ContainsRune
              (separators, s[half_len-1]) ||
           strings.ContainsRune
              (separators, s[half_len]) {
             // No adjustment needed
             return <-left_sum right_sum="" span="">
         } else {// Minus 1 because word divided
             return <-left_sum -="" 1="" right_sum="" span="">
         }
     }
  }
2.1 Unique Features of Go
Go has some unusual features.  Whether a declaration is exported is determined by whether or not its name begins with an upper-case letter (as defined by Unicode); if the declaration is a package-level declaration or is the declaration of a field or a method, then if the name starts with an upper-case letter, the declaration is visible outside the current package.

Every Go source file begins with a specification of the package it is defining (possibly only in part).  One source file may import declarations from another by specifying the path to the file that contains the declarations, but within the importing code the exported declarations of the imported code are referenced using the imported file’s package name, which need not match that of the imported file’s filename.  Of course projects would typically establish standard naming conventions which would align source file names and package names somehow.

Go provides a reflection capability, which is used, for example, to convert an object of an arbitrary type into a human-readable representation.  The “%v” format in Go’s version of printf does this, allowing arbitrarily complex structs to be written out with something as simple as:

fmt.Printf(“%v”, my_complex_object)

Printf is implemented in Go itself, using the “reflect” package.

There are no uninitialized variables in Go.  If a variable is not explicitly initialized when declared, it is initialized by default to the zero of its type, where each type has an appropriately-defined zero, typically either the zero numeric value or the nil (aka “null”) pointer value, or some composite object with all components having their appropriate zero value.
2.2 What Go Leaves Out
Because complexity was a major concern during all three of these language designs, some of the most important design decisions were about what to leave out of the language.  Here we mention some of the features that Go does not have.

Go does not permit direct cyclic dependencies between packages.  However, the Go interface capability permits the construction of  recursive control or data structures that cross packages, because an interface declared in one package can be implemented by a type declared in another package without either package being directly dependent on the other.

Like C, Go has no generic template facility.  There are some builtin type constructors, such as array, map, and chan, which are effectively parameterized type constructors, but there is no way for the user to create their own such type constructor.  Unlike C, there is no macro facility which might be used to create something like a parameterized type.  Nevertheless, Go’s flexible interface and reflection capabilities allow the creation of complex data structures that depend only on the presence of a user-provided method such as Hash and the DeepEqual function of the reflect package.

Go does not allow user-defined operators.  Various operators are built in for the built-in types, such as int and float32.  Interestingly enough, Go does include built-in complex types (complex64 and complex128) with appropriate operators.

Go does not have exceptions.  However, functions and methods  may return multiple results, and often errors are represented by having a second return value called error that is non-nil on error.  Unlike in C, you cannot ignore such an extra parameter; unless you explicitly assign it to an object of name “_”.  When things go really wrong in Go, a run-time panic ensues, and presumably during development, you are tossed into a debugger.
3. Rust from Mozilla Research
The modern web browser represents one of the most complex and critical pieces of software of the internet era.  The browser is also a place where performance is critical, and there are many opportunities for using parallelism as a web page is “rendered.”  The Rust language arose originally as a personal project by one of the engineers at Mozilla Research (Graydon Hoare), and has grown now into a Mozilla-sponsored research effort.  Rust has been designed to help address the complexity of building components of a modern browser-centric software infrastructure, in the context of the new distributed multicore hardware environment.

Like Go, Rust has chosen to simplify storage management by building garbage collection into the language.  Unlike Go, Rust has chosen to restrict garbage collection to per-task heaps, and adopt a unique ownership policy for data that can be exchanged between tasks.  What this means is that data that can be shared between tasks is visible to only one of them at a time, and only through a single pointer at a time (hence an owning pointer).  This eliminates the possibility of data races between tasks, and eliminates the need for a garbage collector for this global data exchange heap.  When an owning pointer is discarded, the storage designated by the pointer may be immediately reclaimed – so no garbage accumulates in the global exchange heap.

Here is a Rust version of the Word Count program, recursing on multi-character strings with subtasks encapsulated as futures computing the subtotals of each string slice:

fn Word_Count
   (S : &str; Separators : &str) -> uint {
     let Len = S.len();
     match Len {
       0 => return 0; // Empty string
       1 => return    // one-char string
         if Separators.contains(S[0]) { 0 }
           else { 1 };  // 0 or 1 words
       _ =>           // Divide and recurse
         let Half_Len = Len/2;
         let Left_Sum = future::spawn {
           || Word_Count(S.slice
                 (0, Half_Len-1), Separators)};
         let Right_Sum = future::spawn {
           || Word_Count(S.slice
                 (Half_Len, Len-1), Separators)};
         // Adjust sum if a word is divided
         if Separators.contains(S[Half_Len]) ||
           Separators.contains(S[Half_Len+1]) {
             // No adjustment needed
             return
              Left_Sum.get() + Right_Sum.get(); 
         } else {
             // Subtract one because word divided
             return
              Left_Sum.get()+Right_Sum.get() – 1;
         }
     }
  }

Rust does not have special syntax for spawning a “task” (Rust’s equivalent of a “goroutine”) nor declaring the equivalent of a “channel,” but instead relies on its generic template facility and a run-time library of  threading and synchronization capabilities.  In the above example, we illustrate the use of futures which are essentially a combination of a task and an unbuffered channel used to capture the result of a computation.  There are several other mechanisms for spawning and coordinating tasks, but they all depend on the basic tasking model, as mentioned above, where each task has its own garbage-collected heap for task-local computation (manipulated by what Rust calls managed pointers), plus access via owning pointers to data that can be shared between tasks (by sending an owning pointer in a message).
3.1 Rust Memory Management Performance
One of the major advantages of the Rust approach to memory management is that garbage collection is local to a single task.  By contrast, in Go each garbage collector thread has to operate on data that is potentially visible to all goroutines, requiring a garbage collection algorithm that synchronizes properly with all of the active goroutines, as well as with any other concurrent garbage collector threads (presuming garbage collection itself needs to take advantage of parallel processing to keep up with multithreaded garbage generation). 

In Rust, a conventional single-threaded garbage collector algorithm is adequate, because any given garbage collector is working on a single per-task heap.  Furthermore, storage visible via owning pointers needs no garbage collection at all, as releasing an owning pointer means that the associated storage can also be released immediately.
3.2 The Costs of Safety and Performance
One of the downsides of added safety and performance can be added complexity.  As we see, Rust has added safety by allowing access to sharable data only via pointers that give exclusive access to one task at a time, and added performance because garbage collection is single-threaded.  However, as a result, Rust needs several kinds of pointers.  In fact, there are four kinds of pointers in Rust, managed pointers (identified by ‘@’as a prefix on a type) for per-task garbage-collected storage, owning pointers (identified by ‘~’) for data that is sharable between tasks, borrowed pointers (identified by ‘&’) that can temporarily refer to either per-task or sharable data, and raw pointers (identified by ‘*’), analogous to typical C pointers, with no guarantees of safety.
4. ParaSail from AdaCore
The ParaSail language from AdaCore takes support for parallelism one step further than Go or Rust, by treating all expression evaluation in the language as implicitly parallel, while also embracing full type and data-race safety.  Rather than adding complexity to accomplish this, the explicit goal for ParaSail was to achieve safety and pervasive parallelism by simplifying the language, eliminating impediments to parallelism by eliminating many of the features that make safe, implicit parallelism harder.
4.1 What ParaSail Leaves Out
Some of the features left out of ParaSail include the following:

·       No pointers
·       No global variables
·       No run-time exception handling
·       No explicit threads, no explicit locking nor signaling
·       No explicit heap, no garbage collection needed
·       No parameter aliasing

So what is left?  ParaSail provides a familiar class-and-interface-based object-oriented programming model, with mutable objects and assignment statements.  But ParaSail also supports a highly functional style of programming, aided by the lack of global variables, where the only variable data visible to a function is via parameters explicitly specified as var parameters.   This means that the side-effects of a function are fully captured by its parameter profile, which together with the lack of parameter aliasing allows the compiler to verify easily whether two computations can safely be performed in parallel.

By design, every expression in ParaSail can be evaluated in parallel.  If two parts of the same expression might conflict, the expression is not a legal ParaSail expression.  The net effect is that the compiler can choose where and when to insert parallelism strictly based on what makes the most sense from a performance point of view.  Here, for example, is the Word Count example in ParaSail, where parallelism is implicit in the recursive calls on Word Count, without any explicit action by the programmer:

func Word_Count
  (S : Univ_String;
   Separators :
        Countable_Set := [' '])
   -> Univ_Integer is
    case |S| of
      [0] => return 0  // Empty string
      [1] =>
        if S[1] in Separators then
            return 0   // No words
        else
            return 1   // One word
        end if
      [..] =>          // Divide and recurse
        const Half_Len := |S|/2
        const Sum := Word_Count
            (S[1 .. Half_Len], Separators) +
          Word_Count
            (S[Half_Len <.. |S|], Separators)
        if S[Half_Len] in Separators or else
          S[Half_Len+1] in Separators then
            return Sum  // No adjustment needed
        else
            return Sum-1   // Adjust sum
        end if
    end case
end func Word_Count

Although there is no explicit use of a parallel construct, the sum of the two recursive calls on Word_Count can be evaluated in parallel, with the compiler automatically creating a picothread for each recursive call, waiting for their completion, and then summing the results, without the programmer having to add explicit directives.
4.2 Implicit and Explicit Parallelism in ParaSail
Explicit parallelism may be specified if desired in ParaSail, or the programmer can simply rely on the compiler to insert it where it makes the most sense.  The general philosophy is that the semantics are parallel by default, and the programmer needs to work a bit harder if they want to force sequential semantics.  For example, statements in ParaSail can be separated as usual with “;” (which is implicit at the end of the line when appropriate), or by “||” if the programmer wants to request explicit parallelism, or by “then” if the programmer wants to disallow implicit parallelism.  By default the compiler will evaluate two statements in parallel if there are no data dependences between them. 

As another example of ParaSail’s implicit and explicit parallelism, the iterations of “for I in 1..10 loop” are by default executed in any order, including parallel if there are no data dependences between the loop iterations, while “for I in 1..10 forward loop” or “for I in 1..10 reverse loop” may be specified to prevent parallel evaluation, and “for I in 1..10 concurrent loop” may be used to specify that parallel evaluation is desired, and it is an error if there are any data dependences between the iterations.  In all these cases, the compiler will ensure that any parallel evaluation is safe and data-race free, and will complain if there are potential race conditions when parallel evaluation semantics are specified.
4.3 Simplicity breeds Simplicity in ParaSail
There is somewhat of a virtuous cycle that occurs when a programming language is simplified, in that one simplification can lead to another.  By eliminating pointers and a global heap from ParaSail, the language can provide fully automatic storage management without the need for a garbage collector.  Objects in ParaSail have value semantics meaning that assignment copies the value of the right-hand side into the left-hand side, with no sharing of data.  A built-in move operation is provided for moving the value of the right-hand side into the left-hand side, along with a swap for swapping values, thereby reducing the cost of copying while still preserving value semantics.

Every type in ParaSail has an extra value, called null, which is used to represent an empty or otherwise uninitialized value.  An object or component may have a null value of its type T only if it is declared to be “optional T”.  Optional components may be used to implement trees and linked lists, without the use of explicit pointers, and without the potential sharing issues associated with pointers, even if behind the scenes the compiler uses pointers to implement optional objects or components.  The availability of optional components effectively allows an object to grow and shrink, meaning that dynamic structures like hash tables, and higher level notions such as maps and sets, can be implemented in ParaSail without any explicit pointers, with the advantages of purely value semantics.

The lack of explicit pointers means that all objects in ParaSail effectively live on the stack, even though they may still grow and shrink.  Each scope has its own region, effectively a local heap that expands and contracts to hold the objects associated with the scope.  No garbage collector is needed because when an object goes out of scope, or an object or one of its component is set back to null, the associated storage may be immediately reclaimed, much like storage designated by owning pointers in Rust.  By simplifying the type model, the storage management in ParaSail is dramatically simpler and of higher performance, with no complex parallel garbage collection algorithm required.
5. Implementation Status and Conclusions
The programming language design world has been rejuvenated by the new challenges of distributed and multicore architectures.  Three new programming languages designed for building industrial-strength systems have emerged, Go, Rust, and ParaSail.  Each of these languages tries to make parallel programming simpler and safer, while still providing the level of power and performance needed for the most critical systems development tasks. 

From an implementation status point of view, Go is the most stable of these three new languages, with two compilers available, and a number of systems built with Go now being deployed.  Rust and ParaSail are still under development, but both are available for trial use, with Rust having early on achieved the compiler boot strap milestone, where the Rust compiler is itself implemented in Rust.  All three languages provide opportunities for the professional programmer to expand their horizons, and see what sort of new paradigms and idioms will become more important as we leave behind the era of simple sequential processing, and enter the era of massively parallel, widely distributed computing.
References        
[1]    The Go Programming Language, Google Corporation, http://golang.org
[2]    The Rust Programming Language, Mozilla Research, http://www.rust-lang.org
[3]    ParaSail: Less is More with Multicore, S. Tucker Taft, http://www.embedded.com/design/other/4375616/ParaSail--Less-is-more-with-multicore (retrieved 4/1/2013).
[4]    Designing ParaSail: A New Programming Language, S. Tucker Taft, http://parasail-programming-language.blogspot.com