Thursday, June 12, 2014

Linkers and Types and Built-ins! Oh, my!

Hello, It's Justin again. See the previous blog post for a bit of context. (Hint: we're building the compiler)

Linking to the Built-ins

The interpreter has a library of functions it uses to evaluate ParaSail code. We don't want to and can't write every ParaSail operation directly in llvm. So, it was necessary to link the generated llvm code with the interpreter's built-in functions. At first we thought the built-ins needed to be translated to llvm code to successfully link with our generated llvm. To accomplish this, we used a tool called AdaMagic to convert the Ada source code (in which the built-ins are currently written) and Ada's run time system (RTS) to C source code then used the llvm C front-end "clang" to compile the rest of the way. Clang complained with hundreds of warnings, but, it worked. We were able to print integers, floats, and characters! We couldn't do much more with the built-ins at the time because we didn't yet have type descriptors working (see below).

After all the effort and dealing with the messy generated C code, we found out that the system linker could link object files compiled with the llvm backend called "llc" together with object files produced by the GNAT Ada compiler with no problem. That was a relief, as we really didn't want to go mucking around in the generated C code more than we had to.

Types

ParaSail types are represented in the interpreter with Type_Descriptors. Basically,a type descriptor is a large record with everything you need to know about a type. Most importantly, it holds a list of operations, a "null" value to be used for the type, and descriptors for any other type it depends on (such as the component type for an array type). When we want to execute compiled ParaSail code that has types in it, we need much of this information at run time, particularly when executing code that is "generic" in some way (such as a Set where we need to get the Hash operation for Some_Hashable_Type, and perhaps its representation for "null").

Here's an example:

func Count_Nulls(Numbers : Basic_Array<optional Univ_Integer>) -> Univ_Integer is
   var Count := 0;
   var I := 1;
   while I <= Length(Numbers) loop
      if Numbers[I] is null then
         Count += 1;
      end if;
      I += 1;
   end loop;
   return Count;
end func Count_Nulls;

func main(Args : Basic_Array<Univ_String>) is
   var A : Basic_Array<optional Univ_Integer> := Create(10, null);
   A[1] := 42;
   A[5] := 0;
   Print("There are ");
   Print(Count_Nulls(A));
   Println(" nulls");
end func main;

There are quite a few places that need type information, but the easiest to show is "Numbers[I] is null". There is a specific 64 bit value that represents null for the Univ_Integer type that needs to be checked against Numbers[I].

To accomplish this, we encode the type descriptors as "linkonce" global byte arrays in llvm. Before the ParaSail main routine is invoked, these byte streams are reconstructed into Type_Descriptors for use at run time. We reconstruct these by appending our own parameterless function to the @llvm.global_ctors global variable. But, we need to call into the Ada RTS while reconstructing type descriptors, and, much to our dismay, the functions in @llvm.global_ctors are called before the Ada RTS has a chance to initialize. To solve this problem, the parameterless function in global_ctors adds a node to a todo (linked) list. Then, after the Ada RTS has been initialized, we walk the "todo" list and reconstruct the type descriptors from the streams.

A very similar method is used to reconstruct the appropriate run-time representations for Univ_Strings and Univ_Enumerations from their stream representations.

Back to the example

First of all: It compiles and runs! Here's what it prints out:

There are 8 nulls

Woohoo!
(that part was me)

You might wonder why we used a 'while' loop instead of a 'for each' loop. Well, that's because for loops use parallel stuff we haven't implemented yet. Even a forward/reverse for loop uses parallel stuff because it allows you to do things like this

for I := 1 while I < N forward loop
   block
      continue loop with I => I * 2;
   ||
      continue loop with I => I + 1;
   end block;
end loop;

While loops can be implemented with simple llvm 'br' instructions.

We can't compile aaa.psi (the ParaSail standard library) yet, so we don't have access to all the modules defined there, but Basic_Array::Create, Basic_Array::"var_indexing", and Basic_Array::"indexing" are all built-ins, so we're able to use that container.

Other Updates

Main

If a ParaSail function has the name "main," with one argument of type Basic_Array<Univ_String>, and no return value, then that is treated as a special case by the llvm code generator, and it's name will be changed to "_parasail_main_routine". A real main function  "int main(int argc, char** argv)" function exists in ParaSail's RTS that builds up the Args array (well actually not yet, but it will!)  and calls this _parasail_main_routine. Why the preceding underscore you ask? Well, it's not legal to start an identifier with an underscore in ParaSail, so, we're insuring no name collisions this way. However, one downside of this is that it is currently not possible to call main from elsewhere in your program. However, that's not unprecedented, C++ disallows it as well. This isn't set in stone, though.

Reflection

   "Reflection" is a ParaSail module that allows ParaSail code to inspect itself and retrieve the ParaSail Virtual Machine (PSVM) instructions for any routine that is in the Interpreter's memory at the same time. This is how we compile: the user loads compiler.psl (the llvm code generator written in ParaSail itself), its dependent module(s), and the ParaSail source file to be compiled. Then, the main function in compiler.psl, called "Compile," searches through all the routines in the interpreter's memory (using the reflection module) until it finds those that come from the file whose name matches the file name given as an argument to Compile.  It then proceeds to translate each of those routines from PSVM instructions into LLVM instructions, as well as build up the various tables need for reconstructing type descriptors, strings, etc.

Static Link

   The Static Link passed implicitly in each ParaSail function call is used by the called function to retrieve data from outside the function.  If the function is nested inside another function, then the static link points to the local data area of that enclosing function.  If the function is a top-level operation of some module, then the static link points to a type descriptor for some instance of that module.  In a call on a stand-alone function, the static link is null.

We're making tremendous headway on this compiler and I'm very excited about where we could reach by the end of this summer.