We’ve come a long way, from our first look at the π-calculus, to modeling a web service, and exploring the semantics. Now let’s start moving towards something we can actually execute. We’ll turn the calculus into a little programming language and start working on our compiler.
Let’s start with the language syntax. Spaces, tabs, carriage returns and line feeds are ignored. Identifiers are strings of lowercase letters and digits, although the first character must be a letter; and new is a reserved word—it isn’t an identifier. Then the syntax of our little π programming language is (I’ve put the terminal symbols in bold:)
Process ::=
0
Process | Process
ID ( ID ) . Process
ID < ID > . Process
( new ID ) Process
! Process
( Process )
Does this look familiar? It’s essentially the structure of the calculus outlined when we started, except we’ve replaced ν, which is hard to type, with the word new.
This grammar is ambiguous. For example, if we write x(y).P|Q, does this mean: receive y on x, then continue with P and Q concurrently; or does it mean concurrently wait for y on x while running Q? Let’s assume it means that last one—if you look at the example in Parts II and III we wrote some long strings of sequential sends and receives for the client and server, and then ran them concurrently with the | symbol. Making send/receive bind tighter than | does coincides with the intuition there.
Of course, sometimes the programmer might really mean that first thing—receive y on x, then continue with P and Q concurrently—and for that purpose I added a bracketed expression. Now we can write x(y).(P|Q) to mean that thing.
With syntax out of the way we need to define what these phrases in our programming language actually mean: semantics. This is something we basically defined in Part I as →. We’ll just use that!
From Programming Language to Compiler
Let’s get our hands dirty and write a compiler for our programming language. I recommend using F#—it’s an excellent programming language for writing compilers. There’s also a great library, AbsIL, for generating .NET executables. You can get both of these from the Microsoft Research Downloads page.
Our compiler needs to do three things: Convert source code into abstract syntax which is easier to manipulate than source code, work out what that abstract syntax means, and then write a .NET executable that does what the compiler worked out. Because the abstract syntax sits in the middle of all of these moving parts, let’s start by defining it. I’m putting this in a file called ast.fs:
type id = string
type proc =
| Inert
| Par of proc * proc
| Recv of id * id * proc
| Send of id * id * proc
| New of id * proc
| Repeat of proc
We’re really just repeating the grammar of our programming language in F#: IDs are strings; processes (proc) are one of six different things—the do-nothing process (Inert); two processes concurrently (Par); receive a message (Recv); send a message (Send); use a new name (New); do something forever (Repeat.)
Now let’s work on translating text the programmer writes into these new F# terms—Inert, Par, and so on. F# comes with a couple of tools, fsyacc and fslex, for doing this translation. The lexer file defines the low-level translation from characters into larger, but intrinsically meaningless, tokens; the parser file defines how to interpret sequences of tokens as phrases. Luckily there’s an example in the F# distribution, at samples\fsharp\parsing; let’s work from there:
{
open System
open Pars
open Lexing
(* Fslex generated parsers follow the same pattern as OCamllex *)
(* and Mossmllex generated parsers, and do not update line number *)
(* information automatically, partly because the knowledge of when *)
(* a newline has occured is best placed in the lexer rules. *)
(* Thus the following boiler-plate code is very useful *)
let inc_lnum bol pos =
let lnum = pos.pos_lnum in
{pos with pos_lnum = lnum+1; pos_bol = bol }
let newline lexbuf =
lexbuf_set_curr_p lexbuf
( inc_lnum (lexeme_end lexbuf) (lexeme_end_p lexbuf))
}
let whitespace = [' ' '\t']
let newline = ('\n' | '\r' '\n')
rule token = parse
| whitespace { token lexbuf }
| newline { newline lexbuf; token lexbuf }
| "(" { LPAREN }
| ")" { RPAREN }
| "<" { LANGLE }
| ">" { RANGLE }
| "0" { INERT }
| "." { SEQ }
| "|" { PAR }
| "!" { REP }
| "new" { NEW }
| ['a'-'z']['a'-'z' '0'-'9']* { ID(lexeme lexbuf) }
| eof { EOF }
That was a mouthful! The first part, between the curly braces, is just boilerplate copied from the sample. The second defines how we skip spaces, but are interested in parentheses and angle brackets; the digit zero; dots, vertical bars and bangs; the word new; strings of letters and numbers (which is what the dreadful-looking ['a'-'z']['a'-'z' '0'-'9']* means); and the end-of-file marker. The right-hand column between the curly braces define names for these things—LPAREN, INERT, ID, and so on. These are our terminals.
Next we map sequences of terminals into phrases:
%{
open Ast
%}
%start start
%token <string> ID
%token LPAREN RPAREN LANGLE RANGLE INERT SEQ PAR REP NEW EOF
%type <Ast.proc> start
%%
start : Proc EOF { $1 }
Proc : Proc1 { $1 }
| Proc1 PAR Proc { Par ($1, $3) }
Proc1 : INERT { Inert }
| ID LPAREN ID RPAREN SEQ Proc1 { Recv ($1, $3, $6) }
| ID LANGLE ID RANGLE SEQ Proc1 { Send ($1, $3, $6) }
| LPAREN NEW ID RPAREN SEQ Proc1{ New ($3, $6) }
| REP Proc1 { Repeat $2 }
| LPAREN Proc RPAREN { $2 }
This file has three sections. The preamble opens the Ast module—this imports the definitions in the ast.fs file. We’re going to be using the terms in that file.
The next section is mostly book keeping. Here’s a quick commentary: Start parsing at the start non-terminal; IDs are terminals with associated strings (notice how, in the lexer file above, we produce ID slightly differently to other terminals?); the other terminals are called LPAREN, RPAREN, LANGLE, and so on; and the parser produces a value of type proc (the type proc defined in the Ast module.)
In the final section we define three non-terminals, start, Proc, and Proc1; a program (start) is just a process and then the end-of-file marker; processes are one or more processes concurrently, or inert/send/receive/and so on. In the curly braces to the right of each production we construct the matching F# term. $1, $2 and so forth is fsyacc magic meaning the first thing on the left-hand-side, the second thing, and so on.
Why did we break apart Proc and Proc1? In order to get sending and receiving binding tighter than processes-running-concurrently do. Notice how Proc1 doesn’t refer to Proc, except when the programmer explicitly uses parentheses.
Let’s take a break now. We’ve made solid progress on convert source code into abstract syntax. Next time we’ll switch to building the compiler top-down for a while, which means we should be able to run the parser and see it produce AST.