A Calculator Language
Synopsis
Calc illustraties the basic facilities of TypePal
Description
By developing a type checker for Calc, a tiny pocket calculator language, we illustrate elementary usage of TypePal. The full source code of Calc can be found at https://github.com/cwi-swat/typepal/tree/master/src/examples/calc. See Examples of Typecheckers for a list of all available type checker examples.
Syntax of Calc
module lang::calc::Syntax
extend lang::CommonLex; ❶
start syntax Calc
= Decl+ ❷
;
syntax Decl
= "var" Id "=" Exp ";" ❸
;
syntax Exp
= Id ❹
| Integer ❺
| Boolean ❻
| bracket "(" Exp ")" ❼
> left Exp "*" Exp ❽
> left Exp "+" Exp ❾
> "if" Exp "then" Exp "else" Exp ❶⓿
;
keyword Reserved
= "var" | "if" | "then" | "else"
;
- ❶ For brevity we use some common lexical definitions in
lang::CommonLex
. They take, amongst others, care of whitespace, comments and definitions for integer and Boolean constants (Integer
andBoolean
) as well as identifiers (Id
). - ❷ A Calc program consists of one or more declarations (
Decl
). - ❸ Each declaration introduces a new variable and binds it to the value of an expression.
- ❹ An expression can be an identifier (that should have been introduced in a preceding declaration), or
- ❺ an integer constant, or
- ❻ a Boolean constant, or
- ❼ be surrounded by parentheses, or
- ❽ a multiplication (arguments should have the same type), or
- ❾ an addition (arguments should have the same type), or
- ❶⓿ a conditional expression (the first expression should have type Boolean, the other two should have the same type).
Typechecking Calc
Type checking Calc amounts to checking the following:
- Every name that is used in an expression should have been defined in a declaration and has the type of the expression in that declaration.
- Addition or multiplication on integers yields an integer type.
- Addition or multiplication on Booleans yields a Boolean type.
- The condition in a condional expression should be Boolean.
- The then and else part of a conditional expression should have the same type; that type becomes the type of the conditional expression.
Let's now move to the Calc type checker described in lang::calc::Checker
. We break it up in parts
and describe the parts as we go. You should be explicitly aware of the fact that type checking consists of two phases:
- Collecting facts and constraints: this is all about formulating constraints and attaching them to parts of the source program. In this phase types are unknown and constraints can only be created but not yet solved. All computations on types have therefore to be postponed until the solving phase. We will see how in a minute.
- Solving constraints: constraints are solved (if possible) and types are assigned to parts of the source program.
Begin of type checker module for Calc
module lang::calc::Checker
import lang::calc::Syntax; // The Calc syntax
extend analysis::typepal::TypePal; // TypePal
We import the Calc syntax lang::calc::Syntax
and then we extend the TypePal analysis::typepal::TypePal
.
NOTE: There is a technical reason why we have to extend TypePal rather than import it. Some data types (e.g., AType
)
and functions (e.g., collect
) are already defined in TypePal itself and they are being extended in each type checker.
Extend AType
TypePal has a built-in data type to represent types: AType
. This data type can be extended to represent language-specific types.
data AType
= boolType()
| intType()
;
str prettyAType(boolType()) = "bool";
str prettyAType(intType()) = "int";
The values in Calc are Booleans and integers and we create two AType
constants boolType()
and intType()
to represent these types.
prettyAType
converts them back to a string representation; this is used in error reporting.
Collecting facts and constraints
Writing a type checker in TypePal amounts to collecting facts and constaints from a source program to be checked.
This is achieved by the collect
function that has the form
void collect(current: _syntactic pattern_, Collector c){ ... }`
where syntactic pattern corresponds with a rule from the language syntax, in this case lang::calc::Syntax
.
The Collector
argument provides methods to add facts and constraints.
NOTE: One collect
declaration is needed per alternative in the syntax but some rules (i.e., chain rules) are handled automatically.
NOTE: By convention the syntactic pattern is named current
and the Collector is named c
.
WARNING: Each collect
declaration is responsible for calling collect
on relevant subparts (a run-time error is given for a missing collect
).
Check Declaration
A declaration introduces a new name and associates the type of the righthand side expression with it.
void collect(current: (Decl) `var <Id name> = <Exp exp> ;`, Collector c){
c.define("<name>", variableId(), current, defType(exp));
collect(exp, c);
}
Here we define name
as a variable and define its type as the same type as exp
. Let's digest this:
c.define(...)
calls thedefine
function in theCollector
and adds this definition to the internal state of the Collector.name
is a parse tree for the lexical syntax ruleId
."<name>"
reduces this tree to a string and this is how the variable can be used later on.variableId()
defines that the role of this name (IdRole
) is a variable.variableId
is a built-in roleIdRole
.current
is the parse tree of the whole declaration, we use it as unique identification of the definition ofname
.defType
is used to define the type ofname
and it comes in several flavours. Here, we use another parse tree fragmentexp
as argument.defType(exp)
should be read as the same type asexp
and the effect is thatname
is defined having the same type asexp
. This implies that the type ofname
can only be known when the type ofexp
is known.
Check Exp: Id
An expression consisting of a single identifier, refers to a name introduced in another declaration and gets the type introduced in that declaration.
void collect(current: (Exp) `<Id name>`, Collector c){
c.use(name, {variableId()});
}
An expression consisting of a single identifier represents a use of that identifier.
c.use(name, roles)
records this. There are then two possibilities:
- a matching define is found for one of the given roles: use and definition are connected to each other.
- no matching define is found and an error is reported.
NOTE: In larger languages names may be defined in different scopes. Scopes do not play a role in Cal.
NOTE: We do not enforce define-before-use in this example, but see XXX how to achieve this.
There is a "TODO" in the documentation source: :add reference for XXX (((TODO:add reference for XXX)))
Check Exp: Boolean and Integer constants
void collect(current: (Exp) `<Boolean boolean>`, Collector c){
c.fact(current, boolType());
}
void collect(current: (Exp) `<Integer integer>`, Collector c){
c.fact(current, intType());
}
When encountering a Boolean or integer constant we record their type using c.fact(current, _its type_)
.
NOTE: The second argument of fact
maybe an AType
, an arbitrary parse tree (in which case the type of that tree will be used),
or a function that returns a type.
Check Exp: parentheses
void collect(current: (Exp) `( <Exp e> )`, Collector c){
c.fact(current, e);
collect(e, c);
}
The type of an expression enclosed by parentheses is the same as the same of the enclosed expression.
A final, essential, step is to collect constraints from the subpart e
.
Check Exp: addition
Addition of integers given an integer result and addition of Booleans gives a Boolean result.
void collect(current: (Exp) `<Exp e1> + <Exp e2>`, Collector c){
c.calculate("addition", current, [e1, e2],
AType (Solver s) {
switch(<s.getType(e1), s.getType(e2)>){
case <intType(), intType()>: return intType();
case <boolType(), boolType()>: return boolType();
default:
s.report(error(current, "`+` not defined for %t and %t", e1, e2));
}
});
collect(e1, e2, c);
}
Checking addition is more complex since we need the type of sub expressions e1
and e2
in order to do our job.
The type of an arbitrary parse tree can be computed using
c.calculate(_informative label_, current, _list of dependencies_, AType(Solver s) { ... }
.
where
- informative label is a label that can be used in error messages and log files.
current
is the parse tree for which we are computing a type.- list of dependencies is a list of other parse trees whose type is needed in the computation of the type of
current
. AType(Solver s) { ... }
performs the type computation:- it is only called when the types of all dependencies are known.
- it has a
Solver
as argument: aSolver
manages the constraint solving process and is aware of all facts and solved constraints so far and knows, for instance, about the existence of type information for some parse tree. - it either returns an
AType
or reports an error.
The above code could be paraphrased as follows:
The type of current
can only be computed when the types of e1
and e2
are known. When known,
get their types (using s.getType
) and distinguish cases: two integer arguments give an integer result type and two Boolean arguments a Boolean result type;
otherwise report an error.
A final, essential, step is to collect constraints from the subparts e1
and e2
.
Check Exp: multiplication
void collect(current: (Exp) `<Exp e1> * <Exp e2>`, Collector c){
c.calculate("multiplication", current, [e1, e2],
AType (Solver s) {
switch(<s.getType(e1), s.getType(e2)>){
case <intType(), intType()>: return intType();
case <boolType(), boolType()>: return boolType();
default:
s.report(error(current, "`*` not defined for %t and %t", e1, e2));
}
});
collect(e1, e2, c);
}
Checking multiplication follows exactly the same pattern as checking addition. Even in this simple example we see repetition of code that could be factored out.
Check Exp: conditional expression
void collect(current: (Exp) `if <Exp cond> then <Exp e1> else <Exp e2>`, Collector c){
c.calculate("if Exp", current, [cond, e1, e2],
AType(Solver s){
s.requireEqual(cond, boolType(),
error(cond, "Condition should be Boolean, found %t", cond));
s.requireEqual(e1, e2,
error(current, "Equal types required, found %t and %t", e1, e2));
return s.getType(e1);
});
collect(cond, e1, e2, c);
}
Checking a conditional expression amounts to checking that the condition has type Boolean and
that the then and else branch have the same type (which also becomes the type of the conditional expression as a whole).
In the above code we see s.requireEqual(arg1, arg2, message)
.
Here arg1 and arg2 may either be a subtree (in which case its type is used) or an AType
.
requireEqual
requires that both types are equal or reports an error.
A final, essential, step is to collect constraints from the subparts cond
, e1
and e2
.
Getting started
module lang::calc::Test
extend lang::calc::Checker;
extend analysis::typepal::TestFramework; // TypePal test utilities
import ParseTree; // In order to parse tests
We need three ingredients for testing:
lang::calc::Checker
, the Calc type checker we have just completed.analysis::typepal::TestFramework
, the TypePal test framework which enables test automation.ParseTree
that provides parsing functionality that is needed to parse Calc source text.
Manual testing
TModel calcTModelFromTree(Tree pt){
return collectAndSolve(pt);
}
Given a parse tree pt
for a Calc program, we apply Collect And Solve to it. This creates a Collector, uses it to collect constraints from pt
and then creates a Solver
to solve all constraints. The result is a TModel
. calcTModelFromTree
will also be used during automated testing.
TModel calcTModelFromStr(str text){
pt = parse(#start[Calc], text).top;
return calcTModelFromTree(pt);
}
In order to obtain a parse tree, we need to parse it as shown above.
TModels contain much information but here we are only interested in the messages that have been generated during constraint solving.
They are contained in the messsages
field of a TModel.
With the above machinery in place we can perform some experiments:
calcTModelFromStr("var x = 3;"),messages
gives no messages.calcTModelFromStr("var x = 3+true;").messsages
reports that addition of integer and Boolean is illegal.
Automated testing
bool calcTests() {
return runTests([|project://typepal-examples/src/lang/calc/tests.ttl|],
#Calc, calcTModelFromTree);
}
TypePal's testing framework enables scripting of tests. Each script is written in TTL (TypePal Testing Language).
Use runtests(_list of TTL files_, _start non-terminal_, _function to create a TModel_)
to execute the scripts.
Here is a sample script:
test Ok1 [[ var x = 1; ]]
test Ok2 [[ var x = 1; var y = x + 2; ]]
test Undef1 [[ var x = y; ]]
expect { "Undefined variable `y`" }
test Add1 [[ var x = 1 + 2; ]]
test Add2 [[ var x = true + false; ]]
test Add3 [[ var x = 1 + true; ]]
expect { "`+` not defined for `int` and `bool`" }
test If1 [[ var x = if true then 1 else 2;
var y = x + 1;]]
test If2 [[ var x = if true then 1 else false; ]]
expect { "Equal types required, found `int` and `bool`" }
test If3 [[ var x = if 1 then 2 else 3; ]]
expect { "Condition should be Boolean, found `int`" }
and here is the produced output:
Test Ok1: true
Test Ok2: true
Test Undef1: true
Test Add1: true
Test Add2: true
Test Add3: true
Test If1: true
Test If2: true
Test If3: true
Test summary: 9 tests executed, 9 succeeded, 0 failed