Parse Trees
Synopsis
An algebraic data-type for parse trees; produced by all parsers generated from syntax definitions.
Description
Below is the full definition of Tree and Production and Symbol. A parse tree is a nested tree structure of type Tree.
Most internal nodes are applications (
appl) of aProductionto a list of childrenTreenodes.Productionis the abstract representation of a Syntax Definition rule, which consists of a definition of an alternative for aSymbolby a list ofSymbols.The leaves of a parse tree are always characters (
char), which have an integer index in the UTF8 table.Some internal nodes encode ambiguity (
amb) by pointing to a set of alternativeTreenodes.
The Production and Symbol types are an abstract notation for rules in Syntax Definitions, while the Tree type is the actual notation
for parse trees.
Parse trees are called parse forests when they contain amb nodes.
You can analyze and manipulate parse trees in three ways:
- Directly on the
Treelevel, just like any other Algebraic Data Type - Using Concrete Syntax
- Using Actions
The type of a parse tree is the symbol that it's production produces, i.e. appl(prod(sort("A"),[],{}),[]) has type A. Ambiguity nodes
Each such a non-terminal type has Tree as its immediate super-type.
Examples
// the following definition
syntax A = "a";
// would make the following test succeed:
test a() = parse(#A,"a") ==
appl(prod(
sort("A"),
[lit("a")],
{}),
[appl(
prod(
lit("a"),
[\char-class([range(97,97)])],
{}),
[char(97)])]);
// you see that the defined non-terminal A ends up as the production for the outermost node. As the only child is the tree for recognizing the literal a, which is defined to be a single a from the character-class [ a ].
// when we use labels in the definitions, they also end up in the trees:
// the following definition
lexical A = myA:"a" B bLabel;
lexical B = myB:"b";
// would make the following [Test] succeed:
test a() = parse(#A,"ab") == appl(prod(label("myA",lex("A")),[lit("a"),sort("bLabel",lex("B"))],{}),[appl(prod(lit("a"),[\char-class([range(97,97)]),[char(97)]),appl(prod(label("myB", lex("B"),[lit("b")],{}),[appl(prod(lit("b"),[\char-class([range(98,98)]),[char(98)])]) ]);
// here you see that the alternative name is a label around the first argument of `prod` while argument labels become labels in the list of children of a `prod`.