Syntax definition and parser generation for new languages.
All source code analysis projects need to extract information directly from the source code. There are two main approaches to this:
- Lexical information: Use regular expressions to extract useful, but somewhat superficial, flat, information. This can be achieved using regular expression patterns, see Regular Expression Patterns.
- Structured information: Use syntax analysis to extract the complete, nested, structure of the source code in the form of a syntax tree. Rascal can directly manipulate the parse trees, but it also enables user-defined mappings from parse tree to abstract syntax tree.
Using Syntax Definitions you can define the syntax of any (programming) language. Then Rascal:
- will generate the parser, and
- will provide pattern matching and pattern construction on parse trees and abstract syntax trees,
see Abstract Patterns and
Let's use the Exp language as example. It contains the following elements:
- Integer constants, e.g.,
- A multiplication operator, e.g.,
- An addition operator, e.g.,
- Multiplication is left-associative and has precedence over addition.
- Addition is left-associative.
- Parentheses can be used to override the precedence of the operators.
Here are some examples:
The EXP language can be defined as follows:
layout Whitespace = [\t-\n\r\ ]*; ❶
lexical IntegerLiteral = [0-9]+;
start syntax Exp
| bracket "(" Exp ")"
> left Exp "*" Exp
> left Exp "+" Exp
Now you may parse and manipulate programs in the EXP language. Let's demonstrate parsing an expression:
start[Exp]: (start[Exp]) `2+3*4`
First we import the syntax definition and the link:/Libraries/Prelude-ParseTree[ParseTree] module that provides the parsing functionality.
Finally, we parse
2+3*4 using the start symbol
Don't be worried, we are just showing the resulting parse tree here. It intended for programs and not for humans. The points we want to make are:
- Rascal grammars are relatively easy to read and write (unfortunately, writing grammars will never become simple).
- Parser generation is completely implicit.
- Given a syntax definition, it can be used immediately for parsing.