RAP 16 - Support for error trees in Rascal

rascal-0.42.2

RAP	16
Title	Error Tree Support
Author	Davy Landman, Jurgen Vinju, Pieter Olivier
Status	Draft
Type	Runtime System

Abstract

With the addition of error recovery in Rascal comes the need to actually handle trees with error nodes from Rascal programs. Error trees introduce two new production rules (error and skipped). Also, error trees can be highly ambiguous making any algorithm that visits all nodes without taking advantage of the sharing of nodes infeasible. This RAP documents a number of design considerations (and decisions) for implementing support to work with error trees in a convenient and concise manner.

Motivation

Currently we only have low-level support to work with error trees: matching on the Tree level (using appl(error(... and appl(skipped(... patterns) and parse error functions on trees (hasParseErrors, findAllParseErrors, ...). To handle highly ambiguous trees they first have to be reduced to something more manageable. The only built-in support is currently in the form of the disambiguateParseErrors function that uses simple heuristics ("keep the skipped char count as low as possible") to trim down the parse forest.

We want to be able to work with parse trees in a more "natural" way at a higher level of abstraction and more in the “Rascal spirit”. This RAP discusses options to reach this goal. More concrete, we want to make sure:

Existing code still functions normally when working with non-error tree
Existing code can be made to work with error trees with relatively minor adaptions (for instance adding cases, adding a try-catch block, or explicitly check for error trees)
It should be possible to write clear and concise code that works for both error trees and non-error trees.

This RAP is split into three parts to answer three design questions:

How to handle regular code that tries to access parts of error trees that are not present ("beyond the dot").
How to handle erroneous access to error trees inside backtracking/generator constructions.
How to deal with highly ambiguous parse trees.

How to handle attempts at accessing non-existing subtrees

There are a limited number of ways code can try to access or analyze subtrees of parse trees (not counting Tree level access, then you are on your own):

Checking if a field exists (tree has name, tree.name?)
Checking if a tree is generated by a certain labeled production (tree is label)
Named field access (tree.field, tree.name?alt)
Named field assignment (tree.field=subtree)
Indexed field access (tree[index])
Indexed field assignment (tree[index] = subtree)
Matching using concrete syntax ((Stat)\if \<Expr e> then \<StatList s>`)`

Checking for field existence

If a field exists before the dot we can just return true. For a field after the dot we choose to let the operation return false because the field is not actually present. We believe this best matches with what the user expects in most cases.

Checking if a tree is generated by a labeled production

In case of an error tree we could look at the original production and decide based on that. But that would violate the assumption that if the “is” operator succeeds all fields of the production are available. In order to support this very reasonable assumption we choose to let the “is” operator always fail when presented with an error tree.

Named field access and assignment

It seems natural to honor named field access on an error tree if the field accessed is present ("before the dot"). When the field access is not present ("after the dot"), we see three options to handle the situation:

Return a special tree that can be handled by the user.
Throw a generic exception (NoSuchField(str field))
Throw a special exception wrapping the triggering exception (ParseErrorRecovery(RuntimeException fault, loc src), making it clear this exception is caused by an incomplete tree due to parse error recovery.

The first solution would result in detection of the issue coming much later, potentially causing hard to track down bugs. The second solution makes it difficult to differentiate between problems caused by error recovery or by other causes. So we choose the last solution allowing the user to handle parse error related problems at the abstraction level of his/her choice.

Note that tree.name?alt is a special case that is a mix of field access and field existence checking. We propose to return tree.name when name is before the dot and alt when name is after the dot.

Indexed field access and assignment

Here it also seems natural to let access to children of error trees proceed normally when the index is before the dot. Following the same logic as in the "named field" case, an index after the dot should result in a special exception. An extra design choice to be made here is: When the index is equal to the dot, should we return the skipped part? This might introduce subtle bugs. It would also work in unexpected ways as literals are not indexable so the dot would be unreachable when the error tree is a literal. So we opt not to do this and throw an exception instead.

Matching using concrete syntax

This basically already works "out-of-the-box": matching fails on error trees but typed holes can match error trees. Deep matching inside error trees works. Note that highly ambiguous trees still pose a problem in this area. We do not have a way to explicitly match error trees. Experience will show if we need one.

Conclusion:

has should succeed before the dot, fail after the dot.
is should always return false for error trees.
tree.name?alt should return alt after the dot.
Named field access to non-existing fields should throw a special exception
Indexed field access to non-existing fields should throw a generic exception
In the future, for matching we might introduce a way to explicitly match error trees without descending to the appl/prod level. Example:

case error[Statement] s : println("dot came to: <s.prod.dot>");
case error[&T] _ : println("skipping an error tree");

The error[...] type wrapper/modifier would modify a syntax tree type to an error syntax tree type, and thus never match anything else but an error tree of the given type. When the type is an open parameter, it would match on any error tree and bind the type when matching.

Handle faulty access to error trees in backtracking contexts and generators

What should we do when erroneous access to subtrees is detected in the body of a backtracking construct (visit, switch, the body of a matched function, ...) or a generator body? There are many aspects here but in general there are two options:

Propagate an exception so the user has to handle error cases explicitly or handle exceptions.
Transparently catch the exception and fail the backtracking call as if the match did not succeed (or skip the element in a generator) and continue with the next case.

On the one hand the automatic skipping is probably what the user intends to do. On the other hand, this would be the first case in which backtracking kicks in without a failing match or an explicit fail statement. Also, side effects in the body of the construct that is already executed before the erroneous access occurs will be visible. This could get confusing for the user.

We opted for the first option because transparently catching exceptions can cause too much confusion.

Conclusion: When erroneous access to subtrees is detected in the body of a backtracking construct, the exception will be propagated normally. It is up to the programmer to handle these exceptions at the appropriate level. A framework like TypePal can assist by providing suitable default handling where appropriate.

Ambiguous parse trees

We need a strategy to trim down trees in a way that still leaves the user in control. The current approach ("just use disambiguateParseErrors") is not good enough as too much useful information is thrown away for many uses of error trees we envision.

A first step will be to limit the level of ambiguities in parse forests. Usually, having more than 2 levels of ambiguities is not useful. By providing a keyword parameter to the parse functions (int maxAmbDepth=2) we offer the user the flexibility to handle deeper parse trees while limiting the size of parse forests by default. Ambiguities below the chosen depth are collapsed by culling all but the first (=random) alternative of deeper ambiguities.

Even with such a limit, parse forests can get quite big. Depending on the syntax each level of ambiguity can increase the size of a parse forest by a factor of 10 to a 100.

To be able to cope with such large forests in visits and deep matches we propose to implement memoization for these operations.

Conclusion:

We will limit the maximum ambiguity depth of parse trees.
We will implement memoization of visits and deep matches.

Implementation

We will start by writing tests for these semantics. Both the interpreter and compiler can then be modified so the tests will succeed.

Abstract​

Motivation​

How to handle attempts at accessing non-existing subtrees​

Checking for field existence​

Checking if a tree is generated by a labeled production​

Named field access and assignment​

Indexed field access and assignment​

Matching using concrete syntax​

Handle faulty access to error trees in backtracking contexts and generators​

Ambiguous parse trees​

Implementation​