Ambiguity

rascal-0.34.0

drambiguity-0.2.0

So what is ambiguity? Where does it come from?

Ambiguity happens when there are more than a single parse tree, produced by a (generated) parser, for a given input (sub)sentence. Comparable to a parse error, ambiguity is reported after parsing has completed. A typical example:

rascal>syntax E = [a-z] | E "+" E;
ok
rascal>parse(#E, "e+e+e");
warning, ambiguity predicted: E  "+"  E  lacks left or right associativity
|TODO:///|: Ambiguity(
  |unknown:///|(0,5,<1,0>,<1,5>),
  "E",
  "e+e+e")
ok

The parser threw the Ambiguity exception, but we could let it build all the alternatives as well:

rascal>t = parse(#E,"e+e+e", allowAmbiguity=true);
E: (E) `e+e+e`
rascal>prettyTree(t)
str: " ❖\n ├─ E = E  \"+\"  E \n │  ├─ E = [a-z] \n │  │  └─ e\n │  └─ E = E  \"+\"  E \n │     ├─ E = [a-z] \n │     │  └─ e\n │     └─ E = [a-z] \n │        └─ e\n └─ E = E  \"+\"  E \n    ├─ E = E  \"+\"  E \n    │  ├─ E = [a-z] \n    │  │  └─ e\n    │  └─ E = [a-z] \n    │     └─ e\n    └─ E = [a-z] \n       └─ e\n"
---
 ❖
 ├─ E = E  "+"  E 
 │  ├─ E = [a-z] 
 │  │  └─ e
 │  └─ E = E  "+"  E 
 │     ├─ E = [a-z] 
 │     │  └─ e
 │     └─ E = [a-z] 
 │        └─ e
 └─ E = E  "+"  E 
    ├─ E = E  "+"  E 
    │  ├─ E = [a-z] 
    │  │  └─ e
    │  └─ E = [a-z] 
    │     └─ e
    └─ E = [a-z] 
       └─ e

---

This shows a pretty representation of the parse forest, where under the ❖ diamond we spot two alternatives derivations. They correspond to these two interpretations:

(e+e)+e
e+(e+e)

We can also show the raw parse tree representation:

rascal>Tree x = t;
Tree: amb(
  {appl(
      prod(
        sort("E"),
        [
          sort("E"),
          layouts("$default$"),
          lit("+"),
          layouts("$default$"),
          sort("E")
        ],
        {}),
      [appl(
          prod(
            sort("E"),
            [\char-class([range(97,122)])],
            {}),
          [char(101)],
          src=|unknown:///|(0,1,<1,0>,<1,1>)),appl(
          prod(
            layouts("$default$"),
            [],
            {}),
          [],
          src=|unknown:///|(1,0,<1,1>,<1,1>)),appl(
          prod(
            lit("+"),
            [\char-class([range(43,43)])],
            {}),
          [char(43)]),appl(
          prod(
            layouts("$default$"),
            [],
            {}),
          [],
          src=|unknown:///|(2,0,<1,2>,<1,2>)),appl(
          prod(
            sort("E"),
            [
              sort("E"),
              layouts("$default$"),
              lit("+"),
              layouts("$default$"),
              sort("E")
            ],
            {}),
          [appl(
              prod(
                sort("E"),
                [\char-class([range(97,122)])],
                {}),
              [char(101)],
              src=|unknown:///|(2,1,<1,2>,<1,3>)),appl(
              prod(
                layouts("$default$"),
                [],
                {}),
              [],
              src=|unknown:///|(3,0,<1,3>,<1,3>)),appl(
              prod(
                lit("+"),
                [\char-class([range(43,43)])],
                {}),
              [char(43)]),appl(
              prod(
                layouts("$default$"),
                [],
                {}),
              [],
              src=|unknown:///|(4,0,<1,4>,<1,4>)),appl(
              prod(
                sort("E"),
                [\char-class([range(97,122)])],
                {}),
              [char(101)],
              src=|unknown:///|(4,1,<1,4>,<1,5>))],
          src=|unknown:///|(2,3,<1,2>,<1,5>))],
      src=|unknown:///|(0,5,<1,0>,<1,5>)),appl(
      prod(
        sort("E"),
        [
          sort("E"),
          layouts("$default$"),
          lit("+"),
          layouts("$default$"),
          sort("E")
        ],
        {}),
      [appl(
          prod(
            sort("E"),
            [
              sort("E"),
              layouts("$default$"),
              lit("+"),
              layouts("$default$"),
              sort("E")
            ],
            {}),
          [appl(
              prod(
                sort("E"),
                [\char-class([range(97,122)])],
                {}),
              [char(101)],
              src=|unknown:///|(0,1,<1,0>,<1,1>)),appl(
              prod(
                layouts("$default$"),
                [],
                {}),
              [],
              src=|unknown:///|(1,0,<1,1>,<1,1>)),appl(
              prod(
                lit("+"),
                [\char-class([range(43,43)])],
                {}),
              [char(43)]),appl(
              prod(
                layouts("$default$"),
                [],
                {}),
              [],
              src=|unknown:///|(2,0,<1,2>,<1,2>)),appl(
              prod(
                sort("E"),
                [\char-class([range(97,122)])],
                {}),
              [char(101)],
              src=|unknown:///|(2,1,<1,2>,<1,3>))],
          src=|unknown:///|(0,3,<1,0>,<1,3>)),appl(
          prod(
            layouts("$default$"),
            [],
            {}),
          [],
          src=|unknown:///|(3,0,<1,3>,<1,3>)),appl(
          prod(
            lit("+"),
            [\char-class([range(43,43)])],
            {}),
          [char(43)]),appl(
          prod(
            layouts("$default$"),
            [],
            {}),
          [],
          src=|unknown:///|(4,0,<1,4>,<1,4>)),appl(
          prod(
            sort("E"),
            [\char-class([range(97,122)])],
            {}),
          [char(101)],
          src=|unknown:///...

The above diagnosis hints at a treatment, which when applied indeed solves the problem:

rascal>syntax F = [a-z] | left F "+" F;
ok
rascal>prettyTree(parse(#F, "e+e+e"));
str: " F = F  \"+\"  F \n ├─ F = F  \"+\"  F \n │  ├─ F = [a-z] \n │  │  └─ e\n │  └─ F = [a-z] \n │     └─ e\n └─ F = [a-z] \n    └─ e\n"
---
 F = F  "+"  F 
 ├─ F = F  "+"  F 
 │  ├─ F = [a-z] 
 │  │  └─ e
 │  └─ F = [a-z] 
 │     └─ e
 └─ F = [a-z] 
    └─ e

---

The simple example aside, ambiguity of context-free grammars is not decidable in general. There do exist static analyses that do a good approximation, but they are slow and take a lot of memory. Nevertheless for human beings predicting it is also hard, because an ambiguity emerges from the accidental combination of many production rules in the grammar and their accidental application for an example input. It can be a bit hard, and a lot boring, if you don't use the right tools.

The goal of "disambiguation" is to change a Rascal grammar such that one of the trees in this forest goes away, and the other remains. Preferably this choice is implemented via a declarative mechnism in Rascal syntax definitions. It is also possible to program a tree filter in Rascal "manually".