Skip to main content

module lang::box::util::Tree2Box

rascal-Not specified

The default formatting rules for any parsetree.

Usage

import lang::box::util::Tree2Box;

Dependencies

import ParseTree;
import lang::box::\syntax::Box;
import String;
import IO;

Description

This module is meant to be extended to include rules specific for a language.

The main goal of this module is to minimize the number of necessary specializations for any specific programming language.

This module is a port of the original default formatting rules, implemented in C + ATerm library + APIgen, of the "Pandora" in The ASF+SDF Meta-Environment, as described in

M.G.J. van den Brand, A.T. Kooiker, Jurgen J. Vinju, and N.P. Veerman. A Language Independent Framework for Context-sensitive Formatting. In CSMR '06: Proceedings of the Conference on Software Maintenance and Reengineering, pages 103-112, Washington, DC, USA, 2006. IEEE Computer Society Press.

However, due to the more powerful pattern matching available in Rascal, than in C with the ATerm library, we can specialize for more cases more easily than in the original paper. For example, single and multi-line comment styles are automatically recognized.

The current algorithm, not extended, additionally guarantees that no comments are lost as long as their grammar rules have been tagged with @category="comment" or the legacy @category="Comment"

Another new feature is the normalization of case-insensitive literals. By providing To Upper or To Lower the mapping algorithm will change every instance of a case-insensitive literal accordingly before translating it to an L box expression. In case of As Is, the literal will be printed as it occurred in the source code.

Examples

rascal>import lang::box::\syntax::Box;
ok
rascal>extend lang::box::util::Tree2Box;
ok

Notice how we used extend and not import, which will be important in the following.

rascal>import lang::pico::\syntax::Main;
ok

First, let's get an example program text

rascal>example = "begin
|1 >>>> '%% this is an example Pico program
|2 >>>> ' declare
|3 >>>> ' a : %inline comment% natural,
|4 >>>> ' b : natural;
|5 >>>> ' a := a + b;
|6 >>>> ' b := a - b;
|7 >>>> ' a := a - b
|8 >>>> 'end";
str: "begin\n%% this is an example Pico program\n declare\n a : %inline comment% natural,\n b : natural;\n a := a + b;\n b := a - b;\n a := a - b\nend"
───
begin
%% this is an example Pico program
declare
a : %inline comment% natural,
b : natural;
a := a + b;
b := a - b;
a := a - b
end
───

Now we parse it:

rascal>program = [start[Program]] example;
value: appl(
prod(
start(sort("Program")),
[
layouts("Layout"),
label(
"top",
sort("Program")),
layouts("Layout")
],
{}),
[appl(
prod(
layouts("Layout"),
[conditional(
\iter-star(lex("WhitespaceAndComment")),
{\not-follow(\char-class([
range(9,10),
range(13,13),
range(32,32),
range(37,37)
]))})],
{}),
[appl(
regular(\iter-star(lex("WhitespaceAndComment"))),
[],
src=|prompt:///|(0,0,<1,0>,<1,0>))],
src=|prompt:///|(0,0,<1,0>,<1,0>)),appl(
prod(
label(
"program",
sort("Program")),
[
lit("begin"),
layouts("Layout"),
label(
"decls",
sort("Declarations")),
layouts("Layout"),
label(
"body",
\iter-star-seps(
sort("Statement"),
[
layouts("Layout"),
lit(";"),
layouts("Layout")
])),
layouts("Layout"),
lit("end")
],
{}),
[appl(
prod(
lit("begin"),
[
\char-class([range(98,98)]),
\char-class([range(101,101)]),
\char-class([range(103,103)]),
\char-class([range(105,105)]),
\char-class([range(110,110)])
],
{}),
[char(98),char(101),char(103),char(105),char(110)]),appl(
prod(
layouts("Layout"),
[conditional(
\iter-star(lex("WhitespaceAndComment")),
{\not-follow(\char-class([
range(9,10),
range(13,13),
range(32,32),
range(37,37)
]))})],
{}),
[appl(
regular(\iter-star(lex("WhitespaceAndComment"))),
[appl(
prod(
lex("WhitespaceAndComment"),
[\char-class([
range(9,10),
range(13,13),
range(32,32)
])],
{}),
[char(10)],
src=|prompt:///|(5,1,<1,5>,<2,0>)),appl(
prod(
lex("WhitespaceAndComment"),
[
lit("%%"),
conditional(
\iter-star(\char-class([
range(1,9),
range(11,1114111)
])),
{\end-of-line()})
],
{tag("category"("comment"))}),
[appl(
prod(
lit("%%"),
[
\char-class([range(37,37)]),
\char-class([range(37,37)])
],
{}),
[char(37),char(37)]),appl(
regular(\iter-star(\char-class([
range(1,9),
range(11,1114111)
]))),
[char(32),char(116),char(104),char(105),char(115),char(32),char(105),char(115),char(32),char(97),char(110),char(32),char(101),char(120),char(97),char(109),char(112),char(108),char(101),char(32),char(80),char(105),char(99),char(111),char(32),char(112),char(114),char(111),char(103),char(114),char(97),char(109)],
src=|prompt:///|(8,32,<2,2>,<2,34>))],
src=|prompt:///|(6,34,<2,0>,<2,34>)),appl(
prod(
lex("WhitespaceAndComment"),
[\char-class([
...

Then we can convert it to a Box tree:

rascal>b = toBox(program);
Box: HV([
U([]),
V([
H([L("begin")]),
I([V([
U([]),
V([
H([L("declare")]),
I([V([
U([]),
HOV([
H(
[
HV([
L("a"),
U([]),
L(":"),
U([]),
HV([L("natural")])
]),
H(
[L(",")],
hs=1)
],
hs=0),
H(
[HV([
L("b"),
U([]),
L(":"),
U([]),
HV([L("natural")])
])],
hs=0)
]),
U([])
])]),
L(";")
]),
U([]),
V([
H(
[
HV([
L("a"),
U([]),
L(":="),
U([]),
HOV([
HV([L("a")]),
H([
U([]),
L("+"),
U([]),
HV([L("b")])
])
])
]),
H(
[L(";")],
hs=1)
],
hs=0),
H(
[
HV([
L("b"),
U([]),
L(":="),
U([]),
HOV([
HV([L("a")]),
H([
U([]),
L("-"),
U([]),
HV([L("b")])
])
])
]),
H(
[L(";")],
hs=1)
],
hs=0),
H(
[HV([
L("a"),
U([]),
L(":="),
U([]),
HOV([
HV([L("a")]),
H([
U([]),
L("-"),
U([]),
HV([L("b")])
])
])
])],
hs=0)
]),
U([])
])]),
L("end")
]),
U([])
])

Finally, we can format the box tree to get a prettier format:

rascal>import lang::box::util::Box2Text;
ok
rascal>format(b)
str: "begin\n declare\n a : natural, b : natural\n ;\n a := a + b;\n b := a - b;\n a := a - b\nend\n"
───
begin
declare
a : natural, b : natural
;
a := a + b;
b := a - b;
a := a - b
end

───

If you are not happy, then you should produce a specialization:

rascal>Box toBox((Program) `begin <Declarations decls> <{Statement ";"}* body> end`, FormatOptions opts=formatOptions())
|1 >>>> = V([
|2 >>>> L("begin"),
|3 >>>> I([
|4 >>>> toBox(decls)
|5 >>>> ], is=2),
|6 >>>> I([
|7 >>>> toBox(body)
|8 >>>> ], is=4),
|9 >>>> L("end")
|10 >>>> ]);
Box (Program, FormatOptions opts = ...): function(|prompt:///|(0,277,<1,0>,<11,7>))

and we see the result here:

rascal>format(toBox(program));
str: "begin\n declare\n a : natural, b : natural\n ;\n a := a + b;\n b := a - b;\n a := a - b\nend\n"
───
begin
declare
a : natural, b : natural
;
a := a + b;
b := a - b;
a := a - b
end

───

data FormatOptions

Configuration options for toBox

data FormatOptions  
= formatOptions(
CaseInsensitivity ci = asIs()
)
;

data CaseInsensitivity

Normalization choices for case-insensitive literals.

data CaseInsensitivity  
= toLower()
| toUpper()
| toCapitalized()
| asIs()
;

function toBox

This is the generic default formatter

default Box toBox(t:appl(Production p, list[Tree] args), FO opts = fo())

This generic formatter is to be overridden by someone constructing a formatter tools for a specific language. The goal is that this toBox default rule maps syntax trees to plausible Box expressions, and that only a minimal amount of specialization by the user is necessary.

function toBox

For ambiguity clusters an arbitrary choice is made.

default Box toBox(amb({Tree t, *Tree _}), FO opts=fo())

function toBox

When we end up here we simply render the unicode codepoint back.

default Box toBox(c:char(_), FormatOptions opts=fo() )

function toBox

Cycles are invisible and zero length

default Box toBox(cycle(_, _), FO opts=fo())

alias FO

Private type alias for legibility's sake

FormatOptions

function delabel

Removing production labels removes similar patterns in the main toBox function.

Production delabel(prod(label(_, Symbol s), list[Symbol] syms, set[Attr] attrs))

default Production delabel(Production p)

list[Symbol] delabel(list[Symbol] syms)

Symbol delabel(label(_, Symbol s))

default Symbol delabel(Symbol s)

function fo

This is a short-hand for legibility's sake

FO fo()

function ci

Implements normalization of case-insensitive literals

str ci(str word, toLower())

str ci(str word, toUpper())

str ci(str word, toCapitalized())

str ci(str word, asIs())

function words

Split a text by the supported whitespace characters

list[str] words(str text)