Skip to main content

module analysis::diff::edits::HiFiLayoutDiff

rascal-Not specified

Compare equal-modulo-layout parse trees and extract the exact whitespace text edits that will format the original file.

Usage

import analysis::diff::edits::HiFiLayoutDiff;

Dependencies

extend analysis::diff::edits::HiFiTreeDiff;
import ParseTree;
import String;

Description

This algorithm is the final component of a declarative high fidelity source code formatting pipeline.

We have the following assumptions:

  1. One original text file exists.
  2. One Parse Tree of the original file to be formatted, containing all orginal layout and source code comments and case-insensitive literals in the exact order of the original text file. In other words, nothing may have happened to the parse tree after parsing.
  3. One Parse Tree of the same file, but formatted (using a formatting algorithm like Tree2 Box | Box2 Text, or string templates, and then re-parsing). This is typically obtained by translating the tree to a str using some formatting tools, and then reparsing the file.
  4. Typically comments and specific capitalization of case-insensitive literals have been lost in step 3.
  5. We use TextEdits to communicate the effect of formatting to the IDE context.

Benefits

  • Recovers source code comments which have been lost during earlier steps in the formatting pipeline. This makes losing source code comments an independent concern of a declarative formatter.
  • Recovers the original capitalization of case-insensitive literals which may have been lost during earlier steps in the formatting pipeline.
  • Can standardize the layout of case insensitive literals to ALLCAPS, all lowercase, or capitalized. Or can leave the literal as it was formatted by an earlier stage.
  • Is agnostic towards the design of earlier steps in the formatting pipeline, so lang as formattedTree := originalTree. This means that the pipeline may change layout (whitespace and comments and capitalization of case-insensitive literals), but nothing else.

Pitfalls

  • if originalTree !:= formattedTree the algorithm will produce junk. It will break the syntactical correctness of the source code and forget source code comments.
  • if comments are not marked with @category("Comment") in the original grammar, then this algorithm can not recover them.

data CaseInsensitivity

Normalization choices for case-insensitive literals.

data CaseInsensitivity  
= toLower()
| toUpper()
| toCapitalized()
| asIs()
| asFormatted()
;

function layoutDiff

Extract TextEdits for the differences in whitespace between two otherwise identical ((ParseTree))s.

list[TextEdit] layoutDiff(Tree original, Tree formatted, bool recoverComments = true, CaseInsensitivity ci = asIs())

See Hi Fi Layout Diff.

function learnComments

Make sure the new layout still contains all the source code comments of the original layout

str learnComments(Tree original, Tree replacement)

This algorithm uses the @category("Comments") tag to detect source code comments inside layout substrings. If the original layout contains comments, we re-introduce the comments at the expected level of indentation. New comments present in the replacement are kept and will overwrite any original comments.

This trick is complicated by the syntax of multiline comments and single line comments that have to end with a newline.

Benefits

  • if comments are kepts and formatted by tools like Tree2Box, then this algorithm does not overwrite these.
  • if comments were completely lost, then this algorithm always puts them back (under assumptions of Layout Diff)
  • recovered comments are indented according to the indentation discovered in the formatted replacement tree.

Pitfalls

  • if comments are not marked with @category("Comment") in the original grammar, then this algorithm recovers nothing.

function delabel

Symbol delabel(label(_, Symbol t))

default Symbol delabel(Symbol x)