module analysis::text::search::Grammars
rascal-0.41.2
org.rascalmpl.rascal-lucene-0.1.0
Bridges Rascal grammars and parser generation to the Lucene "Analyzer" and "Tokenizer" interfaces.
Usage
import analysis::text::search::Grammars;
Dependencies
extend analysis::text::search::Lucene;
import ParseTree;
import String;
Description
By leveraging the information in ParseTree instances we can provide, selectively, tokens for any source file that we have a grammar for:
- Analyzer From Grammar combines a Tokenizer From Grammar with a Lower Case Filter. It makes an entire source file searchable.
- Identifier Analyzer From Grammar selects only the identifiers in the source text, ignoring keywords and comments and such.
- Comment Analyzer From Grammar focuses on the words in source code comments.
This functionality is based on the Lucene module, and its underlying adapter that bridges Rascal callbacks to Lucene's search framework.
function analyzerFromGrammar
Analyzer analyzerFromGrammar(type[&T <: Tree] grammar)
function identifierAnalyzerFromGrammar
Analyzer identifierAnalyzerFromGrammar(type[&T <: Tree] grammar)
function commentAnalyzerFromGrammar
Analyzer commentAnalyzerFromGrammar(type[&T <: Tree] grammar)
function tokenizerFromGrammar
Use a generate parser as a Lucene tokenizer. Skipping nothing.
Tokenizer tokenizerFromGrammar(type[&T <: Tree] grammar)
function identifierTokenizerFromGrammar
Use a generated parser as a Lucene tokenizer, and skip all keywords and punctuation.
Tokenizer identifierTokenizerFromGrammar(type[&T <: Tree] grammar)
function commentTokenizerFromGrammar
Use a generated parser as a Lucene tokenizer, and skip all keywords and punctuation.
Tokenizer commentTokenizerFromGrammar(type[&T <: Tree] grammar)
function tokens
list[Tree] tokens(amb({Tree x, *_}), bool(Tree) isTokenPredicate)
default list[Tree] tokens(Tree tok, bool(Tree) isTokenPredicate)
function isTokenType
bool isTokenType(lit(_))
bool isTokenType(cilit(_))
bool isTokenType(lex(_))
bool isTokenType(layouts(_))
bool isTokenType(label(str _, Symbol s))
default bool isTokenType(Symbol _)
function isToken
bool isToken(appl(prod(Symbol s, _, _), _))
bool isToken(char(_))
default bool isToken(Tree _)
function isLexical
bool isLexical(appl(prod(Symbol s, _, _), _))
default bool isLexical(Tree _)
function isComment
bool isComment(Tree t)
default bool isComment(Tree _)