SkillAgentSearch skills...

Peggi

PEG parser generator in Vimscript

Install / Use

/learn @EinfachToll/Peggi
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Peggi is a parsing framework. On it's own, it's of not much use, but it can serve you when you write a Vim script that has to struggle with data too complicated to parse with regular expressions alone.

Installation

Use one of the many plugin managers for Vim, or install it manually by putting the folder peggi/ into any autoload/ directory in your runtimepath.

Basic Usage

  1. In your Vim script, first specify the formal grammar of your data as Parsing Expression Grammar (PEG)
  2. Then start Peggi: let result = peggi#peggi#parse(grammar, data, start-nonterminal)

See the section “Grammar” for how a grammar should look like exactly. See the other sections for how to use Peggi.

As an example, let's look at a script that uses Peggi for processing arithmetic expressions:

let s:grammar = '
            \ Expression = ( Term , /\s*[+-]/.strip() , Expression ).g:compute()  |  Term
            \ Term = ( Factor , /\s*[*\/]/.strip() , Term ).g:compute()  |  Factor
            \ Factor = ( /\s*(/ , Expression , /\s*)/ ).take("1")  |  /\s*[-0-9.]\+/.str2float()
            \ '

function! g:compute(list)
    if a:list[1] == '*'
        return a:list[0] * a:list[2]
    elseif a:list[1] == '/'
        return a:list[0] / a:list[2]
    elseif a:list[1] == '+'
        return a:list[0] + a:list[2]
    elseif a:list[1] == '-'
        return a:list[0] - a:list[2]
    endif
endfunction

let s:example = '-10 * (2* 2)/(0.6 +2)'

echo peggi#peggi#parse(s:grammar, s:example, 'Expression')

Some other parsing frameworks return its result as some kind of abstract syntax tree (AST) which the user can process easily. In Vimscript however, it's not really fun to make complicated data structures, so in Peggi, the processing of the result happens while parsing. To this end, you can specify transformation functions in the grammar which process the currently parsed text elements. These transformation functions can, in principle, take and return arbitrary types (strings, numbers, lists, dictionaries, …). So you have to take care yourself that the types match.

Regarding the given example, what exactly happens when Peggi attempts to match a Factor to a part of a string? First, it tries to match the regular expression \s*(, which means arbitrary many whitespaces followed by a parenthesis. If that succeeds, it matches an Expression (which I skip in this explanation to avoid recursion). If this also succeeds, more whitespaces and a closing parenthesis is matched. These three items, opening parenthesis, result of the Expression matching, and closing parenthesis are put into a list (because of the two ,) which is handed to the function take() (built into Peggi), which returns the item at position 1 in that list, that is, the result of the expression which should be a number. Well, and if one of these three matches fail, Peggi attempts to match the regular expression \s*[-0-9.]\+, which means a number, and gives the matched string to the function str2float() (built into Vim) which, as the name suggests, makes a number out of it. So, in every case, applying the nonterminal Factor to a string, it returns a number (assuming Expression returns a number and there is no parse fail).

Grammar

The grammar is specified as one (big) string. The best is to format it like this:

let grammar = '
    \ Nonterminal1 = grammar expression ...
    \ Nonterminal2 = other expression ...
    \'

Because of the sloppy highlight of the standard Vimscript syntax file, this gets a nice highlight, when displayed in Vim. Notice the space between the \ and the nonterminals.

For more information about Parsing Expression Grammar, see PEG. In Peggi, a grammar expression has one of the following forms:

.---------------+------------------------------+------------------------------.
| Form          | Function                     | Yields                       |
+===============+==============================+==============================+
| /regexp/      | matches and consumes the     | the matched string or Fail   |
|               | regexp in the input string   | if it doesn't match          |
+---------------+------------------------------+------------------------------+
| "string"      | matches and consumes the     | the matched string or Fail   |
|               | string                       | if it doesn't match          |
+---------------+------------------------------+------------------------------+
| Nonterminal   | matches the right side of    | whatever the right side of   |
|               | this nonterminal             | the nonterminal yields       |
+---------------+------------------------------+------------------------------+
| Expr1 Expr2   | matches first Expr1 and if   | the concatenated results of  |
|               | it's successfull match Expr2 | Expr1 and Expr2 if both are  |
|               | afterwards                   | lists or strings, or Fail if |
|               |                              | one of them fails            |
+---------------+------------------------------+------------------------------+
| Expr1, Expr2  | matches first Expr1 and if   | the results of Expr1 and     |
|               | it's successfull match Expr2 | Expr2 as list of strings, or |
|               | afterwards                   | Fail if one of them fails    |
+---------------+------------------------------+------------------------------+
| Expr1 | Expr2 | matches first Expr1 and, if  | either what Expr1 or what    |
|               | it's not successfull, match  | Expr2 yields, or Fail if     |
|               | Expr2 at the same position   | both of them fail            |
|               | as Expr1                     |                              |
+---------------+------------------------------+------------------------------+
| Expr?         | matches Expr 0 or 1 times    | what Expr returns if it      |
|               |                              | matches, '' if it fails      |
+---------------+------------------------------+------------------------------+
| Expr*         | matches Expr 0 or more times | a (possibly empty) list of   |
|               | (greedy)                     | what Expr yields             |
+---------------+------------------------------+------------------------------+
| Expr°         | matches Expr 0 or more times | the concatenated results of  |
|               | (greedy)                     | Expr if the results are      |
|               |                              | strings or lists             |
+---------------+------------------------------+------------------------------+
| Expr+         | matches Expr 1 or more times | a list of what Expr yields,  |
|               | (greedy)                     | or Fail if it matches not a  |
|               |                              | single time                  |
+---------------+------------------------------+------------------------------+
| Expr#         | matches Expr 1 or more times | the concatenated results of  |
|               | (greedy)                     | Expr if the results are      |
|               |                              | strings or lists, or Fail if |
|               |                              | it matches not a single time |
+---------------+------------------------------+------------------------------+
| &Expr         | matches Expr, but doesn't    | '' if Expr matches, Fail     |
|               | consume it                   | otherwise                    |
+---------------+------------------------------+------------------------------+
| !Expr         | matches Expr, but doesn't    | Fail if Expr matches, ''     |
|               | consume it                   | otherwise                    |
+---------------+------------------------------+------------------------------+
| (Expr)        | matches Expr                 | whatever Expr yields         |
'---------------+------------------------------+------------------------------'

(Note: “Match and consume” means that the string is matched and the internal pointer moves on to the place behind the matched string in order to match the next tokens. “Matching without consuming” means the next expression is matched to the very same part of the string. So &Expr1 &Expr2 Expr3 means that all three expressions are matched to the same part of the string.)

(Note 2: Due to Vims strange behavior concerning line endings, use /\r/ instead of /\n/ to match a line break in the grammar as well as in transformation functions.)

(Note 3: Expr1 Expr2 binds more tightly than Expr1, Expr2 which binds more tightly than Expr1 | Expr2.)

Special Items

Comments

e.g. Nonterminal = Expr1 | Expr2 Expr3 { Comment }

Comments can only appear at the end of a rule definition. They are useful e.g. to make a note about the data type a rule yields

Transformation functions

e.g. Expr.function("arg1", "arg2")

The result of matching Expr is handed as the first argument to function(), followed by the given additional arguments. Additional arguments must always be enclosed in double quotes (use \" for an actual double quote). Functions can be concatenated: Expr.fu1("arg1").fu2("arg2"). Unfortunately, only global functions (that means, starting with a capital, with g: or functions that sit in an autoload directory) can be used as transformation functions. Script-local functions (starting with s:) won't work, because Peggi is a different script from your script.

Peggi has some functions built in:

.------------------------------+------------------------------.
| Function                     | Function (I mean, function   |
|                              | of the function)             |
+==============================+==============================+
| Expr.strip()                 | cuts whitespace off from     |
|                              | left and right               |
+----------
View on GitHub
GitHub Stars8
CategoryDevelopment
Updated5y ago
Forks0

Languages

VimL

Security Score

55/100

Audited on Jan 5, 2021

No findings