pyVHDLParser

This is a token-stream based parser for VHDL-2008.

This project requires Python 3.8+.

Introduction

Main Goals

Parsing
- slice an input document into tokens and text blocks which are categorized
- preserve case, whitespace and comments
- recover on parsing errors
- good error reporting / throw exceptions
Fast Processing
- multi-pass parsing and analysis
- delay analysis if not needed at current pass
- link tokens and blocks for fast-forward scanning
Generic VHDL Language Model
- Assemble a document-object-model (Code-DOM)
- Provide an API for code introspection

Use Cases

generate documentation by using the fast-forward scanner
generate a document/language model by using the grouped text-block scanner
extract compile orders and other dependency graphs
generate highlighted syntax
re-annotate documenting comments to their objects for doc extraction

Parsing approach

slice an input document into tokens
assemble tokens to text blocks which are categorized
assemble text blocks for fast-forward scanning into groups
translate groups into a document-object-model (DOM)
provide a generic VHDL language model

Long time goals

A Sphinx language plugin for VHDL

TODO: Move the following documentation to ReadTheDocs and replace it with a more lightweight version.

Basic Concept

Example 1

This is an input file:

-- Copryright 2016
library IEEE;
use     IEEE.std_logic_1164.all;

entity myEntity is
  generic (
    BITS : positive := 8
  );
  port (
    Clock   : in   std_logic;
    Output  : out  std_logic_vector(BITS - 1 downto 0)
  );
end entity;

architecture rtl of myEntity is
  constant const0 : integer := 5;
begin
  process(Clock)
  begin
  end process;
end architecture;

library IEEE, PoC;
use     PoC.Utils.all, PoC.Common.all;

package pkg0 is
  function func0(a : integer) return string;
end package;

package body Components is
  function func0(a : integer) return string is
    procedure proc0 is
    begin
    end procedure;
  begin
  end function
end package body;

Step 1

The input file (stream of characters) is translated into stream of basic tokens:

StartOfDocumentToken
LinebreakToken
SpaceToken
- IndentationToken
WordToken
CharacterToken
- FusedCharacterToken
CommentToken
- SingleLineCommentToken
- MultiLineCommentToken
EndOfDocumentToken

The stream looks like this:

<StartOfDocumentToken>
<SLCommentToken '-- Copryright 2016\n'  ................ at 1:1>
<WordToken      'library'  ............................. at 2:1>
<SpaceToken     ' '  ................................... at 2:8>
<WordToken      'IEEE'  ................................ at 2:9>
<CharacterToken ';'  ................................... at 2:13>
<LinebreakToken ---------------------------------------- at 2:14>
<WordToken      'use'  ................................. at 3:1>
<SpaceToken     '     '  ............................... at 3:4>
<WordToken      'IEEE'  ................................ at 3:9>
<CharacterToken '.'  ................................... at 3:13>
<WordToken      'std_logic_1164'  ...................... at 3:14>
<CharacterToken '.'  ................................... at 3:28>
<WordToken      'all'  ................................. at 3:29>
<CharacterToken ';'  ................................... at 3:32>
<LinebreakToken ---------------------------------------- at 3:33>
<LinebreakToken ---------------------------------------- at 4:1>
<WordToken      'entity'  .............................. at 5:1>
<SpaceToken     ' '  ................................... at 5:7>
<WordToken      'myEntity'  ............................ at 5:8>
<SpaceToken     ' '  ................................... at 5:16>
<WordToken      'is'  .................................. at 5:17>
<LinebreakToken ---------------------------------------- at 5:19>
<IndentToken    '\t'  .................................. at 6:1>
<WordToken      'generic'  ............................. at 6:2>
<SpaceToken     ' '  ................................... at 6:9>
<CharacterToken '('  ................................... at 6:10>
<LinebreakToken ---------------------------------------- at 6:11>
<IndentToken    '\t\t'  ................................ at 7:1>
<WordToken      'BITS'  ................................ at 7:3>
<SpaceToken     ' '  ................................... at 7:7>
<CharacterToken ':'  ................................... at 7:8>
<SpaceToken     ' '  ................................... at 7:8>
<WordToken      'positive'  ............................ at 7:10>
<SpaceToken     ' '  ................................... at 7:18>
<FusedCharToken ':='  .................................. at 7:19>
<SpaceToken     ' '  ................................... at 7:21>
<WordToken      '8'  ................................... at 7:22>
<LinebreakToken ---------------------------------------- at 7:23>
<IndentToken    '\t'  .................................. at 8:1>
<CharacterToken ')'  ................................... at 8:2>
<CharacterToken ';'  ................................... at 8:3>
<LinebreakToken ---------------------------------------- at 8:4>
<IndentToken    '\t'  .................................. at 9:1>
<WordToken      'port'  ................................ at 9:2>
<SpaceToken     ' '  ................................... at 9:6>
<CharacterToken '('  ................................... at 9:7>
<LinebreakToken ---------------------------------------- at 9:8>
<IndentToken    '\t\t'  ................................ at 10:1>
<WordToken      'Clock'  ............................... at 10:3>
<SpaceToken     '   '  ................................. at 10:8>
<CharacterToken ':'  ................................... at 10:11>
<SpaceToken     ' '  ................................... at 10:11>
<WordToken      'in'  .................................. at 10:13>
<SpaceToken     '  '  .................................. at 10:15>
<WordToken      'std_logic'  ........................... at 10:17>
<CharacterToken ';'  ................................... at 10:26>
<LinebreakToken ---------------------------------------- at 10:27>
<IndentToken    '\t\t'  ................................ at 11:1>
<WordToken      'Output'  .............................. at 11:3>
<SpaceToken     '       '  ................................... at 11:9>
<CharacterToken ':'  ................................... at 11:10>
<SpaceToken     ' '  ................................... at 11:10>
<WordToken      'out'  ................................. at 11:12>
<SpaceToken     '       '  ................................... at 11:15>
<WordToken      'std_logic_vector'  .................... at 11:16>
<CharacterToken '('  ................................... at 11:32>
<WordToken      'BITS'  ................................ at 11:33>
<SpaceToken     ' '  ................................... at 11:37>
<CharacterToken '-'  ................................... at 11:38>
<SpaceToken     ' '  ................................... at 11:38>
<WordToken      '1'  ................................... at 11:40>
<SpaceToken     ' '  ................................... at 11:41>
<WordToken      'downto'  .............................. at 11:42>
<SpaceToken     ' '  ................................... at 11:48>
<WordT

PyVHDLParser

Install / Use

README