PyBasic
Simple interactive BASIC interpreter written in Python
Install / Use
/learn @richpl/PyBasicREADME
A BASIC Interpreter - Program like it's 1979!
Introduction
A simple interactive BASIC interpreter written in Python 3. It is based heavily on material in the excellent book Writing Interpreters and Compilers for the Raspberry Pi Using Python by Anthony J. Dos Reis. However, I have had to adapt the Python interpreter presented in the book, both to work with the BASIC programming language and to produce an interactive command line interface. The interpreter therefore adopts the key techniques for interpreter and compiler writing, the use of a lexical analysis stage followed by a recursive descent parser which implements the context free grammar representing the target programming language.
The interpreter is a homage to the home computers of the early 1980s, and when executed, presents an interactive prompt ('>') typical of such a home computer. Commands to run, list, save and load BASIC programs can be entered at the prompt as well as program statements themselves.
The BASIC dialect that has been implemented is slightly simplified, and naturally avoids machine specific instructions, such as those concerned with sound and graphics for example.
There is reasonably comprehensive error checking. Syntax errors will be picked up and reported on by the lexical analyser as statements are entered. Runtime errors will highlight the cause and the line number of the offending statement.
The interpreter can be invoked as follows:
$ python interpreter.py
Although this started of as a personal project, it has been enhanced considerably by some other Github users. You can see them in the list of contributors! It's very much a group endeavour now.
Recent Updates
Array INPUT and READ Support
The interpreter now supports reading data directly into array elements using both INPUT and READ statements:
- Array Elements as Targets: Use syntax like
INPUT A(1), B(I+2)andREAD N$(1), N$(2) - Expression Indices: Support for complex expressions in array indices like
A(I*2+1) - Mixed Data Types: Read numeric and string data into appropriate array types
- Error Handling: Proper SUBSCRIPT ERROR and OUT OF DATA error messages
- Specification Compliant: Follows classic BASIC behavior for all edge cases
See the comprehensive test program examples/array_input_read_tests.bas for demonstrations of all functionality.
Operators
A limited range of arithmetic expressions are provided. Addition and subtraction have the lowest precedence, but this can be changed with parentheses.
- + - Addition
- - - Subtraction
- * - Multiplication
- / - Division
- MOD (or %) - Modulo
> 10 PRINT 2 * 3
> 20 PRINT 20 / 10
> 30 PRINT 10 + 10
> 40 PRINT 10 - 10
> 50 PRINT 15 MOD 10
> RUN
6
2.0
20
0
5
>
Additional numerical operations may be performed using numeric functions (see below).
Not also that '+' does extra duty as a string concatenation operator, while '*' can be used to repeat strings.
Commands
Programs may be listed using the LIST command:
> LIST
10 LET I = 10
20 PRINT I
>
The list command can take arguments to refine the line selection listed
LIST 50 Lists only line 50.
LIST 50-100 Lists lines 50 through 100 inclusive.
LIST 50 100 Also Lists lines 50 through 100 inclusive, almost any delimiter
works here.
LIST -100 Lists from the start of the program through line 100 inclusive.
LIST 50- Lists from line 50 to the end of the program.
A program is executed using the RUN command:
> RUN
10
>
A program may be saved to disk using the SAVE command. Note that the full path must be specified within double quotes:
> SAVE "C:\path\to\my\file"
Program written to file
>
The program may be re-loaded from disk using the LOAD command, again specifying the full path using double quotes:
> LOAD "C:\path\to\my\file"
Program read from file
>
When loading or saving, the .bas extension is assumed if not provided. If you are loading a simple name (alpha/numbers only) and in the working dir, quotes can be omitted:
> LOAD regression
will load regression.bas from the current working directory.
Individual program statements may be deleted by entering their line number only:
> 10 PRINT "Hello"
> 20 PRINT "Goodbye"
> LIST
10 PRINT "Hello"
20 PRINT "Goodbye"
> 10
> LIST
20 PRINT "Goodbye"
>
The program may be erased entirely from memory using the NEW command:
> 10 LET I = 10
> LIST
10 LET I = 10
> NEW
> LIST
>
Program line numbers can be renumbered using the RENUMBER command:
> 10 A = 1
> 20 IF A = 1 THEN 100
> 30 GOTO 10
> 100 PRINT "DONE"
> LIST
10 A = 1
20 IF A = 1 THEN 100
30 GOTO 10
100 PRINT "DONE"
> RENUMBER 100,100
Program renumbered
> LIST
100 A = 1
200 IF A = 1 THEN 400
300 GOTO 100
400 PRINT "DONE"
>
The RENUMBER command supports various parameter combinations:
- RENUMBER - Renumber whole program starting at 10 with increments of 10
- RENUMBER 100 - Start renumbering at 100 with increments of 10
- RENUMBER 50,5 - Start at 50 with increments of 5
- RENUMBER 100,10,200,300 - Renumber only lines 200-300, starting at 100
- RENUMBER ,,200 - Renumber lines 200 and above using defaults
The RENUMBER command automatically updates line number references in GOTO, GOSUB, IF...THEN, ON...GOTO, ON...GOSUB, and RESTORE statements while preserving line numbers in string literals and comments.
Finally, it is possible to terminate the interpreter by issuing the EXIT command:
> EXIT
c:\
On occasion, it might be necessary to force termination of a program and return to the interpreter, for example, because it is caught in an infinite loop. This can be achieved by using Ctrl-C to force the program to stop:
> 10 PRINT "Hello"
> 20 GOTO 10
> RUN
"Hello"
"Hello"
"Hello"
...
...
<Ctrl-C>
Program terminated
> LIST
10 PRINT "Hello"
20 GOTO 10
>
Programming language constructs
Statement structure
As per usual in old school BASIC, all program statements must be prefixed with a line number which indicates the order in which the statements may be executed. A statement may be modified or replaced by re-entering a statement with the same line number:
> 10 LET I = 10
> LIST
10 LET I = 10
> 10 LET I = 200
> LIST
10 LET I = 200
>
Multiple statements may appear on one line separated by a colon:
> 10 LET X = 10: PRINT X
NOTE: Currently inline loops are NOT supported
10 FOR I = 1 to 10: PRINT I: NEXT
will need to be decomposed to individual lines.
Variables
Variable types follow the typical BASIC convention. Simple variables may contain either strings or numbers (the latter may be integers or floating point numbers). Likewise array variables may contain arrays of either strings or numbers, but they cannot be mixed in the same array.
Note that all keywords and variable names are case insensitive (and will be converted to upper case internally by the lexical analyser). String literals will retain their case however. There is no inherent limit on the length of variable names or string literals, this will be dictated by the limitations of Python. The range of numeric values is also dependent upon the underlying Python implementation.
Note that variable names may only consist of alphanumeric characters and underscores. However, they must all begin with an alphabetic character. For example:
- MY_VAR
- MY_VAR6$
- VAR77(0, 0)
are all valid variable names, whereas:
- 5_VAR
- _VAR$
- 66$
are all invalid.
Numeric variables have no suffix, whereas string variables are always suffixed by '$'. Note that 'I' and 'I$' are considered to be separate variables. Note that string literals must always be enclosed within double quotes (not single quotes). Using no quotes will result in a syntax error.
Array variables are defined using the DIM statement, which explicitly lists how many dimensions the array has, and the sizes of those dimensions:
> REM DEFINE A THREE DIMENSIONAL NUMERIC ARRAY
> 10 DIM A(3, 3, 3)
Note that the index of each dimension always starts at zero, but for compatibility with some basic dialects the bounds of each dimension will be expanded by one to enable element access including the len. So in the above example, valid index values for array A will be 0, 1, 2 or 3 for each dimension. Arrays may have a maximum of three dimensions. Numeric arrays will be initialised with each element set to zero, while string arrays will be initialised with each element set to the empty string "".
As for simple variables, a string array has its name suffixed by a '$' character, while a numeric array does not carry a suffix. An attempt to assign a string value to a numeric array or vice versa will generate an error.
Array variables with the same name but different dimensionality are treated as the same. For example, using a DIM statement to define I(5) and then a second DIM statement to define I(5, 5) will result in the second definition (the two dimensional array) overwriting the first definition (the one dimensional array).
Array values may be used within any expression, such as in a PRINT statement for string values, or in any numerical expression for number values. However, you must be specific about which array element you are referencing, using the correct number of in-range indexes. If that particular array value has not yet been assigned, then an error message will be printed.
> 10 DIM MYARRAY(2, 2, 2)
> 20 LET MYARRAY(0, 1, 0) = 56
> 30 PRINT MYARRAY(0, 1, 0)
> RUN
56
> 30 PRINT MYARRAY(0, 0, 0)
> RUN
Empty array value returned in l
