If you want to write a compiler, you can start small. The most simple language to implement is Forth. It consists of words separated by whitespace. A word contains anything except whitespace.

exerciseRead Forth code from some input stream. There shall be a get_next function to get the next word. Make sure to skip the whitespace correctly.

Now those programs should do something. Forth maintains a dictionary of words, which maps words to their function. We need to initialize this dictionary with a few builtin words, then we can execute words we read from a stream. Forth uses a stack, where executed words push and pop numbers. For example, you could have a word + which pops two numbers from the stack and pushs the sum of them. There could be some print word, which pops and outputs. If a word is not found in the dictionary and it consists of digits, it is a number, which is pushed on the stack.

exerciseCreate a dictionary of some sort. Initialize it with the words + and print. Execute programs like 2 3 4 + + print.

At this point, you have something like a stack-based bytecode interpreter or more precisely a wordcode interpreter. This is the core of what a Java Virtual Machine does.

Every serious language requires control flow. In Forth, the primitives are usually called branch, for a simple jump within the program, and branch?, for conditional jump depending on the top stack value. To implement this, you need at least the currently executed word, if not the whole program, available as a sequential random access data structure (array, list, vector, etc).

exerciseProvide builtin branch and branch? words.

A Forth interpreter has two modes: Execute and compile. You already have the execute mode working from the last exercise. The compile mode is started with the word : and ended with ;. More precisely, : sets the compile-flag and reads one more word, which will now be compiled. If the interpreter is in compile mode, then words are not executed, but appended to the currently compiling words body. The only exception is ;, which is always executed and clears the compile-flag and makes the currently compiled word available for execution. As an example, you could create a word, which adds one to the top stack element, via : ++ 1 add ;.

Now that there is a difference between builtin and compiled words, you need a way to return from words like you return from functions. Forths usually employ a second stack, where return addresses are pushed. The ; implicitly adds a return command to the currently compiled word. However, if Forth is implemented in a high-level language, you can use the host call stack for this, which means you use recursive calls for words.

exerciseImplement : and ; and compile-mode in your interpreter.

For the previous exercise, we made ; special. Actually, Forth calls such words "immediate" and they are not special. Any word can be marked as immediate or not. Depending on this flag, an interpreter in compile mode either executes a word or pushs it to the currently compiled word. Different Forths do this in different ways. Some provide a word immediate, which makes the last compiled word immediate. Others provide an alternative word for :, which compiles immediate words instead.


Implement a way to compile immediate words.

TODO do something interesting with immediate words

Congratulations, you now have a nice Forth interpreter. The following exercises make you play around with it. If you want to look at another implementation, I recommend Jonesforth.

exerciseDefine words ( and ) in Forth. Make ( read words until ) and do nothing with them. This is how Forth implements comments.
exerciseIn Forth if, while, and similar control flow constructs are not builtin but implemented. Implement them in your Forth.