We've now seen context-free grammars and the motivation for developing a top-down parsing algorithm. In this section, we will introduce the simplest top-down parsing algorithm using an LL(1) parser.
The following illustrates the setup of an LL(1) parse table.
For Σ = {σ1, σ2, ..., σn}, N = {n1, n2, ..., nn}, T = {t1, t2, ..., tn}, P = {p1, p2, ..., pn}, our table looks like:
t1 | t2 | ... | tn | |
n1 | an1,t1 | an1,t2 | ... | an1,tn |
n2 | an2,t1 | an2,t2 | ... | an2,tn |
... | ||||
nn | ann,t1 | ann,t2 | ... | ann,tn |
Here, ti ∈ Σ ∪ $. ani,tj ∈ A ∪ ø.
A = {apply1, apply2, ..., applyn}, representing the set of all possible applications of some production Pi from our grammar.
$ is a special end of string symbol that will be explained later. ø here is not the empty set, but instead a null element.
The funamental question an LL(1) parse table attempts to answer is:
Given a nonterminal n ∈ N and remaining unprocessed input string sremaining, which production should we apply to n?
Let's represent our nonterminal to expand as nnext and the next input symbol as tnext - taken from the beginning of sremaining. Our next parse action anext can be found by simply looking up the table entry corresponding to nnext and tnext.We can represent this as:
anext = annext,tnext
The result can be interpreted as:
Example
Σ = {a, b}
S → a
S → b
Our parse table is:
a | b | |
S | 1 | 2 |
Note
In our table, we use numbers as a short-form notation. In particular, a number i in the table represents an action of applyi.
Using this table, if we have nonterminal S and are trying to produce the string a, it indicates that grammar rule 1 should be applied. This leads to a derivation that looks like:
Conceptually, we know that we need to start with nonterminal S and end with string a. In order to make the correct choice of production, we use the parse table which has encoded information about which production rule to apply to our nonterminal symbol S to produce that next terminal symbol a.
Example
Σ = {a, b}
S → aZ
Z → B
B → b
Our parse table is:
a | b | |
S | 1 | |
Z | 2 | |
B | 3 |
Note
The empty cells represent ø. This notation is used by convention for the sake of convenience.
Using the rules from the above parse table, we can derive the string ab as follows:
Trying to parse the string aa using the same parse table:
The idea here is the same as in our simpler example. Note that we are always looking to choose a production to apply to the leftmost remaining nonterminal symbol.
We've now seen the basic architecture of the LL(1) parsing table and the basic selection method for our next parse action. In the next section, we will formally introduce the LL(1) parsing algorithm.
GitHub Repository: https://github.com/bprollinson/ripal
Copyright © 2017 Brendan Rollinson-Lorimer