RIPAL: Responsive and Intuitive Parsing for the Analysis of Language

Pages

The LL(1) parse table

Background

We've now seen context-free grammars and the motivation for developing a top-down parsing algorithm. In this section, we will introduce the simplest top-down parsing algorithm using an LL(1) parser.

Anatomy of an LL(1) parse table

The following illustrates the setup of an LL(1) parse table.

For Σ = {σ1, σ2, ..., σn}, N = {n1, n2, ..., nn}, T = {t1, t2, ..., tn}, P = {p1, p2, ..., pn}, our table looks like:

t1 t2 ... tn
n1 an1,t1 an1,t2 ... an1,tn
n2 an2,t1 an2,t2 ... an2,tn
...
nn ann,t1 ann,t2 ... ann,tn

Here, ti ∈ Σ ∪ $. ani,tj ∈ A ∪ ø.

A = {apply1, apply2, ..., applyn}, representing the set of all possible applications of some production Pi from our grammar.

$ is a special end of string symbol that will be explained later. ø here is not the empty set, but instead a null element.

The fundamental question

The funamental question an LL(1) parse table attempts to answer is:

Given a nonterminal n ∈ N and remaining unprocessed input string sremaining, which production should we apply to n?

The fundamental answer

Let's represent our nonterminal to expand as nnext and the next input symbol as tnext - taken from the beginning of sremaining. Our next parse action anext can be found by simply looking up the table entry corresponding to nnext and tnext.We can represent this as:

anext = annext,tnext

The result can be interpreted as:

  1. Reject the input string entirely if anext = ø
  2. Accept the input string if anext = accept
  3. Apply the production specified by anext if anext ∈ A

Parse table example

Example

Σ = {a, b}

S → a
S → b

Our parse table is:

a b
S 1 2

Note

In our table, we use numbers as a short-form notation. In particular, a number i in the table represents an action of applyi.

Using this table, if we have nonterminal S and are trying to produce the string a, it indicates that grammar rule 1 should be applied. This leads to a derivation that looks like:

S → a (via rule 1)

Conceptually, we know that we need to start with nonterminal S and end with string a. In order to make the correct choice of production, we use the parse table which has encoded information about which production rule to apply to our nonterminal symbol S to produce that next terminal symbol a.

More complex parse table example

Example

Σ = {a, b}

S → aZ
Z → B
B → b

Our parse table is:

a b
S 1
Z 2
B 3

Note

The empty cells represent ø. This notation is used by convention for the sake of convenience.

Using the rules from the above parse table, we can derive the string ab as follows:

S → aZ (via rule 1, looking up the table entry for S and a) → aB (via rule 2, looking up the table entry for Z and b) → ab (via rule 3, looking up the table entry for B and b)

Trying to parse the string aa using the same parse table:

S → aZ (via rule 1, looking up the table entry for S and a) reject! (since there is not table entry for Z and a)

The idea here is the same as in our simpler example. Note that we are always looking to choose a production to apply to the leftmost remaining nonterminal symbol.

Conclusion

We've now seen the basic architecture of the LL(1) parsing table and the basic selection method for our next parse action. In the next section, we will formally introduce the LL(1) parsing algorithm.


GitHub Repository: https://github.com/bprollinson/ripal

Copyright © 2017 Brendan Rollinson-Lorimer