RIPAL: Responsive and Intuitive Parsing for the Analysis of Language

Background

We've now seen context-free grammars and the motivation for developing a top-down parsing algorithm. In this section, we will introduce the simplest top-down parsing algorithm using an LL(1) parser.

Anatomy of an LL(1) parse table

The following illustrates the setup of an LL(1) parse table.

For Σ = {σ₁, σ₂, ..., σ_n}, N = {n₁, n₂, ..., n_n}, T = {t₁, t₂, ..., t_n}, P = {p₁, p₂, ..., p_n}, our table looks like:

	t₁	t₂	...	t_n
n₁	a_n₁,t₁	a_n₁,t₂	...	a_{n₁,t_n}
n₂	a_n₂,t₁	a_n₂,t₂	...	a_{n₂,t_n}
...
n_n	a_{n_n,t₁}	a_{n_n,t₂}	...	a_{n_n,t_n}

Here, t_i ∈ Σ ∪ $. a_{n_i,t_j} ∈ A ∪ ø.

A = {apply₁, apply₂, ..., apply_n}, representing the set of all possible applications of some production P_i from our grammar.

$ is a special end of string symbol that will be explained later. ø here is not the empty set, but instead a null element.

The fundamental question

The funamental question an LL(1) parse table attempts to answer is:

Given a nonterminal n ∈ N and remaining unprocessed input string s_remaining, which production should we apply to n?

The fundamental answer

Let's represent our nonterminal to expand as n_next and the next input symbol as t_next - taken from the beginning of s_remaining. Our next parse action a_next can be found by simply looking up the table entry corresponding to n_next and t_next.We can represent this as:

a_next = a_{n_next,t_next}

The result can be interpreted as:

Reject the input string entirely if a_next = ø
Accept the input string if a_next = accept
Apply the production specified by a_next if a_next ∈ A

Parse table example

Example

Σ = {a, b}

S → a
S → b

Our parse table is:

	a	b
S	1	2

Note

In our table, we use numbers as a short-form notation. In particular, a number i in the table represents an action of apply_i.

Using this table, if we have nonterminal S and are trying to produce the string a, it indicates that grammar rule 1 should be applied. This leads to a derivation that looks like:

S → a (via rule 1)

Conceptually, we know that we need to start with nonterminal S and end with string a. In order to make the correct choice of production, we use the parse table which has encoded information about which production rule to apply to our nonterminal symbol S to produce that next terminal symbol a.

More complex parse table example

Example

Σ = {a, b}

S → aZ
Z → B
B → b

Our parse table is:

	a	b
S	1
Z		2
B		3

Note

The empty cells represent ø. This notation is used by convention for the sake of convenience.

Using the rules from the above parse table, we can derive the string ab as follows:

S → aZ (via rule 1, looking up the table entry for S and a) → aB (via rule 2, looking up the table entry for Z and b) → ab (via rule 3, looking up the table entry for B and b)

Trying to parse the string aa using the same parse table:

S → aZ (via rule 1, looking up the table entry for S and a) reject! (since there is not table entry for Z and a)

The idea here is the same as in our simpler example. Note that we are always looking to choose a production to apply to the leftmost remaining nonterminal symbol.

Conclusion

We've now seen the basic architecture of the LL(1) parsing table and the basic selection method for our next parse action. In the next section, we will formally introduce the LL(1) parsing algorithm.