RIPAL: Responsive and Intuitive Parsing for the Analysis of Language

Pages

Basic LL(1) parser construction

Background

We now understand the construction of first sets from a context-free grammar. Now, we can illustrate how to construct an LL(1) parse table using these first sets.

Extending our concept of the first set

In order to outline our table construction algorithm, we must first slightly extend our concept of first sets.

Let's say we have context-free grammar G with productions P = {p1, p2, ..., pn}.

We will define a specialized first set of production pi, FIRSTPROD(i), that captures the set of terminal symbols that can be the first derived starting with the specified production.

Specifically:

  1. If pi is of the form Ni → T sym2 sym3 ... symn for some terminal t
    1. then t ∈ FIRSTPROD(i)
  2. If pi is of the form Ni → Nfirst sym2 sym3 ... symn and T ∈ FIRST(Nfirst) for terminal t
    1. then t ∈ FIRSTPROD(i)

Example

S → a
S → b
S → C
C → c
C → D
C → E
D → d
E → e

p4 = C → c

From this, we can simply derive FIRSTPROD(4) = {c}.

p5 = C → D

We can calculate FIRST(D) as {d}. From this, we can determine that FIRSTPROD(5) = {d}. Note that this does not include all terminals from FIRST(C) - only those reachable starting with production 5.

Algorithm

This section illustrates the algorithm for constructing an LL(1) parse table.

Note

The algorithm outlined here is designed to work for CFGs with no ε productions. The handling of such productions will be outlined in a later section.

Algorithm

First, we want to initialize our parse table PT as an empty table with:

  1. One row for each nonterminal that appears in the CFG
  2. One column for each terminal that appears in the CFG
    1. Later on, we will also be adding a column for $, the end of string symbol, though it's not required yet

Then, we can populate the table as follows from our grammar G:

For each production pi = n → sym1, sym2, ..., symn in G For each terminal t ∈ FIRSTPROD(i) If PTn,t is already set Fail, as the grammar is not LL(1) Set PTn,t = i

By running through this process, we can construct the relevant LL(1) parse table and subsequently use it in our parser. Note that we are still using i as a short-form for applyi in our description of the parse table entries.

Parse table construction example

Example

Given the grammar:

S → A
A → B d
A → D d
B → b
D → d
D → e

We can start by calculating our first sets as follows:

  1. FIRST(S) = {b, d, e}
  2. FIRST(A) = {b, d, e}
  3. FIRST(B) = {b}
  4. FIRST(D) = {d, e}

Then, we can calculate the first sets for productions as follows:

  1. FIRSTPROD(1) = FIRST(A) = {b, d, e}
  2. FIRSTPROD(2) = FIRST(B) = {b}
  3. FIRSTPROD(3) = FIRST(D) = {d, e}
  4. FIRSTPROD(4) = {b}
  5. FIRSTPROD(5) = {d}
  6. FIRSTPROD(6) = {e}

From this information, we can simply fill in the parse table as follows:

b d e
S 1 1 1
A 2 3 3
B 4
D 5 6

Example

Observe the following grammar:

S → ab
S → a

Note how both productions' right-hand side starts with the terminal a.

Constructing the parse table, we end up with:

a
S 1/2

From this table, we clearly have a conflict. If the next nonterminal is S and our next input symbol is a, it's not clear whether we take the action of apply1 or apply2.

Due to this ambiguity, we know that the provided grammar is not LL(1). As a result, we would not be able to construct an LL(1) parser for it. Note that the 1/2 notation is simply used to illustrate the conflict - this parse table is not valid.

Conclusion

Now, we have the tools to construct an LL(1) parse table for any LL(1) grammar that does not contain the symbol ε. Next, we will introduce the underlying concepts needed to handle ε transitions.


GitHub Repository: https://github.com/bprollinson/ripal

Copyright © 2017 Brendan Rollinson-Lorimer