RIPAL: Responsive and Intuitive Parsing for the Analysis of Language

Pages

ε handling in LL(1) parser construction

Background

We now have a basic understanding of how to construct an LL(1) parser as well as a good understanding of the calculations for first and follow sets. In this section, we will outline the process for handling ε transitions in LL(1) parse table construction.

Motivating examples

Example

Observe the following grammar:

S → A B
A → ε
B → b

Using our LL(1) parse table construction rules as outlined so far, our parse table would be:

b
S
A
B 3

Clearly, we have a problem here because the productions S → A B and A → ε are not captured anywhere in the parse table.

Further examining the grammar, when observing the production S → A B, we know to apply that production followed by the production A → ε if the first symbol in our input queue is b. This can be captured with the following parse table which includes additional entries to handle this:

b
S 1
A 2
B 3

Example

Observe the following grammar:

S → A
A → B
B → C
C → ε

Using our LL(1) parse table construction rules as outlined so far, we would have an empty parse table. However, this grammar is LL(1) and can be recognized using the following parse table:

$
S 1
A 2
B 3
C 4

Epsilon handling in first sets

Further observing the grammar:

S → A B
A → ε
B → b

we would like our definition of FIRST(n) to be extended such that it handles ε transitions correctly.

For the production S → A B, we would like to further define FIRST(n) in such a way that FIRST(S) = {b}, since b would be our lookahead for nonterminal S when processing input string b.

Specifically, our definition of FIRST(n) should be smart enough to pass through nullable nonterminals to find the next real symbol for inclusing in our parse table.

Definition

  1. If G has a production in the form: n → n1 n2 n3 ... nn t sym1 sym2 sym3 ... symn | n1, n2 ... nn ∈ N, NULLABLE(n1), NULLABLE(n2), NULLABLE(n3), ..., NULLABLE(nn), t ∈ T
    1. then t ∈ FIRST(n)
  2. If G has a production in the form: n → n1 n2 n3 ... nn nfirst sym1 sym2 sym3 ... symn | n1, n2 ... nn ∈ N, NULLABLE(n1), NULLABLE(n2), NULLABLE(n3), ..., NULLABLE(nn), t ∈ FIRST(nfirst), t ∈ T
    1. then t ∈ FIRST(n)

Applying this logic to the production S → A B, we can further fill in our parse table as follows:

$
S 1
A
B 3

Observing the production A → ε, it's clear that no terminal first set elements can be obtained from this production, since it contains no terminals. Hence, we state that FIRST(A) = {ε}. Contrasting this with the production S → A B, we know that we only want the first set of a nonterminal to include ε if all elements in one of its productions contains only nullable nonterminals.

Definition

  1. If G has a production in the form: n → n1 n2 n3 ... nn | n1, n2 ... nn ∈ N, NULLABLE(n1), NULLABLE(n2), NULLABLE(n3), ..., NULLABLE(nn)
    1. then t ε FIRST(n)

Using the follow set

How can we handle the production A → ε? If we have the input symbol b at the beginning of our input queue, we know it's safe to use the production A → ε since B → b.

The rule for this can be formalized as follows:

Definition

  1. If pi is of the form n1 n2 n3 ... nn | n1, n2 ... nn ∈ N, NULLABLE(n1), NULLABLE(n2), NULLABLE(n3), ..., NULLABLE(nn)
    1. then ε ∈ FIRSTPROD(i)

We can use this to further extend the construction of our LL(1) parse table. In particular, we can find nullable productions and use the left-hand nonterminal's follow set to anticipate when to apply an epsilon production.

For each production pi = n → sym1, sym2, ..., symn in G For each terminal t ∈ FIRSTPROD(i) If PTn,t is already set Fail, as the grammar is not LL(1) Set PTn,t = i If ε ∈ FIRSTPROD(i) For each terminal t ∈ FOLLOW(n) If PTn,t is already set Fail, as the grammar is not LL(1) Set PTn,t = i

In our first grammar, FOLLOW(A) = {b}, so we can apply the production A → ε if the next input symbol is b. Using this rule, we arrive at our final parse table:

b
S 1
A 2
B 3

Null string example

Applying this updated algorithm to the following grammar:

S → A
A → B
B → C
C → ε

note that FOLLOW(S) = FOLLOW(A) = FOLLOW(B) = FOLLOW(C).

This results in the following parse table:

$
S 1
A 2
B 3
C 4

In a subsequent section, we will illustrate the correct process for handling end-of-string actions within the parsing process. For now, note conceptually that, when the end of string is reached, the production chain S → A → B → C → ε will occur based on the corresponding $ table entries.

Follow set conflicts

Much like when using the first set (more specifically, FIRSTPROD) to construct a parse table, the usage of the follow set can result in a conflict if the grammar is not LL(1).

The following grammar:

S → A b
S → b
A → ε

results in the following parse table:

b
S 1/2
A 3

This occurs because, for nonterminal S and next input symbol b, it's not clear which of the two following derivations should be used:

S → A b → ε b → b

S → b

This is an example of a first / follow conflict, since FIRST(S) and FOLLOW(A) both result in the same production.

Conclusion

Now, we've illustrated how to construct parse tables for all LL(1) grammars, including those with ε transitions. Next, we will illustrate some final details on applying the LL(1) parsing algorithm when encountering the end of string symbol.


GitHub Repository: https://github.com/bprollinson/ripal

Copyright © 2017 Brendan Rollinson-Lorimer