RIPAL: Responsive and Intuitive Parsing for the Analysis of Language

Pages

LL(1) ε motivations

Background

We've seen the process for constructing and using LL(1) parse tables. However, we haven't considered the handling of ε productions. This section will present some background and motivations for the handling of such transitions.

Looking ahead

Example

Observe the following CFG:

S → AB
A → a
A → ε
B → b

If we were to follow the LL(1) table construction algorithm as understood so far, we would end up with:

a b
S 1
A 2
B 4

However, if we were to parse the the string b, this table would not be sufficient, even when just considering the start symbol S. Our ideal derivation would be:

S → AB (via rule 1) → εB (via rule 3) → B (since ε represents an empty set of symbols) → b (via rule 4)

When we have start symbol S at the top of the parse stack and the symbol b at the start of the input queue, we should know to apply rule 1 by looking at this single symbol b.

Similarly, when we have symbol A at the top of the parse stack and the symbol b at the start of the input queue, we should know to apply rule 3 by looking at the single symbol b.

In fact, the following augmented parse table can be illustrated to correctly represent the specified language:

a b
S 1 1
A 2 3
B 4

This is a good illustration that the specified CFG is LL(1), since we never use more than one symbol of lookahead with this parse table.

Embracing the end

Example

Observe the following CFG:

S → A
A → B
B → C
C → ε

If we were to attempt to construct an LL(1) parse table for the above grammar using the rules we've learned so far, we would have no way to add any table entries. This happens here because FIRST(S) = FIRST(A) = FIRST(B) = FIRST(C) = {}.

However, we have a straightforward of derivation of the empty string ε as follows:

S → A → B → C → ε

If we categorize the end of string as a symbol that can be used as a lookahead (denoted as $ below), we can construct the following parse table for this language:

$
S 1
A 2
B 3
C 4

This is a good illustration that the specified CFG is LL(1), since we never use more than one symbol of lookahead (here, always $) with this parse table.

Conclusion

From the constructions above, it can be illustrated that we should be able to handle ε transitions in the construction of LL(1) parse tables. Next, we will go over some underlying constructions needed to do so.


GitHub Repository: https://github.com/bprollinson/ripal

Copyright © 2017 Brendan Rollinson-Lorimer