RIPAL: Responsive and Intuitive Parsing for the Analysis of Language

Pages

Follow sets

Background

We've seen the need to look forward through epsilon productions to obtain the information needed to construct an LL(1) parse table. This section will outline the main underlying concepts needed to do this.

First sets and nullability

Definition

For context-free grammar G, if we have a production in the form:

N → ε

then ε ∈ FIRST(N). This is an extension of our earlier definition of first sets.

We say that N is nullable. This can be formulated as:

NULLABLE(N) = true

The fundamental question

Given a nonterminal symbol n within grammar G, which set of symbols (FOLLOW(n)) can be produced directly following n?

Here, the elements of FOLLOW(n) belong to the set T ∪ {ε}.

Handling the start symbol

Example

Given the grammar

S → a

what can follow nonterminal S?

FOLLOW(S) = {$}

This is true because, once the start symbol has been handled, we should be at the end of the input string.

Definition

We will build up the definition for FOLLOW(n), the follow set of a nonterminal n. We can start with the following rule:

  1. If G has start symbol S ∈ N
    1. then $ ∈ FOLLOW(S)

Trailing terminals

Example

Given the grammar

S → A b
S → A c
A → a

what can follow the nonterminal A?

From the production S → A b, we can see that the terminal b can follow A. Similarly, from the production S → A c, we can see that the terminal c can follow A.

FOLLOW(A) = {b, c}

Definition

We will further build up the definition for FOLLOW(n) as follows:

  1. If G has a production in the form n' → n t | n' ∈ N, t ∈ T
    1. then t ∈ FOLLOW(n)

Trailing nonterminals

Example

Given the grammar

S → A B
A → a
B → b
B → c

what can follow the nonterminal A?

Since the nonterminal B always follows A and B derives the strings b and c, we can easily determine the follow set.

FOLLOW(A) = {b, c}

Definition

We will further build up the definition for FOLLOW(n) as follows:

  1. If G has a production in the form n' → n n2 | n', n2 ∈ N, t ∈ T, t ∈ FIRST(n2)
    1. then t ∈ FOLLOW(n)

Passing through ε

Example

Given the grammar

S → A B C d
A → A
B → ε
C → ε

what can follow the nonterminal A?

From the production S → A B C d, it would appear that we should include elements from FIRST(B) in our follow set. However, there are no terminals in FIRST(B), and subsequently, in FIRST(C).

Since NULLABLE(B) and NULLABLE(C), we can bypass these expressions and include the subsequent terminal d.

FOLLOW(A) = {d}

Definition

We will further build up the definition for FOLLOW(n) as follows:

  1. If G has a production in the form n' → n n2 n3 ... nn t | n', n2, n3, ..., nn ∈ N, NULLABLE(n2), NULLABLE(n3), ..., NULLABLE(nn), t ∈ T
    1. then t ∈ FOLLOW(n)

Example

Given the grammar

S → A B C D
A → A
B → ε
C → ε
D → d

what can follow the nonterminal A?

Similar to in our last example, there are no terminals in FIRST(B), and subsequently, in FIRST(C).

Since NULLABLE(B) and NULLABLE(C), we can bypass these expressions and include terminals from FIRST(D).

FOLLOW(A) = {d}

Definition

We will further build up the definition for FOLLOW(n) as follows:

  1. If G has a production in the form n' → n n2 n3 ... nn | n', n2, n3, ..., nn ∈ N, NULLABLE(n2), NULLABLE(n3), ..., NULLABLE(nn - 1), t ∈ T, t ∈ FIRST(nn)
    1. then t ∈ FOLLOW(n)

Parental inheritance

Example

Given the grammar

S → A
A → B
A → B a
B → C
C → c

what can follow nonternal C?

Observe the following two derivations starting with the start symbol S:

S → A → B → C → c

S → A → B a → C a → c a

From this, we can see that FOLLOW(C) = {$, a}.

In particular, the nonterminal C inherits the follow sets from the nonterminals that derive it.

Definition

We will further build up the definition for FOLLOW(n) as follows:

  1. If G has a production in the form n' → sym1 sym2 sym3 ... symn n | sym ∈ FOLLOW(n')
    1. then sym ∈ FOLLOW(n)

Parental inheritance through ε

Example

Given the grammar

S → A
S → A a
A → B C D
B → b
C → ε
D → ε

what can follow nonterminal B?

Observe the following two derivations starting with the start symbol S:

S → A → B C D → B ε ε → b ε ε → b

S → A a → B C D a → B ε ε a → b ε ε a → b a

From this, we can see that follow A = {$, a}.

In particular, the nonterminal C inherets the end of string follow symbol from its more direct derivation as well as the terminal follow symbol from its parent .

Definition

We will further build up the definition for FOLLOW(n) as follows:

  1. If G has a production in the form n' → sym1 sym2 sym3 ... symn n n2 n3 ... nn | n', n2 ... nn ∈ N, NULLABLE(n2), NULLABLE(n3) ... NULLABLE(nn), sym ∈ FOLLOW(n')
    1. then sym ∈ FOLLOW(n)

Conclusion

Now, we've explored the definition of the follow set. Next, we will look at using it in LL(1) parse table construction.


GitHub Repository: https://github.com/bprollinson/ripal

Copyright © 2017 Brendan Rollinson-Lorimer