RIPAL: Responsive and Intuitive Parsing for the Analysis of Language

Pages

LR(0) closure definition

Background

We've see a few examples of grammar rule set closures.

In this section, we will formally define this closure and provide some more examples of closure calculations using this definition.

Defining qualified productions

Definition

We will define a qualified production as:

Ni → symprei1 symprei2 ... sympreinpre · symposti1 symposti2 ... sympostinpre | symj ∈ N ∪ T ∪ {ε}

This matches the construct we used in our earlier examples. Typically, this production (minus ·) belongs to our language's grammar.

Closure calculation goal

In our closure calculation, we start with a set of qualified productions. Let's call this set Pqualifiedinitial. These productions take the form illustated above.

At the end of our closure calculation, we also end up with a set of qualified productions. Let's call this set Pqualifiedfinal. As a result, we can represent this mapping as:

Pqualifiedfinal = closure(G, Pqualifiedinitial)

Here, G is our language's grammar.

We know that Pqualifiedinitial ⊆ Pqualifiedfinal, since we start a closure calcuation with our initial qualified production set and extend it through the closure calulation process.

Defining the closure calculation

Given a set of qualified productions Pqualifiedinitial and our context-free grammr G, we can define the result of our closure calculation (i.e. our production closure) Pqualifiedfinal with the following three rules:

  1. If some qualified production Pi ∈ Pqualifiedinitial then Pi ∈ Pqualifiedfinal
  2. If
    1. some qualified production Pcurrent = Ncurrent → symprei1 symprei2 ... sympreinpre · Nnext symposti2 ... sympostinpre ∈ Pqualifiedfinal AND
    2. some production Nnext → symnext1 symnext2 ... symnextn ∈ G
    then Pnext = Nnext → · symnext1 symnext2 ... symnextn ∈ Pqualifiedfinal
  3. If some qualified production Pcurrent = Ncurrent → · ε ∈ Pqualifiedfinal then Pnext = Nnext → ε · ∈ Pqualifiedfinal

Intuitively, the first rule simply states that our initial qualified rules are also part of our final production closure.

The second rule describes the main mechanism for expanding the closure. This is the main mechanism we used in previous examples to allow our goto / reduce actions to chain through multiple productions in our language's grammar.

The third rule allows us to extend our closures past ε symbols since they do not represent an actual terminal symbol.

Simple example

Example

Given the augmented grammar

S' → S $
S → A C
A → a
A → B
B → b

and initial set of qualified production rules Pqualifiedinitial

{S' → · S $}

let's calculate Pqualifiedfinal.

From rule 1, we can initialize Pqualfiedfinal = Pqualifiedinitial.

Pqualifiedfinal = {S' → · S $}

From rule 2, since Pqualifiedfinal contains rule S' → · S $ and G contains production S → A C, we can add qualified production S → · A C to Pqualifiedfinal.

Pqualifiedfinal = {S' → · S $, S → · A C}

From rule 2, since Pqualifiedfinal contains rule S → · A B and G contains productions A → a and A → B, we can apply the same process and extend Pqualifiedfinal to add the qualified versions of these two rules.

Pqualifiedfinal = {S' → · S $, S → · A C, A → · a, A → · B}

Finally, we can complete the closure calculation by applying this same expansion process using qualified production A → · B and grammr rule B → b.

Pqualifiedfinal = {S' → · S $, S → · A C, A → · a, A → · B, B → · b}

From this set of qualified productions, there is nothing more from our grammar that we can add, so we have our closure.

Cycle example

Example

Given the augmented grammar

S' → S $
S → A B
A → a
B → C B
B → b
C → B
C → c

and initial set of qualified production rules Pqualifiedinitial

{B → · C B}

let's calculate Pqualifiedfinal.

From rule 1, we can initialize Pqualifiedfinal = Pqualifiedinitial.

Pqualifiedfinal = {B → · C B}

From rule 2, since Pqualifiedfinal contains rule B → · C B and G contains productions C → B and C → c, we can extend our closure as follows:

Pqualifiedfinal = {B → · C B, C → · B, C → · c}

Finally, we should look at expanding the closure via the qualified production C → · B. From this qualified production, rule 2 allows us to expand the closure via the following set of productions with nonterminal B on the left-hand side:

  1. B → C B
  2. B → b

Here, note how our closure already contains qualified production B → · C B. Since a closure is a set, we don't track the same element multiple times. Intuitively, this makes sense since we can already apply the reduction chain for productions already covered within our parser state. Our final closure set is:

Pqualifiedfinal = {B → · C B, C → · B, C → · c, B → · b}

ε example

Example

Given the augmented grammar:

S' → S $
S → A
A → B
B → b
B → ε

and initial set of qualified rules Pqualifiedinitial

{S' → · S $}

let's calculate Pqualifiedfinal.

From rule 1, we can initialize Pqualifiedfinal = Pqualifiedinitial.

Pqualifiedfinal = {S' → · S $}

From rule 2, since Pqualifiedfinal contains rule S' → · S $ and G contains production S → A, we can extend our closure as follows:

{S' → · S $, S → · A}

From rule 2, since Pqualifiedfinal contains rule S → · A and G contains production A → B, we can extend our closure as follows:

{S' → · S $, S → · A, A → · B}

From rule 2, since Pqualifiedfinal contains rule A → · B and G contains productions B → b and B → ε, we can extend our closure as follows:

{S' → · S $, S → · A, A → · B, B → · b, B → · ε}

From rule 3, since Pqualifiedfinal contains rule B → · ε, we can create a new qualified production rule with the dot shifted past the ε symbol. Our final closure set is:

{S' → · S $, S → · A, A → · B, B → · b, B → · ε, B → ε ·}

Observations

There are a few observations we can immediately make from this process. In particular:

  1. When adding a production starting with a terminal symbol on the right-hand side, there are no subsequent additions to be made
  2. When adding a production starting with a nonterminal symbol on the right-hand side, there may be subsequent additions to be made, and the process may be chained across multiple productions
  3. When adding a qualified production that already exists as part of the closure set, the chaining process ends (at least for that one production chain)
  4. The ε symbol is not a terminal symbol, and does not require a separate state for processing as part of reducing via a production rule

Conclusion

We've now seen the formal definition of the LR(0) closure calculation, which is relevant to parser construction.

Next, we will look at expanding this concept to handle the construction of a state machine which can subsequently be used to generate our LR(0) parse table.


GitHub Repository: https://github.com/bprollinson/ripal

Copyright © 2017 Brendan Rollinson-Lorimer