RIPAL: Responsive and Intuitive Parsing for the Analysis of Language

Pages

LR(0) complex closure calculation example

Background

We've seen a very basic LR(0) production closure calculation.

In this section, we will illustrate a more complex closure calculation.

Motivating example

Example

Observe the following grammar:

S → A
A → B
B → C D
C → c
D → d

This grammar has the following augmented grammar:

S' → S
S → A
A → B
B → C D
C → c
D → d

It has the following LR(0) parse table:

c d $ S A B C D S'
state1 shift2 goto8 goto7 goto6 goto3
state2 reduce5 reduce5 reduce5
state3 shift4 goto5
state4 reduce6 reduce6 reduce6
state5 reduce4 reduce4 reduce4
state6 reduce3 reduce3 reduce3
state7 reduce2 reduce2 reduce2
state8 accept

Parsing input string cd:

Input queue Parse stack Action
c d 1 Apply action of shift2 which corresponds to state1 and c in our parse table
d 1 c 2 Apply action of reduce5 which corresponds to state2 and d in our parse table
d 1 C Apply action of goto3 which corresponds to state1 and C in our parse table
d 1 C 3 Apply action of shift4 which corresponds to state3 and d in our parse table
1 C 3 d 4 Apply action of reduce6 which corresponds to state4 and $ in our parse table
1 C 3 D Apply action of goto5 which corresponds to state3 and D in our parse table
1 C 3 D 5 Apply action of reduce4 which corresponds to state5 and $ in our parse table
1 B Apply action of goto6 which corresponds to state1 and B in our parse table
1 B 6 Apply action of reduce3 which corresponds to state6 and $ in our parse table
1 A Apply action of goto7 which correspoinds to state1 and A in our parse table
1 A 7 Apply action of reduce2 which corresponds to state7 and $ in our parse table
1 S Apply action of goto8 which corresponds to state1 and S in our parse table
1 S 8 Accept, since this action corresponds to state8 and $ in our parse table

Intuitive observations

Similar to in the previous example, we can see here how the parsing procedure is rooted in state1. In particular, this can be seen with the following parse stack states.

  1. 1 C
  2. 1 B
  3. 1 A
  4. 1 S

However, before our reduction to nonterminal B, we must first arrive in the following parse stack state:

1 C 3 D 5

Because the production B → C D has two symbols on the right-hand side, we end up pushing two states onto the parse stack before finally reducing to:

1 B

In short, in order to process this production, we leave the previously displayed paradigm of an alternating sequence of goto and reduce actions. This happens because we must first process C before D. Also, C and D both produce a terminal, which requires input.

However, once we reduce to B, this chaining process can proceed. We are able to do this easily because in the reduction to B, we discard all of the additional state information (states 3 and 5) from the parse stack along with nonterminals C and D.

The fundamental question

Using our observations here, we can make our fundamental question more flexible in comparison to the previous example as follows:

Given a qualified production as follows:

N → sympre1 sympre2 ... symprenpre · Nnext sympost1 sympost2 ... sympostnpost

The fundamental question to assess in this context is:

How far can we progress through our productions without processing any input symbols OR with shifting and reducing input symbols such that we arrive back in the initial parse state?

The fundamental answer - start position

The direct chain

Starting with the qualified production:

S' → · S $

we can see from above that, after reducing to state S, our parse stack contains:

1 S

This state of the parse stack is reached through our typical chaining of goto and reduce actions with the following sequence of stack states:

  1. 1 B
  2. 1 B 6
  3. 1 A
  4. 1 A 7
  5. 1 S

These states of parse stack content are reachable without processing any additional input symbols. As a result, we add the following qualified productions to the closure of S' → · S:

  1. S → · A
  2. A → · B

Notice how all such nonterminals are reachable with only state1 below it on the parse stack.

The indirect chain

Next, let's consider the following sequence of stack states:

  1. 1 C
  2. 1 C 3
  3. 1 C 3 d 4
  4. 1 C 3 D
  5. 1 C 3 D 5
  6. 1 B

Note here how 1 C is eventually reduced to 1 B. Although additional nonterminals and terminals are processed in between those two stack states, all of that information is removed from the stack when 1 C 3 D 5 is reduced to 1 B.

This is a form of indirect chaining. Although additional steps are taken in between stack states 1 C and 1 B, we do eventually arrive back in state 1.

As a result, we add the following qualified production to our closure:

B → · C D

Even though the nonterminal D was not considered as part of this additional closure production, this closure extension can take place. Nonterminal D will be handled in a subsequent parse state which is resolved before the stack state of 1 B is reached.

The terminal production

Finally, because we have the production:

C → c

in our grammar, we add the following to our closure:

C → · c

since no additional symbols are needed to reach this point in the parsing process.

The closure

In conclusion, the closure of the qualified production set:

{S → · S $}

is:

{S → · S $, S → · A, A → · B C, C → · c}

The fundamental answer - subsequent position

Starting with the qualified production:

B → C · D

we know that nonterminal D appears on the left-hand side of the following production:

D → d

Before processing this production, we are in the following qualified production state:

D → · d

As a result, we can say that the closure of:

{B → C · D}

is

{B → C · D, D → · d}

Conclusion

We've seen a couple of examples of qualified production closure calculations.

Next, we will inspect a closure calculation for a production chain containing an ε production.


GitHub Repository: https://github.com/bprollinson/ripal

Copyright © 2017 Brendan Rollinson-Lorimer