RIPAL: Responsive and Intuitive Parsing for the Analysis of Language

Background

We've seen a very basic LR(0) production closure calculation.

In this section, we will illustrate a more complex closure calculation.

Motivating example

Example

Observe the following grammar:

S → A
A → B
B → C D
C → c
D → d

This grammar has the following augmented grammar:

S' → S
S → A
A → B
B → C D
C → c
D → d

It has the following LR(0) parse table:

	c	d	$	S	A	B	C	D	S'
state₁	shift₂			goto₈	goto₇	goto₆	goto₃
state₂	reduce₅	reduce₅	reduce₅
state₃		shift₄						goto₅
state₄	reduce₆	reduce₆	reduce₆
state₅	reduce₄	reduce₄	reduce₄
state₆	reduce₃	reduce₃	reduce₃
state₇	reduce₂	reduce₂	reduce₂
state₈			accept

Parsing input string cd:

Input queue	Parse stack	Action
c d	1	Apply action of shift₂ which corresponds to state₁ and c in our parse table
d	1 c 2	Apply action of reduce₅ which corresponds to state₂ and d in our parse table
d	1 C	Apply action of goto₃ which corresponds to state₁ and C in our parse table
d	1 C 3	Apply action of shift₄ which corresponds to state₃ and d in our parse table
	1 C 3 d 4	Apply action of reduce₆ which corresponds to state₄ and $ in our parse table
	1 C 3 D	Apply action of goto₅ which corresponds to state₃ and D in our parse table
	1 C 3 D 5	Apply action of reduce₄ which corresponds to state₅ and $ in our parse table
	1 B	Apply action of goto₆ which corresponds to state₁ and B in our parse table
	1 B 6	Apply action of reduce₃ which corresponds to state₆ and $ in our parse table
	1 A	Apply action of goto₇ which correspoinds to state₁ and A in our parse table
	1 A 7	Apply action of reduce₂ which corresponds to state₇ and $ in our parse table
	1 S	Apply action of goto₈ which corresponds to state₁ and S in our parse table
	1 S 8	Accept, since this action corresponds to state₈ and $ in our parse table

Intuitive observations

Similar to in the previous example, we can see here how the parsing procedure is rooted in state₁. In particular, this can be seen with the following parse stack states.

1 C
1 B
1 A
1 S

However, before our reduction to nonterminal B, we must first arrive in the following parse stack state:

1 C 3 D 5

Because the production B → C D has two symbols on the right-hand side, we end up pushing two states onto the parse stack before finally reducing to:

1 B

In short, in order to process this production, we leave the previously displayed paradigm of an alternating sequence of goto and reduce actions. This happens because we must first process C before D. Also, C and D both produce a terminal, which requires input.

However, once we reduce to B, this chaining process can proceed. We are able to do this easily because in the reduction to B, we discard all of the additional state information (states 3 and 5) from the parse stack along with nonterminals C and D.

The fundamental question

Using our observations here, we can make our fundamental question more flexible in comparison to the previous example as follows:

Given a qualified production as follows:

N → sym_pre₁ sym_pre₂ ... sym_{pre_{n_pre}} · N_next sym_post₁ sym_post₂ ... sym_{post_{n_post}}

The fundamental question to assess in this context is:

How far can we progress through our productions without processing any input symbols OR with shifting and reducing input symbols such that we arrive back in the initial parse state?

The fundamental answer - start position

The direct chain

Starting with the qualified production:

S' → · S $

we can see from above that, after reducing to state S, our parse stack contains:

1 S

This state of the parse stack is reached through our typical chaining of goto and reduce actions with the following sequence of stack states:

1 B
1 B 6
1 A
1 A 7
1 S

These states of parse stack content are reachable without processing any additional input symbols. As a result, we add the following qualified productions to the closure of S' → · S:

S → · A
A → · B

Notice how all such nonterminals are reachable with only state₁ below it on the parse stack.

The indirect chain

Next, let's consider the following sequence of stack states:

1 C
1 C 3
1 C 3 d 4
1 C 3 D
1 C 3 D 5
1 B

Note here how 1 C is eventually reduced to 1 B. Although additional nonterminals and terminals are processed in between those two stack states, all of that information is removed from the stack when 1 C 3 D 5 is reduced to 1 B.

This is a form of indirect chaining. Although additional steps are taken in between stack states 1 C and 1 B, we do eventually arrive back in state 1.

As a result, we add the following qualified production to our closure:

B → · C D

Even though the nonterminal D was not considered as part of this additional closure production, this closure extension can take place. Nonterminal D will be handled in a subsequent parse state which is resolved before the stack state of 1 B is reached.

The terminal production

Finally, because we have the production:

C → c

in our grammar, we add the following to our closure:

C → · c

since no additional symbols are needed to reach this point in the parsing process.

The closure

In conclusion, the closure of the qualified production set:

{S → · S $}

is:

{S → · S $, S → · A, A → · B C, C → · c}

The fundamental answer - subsequent position

Starting with the qualified production:

B → C · D

we know that nonterminal D appears on the left-hand side of the following production:

D → d

Before processing this production, we are in the following qualified production state:

D → · d

As a result, we can say that the closure of:

{B → C · D}

is

{B → C · D, D → · d}

Conclusion

We've seen a couple of examples of qualified production closure calculations.

Next, we will inspect a closure calculation for a production chain containing an ε production.