RIPAL: Responsive and Intuitive Parsing for the Analysis of Language

Background

In a previous section, we illustrated the method for building the LR(0) state machine for a grammar producing a nonterminal symbol in one of its productions.

In this section, we will expand this construction to illustrate the process for constructing an LR(0) state machine for a grammar producing multiple nonterminal symbols in a single production.

Our grammar

The language we will be analyzing is specified by the following grammar:

S → A B
A → a
B → b

This grammar has the following augmented grammar:

S' → S $
S → A B
A → a
B → b

Our parse table

This grammar has the following LR(0) parse table:

	a	b	$	S	A	B	S'
state₁	shift₅			goto₆	goto₂
state₂		shift₃				goto₄
state₃	reduce₄	reduce₄	reduce₄
state₄	reduce₂	reduce₂	reduce₂
state₅	reduce₃	reduce₃	reduce₃
state₆			accept

Parsing input string ab:

Input queue	Parse stack	Action
ab	1	Apply action of shift₅, which corresponds to state₁ and a in our parse table
b	1 a 5	Apply action of reduce₃, which corresponds to state₅ and b in our parse table
b	1 A	Apply action of goto₂, which corresponds to state₁ and A in our parse table
b	1 A 2	Apply action of shift₃, which corresponds to state₂ and b in our parse table
	1 A 2 b 3	Apply action of reduce₄, which corresponds to state₃ and $ in our parse table
	1 A 2 B	Apply action of goto₄, which corresponds to state₂ and B in our parse table
	1 A 2 B 4	Apply action of reduce₂, which corresponds to state₄ and $ in our parse table
	1 S	Apply action of goto₆, which corresponds to state₁ and S in our parse table
	1 S 6	Accept, since this action corresponds to state₆ and $ in our parse table

Initial state

As per our standard process, we will construct our initial qualified production rule.

Our initial set of qualified productions is:

{S' → · S $}

Calculating the closure set, we end up with:

{S' → · S $, S → · A B, A → · a}

This is as far as we can go, since we would need to process nonterminal A or terminal a to proceed further.

As a result, our initial parse state is:

DFA state 1. This state contains the following productions: S prime produces dot S dollar, S produces dot A B, A produces dot a.

Handling terminal a

Observe the following sequence of parse stack states encountered in the parsing example above:

1
1 a 5
1 A

Here, we first shift a onto our parse stack and move into state₅. Here, state₅ represent being at the point in our parsing procedure where the application of a reduction corresponding to the production A → a can take place.

In applying this reduction, we are able to return to our initial parse state and continue the standard chaining process.

To calculate the closure for state₅, we take the following qualified production:

A → · a

We shift the dot past the terminal a that we are processing and create the following qualified production set from this action:

{A → a ·}

Since the dot is at the end of the production, this closure can't be further expanded. We end up with state₅ being defined as follows:

DFA state 5. This state contains the following production: A produces a dot.

Finally, we connect state₁ and state₅ with the appropriate transition as follows:

DFA containing two states. State 1 contains the following productions: S prime produces dot S dollar, S produces dot A B, A produces dot a. State 5 contains the following production: A produces a dot. State 1 transitions to state 5 on terminal symbol a.

Handling terminal b

Observe the following sequence of parse stack states encountered in the parsing example above:

1 A
1 A 2
1 A 2 b 3
1 A 2 B

After reducing terminal a to nonterminal A, we want to further make a reduction via the rule S → A B. However, we need to first get nonterminal B onto our parse stack to use this rule.

First, we use the goto₂ action to transition into a state where we are ready to process more input symbols.

We shift terminal b onto the parse stack and transition into state₃, in preparation for applying a reduction via the rule B → b. Then we can apply our reduction as shown in the last stack state listed above.

To calculate the closure for state₂, we take the following qualified production:

S → · A B

We shift the dot past the nonterminal A that we are processing and create the following qualified production set from this action:

{S → A · B}

Expanding via our closure calculation, we end up with a closure of:

{S → A · B, B → · b}

We end up with state₂ being defined as follows:

DFA state 2. This state contains the following productions: S produces A dot B, B produces dot b.

Finally, we connect state₁ and state₂ with the appropriate transition as follows:

DFA containing three states. State 1 contains the following productions: S prime produces dot S dollar, S produces dot A B, A produces dot a. State 5 contains the following production: A produces a dot. State 1 transitions to state 5 on terminal symbol a. State 2 contains the following productions: S produces A dot B, B produces dot b. State 1 transitions to state 2 on nonterminal symbol A.

Next, we create state₃ by taking qualified production:

B → · b

We shift the dot past the terminal b that we are processing and create the following qualified production set from this action:

{B → b ·}

Since the dot is as the end of the production, this closure can't be further expanded. We end up with state₃ being defined as follows:

DFA state 3. This state contains the following production: B produces b dot.

Finally, we connect state₂ and state₃ with the appropriate transition as follows:

DFA containing four states. State 1 contains the following productions: S prime produces dot S dollar, S produces dot A B, A produces dot a. State 5 contains the following production: A produces a dot. State 1 transitions to state 5 on terminal symbol a. State 2 contains the following productions: S produces A dot B, B produces dot b. State 1 transitions to state 2 on nonterminal symbol A. State 3 contains the following production: B produces b dot. State 2 transitions to state 3 on terminal symbol b.

Handling nonterminal S

Observe the following sequence of parse stack states encountered in the parsing example above:

1 A 2 B
1 A 2 B 4
1 S

After reducing terminal b to nonterminal B, we arrive back in state₂.

From here, we use our standard goto action corresponding to nonterminal processing and transition into state₄.

To calculate the closure for state₄, we start with the qualified production:

S → A · B

We shift the dot past the nonterminal B that we are processing and create the following qualified production set from this action:

{S → A B ·}

Since the dot is at the end of the production, this closure can't be further expanded. We end up with state₄ being defined as follows:

DFA state 4. This state contains the following production: S produces A B dot.

Finally, we connect state₂ and state₄ with the appropriate transition as follows:

DFA containing five states. State 1 contains the following productions: S prime produces dot S dollar, S produces dot A B, A produces dot a. State 5 contains the following production: A produces a dot. State 1 transitions to state 5 on terminal symbol a. State 2 contains the following productions: S produces A dot B, B produces dot b. State 1 transitions to state 2 on nonterminal symbol A. State 3 contains the following production: B produces b dot. State 2 transitions to state 3 on terminal symbol b. State 4 contains the following production: S produces A B dot. State 2 transitions to state 4 on nonterminal symbol B.

At this point, we are able to reduce nonterminals A and B to nonterminal S via the production S → A B, discarding the state information attached to those nonterminals.

Handling end of string

After reducing A B to S, from state₁, we want to process the end of string according to our standard approach.

Extending our DFA, we end up with a final diagram of:

DFA containing seven states. State 1 contains the following productions: S prime produces dot S dollar, S produces dot A B, A produces dot a. State 5 contains the following production: A produces a dot. State 1 transitions to state 5 on terminal symbol a. State 2 contains the following productions: S produces A dot B, B produces dot b. State 1 transitions to state 2 on nonterminal symbol A. State 3 contains the following production: B produces b dot. State 2 transitions to state 3 on terminal symbol b. State 4 contains the following production: S produces A B dot. State 2 transitions to state 4 on nonterminal symbol B. State 6 contains the following production: S prime produces S dot $. State 1 transitions to state 6 on nonterminal symbol S. State 7 contains the following production: S prime produces S $ dot. State 6 transitions to state 7 on end of string symbol $. State 7 is an accepting state.

Conclusion

We've seen the LR(0) DFA construction process for a grammar containing a multi-nonterminal production.

In the next section, we will start to explore the process of converting an LR(0) DFA into a parse table.

RIPAL: Responsive and Intuitive Parsing for the Analysis of Language

Pages

LR(0) multi-nonterminal production state machine

Background

Our grammar

Our parse table

Initial state

Handling terminal a

Handling terminal b

Handling nonterminal S

Handling end of string

Conclusion