RIPAL: Responsive and Intuitive Parsing for the Analysis of Language

Pages

The LR(0) parse table

Background

We've now seen LR(0) states and parsing actions at a conceptual level. In this section, we will introduce the simplest bottom-up parsing algorithm using an LR(0) parser.

Anatomy of an LR(0) parse table

The following illustrates the setup of an LR(0) parse table.

For Σ = {σ1, σ2, ..., σn}, N = {n1, n2, ..., nn}, T = {t1, t2, ..., tn}, States = {state1, state2, ..., staten}, our table looks like:

sym1 sym2 ... symn
state1 an1,t1 an1,t2 ... an1,tn
state2 an2,t1 an2,t2 ... an2,tn
...
staten ann,t1 ann,t2 ... ann,tn

Here, ti ∈ Σ ∪ $. symi ∈ T ∪ N ∪ $. ani,tj ∈ A ∪ ø.

A = {shift} ∪ Reduce ∪ {accept} ∪ Goto, representing all of the possible actions in our parse table.

Here, shift represents shifting the first (terminal) symbol from the input queue onto the top of the stack.

Reduce = {reducep1, reducep2>, ..., reducepn}, where reducepi represents the replacement of the right-hand side of production pi with its left-hand nonterminal on the top of the parse stack.

accept represent accepting the input string as part of the specified language.

Goto = {gotostate1, gotostate2, ..., gotostaten}, where gotostatei represents changing the parser start to the specified state.

The fundamental question

The fundamental question an LR(0) parse table attempts to answer is:

Given parse stack {stack1, stack2, ..., stackn} | stacki ∈ T ∪ N ∪ State and input queue sremaining, what action should we apply to our parse stack and / or input queue?

The fundamental answer

Let's represent our parse stack as {stack1, stack2, ..., stackn} and the next input symbol as tnext - taken from the beginning of sremaining. Our next parse action anext can be found as follows:

First, determine statecurrent as follows:

Start with the top symbol of the parse stack and move downward until some state is encountered. This is statecurrent.

Note that, technically, digging deeper into the parse stack is not a valid stack operation, but could be implemented by having a second stack to temporarily offload stack elements to.

Next, determine symcurrent as follows:

  1. If stackn ∈ T ∪ N ∪ $ then symcurrent = stackn
  2. Otherwise, symcurrent = tnext

In short, we determine the next terminal symbol to be handled by checking the top of the parse stack, then defaulting to the next input symbol if the top of the parse stack is a state and not a symbol that can be processed.

Our next parse action, anext, can be found by simply looking up the table entry corresponding to statecurrent and symcurrent. We can represent this as:

anext = astatenext, symnext

The result can be interpreted as:

  1. Reject the input string entirely if anext = ø
  2. Accept the input string if anext = accept
  3. Shift tnext from the front of the input queue and onto the top of the parse stack and go to state statei if anext = shiftstatei
  4. Reduce pi by replacing its right-hand symbols with its left-hand nonterminal on the top of the parse stack if anext = reducei
  5. Goto statei by placing it on top of the parse stack if anext = gotoi

Reject and accept actions

The reject and accept actions are intuitive - they simply reject or accept the input string.

Example shift action

Example

If we have parse stack

(top of stack) state1

and input queue

(head of queue) a b

then an action of shift2 will result in a parse stack of

(top of stack) state2 a state1

and input queue

(head of queue) b

In an action of shiftstatei, we:

  1. shift tnext onto the top of the parse stack and
  2. push statei onto the top of the parse stack

Note that, after this shift operation, we are in a new parse state of statei.

Example reduce action

Example

If we have augmented grammar

S' → S $
S → a b

and parse stack

(top of stack) state3 b state2 a state1

then an action of reduce2 will result in a parse stack of

(top of stack) S

In an action of reducepi, we:

  1. remove 2 * n elements from the top of the parse stack, where n is the number of elements in the right-hand side of pi
  2. push the left-hand nonterminal symbol n from pi onto the top of the parse stack

After this action, we have reverted to the previous parse state that represents where the parser was before handling the input symbols from the right-hand side of pi. We remove 2 * n elements from the top of the parse stack to handle those right-hand symbols, plus clear out one state per such symbol.

Example goto action

Example

If we have parse stack

(top of stack) S

then an action of gotostate1 will result in a parse stack of

(top of stack) S state1

In an action of gotostatei, we:

  1. push statei onto the top of the parse stack

After this action, we are in a new parse state and can continue on through parsing our string. Note that this action is useful after a reduce action, since a nonterminal symbol and not a parse state will be on the top of the stack at that point.

Conclusion

You've now seen all of the fundamental LR(0) parse actions and the nature of an LR(0) parse table.

Since these operation are significantly more complex than those used in LL(1) parsing, it's not as easy to intuitively understand them. These operations will be further clarified through some fundamental parsing examples in subsequent sections.


GitHub Repository: https://github.com/bprollinson/ripal

Copyright © 2017 Brendan Rollinson-Lorimer