RIPAL: Responsive and Intuitive Parsing for the Analysis of Language

Background

We've now seen LR(0) states and parsing actions at a conceptual level. In this section, we will introduce the simplest bottom-up parsing algorithm using an LR(0) parser.

Anatomy of an LR(0) parse table

The following illustrates the setup of an LR(0) parse table.

For Σ = {σ₁, σ₂, ..., σ_n}, N = {n₁, n₂, ..., n_n}, T = {t₁, t₂, ..., t_n}, States = {state₁, state₂, ..., state_n}, our table looks like:

	sym₁	sym₂	...	sym_n
state₁	a_n₁,t₁	a_n₁,t₂	...	a_{n₁,t_n}
state₂	a_n₂,t₁	a_n₂,t₂	...	a_{n₂,t_n}
...
state_n	a_{n_n,t₁}	a_{n_n,t₂}	...	a_{n_n,t_n}

Here, t_i ∈ Σ ∪ $. sym_i ∈ T ∪ N ∪ $. a_{n_i,t_j} ∈ A ∪ ø.

A = {shift} ∪ Reduce ∪ {accept} ∪ Goto, representing all of the possible actions in our parse table.

Here, shift represents shifting the first (terminal) symbol from the input queue onto the top of the stack.

Reduce = {reduce_p₁, reduce_p₂>, ..., reduce_{p_n}}, where reduce_{p_i} represents the replacement of the right-hand side of production p_i with its left-hand nonterminal on the top of the parse stack.

accept represent accepting the input string as part of the specified language.

Goto = {goto_state₁, goto_state₂, ..., goto_{state_n}}, where goto_{state_i} represents changing the parser start to the specified state.

The fundamental question

The fundamental question an LR(0) parse table attempts to answer is:

Given parse stack {stack₁, stack₂, ..., stack_n} | stack_i ∈ T ∪ N ∪ State and input queue s_remaining, what action should we apply to our parse stack and / or input queue?

The fundamental answer

Let's represent our parse stack as {stack₁, stack₂, ..., stack_n} and the next input symbol as t_next - taken from the beginning of s_remaining. Our next parse action a_next can be found as follows:

First, determine state_current as follows:

Start with the top symbol of the parse stack and move downward until some state is encountered. This is state_current.

Note that, technically, digging deeper into the parse stack is not a valid stack operation, but could be implemented by having a second stack to temporarily offload stack elements to.

Next, determine sym_current as follows:

If stack_n ∈ T ∪ N ∪ $ then sym_current = stack_n
Otherwise, sym_current = t_next

In short, we determine the next terminal symbol to be handled by checking the top of the parse stack, then defaulting to the next input symbol if the top of the parse stack is a state and not a symbol that can be processed.

Our next parse action, a_next, can be found by simply looking up the table entry corresponding to state_current and sym_current. We can represent this as:

a_next = a_{state_next, sym_next}

The result can be interpreted as:

Reject the input string entirely if a_next = ø
Accept the input string if a_next = accept
Shift t_next from the front of the input queue and onto the top of the parse stack and go to state state_i if a_next = shift_{state_i}
Reduce p_i by replacing its right-hand symbols with its left-hand nonterminal on the top of the parse stack if a_next = reduce_i
Goto state_i by placing it on top of the parse stack if a_next = goto_i

Reject and accept actions

The reject and accept actions are intuitive - they simply reject or accept the input string.

Example shift action

Example

If we have parse stack

(top of stack)

state₁

and input queue

(head of queue)

a

b

then an action of shift_₂ will result in a parse stack of

(top of stack)

state₂

a

state₁

and input queue

(head of queue)

b

In an action of shift_{state_i}, we:

shift t_next onto the top of the parse stack and
push state_i onto the top of the parse stack

Note that, after this shift operation, we are in a new parse state of state_i.

Example reduce action

Example

If we have augmented grammar

S' → S $
S → a b

and parse stack

(top of stack)

state₃

b

state₂

a

state₁

then an action of reduce₂ will result in a parse stack of

(top of stack)

S

In an action of reduce_{p_i}, we:

remove 2 * n elements from the top of the parse stack, where n is the number of elements in the right-hand side of p_i
push the left-hand nonterminal symbol n from p_i onto the top of the parse stack

After this action, we have reverted to the previous parse state that represents where the parser was before handling the input symbols from the right-hand side of p_i. We remove 2 * n elements from the top of the parse stack to handle those right-hand symbols, plus clear out one state per such symbol.

Example goto action

Example

If we have parse stack

(top of stack)

S

then an action of goto_state₁ will result in a parse stack of

(top of stack)

S

state₁

In an action of goto_{state_i}, we:

push state_i onto the top of the parse stack

After this action, we are in a new parse state and can continue on through parsing our string. Note that this action is useful after a reduce action, since a nonterminal symbol and not a parse state will be on the top of the stack at that point.

Conclusion

You've now seen all of the fundamental LR(0) parse actions and the nature of an LR(0) parse table.

Since these operation are significantly more complex than those used in LL(1) parsing, it's not as easy to intuitively understand them. These operations will be further clarified through some fundamental parsing examples in subsequent sections.

RIPAL: Responsive and Intuitive Parsing for the Analysis of Language

Pages

The LR(0) parse table

Background

Anatomy of an LR(0) parse table

The fundamental question

The fundamental answer

Reject and accept actions

Example shift action

Example reduce action

Example goto action

Conclusion