We've now seen LR(0) states and parsing actions at a conceptual level. In this section, we will introduce the simplest bottom-up parsing algorithm using an LR(0) parser.
The following illustrates the setup of an LR(0) parse table.
For Σ = {σ1, σ2, ..., σn}, N = {n1, n2, ..., nn}, T = {t1, t2, ..., tn}, States = {state1, state2, ..., staten}, our table looks like:
sym1 | sym2 | ... | symn | |
state1 | an1,t1 | an1,t2 | ... | an1,tn |
state2 | an2,t1 | an2,t2 | ... | an2,tn |
... | ||||
staten | ann,t1 | ann,t2 | ... | ann,tn |
Here, ti ∈ Σ ∪ $. symi ∈ T ∪ N ∪ $. ani,tj ∈ A ∪ ø.
A = {shift} ∪ Reduce ∪ {accept} ∪ Goto, representing all of the possible actions in our parse table.
Here, shift represents shifting the first (terminal) symbol from the input queue onto the top of the stack.
Reduce = {reducep1, reducep2>, ..., reducepn}, where reducepi represents the replacement of the right-hand side of production pi with its left-hand nonterminal on the top of the parse stack.
accept represent accepting the input string as part of the specified language.
Goto = {gotostate1, gotostate2, ..., gotostaten}, where gotostatei represents changing the parser start to the specified state.
The fundamental question an LR(0) parse table attempts to answer is:
Given parse stack {stack1, stack2, ..., stackn} | stacki ∈ T ∪ N ∪ State and input queue sremaining, what action should we apply to our parse stack and / or input queue?
Let's represent our parse stack as {stack1, stack2, ..., stackn} and the next input symbol as tnext - taken from the beginning of sremaining. Our next parse action anext can be found as follows:
First, determine statecurrent as follows:
Start with the top symbol of the parse stack and move downward until some state is encountered. This is statecurrent.
Note that, technically, digging deeper into the parse stack is not a valid stack operation, but could be implemented by having a second stack to temporarily offload stack elements to.
Next, determine symcurrent as follows:
In short, we determine the next terminal symbol to be handled by checking the top of the parse stack, then defaulting to the next input symbol if the top of the parse stack is a state and not a symbol that can be processed.
Our next parse action, anext, can be found by simply looking up the table entry corresponding to statecurrent and symcurrent. We can represent this as:
anext = astatenext, symnext
The result can be interpreted as:
The reject and accept actions are intuitive - they simply reject or accept the input string.
Example
If we have parse stack
(top of stack) | state1 |
and input queue
(head of queue) | a | b |
then an action of shift2 will result in a parse stack of
(top of stack) | state2 | a | state1 |
and input queue
(head of queue) | b |
In an action of shiftstatei, we:
Note that, after this shift operation, we are in a new parse state of statei.
Example
If we have augmented grammar
S' → S $
S → a b
and parse stack
(top of stack) | state3 | b | state2 | a | state1 |
then an action of reduce2 will result in a parse stack of
(top of stack) | S |
In an action of reducepi, we:
After this action, we have reverted to the previous parse state that represents where the parser was before handling the input symbols from the right-hand side of pi. We remove 2 * n elements from the top of the parse stack to handle those right-hand symbols, plus clear out one state per such symbol.
Example
If we have parse stack
(top of stack) | S |
then an action of gotostate1 will result in a parse stack of
(top of stack) | S | state1 |
In an action of gotostatei, we:
After this action, we are in a new parse state and can continue on through parsing our string. Note that this action is useful after a reduce action, since a nonterminal symbol and not a parse state will be on the top of the stack at that point.
You've now seen all of the fundamental LR(0) parse actions and the nature of an LR(0) parse table.
Since these operation are significantly more complex than those used in LL(1) parsing, it's not as easy to intuitively understand them. These operations will be further clarified through some fundamental parsing examples in subsequent sections.
GitHub Repository: https://github.com/bprollinson/ripal
Copyright © 2017 Brendan Rollinson-Lorimer