RIPAL: Responsive and Intuitive Parsing for the Analysis of Language

Background

In the last section, we saw the underlying idea behind bottom-up parsing. In this section, we will introduce the specific core constructs and processing rules used in the bottom-up LR parsing algorithm. We will keep our constructs simple at first, and begin to explore the construction of LR(0) parsers.

Setup

To apply the LR(0) parsing algorithm, we need the following data structures set up:

The LR(0) parse table

Much like in an LL(1) parser, an LR(0) parser uses a parse table to make decisions. However, the LR(0) parse table is somewhat more complex than the LL(1) equivalent.

As a result, we will build up a foundation of more LR parsing concepts before introducing it.

The input queue

As we done in the LL(1) parsing algorithm, we place our input string into an input queue, with one input symbol in each slot.

The queue looks like:

(head of queue)

σ₁

σ₂

...

σ₂

σ_i ∈ Σ

The input queue here has a very similar setup to the LL(1) input queue.

The operations we need to support on the input queue are:

Initialize a queue with a fixed set of symbols
Peek - return the element at the head of the queue
Dequeue - remove and return the element at the head of the queue
Check if empty - determine whether the queue is empty

Note that we've added the peek operation.

The parse stack

Similar to in LL(1) parsing, our parse stack looks like:

(top of stack)

sym₁

sym₂

...

sym_n

sym_i ∈ T ∪ N

As before, we still want to support the following operations:

Initialize a stack with a fixed set of symbols
Push - add an element to the top of the stack
Pop - remove and return the element at the top of the stack
Check if empty - determine whether the stack is empty

Shift & reduce actions

In order to understand bottom-up LR(0) parsing, we need to first understand the two key LR parse actions: shift and reduce.

Definition

In the LR(0) shift action, we dequeue the first symbol from the input queue and push it onto the top of the parse stack.

Definition

Given a production N_i → sym_i₁, sym_i₂, ..., sym_{i_n} in our context-free grammar G, we can apply an LR(0) reduce action as follows:

Pop n symbols from the top of the parse stack (which should match s_i₁ ... sym_{i_n}) and push N_i onto the top of the parse stack.

The accept action

Since we need to accept the input string after it has been reduced to the grammar's start symbol, our LR(0) parser must have an accept action.

Shift & reduce example

Given the grammar

S → ABC
A → a
B → b
C → cc

we can derive the string abcc in reverse as follows:

Input queue	Parse stack	Action
abcc		Apply shift action
bcc	a	Apply reduce action, since a matches the right-hand side of A → a
bcc	A	Apply shift action, since the parse stack does not contain the entire right-hand side of any production
cc	Ab	Apply reduce action, since b matches the right-hand size of B → b
cc	AB	Apply shift action, since we want ABC on the parse stack to reduce to S
c	ABc	Apply shift action, since we need two cs to reduce to C
	ABcc	Apply reduce action for production C → cc
	ABC	Apply reduce action for production S → ABC
	S	Accept, since we've reduced our terminal string to the start symbol

Conclusion

We've gone over some of the fundamental operations of LR parsing which will be first used in an illustration of the LR(0) parsing algorithm. Next, we will look at the motivation for requiring state managemnt in our parsing process.