RIPAL: Responsive and Intuitive Parsing for the Analysis of Language

Pages

Core LR parsing process

Background

In the last section, we saw the underlying idea behind bottom-up parsing. In this section, we will introduce the specific core constructs and processing rules used in the bottom-up LR parsing algorithm. We will keep our constructs simple at first, and begin to explore the construction of LR(0) parsers.

Setup

To apply the LR(0) parsing algorithm, we need the following data structures set up:

The LR(0) parse table

Much like in an LL(1) parser, an LR(0) parser uses a parse table to make decisions. However, the LR(0) parse table is somewhat more complex than the LL(1) equivalent.

As a result, we will build up a foundation of more LR parsing concepts before introducing it.

The input queue

As we done in the LL(1) parsing algorithm, we place our input string into an input queue, with one input symbol in each slot.

The queue looks like:

(head of queue) σ1 σ2 ... σ2

σi ∈ Σ

The input queue here has a very similar setup to the LL(1) input queue.

The operations we need to support on the input queue are:

  1. Initialize a queue with a fixed set of symbols
  2. Peek - return the element at the head of the queue
  3. Dequeue - remove and return the element at the head of the queue
  4. Check if empty - determine whether the queue is empty

Note that we've added the peek operation.

The parse stack

Similar to in LL(1) parsing, our parse stack looks like:

(top of stack) sym1 sym2 ... symn

symi ∈ T ∪ N

As before, we still want to support the following operations:

  1. Initialize a stack with a fixed set of symbols
  2. Push - add an element to the top of the stack
  3. Pop - remove and return the element at the top of the stack
  4. Check if empty - determine whether the stack is empty

Shift & reduce actions

In order to understand bottom-up LR(0) parsing, we need to first understand the two key LR parse actions: shift and reduce.

Definition

In the LR(0) shift action, we dequeue the first symbol from the input queue and push it onto the top of the parse stack.

Definition

Given a production Ni → symi1, symi2, ..., symin in our context-free grammar G, we can apply an LR(0) reduce action as follows:

Pop n symbols from the top of the parse stack (which should match si1 ... symin) and push Ni onto the top of the parse stack.

The accept action

Since we need to accept the input string after it has been reduced to the grammar's start symbol, our LR(0) parser must have an accept action.

Shift & reduce example

Given the grammar

S → ABC
A → a
B → b
C → cc

we can derive the string abcc in reverse as follows:

Input queue Parse stack Action
abcc Apply shift action
bcc a Apply reduce action, since a matches the right-hand side of A → a
bcc A Apply shift action, since the parse stack does not contain the entire right-hand side of any production
cc Ab Apply reduce action, since b matches the right-hand size of B → b
cc AB Apply shift action, since we want ABC on the parse stack to reduce to S
c ABc Apply shift action, since we need two cs to reduce to C
ABcc Apply reduce action for production C → cc
ABC Apply reduce action for production S → ABC
S Accept, since we've reduced our terminal string to the start symbol

Conclusion

We've gone over some of the fundamental operations of LR parsing which will be first used in an illustration of the LR(0) parsing algorithm. Next, we will look at the motivation for requiring state managemnt in our parsing process.


GitHub Repository: https://github.com/bprollinson/ripal

Copyright © 2017 Brendan Rollinson-Lorimer