RIPAL: Responsive and Intuitive Parsing for the Analysis of Language

Pages

The LL(1) parsing algorithm

Background

We've seen what an LL(1) parse table looks like. Now we can solidify a parsing algorithm using an LL(1) parse table.

Caveat

For now, we will ignore CFGs with ε productions to keep the setup of this algorithm easier to explain. CFGs with ε productions will be covered later.

Setup

To apply the LL(1) parsing algorithm, we will need the following data structures set up:

The LL(1) parse table

As shown in the previous section, we will need a parse table that maps combinations of nonterminal symbols and terminal symbols to the application of a production rule.

The input queue

We place our input string into an input queue, with one input symbol in each slot.

The queue looks like:

(head of queue) σ1 σ2 ... σ2

σi ∈ Σ

Example

Σ = {e, s, t}

s = test

Our queue looks like:

(head of queue) t e s t

The operations we need to support on the input queue are:

  1. Initialize a queue with a fixed set of symbols
  2. Dequeue - remove and return the element at the head of the queue
  3. Check if empty - determine whether the queue is empty

The parse stack

We use a parse stack to store the symbols we are still in the process of handling.

The stack looks like:

(top of stack) sym1 sym2 ... symn

symi ∈ T ∪ N

The operations we need to support on the parse stack are:

  1. Initialize a stack with a fixed set of symbols
  2. Push - add an element to the top of the stack
  3. Pop - remove and return the element at the top of the stack
  4. Check if empty - determine whether the stack is empty

Initialization

Initialize the input queue with the symbols from the input string.

Initialize the parse stack with the grammar's start symbol.

Input processing

At each step of our parsing process, we pop and observe the top symbol from the parse stack.

If the top symbol is a terminal:

  1. Dequeue the first terminal in the input queue (tnext)
    1. If it matches the symbol from the top of the stack, continue parsing, discarding both symbols
    2. If it does not match the symbol from the top of the stack, reject

If the top symbol is a nonterminal (nnext):

  1. Look up the parse table entry (anext) corresponding to the top symbol from the stack (nnext) and the first terminal from the input queue (tnext)
    1. If anext = ø, reject
    2. Otherwise, look up our next production pnext = nnext → symnext1 symnext2 ... symnextn
    3. Add symnext1, symnext2, ..., symnextn to the top of the parse stack in reverse order such that symnext1 is at the top of the stack

Handing the end of input

Outside of an input symbol or table lookup mismatch, the following conditions signal and end of processing:

  1. If the input queue and parse stack are both empty at the end of a processing step, accept
  2. If the input queue is empty and the parse stack is not empty at the end of a processing step, reject
  3. If the input queue is not empty and the parse stack is empty at the end of the processing step, reject

Examples

S → ab

a b
S 1

Example

Parsing input string ab:

Input queue Parse stack Action
ab S Apply rule 1 since S and a correspond to this production in our table
ab ab Remove a from head of input queue and parse stack and continue parsing since terminal symbol matches
b b Remove b from head of input queue and parse stack and continue parsing since terminal symbol matches
Accept, since input queue and parse stack are both empty

Example

Parsing input string b:

Input queue Parse stack Action
b S Reject, since S and b correspond to ø in our table

Example

Parsing input string aa:

Input queue Parse stack Action
aa S Apply rule 1 since S and a correspond to this production in our table
aa ab Remove a from head of input queue and parse stack and continue parsing since terminal symbol matches
a b Reject, since the symbol from the head of the input queue (a) doesn't match the symbol from the top of the parse stack (b)

Example

Parsing input string a:

Input queue Parse stack Action
a S Apply rule 1 since S and a correspond to this production in our table
a ab Remove a from head of input queue and parse stack and continue parsing since terminal symbol matches
b Reject, since input queue is empty but parse stack is not

Example

Parsing string abc:

Input queue Parse stack Note
abc S Apply rule 1 since S and a correspond to this production in our table
abc ab Remove a from head of input queue and parse stack and continue parsing since terminal symbol matches
bc b Remove b from head of input queue and parse stack and continue parsing since terminal symbol matches
c Reject, since parse stack is empty but input queue is not

Intuitive explanation

When applying the LL(1) parsing algorithm, we are making a selection of a production rule to apply based on the next terminal from the input string and the next nonterminal from the parse stack.

In general, we need the string produced by replacing nonterminals with terminals to match the input string exactly for a parse to be successful. Although an LL(1) parse table is a powerful tool, it is not always powerful enough to parse all context-free languages correctly.

Conclusion

We've now learned the basic rules of LL(1) parsing. Next, we will look at how to interpret the results.


GitHub Repository: https://github.com/bprollinson/ripal

Copyright © 2017 Brendan Rollinson-Lorimer