RIPAL: Responsive and Intuitive Parsing for the Analysis of Language

Background

We've seen what an LL(1) parse table looks like. Now we can solidify a parsing algorithm using an LL(1) parse table.

Caveat

For now, we will ignore CFGs with ε productions to keep the setup of this algorithm easier to explain. CFGs with ε productions will be covered later.

Setup

To apply the LL(1) parsing algorithm, we will need the following data structures set up:

The LL(1) parse table

As shown in the previous section, we will need a parse table that maps combinations of nonterminal symbols and terminal symbols to the application of a production rule.

The input queue

We place our input string into an input queue, with one input symbol in each slot.

The queue looks like:

(head of queue)

σ₁

σ₂

...

σ₂

σ_i ∈ Σ

Example

Σ = {e, s, t}

s = test

Our queue looks like:

(head of queue)

t

e

s

t

The operations we need to support on the input queue are:

Initialize a queue with a fixed set of symbols
Dequeue - remove and return the element at the head of the queue
Check if empty - determine whether the queue is empty

The parse stack

We use a parse stack to store the symbols we are still in the process of handling.

The stack looks like:

(top of stack)

sym₁

sym₂

...

sym_n

sym_i ∈ T ∪ N

The operations we need to support on the parse stack are:

Initialize a stack with a fixed set of symbols
Push - add an element to the top of the stack
Pop - remove and return the element at the top of the stack
Check if empty - determine whether the stack is empty

Initialization

Initialize the input queue with the symbols from the input string.

Initialize the parse stack with the grammar's start symbol.

Input processing

At each step of our parsing process, we pop and observe the top symbol from the parse stack.

If the top symbol is a terminal:

Dequeue the first terminal in the input queue (t_next)
1. If it matches the symbol from the top of the stack, continue parsing, discarding both symbols
2. If it does not match the symbol from the top of the stack, reject

If the top symbol is a nonterminal (n_next):

Look up the parse table entry (a_next) corresponding to the top symbol from the stack (n_next) and the first terminal from the input queue (t_next)
1. If a_next = ø, reject
2. Otherwise, look up our next production p_next = n_next → sym_next₁ sym_next₂ ... sym_{next_n}
3. Add sym_next₁, sym_next₂, ..., sym_{next_n} to the top of the parse stack in reverse order such that sym_next₁ is at the top of the stack

Handing the end of input

Outside of an input symbol or table lookup mismatch, the following conditions signal and end of processing:

If the input queue and parse stack are both empty at the end of a processing step, accept
If the input queue is empty and the parse stack is not empty at the end of a processing step, reject
If the input queue is not empty and the parse stack is empty at the end of the processing step, reject

Examples

S → ab

	a	b
S	1

Example

Parsing input string ab:

Input queue	Parse stack	Action
ab	S	Apply rule 1 since S and a correspond to this production in our table
ab	ab	Remove a from head of input queue and parse stack and continue parsing since terminal symbol matches
b	b	Remove b from head of input queue and parse stack and continue parsing since terminal symbol matches
		Accept, since input queue and parse stack are both empty

Example

Parsing input string b:

Input queue	Parse stack	Action
b	S	Reject, since S and b correspond to ø in our table

Example

Parsing input string aa:

Input queue	Parse stack	Action
aa	S	Apply rule 1 since S and a correspond to this production in our table
aa	ab	Remove a from head of input queue and parse stack and continue parsing since terminal symbol matches
a	b	Reject, since the symbol from the head of the input queue (a) doesn't match the symbol from the top of the parse stack (b)

Example

Parsing input string a:

Input queue	Parse stack	Action
a	S	Apply rule 1 since S and a correspond to this production in our table
a	ab	Remove a from head of input queue and parse stack and continue parsing since terminal symbol matches
	b	Reject, since input queue is empty but parse stack is not

Example

Parsing string abc:

Input queue	Parse stack	Note
abc	S	Apply rule 1 since S and a correspond to this production in our table
abc	ab	Remove a from head of input queue and parse stack and continue parsing since terminal symbol matches
bc	b	Remove b from head of input queue and parse stack and continue parsing since terminal symbol matches
c		Reject, since parse stack is empty but input queue is not

Intuitive explanation

When applying the LL(1) parsing algorithm, we are making a selection of a production rule to apply based on the next terminal from the input string and the next nonterminal from the parse stack.

In general, we need the string produced by replacing nonterminals with terminals to match the input string exactly for a parse to be successful. Although an LL(1) parse table is a powerful tool, it is not always powerful enough to parse all context-free languages correctly.

Conclusion

We've now learned the basic rules of LL(1) parsing. Next, we will look at how to interpret the results.

RIPAL: Responsive and Intuitive Parsing for the Analysis of Language

Pages

The LL(1) parsing algorithm

Background

Caveat

Setup

Initialization

Input processing

Handing the end of input

Examples

Intuitive explanation

Conclusion