RIPAL: Responsive and Intuitive Parsing for the Analysis of Language

Pages

Handling end-of-string productions

Background

We've learned how to construct LL(1) parse tables for context-free grammars containing epsilon productions. This section will introduce the parsing techniques needed to handle these parse table additions.

Example grammar and parse table

Example

The following grammar

S → a B
B → C
C → ε

has the following corresponding parse table:

a $
S 1
B 2
C 3

Example parse

Example

Parsing input string a:

Input queue Parse stack Action
a$ S Apply rule 1 since S and a correspond to this production in our table
a$ aB Remove a from head of input queue and parse stack and continue parsing since terminal symbol matches
$ B Apply rule 2 since B and $ correspond to this production in our table
$ C Apply rule 3 since C and $ correspond to this production in our table
$ Accept, since $ is at head of input queue and parse stack is empty

A few notes about the parsing procedure here:

  1. Adding the end of string symbol $ to the end of the input queue allows us to apply ε productions when there are no more terminal nodes to process
  2. Since the input queue always starts with $ at the end, we accept when this symbol is encountered and the parse stack is empty

Additional parsing rules

First, in order to handle applying all of the necessary productions, we will change the nonterminal processing rule to:

If the top symbol is a nonterminal (nnext):

  1. Look up the parse table entry (anext) corresponding to the top symbol from the stack (nnext) and the first symbol (terminal or $) from the input queue (symnext)
    1. If anext = ø, reject
    2. Otherwise, look up our next production pnext = nnext → symnext1 symnext2 ... symnextn
    3. Add symnext1, symnext2, ..., symnextn to the top of the parse stack in reverse order such that symnext1 is at the top of the stack

Here, we've simply allowed the use of the symbol $ as a lookup in our parse table. Note that the rule used for removing terminals from the input queue is unchanged.

In addition to this, since we allow productions to still occur when the end of the input string has been encountered, we need to change the accept / reject conditions to the following:

  1. If $ is at the head of the input queue and parse stack is empty at the end of a processing step, accept
  2. If $ is at the head of the input queue and a terminal is at the top of the parse stack at the end of a processing step, reject
  3. If $ is not at the head of the input queue and the parse stack is empty at the end of the processing step, reject

Note that, as before, parsing can also fail upon not finding a parse table entry.

Another successful parse example

Example

S → A B C
A → a
A → ε
B → b
C → c
C → ε

has the following parse table:

a b c $
S 1 1
A 2 3
B 4
C 5 6

Parsing the input string abc:

Input queue Parse stack Action
abc$ S Apply rule 1 since S and a correspond to this production in our table
abc$ ABC Apply rule 2 since A and a correspond to this production in our table
abc$ aBC Remove a from head of input queue and parse stack and continue parsing since terminal symbol matches
bc$ BC Apply rule 4 since B and b correspond to this production in our table
bc$ bC Remove b from head of input queue and parse stack and continue parsing since terminal symbol matches
c$ C Apply rule 5 since C and c correspond to this production in our table
c$ c Remove c from head of input queue and parse stack and continue parsing since terminal symbol matches
$ Accept, since $ is at head of input queue and parse stack is empty

Parsing input string b:

Input queue Parse stack Action
b$ S Apply rule 1 since S and b correspond to this production in our table
b$ ABC Apply rule 3 since A and b correspond to this production in our table
b$ BC Apply rule 4 since B and b correspond to this production in our table
b$ bC Remove b from head of input queue and parse stack and continue parsing since terminal symbol matches
$ C Apply rule 6 since B and $ correspond to this production in our table
$ Accept, since $ is at head of input queue and parse stack is empty

Parse failure examples

Example

S → ab

has the following corresponding parse table:

a $
S 1

Note that the $ column is empty here and is only included to be explicit in this illustration.

Parsing input string a:

Input queue Parse stack Action
a$ S Apply rule 1 since S and a correspond to this production in our table
a$ ab Remove a from head of input queue and parse stack and continue parsing since terminal symbol matches
$ b Reject, since $ is at head of input queue and parse stack has a terminal at the top

Parsing input string abc:

Input queue Parse stack Action
abc$ S Apply rule 1 since S and a correspond to this production in our table
abc$ ab Remove a from head of input queue and parse stack and continue parsing since terminal symbol matches
bc$ b Remove b from head of input queue and parse stack and continue parsing since terminal symbol matches
c$ Reject, since terminal c is at head of input queue and parse stack is empty

Conclusion

The parsing techniques used here complete the theory needed to fully understand LL(1) parsing. Next, we will introduce the concept of bottom-up parsing in order to facilitate discussion of a wider variety of parsing algorithms.


GitHub Repository: https://github.com/bprollinson/ripal

Copyright © 2017 Brendan Rollinson-Lorimer