RIPAL: Responsive and Intuitive Parsing for the Analysis of Language

Pages

Bottom-up parsing motivations

Background

We've explored top-down parsing via the LL(1) parsing algorithm. Next, we will explore the concepts behind bottom-up parsing.

Bottom-up parsing

Definition

In a bottom-up parsing procedure for context-free languages, we attempt to recognize string s as part of language L using grammar G. To do so, we start with s, the string to be accepted, and apply a sequence of productions {p1, p2, ..., pn} in reverse to reduce s to start symbol S.

Example

Σ = {a, b, c}

S → AB
A → a
B → b
B → bc

The following reverse derivation of the string abc corresponds to a bottom-up parsing procedure:

abc → Abc (via rule 2) → AB (via rule 4) → S (via rule 1)

Production choice

Similar to in top-down parsing, decisions on which production to use need to be made by the parser.

Example

When we have produced Abc and are looking to reduce the terminals b and c, we have two possible rules that could be applied:

  1. B → b
  2. B → bc

Because we have b and c in our input string, we would choose rule 4 in this case.

A dead end

Example

Σ = {a, b, c}

S → AC
A → ab
C → cc

Let's say we have the input string abc and try to derive it (in reverse) using this grammar:

abc → Ac (via rule 2)

Here, we choose to apply rule 2 since it matches nonterminals a and b which appear at the beginning of our input string. However, the single terminal c doesn't match the output of any production in our grammar. The above is not a valid derivation of abc, which is in fact not part of the specified language.

Ambiguity

Similar to top-down parsers, bottom-up parsers may encounter ambiguous situations when trying to match an input string to a language specification.

Example

Σ = {a, b}

S → aa
S → ab

Deriving the string ab in reverse:

ab → S (via rule 2)

In this example, looking at the first input symbol a is not enough to decide which production should be used to reduce the input symbols to a nonterminal. Here, we have to look ahead to the next symbol, b, to determine that rule 2 is the correct rule to apply.

Example

Σ = {a}

S → a
S → a

Deriving the string a in reverse:

a → S (via rule ?)

Similar to in top-down parsing, we know that a is part of our language. However, we can't determine its meaning because it's not clear which production should be used to derive a. This language is ambiguous.

Conclusion

We now have a basic understanding of what bottom-up parsing is. Next, we will look at the parsing actions that are used by LR parsers - a common type of bottom-up parsers.


GitHub Repository: https://github.com/bprollinson/ripal

Copyright © 2017 Brendan Rollinson-Lorimer