RIPAL: Responsive and Intuitive Parsing for the Analysis of Language

Pages

Top-down parsing motivations

Background

We've seen how context-free grammars can be used to derive strings as part of a context-free language. Next, we will informally introduce some of the concepts needed to construct and utilize the simplest of all context-free language parsers - the LL(1) parser.

Top-down parsing

Definition

In a top-down parsing procedure for context-free languages, we attempt to recognize string s as part of language L using grammar G. To do so, we start with S, the start symbol for G and apply a sequence of productions {p1, p2, ..., pn} such that S →* s.

Example

All previous examples of context-free derivations followed this top-down approach. Regardless, here's another example:

Σ = {a, b, c}

S → ACD
A → ab
A → b
C → c
D → dE
E → e

The following derivation of the string abcde corresponds to a top-down parsing procedure:

S → ACD (via rule 1) → abCD (via rule 2) → abcD (via rule 4) → abcdE (via rule 5) → abcde (via rule 6)

Production choice

Example

When we have produced ACD and are looking to expand the nonterminal A, we have two possible rules that could be applied:

  1. A → ab
  2. A → b

Because the next symbol in our input string is a, we choose to use the rule whose right side begins with the terminal a. We choose rule 2 over rule 3 in this case.

Handling subsequent nonterminals

Example

Σ = {a, b, c}

S → AB
A → a
B → bc
B → cc

Let's derive the string acc:

S → AB (via rule 1) → aB (via rule 2) → acc (via rule 4)

Here, we apply rule 4 rather than rule 3 for our final production because the remaining input that has not yet been produced (cc) better matches that production.

Looking deeper through multiple productions

Example

Σ = {x, y, z}

S → AZ
A → X
A → Y
X → x
Y → y
Z → z

Let's derive the string yz:

S → AZ (via rule 1) → YZ (via rule 3) → yZ (via rule 5) → yz (via rule 6)

When we have produced AZ and are looking to expand the nonterminal A, we choose rule 3 because Y →* y. This line of thinking can be used for even longer nonterminal chains.

A dead end

Example

Σ = {a, b}

S → A
A → ab

Let's say we have the input string aa and we try to derive it using this grammar:

S → A (via rule 1) → ab (via rule 2)

Here, we chose to apply rule 1 because it looks like rule 2 subequently procudes the a symbol we need. However, it causes a mismatch later in the string. The above is not a valid derivation of aa, which is in fact not part of the specified language.

Ambiguity

Example

Σ = {a, b}

S → aa
S → ab

Let's derive the string ab:

S → ab (via rule 2)

Here, we choose to apply rule 2 because although the first input character (a) would be matched in either case, the second input character (b) would only be matched via rule 2. As shown here, sometimes we need to look more than one symbol ahead to decide how to derive our input string via context-free grammar rules.

Example

Σ = {a}

S → a
S → a

Let's say we have the input string a and we try to derive it using this grammar:

S → a (via rule ?)

Even though a is clearly in our language, it's not clear which production rule should be used. As a result of this, it's not clear what the input string means in this language provided by the CFG. This grammar is ambiguous.

Conclusion

Hopefully, this page introduced you to some of the motivations for digging deeper into the details of top-down parsing algorithms. Specific information on these algorithms will be provided subsequently.


GitHub Repository: https://github.com/bprollinson/ripal

Copyright © 2017 Brendan Rollinson-Lorimer