We've seen how context-free grammars can be used to derive strings as part of a context-free language. Next, we will informally introduce some of the concepts needed to construct and utilize the simplest of all context-free language parsers - the LL(1) parser.
Definition
In a top-down parsing procedure for context-free languages, we attempt to recognize string s as part of language L using grammar G. To do so, we start with S, the start symbol for G and apply a sequence of productions {p1, p2, ..., pn} such that S →* s.
Example
All previous examples of context-free derivations followed this top-down approach. Regardless, here's another example:
Σ = {a, b, c}
S → ACD
A → ab
A → b
C → c
D → dE
E → e
The following derivation of the string abcde corresponds to a top-down parsing procedure:
Example
When we have produced ACD and are looking to expand the nonterminal A, we have two possible rules that could be applied:
Because the next symbol in our input string is a, we choose to use the rule whose right side begins with the terminal a. We choose rule 2 over rule 3 in this case.
Example
Σ = {a, b, c}
S → AB
A → a
B → bc
B → cc
Let's derive the string acc:
Here, we apply rule 4 rather than rule 3 for our final production because the remaining input that has not yet been produced (cc) better matches that production.
Example
Σ = {x, y, z}
S → AZ
A → X
A → Y
X → x
Y → y
Z → z
Let's derive the string yz:
When we have produced AZ and are looking to expand the nonterminal A, we choose rule 3 because Y →* y. This line of thinking can be used for even longer nonterminal chains.
Example
Σ = {a, b}
S → A
A → ab
Let's say we have the input string aa and we try to derive it using this grammar:
Here, we chose to apply rule 1 because it looks like rule 2 subequently procudes the a symbol we need. However, it causes a mismatch later in the string. The above is not a valid derivation of aa, which is in fact not part of the specified language.
Example
Σ = {a, b}
S → aa
S → ab
Let's derive the string ab:
Here, we choose to apply rule 2 because although the first input character (a) would be matched in either case, the second input character (b) would only be matched via rule 2. As shown here, sometimes we need to look more than one symbol ahead to decide how to derive our input string via context-free grammar rules.
Example
Σ = {a}
S → a
S → a
Let's say we have the input string a and we try to derive it using this grammar:
Even though a is clearly in our language, it's not clear which production rule should be used. As a result of this, it's not clear what the input string means in this language provided by the CFG. This grammar is ambiguous.
Hopefully, this page introduced you to some of the motivations for digging deeper into the details of top-down parsing algorithms. Specific information on these algorithms will be provided subsequently.
GitHub Repository: https://github.com/bprollinson/ripal
Copyright © 2017 Brendan Rollinson-Lorimer