We've learned methods for recognizing regular languanges. However, not all languages are regular. In this section, we will introduce a more complex class of languages: context-free languages.
Example
The following language is not regular:
L = {anbn | n >= 0}
To clarify, L = {ε, ab, aabb, aaabbb, ...}
No matter how hard you try, you will not be able to come up with a regular expression that specifies it or DFA that recognizes it. This language, however, does belong to the class of context-free languages.
The easiest way to specify a context-free language is via a context-free grammar (CFG).
Example
This language can be specified with the following CFG:
S → aSb
S → ε
A context-free grammar consists of four pieces of information:
Example
S → aSb
S → ε
Here, we have two productions.
S is our only nonterminal. Our terminals are a, b. Note that these can be easily read from the productions and are typically not stated explicitly. Also note that ε is not a terminal and is a special symbol.
S is our start symbol. We can determine that by observing the nonterminal on the left-hand side of the first production as per the usual convention.
Here, we can say that S produces aSb and S produces ε.
We can derive a sample string, aabb, as follows:
In the above example, we can say that S derives aabb, so aabb ∈ L. This is symbolized as:
S →* aabb
Example
SENTENCE → SUBJECT VERB OBJECT.
SUBJECT → John | Jane
VERB → threw | kicked
OBJECT → the ball | the puck
Note: S → sym1 | sym2 is just short form for:
S → sym1
S → sym2
Here, N = {SENTENCE, SUBJECT, VERB, OBJECT}, T = {John, Jane, threw, kicked, the, ball, puck, .}. SENTENCE is our start symbol.
Example derivation:
SENTENCE →* Jane kicked the ball.
If we can construct a context-free grammar that produces the strings S in a language L, then L is a context-free language.
GitHub Repository: https://github.com/bprollinson/ripal
Copyright © 2017 Brendan Rollinson-Lorimer