RIPAL: Responsive and Intuitive Parsing for the Analysis of Language

Background

We've learned methods for recognizing regular languanges. However, not all languages are regular. In this section, we will introduce a more complex class of languages: context-free languages.

Non-regular language example

Example

The following language is not regular:

L = {aⁿbⁿ | n >= 0}

To clarify, L = {ε, ab, aabb, aaabbb, ...}

No matter how hard you try, you will not be able to come up with a regular expression that specifies it or DFA that recognizes it. This language, however, does belong to the class of context-free languages.

Context-free grammar

The easiest way to specify a context-free language is via a context-free grammar (CFG).

Example

This language can be specified with the following CFG:

S → aSb
S → ε

A context-free grammar consists of four pieces of information:

A set of nonterminals, N
1. A nonterminal is a symbol that can become some other set of symbols through a production
2. By convention, we represent a nonterminal with a capital letter
A set of terminals, T
1. Terminals are symbols we are trying to produce from our CFG corresponding to symbols in our input string
2. T ⊂ Σ, our alphabet
A set of productions, P
1. P = {p₁, p₂, ..., p_n}
2. p_i can be expressed as N_i → sym_i₁, sym_i₂, ..., sym_{i_n}, where sym_{i_j} ∈ T ∪ N ∪ ε
3. Here, a production p_i represents the replacement of a nonterminal with some combination of terminal and nonterminal symbols
A start symbol, S
1. This symbol provides the starting symbol for producing strings within our language L

Re-examining the context-free grammar

Example

S → aSb
S → ε

Here, we have two productions.

S is our only nonterminal. Our terminals are a, b. Note that these can be easily read from the productions and are typically not stated explicitly. Also note that ε is not a terminal and is a special symbol.

S is our start symbol. We can determine that by observing the nonterminal on the left-hand side of the first production as per the usual convention.

Here, we can say that S produces aSb and S produces ε.

We can derive a sample string, aabb, as follows:

S → aSb (via rule 1) → aaSbb (via rule 1) → aabb (via rule 2)

In the above example, we can say that S derives aabb, so aabb ∈ L. This is symbolized as:

S →^* aabb

Another example context-free grammar

Example

SENTENCE → SUBJECT VERB OBJECT.
SUBJECT → John | Jane
VERB → threw | kicked
OBJECT → the ball | the puck

Note: S → sym₁ | sym₂ is just short form for:

S → sym₁
S → sym₂

Here, N = {SENTENCE, SUBJECT, VERB, OBJECT}, T = {John, Jane, threw, kicked, the, ball, puck, .}. SENTENCE is our start symbol.

Example derivation:

SENTENCE → SUBJECT VERB OBJECT. (via rule 1) → Jane VERB OBJECT. (via rule 2) → Jane kicked OBJECT. (via rule 3) → Jane kicked the ball. (via rule 3)

SENTENCE →^* Jane kicked the ball.

Context-free languages

If we can construct a context-free grammar that produces the strings S in a language L, then L is a context-free language.

RIPAL: Responsive and Intuitive Parsing for the Analysis of Language

Pages

Context-free grammars

Background

Non-regular language example

Context-free grammar

Re-examining the context-free grammar

Another example context-free grammar

Context-free languages