RIPAL: Responsive and Intuitive Parsing for the Analysis of Language

Pages

Context-free grammars

Background

We've learned methods for recognizing regular languanges. However, not all languages are regular. In this section, we will introduce a more complex class of languages: context-free languages.

Non-regular language example

Example

The following language is not regular:

L = {anbn | n >= 0}

To clarify, L = {ε, ab, aabb, aaabbb, ...}

No matter how hard you try, you will not be able to come up with a regular expression that specifies it or DFA that recognizes it. This language, however, does belong to the class of context-free languages.

Context-free grammar

The easiest way to specify a context-free language is via a context-free grammar (CFG).

Example

This language can be specified with the following CFG:

S → aSb
S → ε

A context-free grammar consists of four pieces of information:

  1. A set of nonterminals, N
    1. A nonterminal is a symbol that can become some other set of symbols through a production
    2. By convention, we represent a nonterminal with a capital letter
  2. A set of terminals, T
    1. Terminals are symbols we are trying to produce from our CFG corresponding to symbols in our input string
    2. T ⊂ Σ, our alphabet
  3. A set of productions, P
    1. P = {p1, p2, ..., pn}
    2. pi can be expressed as Ni → symi1, symi2, ..., symin, where symij ∈ T ∪ N ∪ ε
    3. Here, a production pi represents the replacement of a nonterminal with some combination of terminal and nonterminal symbols
  4. A start symbol, S
    1. This symbol provides the starting symbol for producing strings within our language L

Re-examining the context-free grammar

Example

S → aSb
S → ε

Here, we have two productions.

S is our only nonterminal. Our terminals are a, b. Note that these can be easily read from the productions and are typically not stated explicitly. Also note that ε is not a terminal and is a special symbol.

S is our start symbol. We can determine that by observing the nonterminal on the left-hand side of the first production as per the usual convention.

Here, we can say that S produces aSb and S produces ε.

We can derive a sample string, aabb, as follows:

S → aSb (via rule 1) → aaSbb (via rule 1) → aabb (via rule 2)

In the above example, we can say that S derives aabb, so aabb ∈ L. This is symbolized as:

S →* aabb

Another example context-free grammar

Example

SENTENCE → SUBJECT VERB OBJECT.
SUBJECT → John | Jane
VERB → threw | kicked
OBJECT → the ball | the puck

Note: S → sym1 | sym2 is just short form for:

S → sym1
S → sym2

Here, N = {SENTENCE, SUBJECT, VERB, OBJECT}, T = {John, Jane, threw, kicked, the, ball, puck, .}. SENTENCE is our start symbol.

Example derivation:

SENTENCE → SUBJECT VERB OBJECT. (via rule 1) → Jane VERB OBJECT. (via rule 2) → Jane kicked OBJECT. (via rule 3) → Jane kicked the ball. (via rule 3)

SENTENCE →* Jane kicked the ball.

Context-free languages

If we can construct a context-free grammar that produces the strings S in a language L, then L is a context-free language.


GitHub Repository: https://github.com/bprollinson/ripal

Copyright © 2017 Brendan Rollinson-Lorimer