RIPAL: Responsive and Intuitive Parsing for the Analysis of Language

Pages

Regular expressions

Background

In order to define a regular language in a concise way, we can use a regular expression.

Base symbols

Similar to in our definition of a regular language, for any alphabet Σ, we can construct R = ε as a regular expression.

Similarly, for an σ ∈ Σ, we can construct R = σ as a regular expression.

Example

Σ = {}

R = ε

This regular expression recognizes the empty string ε.

Example

Σ = {a}

R = a

This regular expression recognizes the string a.

Concatenation of expressions

We want to allow for the inclusion of strings consisting of multiple symbols in our regular language. In order to support this, we want to allow the concatenation (squishing together) of two expressions to create a more complex expression. This is exactly matches the concatenation operator as described in our definition of regular languages.

Example

Σ = {a, b}

We know that R1 = {a} and R2 = {b} are regular expressions.

R = R1R2 = ab

This regular expression recognizes the string ab.

Union of expressions

We want to allow for the choice of two expressions in our regular language. In order to support this, we want to allow the union ("or" operation) of two expressions to create a more complex expression. This exactly matches the union operator as described in our definition of regular languages.

Example

Σ = {a, b}

We know that R1 = {a} and R2 = {b} are regular expressions.

R = R1 ∪ R2 = a ∪ b

This regular expression recognizes the strings a and b.

Kleene closure of an expression

We want to allow the repetition of expressions in our regular language. In order to support this, we want to allow the Kleene closure (*) operator to be applied to an expression to create a more complex expression. This exactly matches the Kleene closure operator as described in our definition of regular languages.

Example

Σ = {a}

We know that R1 = {a} is a regular expression.

R = R1* = a*

This regular expression recognizes the strings ε, a, aa, ..., with no limit on the string length.

Special case: The empty language

We should include a representation of the empty language, since that is a valid regular language as described in our definition. We can represent it with a special symbol as follows:

R = ø

This regular expression never recognizes any string and allows us to reprsent the empty set of strings which corresponds to the language:

L = {}


GitHub Repository: https://github.com/bprollinson/ripal

Copyright © 2017 Brendan Rollinson-Lorimer