Next: Lexical Analysis
Up: Finite Automata
Previous: Languages recognized by finite automata
REGULAR EXPRESSION. Let be an alphabet
such that none of the symbols (, ), |, * belongs
to .
A regular expression over
- is a word r over the alphabet
{(,),|, * },
- denoting (= associated with) a language L(r) over and
- defined by the following inductive rules.
- (R1)
- The empty word
is a regular expression over
denoting the language
{}.
- (R2)
- For every
x the letter x is a regular expression
over denoting the language {x}.
- (R3)
- Let r and s be regular expressions over
denoting languages L(r) and L(s) respectively. Then,
- (r)|(s) is a regular expression over
denoting the language
L(r) L(s),
- (r)(s) is a regular expression over
denoting the language
L(r).L(s),
-
(r) * is a regular expression over
denoting the language
(L(r)) * ,
- (r) is a regular expression over
denoting the language L(r) (i.e., one can add parentheses).
Remark 2
Unnecessary parentheses can be avoided in regular expressions
if we adopt the following conventions:
- 1.
- the unary operator * has the highest precedence
and is left associative,
- 2.
- concatenation has the second the highest precedence
and is left associative,
- 3.
- the operator | has the lowest precedence
and is left associative.
With these conventions the regular expression
(
a)|((
b)*(
c)) is
a|
b*
c.
REGULAR LANGUAGES. Let be an alphabet
(such that none of the symbols (, ), |, * belongs to it).
A language over is regular
if there exists a regular expression r over
denoting , that is if
= L(r).
It follows from Kleene's theorem that a language
is recognized by FA iff it is regular.
FROM A REGULAR EXPRESSION TO A FINITE AUTOMATON. We present now a method that for any regular expression r
constructs a FA
(r) recognizing L(r).
- The FA
(r) is built inductively by following the structure of the
regular expression r.
- At each step of the algorithm a FA is produced from previous ones
starting with FA of
R0() and ending with
(r).
- Every intermediate FA
= (, S, I, F,)
produced by the algorithm
is NORMALIZED.
The method is quite natural.
- 1.
- We first parse r into its constituent subexpressions.
- 2.
- We construct NFAs for each letter or
in r
by applying the above rules (R1) and (R2).
- 3.
- Then guided by the syntaxic structure of the regular expression
r, we combine the NFAs using rules (R3) and
Figures 15,
16,
18.
This method is called Thompson's algorithm.
Remark 3
The number of states of the FA produced by the
Thompson's algorithm is upper bounded by 2| r|
where | r| is the number of symbols and operators in
the regular expression r.
Next: Lexical Analysis
Up: Finite Automata
Previous: Languages recognized by finite automata
Marc Moreno Maza
2004-12-02