Next: Non-deterministic Finite Automata
Up: Finite Automata
Previous: Finite Automata
ALPHABET, STRING, LANGUAGE. We call an alphabet any finite set of symbols.
Let be an alphabet.
A string or word over is any finite sequence
of symbols from .
The empty string is denoted by
.
A language over is any set of strings over .
Note that the above definitions do not ascribe any meaning
to the strings of the language.
DETERMINISTIC FINITE AUTOMATA. A deterministic finite automaton (DFA) consists of five things:
- an input alphabet ,
- a finite set S whose elements are called states,
- a distinguished state
s0 S, called starting state,
- a set
F S of distinguished states, called accepting states (or final states),
- a partial function from
S×
to S called the transition function
(hence for every state-symbol couple either
maps it to a symbol or
is not defined for this couple).
It is convenient
- to say ``let
= (, S, s0, F,)
be a DFA'' for ``let be a DFA with alphabet ,
set of states S, starting state s0, set of accepting states F
and transition function .``
- to represent a DFA by a transition diagram
using the rules shown on Figure 1.
Figure 1:
Pictorial notations for finite automata.
|
LANGUAGE RECOGNIZED BY A DFA. A DFA accepts an input string if when beginning the computation
in the starting state, after reading the entire string, the
automaton is in an accepting state.
Figure 2:
Some finite automata accepting some lexical tokens.
|
The set of the strings recognized by a DFA is called
the language recognized by the DFA
and will be denoted by
().
TRANSITION TABLE. A straightforward way to implement a DFA is to represent
the transition function as a transition table.
This table has a row for each state s and a column for
each input symbol a .
The intersection of row s and column a contains
(s, a).
Figure 3 shows the transition diagram
and the transition table of an automaton.
Figure 3:
Transition table for a DFA.
|
A DFA can be easily converted to a program
to look for tokens specified by it.
COMPLETE DFA. A DFA
= (, S, s0, F,)
is said complete if for every s S and for
every
a the transition
(s, a)
is defined.
The automaton of Figure 3 is complete
whereas none of the automata of Figure 2 are.
As a model for a computer program, it is desirable for a DFA
to be complete.
To transform a non-complete DFA
= (, S, s0, F,)
into a complete DFA one needs only
- to add a new state to S, say E
(for error) and then
- for every
(s, a) S×
such that was not defined
at this couple, define
(s, a) = E.
DFA RECOGNIZING THE UNION OF SEVERAL LANGUAGES. Each of the four DFA in Figure 2
recognizes a simple language over the alphabet
Let us denote these languages by
L1, L2, L3, L4.
- L1 is the language reduced to the single world if,
- L2 is the language of the identifiers made from
letters and digits and starting with a letter,
- L3 is the language of the integer numbers,
- L4 is the language of the floating point numbers.
It is desirable to build a DFA that would recognize
the language
L1 L2 L3 L4.
In other words, one would like to combine the four
transition diagrams in Figure 2
specifying the various patterns into a single
transition diagram.
This is a non-trivial task. Here are some difficulties.
- Consider a word wi which belongs to some Li and
a word
wj such that wiwj belongs to some Lj.
For instance with wi = 123 and
wj = .456
should the combined DFA accept wi or wiwj?
The rule is to choose the longest initial substring
that can match a pattern (think of integers).
Hence wiwj is chosen.
- It may happen that wiwj (the longest match) belongs to
several Lj.
Hence we need to define some priorities.
For instance the word if belongs to L1 and L2.
However we want the word if to be a reserved word,
not an identifier.
So we have to replace L2 by
L2 L1.
Figure 4:
An automaton
accepting
L1 L2 L3 L4
and solving the difficulties mentionned above.
|
N.B. From now on and for simplicity,
we do not consider capital letters
any more in our example (Figure 2).
In other words we reduce to the
union of
[a ... z],
[0 ... 9] and {.}.
Next: Non-deterministic Finite Automata
Up: Finite Automata
Previous: Finite Automata
Marc Moreno Maza
2004-12-02