Deterministic Finite Automata

Next: Non-deterministic Finite Automata Up: Finite Automata Previous: Finite Automata

Deterministic Finite Automata

ALPHABET, STRING, LANGUAGE. We call an alphabet any finite set of symbols. Let $\Sigma$ be an alphabet. A string or word over $\Sigma$ is any finite sequence of symbols from $\Sigma$ . The empty string is denoted by $\epsilon$ . A language over $\Sigma$ is any set of strings over $\Sigma$ . Note that the above definitions do not ascribe any meaning to the strings of the language.

DETERMINISTIC FINITE AUTOMATA. A deterministic finite automaton (DFA) consists of five things:

an input alphabet $\Sigma$ ,
a finite set S whose elements are called states,
a distinguished state s₀ $\in$ S, called starting state,
a set F $\subseteq$ S of distinguished states, called accepting states (or final states),
a partial function $\delta$ from S× $\Sigma$ to S called the transition function (hence for every state-symbol couple either $\delta$ maps it to a symbol or $\delta$ is not defined for this couple).

It is convenient

to say ``let $\cal {A}$ = ( $\Sigma$ , S, s₀, F, $\delta$ ) be a DFA'' for ``let $\cal {A}$ be a DFA with alphabet $\Sigma$ , set of states S, starting state s₀, set of accepting states F and transition function $\delta$ .``
to represent a DFA by a transition diagram using the rules shown on Figure 1.

**Figure 1:** Pictorial notations for finite automata.
$\begin{figure}\htmlimage \centering\includegraphics[scale=.5]{automataNotation.eps} \end{figure}$

LANGUAGE RECOGNIZED BY A DFA. A DFA accepts an input string if when beginning the computation in the starting state, after reading the entire string, the automaton is in an accepting state.

**Figure 2:** Some finite automata accepting some lexical tokens.
$\begin{figure}\htmlimage \centering\includegraphics[scale=.4]{4automatas.eps} \end{figure}$

The set of the strings recognized by a DFA $\cal {A}$ is called the language recognized by the DFA $\cal {A}$ and will be denoted by $\cal {L}$ ( $\cal {A}$ ).

TRANSITION TABLE. A straightforward way to implement a DFA is to represent the transition function $\delta$ as a transition table. This table has a row for each state s and a column for each input symbol a . The intersection of row s and column a contains $\delta$ (s, a). Figure 3 shows the transition diagram and the transition table of an automaton.

**Figure 3:** Transition table for a DFA.
$\begin{figure}\htmlimage \centering\includegraphics[scale=.4]{transitionTable.eps} \end{figure}$

A DFA can be easily converted to a program to look for tokens specified by it.

COMPLETE DFA. A DFA $\cal {A}$ = ( $\Sigma$ , S, s₀, F, $\delta$ ) is said complete if for every s $\in$ S and for every a $\in$ $\Sigma$ the transition $\delta$ (s, a) is defined.

The automaton of Figure 3 is complete whereas none of the automata of Figure 2 are.

As a model for a computer program, it is desirable for a DFA to be complete. To transform a non-complete DFA $\cal {A}$ = ( $\Sigma$ , S, s₀, F, $\delta$ ) into a complete DFA one needs only

to add a new state to S, say E (for error) and then
for every (s, a) $\in$ S× $\Sigma$ such that $\delta$ was not defined at this couple, define $\delta$ (s, a) = E.

DFA RECOGNIZING THE UNION OF SEVERAL LANGUAGES. Each of the four DFA in Figure 2 recognizes a simple language over the alphabet

$\begin{equation} {\Sigma} \ \ = \ \ [a \cdots z] \ \cup \ [A \cdots Z] \ \cup \ [0 \cdots 9] \ \cup \ \{. \} \end{equation}$

Let us denote these languages by L₁, L₂, L₃, L₄.

L₁ is the language reduced to the single world if,
L₂ is the language of the identifiers made from letters and digits and starting with a letter,
L₃ is the language of the integer numbers,
L₄ is the language of the floating point numbers.

It is desirable to build a DFA that would recognize the language L₁ $\cup$ L₂ $\cup$ L₃ $\cup$ L₄. In other words, one would like to combine the four transition diagrams in Figure 2 specifying the various patterns into a single transition diagram. This is a non-trivial task. Here are some difficulties.

Consider a word w_i which belongs to some L_i and a word w_j $\neq$ $\varepsilon$ such that w_iw_j belongs to some L_j. For instance with w_i = 123 and w_j = .456 should the combined DFA accept w_i or w_iw_j? The rule is to choose the longest initial substring that can match a pattern (think of integers). Hence w_iw_j is chosen.
It may happen that w_iw_j (the longest match) belongs to several L_j. Hence we need to define some priorities. For instance the word if belongs to L₁ and L₂. However we want the word if to be a reserved word, not an identifier. So we have to replace L₂ by L₂ $\setminus$ L₁.

**Figure 4:** An automaton accepting L₁ $\cup$ L₂ $\cup$ L₃ $\cup$ L₄ and solving the difficulties mentionned above.
$\begin{figure}\htmlimage \centering\includegraphics[scale=.5]{1automatonFor4.eps} \end{figure}$

N.B. From now on and for simplicity, we do not consider capital letters any more in our example (Figure 2). In other words we reduce $\Sigma$ to the union of [a^...z], [0^...9] and {.}.

Next: Non-deterministic Finite Automata Up: Finite Automata Previous: Finite Automata

Marc Moreno Maza
2004-12-02