One hundred-forty-five full-length aldehyde dehydrogenase-related
sequences were aligned to determine relationships within
the aldehyde dehydrogenase (ALDH) extended family. The
alignment reveals only four invariant residues: two glycines,
a phenylalanine involved in NAD binding, and a glutamic
acid that coordinates the nicotinamide ribose in certain
E-NAD binary complex crystal structures, but which may
also serve as a general base for the catalytic reaction.
The cysteine that provides the catalytic thiol and its
closest neighbor in space, an asparagine residue, are conserved
in all ALDHs with demonstrated dehydrogenase activity.
Sixteen residues are conserved in at least 95% of the sequences;
12 of these cluster into seven sequence motifs conserved
in almost all ALDHs. These motifs cluster around the active
site of the enzyme. Phylogenetic analysis of these ALDHs
indicates at least 13 ALDH families, most of which have
previously been identified but not grouped separately by
alignment. ALDHs cluster into two main trunks of the phylogenetic
tree. The largest, the “Class 3” trunk, contains
mostly substrate-specific ALDH families, as well as the
class 3 ALDH family itself. The other trunk, the “Class
1/2” trunk, contains mostly variable substrate ALDH
families, including the class 1 and 2 ALDH families. Divergence
of the substrate-specific ALDHs occurred earlier than the
division between ALDHs with broad substrate specificities.
A site on the World Wide Web has also been devoted to this
alignment project.