The RNA genome of the hepatitis C virus (HCV) undergoes rapid
evolutionary change. Efforts to control this virus would benefit
from the advent of facile methods to identify characteristic
features of HCV RNA and proteins, and to condense the vast amount
of mutational data into a readily interpretable form. Many HCV
sequences are available in GenBank. To facilitate analysis,
consensus sequences were constructed to eliminate the
overrepresentation of certain genotypes, such as genotype 1,
and a novel package of sequence analysis tools was developed.
Mutation Master generates profiles of point mutations in a
population of sequences and produces a set of visual displays
and tables indicating the number, frequency, and character of
substitutions. It can be used to analyze hundreds of sequences
at a time. When applied to 255 HCV core protein sequences, Mutation
Master identified variable domains and a series of mutations
meriting further investigation. It flagged position 4, for example,
where 90% or more of all sequences in genotypes 1, 2, 4, and
5, have N4, whereas those in genotypes 3, 6, 7, 8, 9, and 10
have L4. This pattern is noteworthy: L (hydrophobic) to N (polar)
substitutions are generally rare, and genotypes 1, 2, 4, and
5 do not form a recognized super family of sequences. Thus,
the L4N substitution probably arose independently several times.
Moreover, not one member of genotypes 1, 2, 4, or 5 has L4 and
not one member of genotypes 3, 6, 7, 8, 9, or 10 has N4. This
nonoverlapping pattern suggests that coordinated changes at
position 4 and a second site are required to yield a viable
virus. The package generated a table of genotype-specific
substitutions whose future analysis may help to identify
interacting amino acids. Three substitutions were present in
100% of genotype 2 members and absent from all others: A68D,
R74K, and R114H. Finally, this study revealed that ARFP, a novel
protein encoded in an overlapping reading frame, is as conserved
as conventional HCV proteins, a result supporting a role for
ARFP in the viral life cycle. Whereas most conventional programs
for phylogenetic analysis of sequences provide information about
overall relatedness of genes or genomes, this program highlights
and profiles point mutations. This is important because
determinants of pathogenicity and drug susceptibility are likely
to result from changes at only one or two key nucleotides or
amino acid sites, and would not be detected by the type of pairwise
comparisons that have usually been performed on HCV to date.
This study is the first application of Mutation Master, which
is now available upon request
(http://tandem.biomath.mssm.edu/mutationmaster.html).