Hostname: page-component-78c5997874-dh8gc Total loading time: 0 Render date: 2024-11-13T02:13:41.586Z Has data issue: false hasContentIssue false

AlleleCoder: a PERL script for coding co-dominant polymorphism data for PCA

Published online by Cambridge University Press:  22 July 2011

Angela M. Baldo*
Affiliation:
USDA, ARS, Plant Genetic Resources Unit, 630 W. North St., Geneva, NY14456, USA
David M. Francis
Affiliation:
Department of Horticulture and Crop Science, The Ohio State University, Ohio Agricultural Research and Development Center, 1680 Madison Ave., Wooster, OH44691, USA
Martina Caramante
Affiliation:
Dipartimento di Scienze del Suolo, della Pianta, dell'Ambiente e delle Produzioni Animali, Università degli Studi di Napoli ‘Federico II’, 80055Portici, Napoli, Italy
Larry D. Robertson
Affiliation:
USDA, ARS, Plant Genetic Resources Unit, 630 W. North St., Geneva, NY14456, USA
Joanne A. Labate
Affiliation:
USDA, ARS, Plant Genetic Resources Unit, 630 W. North St., Geneva, NY14456, USA
*
*Corresponding author. E-mail: angela.baldo@ars.usda.gov

Abstract

A useful biological interpretation of diploid heterozygotes is in terms of the dose of the common allele (0, 1 or 2 copies). We have developed a PERL script that converts FASTA files into coded spreadsheets suitable for principal component analysis. In combination with R and R Commander, two- and three-dimensional plots can be generated for visualizing genetic relationships. Such plots are useful for characterizing plant genetic resources. This method nicely illustrated the spectrum of genetic diversity in tomato landraces and the varieties categorized according to human-mediated dispersal.

Type
Short Communication
Copyright
Copyright © NIAB 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Fox, J (2005) The R commander: a basic statistics graphical user interface to R. Journal of Statistical Software 14: 144.Google Scholar
Horne, BD and Camp, NJ (2004) Principal component analysis for selection of optimal SNP-sets that capture intragenic genetic variation. Genetic Epidimiology 26: 1121.Google Scholar
Labate, JA, Sheffer, SM, Balch, T and Robertson, LD (2011) Diversity and population structure in a geographic sample of tomato accessions. Crop Science. doi: 10.2135/cropsci2010.05.0305 (in press).CrossRefGoogle Scholar
Lin, Z and Altman, RB (2004) Finding haplotype tagging SNPs by use of principal components analysis. American Journal of Human Genetics 75: 850861.CrossRefGoogle ScholarPubMed
Peakall, R and Smouse, PE (2006) GENALEX 6: genetic analysis in Excel. Population genetic software for teaching and research. Molecular Ecology Notes 6: 288295.Google Scholar
Pearson, WR and Lipman, DJ (1988) Improved tools for biological sequence comparison. Proceedings of the National Academic Sciences USA 85: 24442448.Google Scholar
R Development Core Team (2011) A Language and Environment for Statistical Computing. Vienna, Austria. R Foundation for Statistical Computing. http://www.R-project.org.Google Scholar
Rohlf, FJ (2002) NTSYSpc: Numerical Taxonomy System, Version 2.1. Setauket, NY: Exeter Publishing, Ltd.Google Scholar
Stajich, JE, Block, D, Boulez, K, Brenner, SE, Chervitz, SA, Dagdigian, C, Fuellen, G, Gilbert, JG, Korf, I, Lapp, H, Lehväslaiho, H, Matsalla, C, Mungall, CJ, Osborne, BI, Pocock, MR, Schattner, P, Senger, M, Stein, LD, Stupka, E, Wilkinson, MD and Birney, E (2002) The Bioperl toolkit: Perl modules for the life sciences. Genome Research 12: 16111618.Google Scholar
Supplementary material: PDF

Baldo Supplementary Material

Baldo Supplementary Material

Download Baldo Supplementary Material(PDF)
PDF 249.2 KB