Hostname: page-component-68c7f8b79f-pksg9 Total loading time: 0 Render date: 2026-01-02T13:55:23.438Z Has data issue: false hasContentIssue false

How to Code a Million Missions: Developing Bespoke Nonprofit Activity Codes Using Machine Learning Algorithms

Published online by Cambridge University Press:  01 January 2026

Francisco J. Santamarina*
Affiliation:
Evans School of Public Policy and Governance, University of Washington, 4105 George Washington Lane Northeast, Seattle, WA 98105, USA
Jesse D. Lecy*
Affiliation:
Watts College, Arizona State University, 411 N. Central Ave., Suite 750, Phoenix, AZ 85004-2163, USA
Eric Joseph van Holm*
Affiliation:
Department of Political Science, Urban Entrepreneurship and Policy Institute, The University of New Orleans, 256 Milneburg Hall, New Orleans, LA 70148, USA
Get access

Abstract

National Taxonomy of Exempt Entities (NTEE) codes have become the primary classifier of nonprofit missions since they were developed in the mid-1980s in response to growing demands for a taxonomy of nonprofit activities (Herman in Nonprofit and Voluntary Sector Quarterly 19(3):293–306, 1990, Barman in Social Science History 37:103–141, 2013). However, the increasingly complex nature of nonprofits means that NTEE codes may be outdated or lack specificity. As an alternative, scholars and practitioners can create a bespoke taxonomy for a specific purpose by hand-coding a training dataset and using machine learning classifiers to apply the codes to a large population. This paper presents a framework for determining training set sizes needed to scale custom taxonomies using machine learning algorithms.

Information

Type
Research Papers
Copyright
Copyright © International Society for Third-Sector Research 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/s11266-021-00420-z.

References

Barman, E. (2013). Classificatory struggles in the nonprofit sector: The formation of the national taxonomy of exempt entities, 1969–1987. Social Science History, 37, 103141. doi: 10.2307/23361114.Google Scholar
Benoit, K., Watanabe, K., Wang, H., Nulty, P., Obeng, A., Müller, S., & Matsuo, A. (2018). quanteda: An R package for the quantitative analysis of textual data. Journal of Open Source Software, 3(30), 774. doi: 10.21105/joss.00774.CrossRefGoogle Scholar
Brodersen, K. H., Ong, C. S., Stephan, K. E., & Buhmann, J. M. (2010). The balanced accuracy and its posterior distribution. In 2010 20th international conference on pattern recognition (pp. 31213124). IEEE.10.1109/ICPR.2010.764CrossRefGoogle Scholar
Fyall, R., Moore, M. K., & Gugerty, M. K. (2018). Beyond NTEE codes: Opportunities to understand nonprofit activity through mission statement content coding. Nonprofit and Voluntary Sector Quarterly, 47(4), 677701. doi: 10.1177/0899764018768019.CrossRefGoogle Scholar
Hand, D. J., & Yu, K. (2001). Idiot's Bayes—not so stupid after all?. International Statistical Review, 69(3), 385398.Google Scholar
Herman, R. D. (1990). Methodological issues in studying the effectiveness of nongovernmental and nonprofit organizations. Nonprofit and Voluntary Sector Quarterly, 19(3), 293306. doi: 10.1177/089976409001900309.CrossRefGoogle Scholar
Internal Revenue Service. (2018). Instructions for form 1023-EZ: Streamlined application for recognition of exemption under section 501(c)(3) of the internal revenue code (Cat. No. 66268Y). Retrieved from https://www.irs.gov/pub/irs-pdf/i1023ez.pdf.Google Scholar
Jones, D. (2019). IRS activity codes. Published January 22, 2019. https://nccs.urban.org/publication/irs-activity-codes.Google Scholar
Kuhn, M. (2008). Building predictive models in R using the caret package. Journal of Statistical Software, 28(5), 126. doi: 10.18637/jss.v028.i05.CrossRefGoogle Scholar
Kuhn, M. (2019). The `caret` package. “17 Measuring Performance.” https://topepo.github.io/caret/measuring-performance.html.Google Scholar
Lecy, J. D., Ashley, S. R., & Santamarina, F. J. (2019a). Do nonprofit missions vary by the political ideology of supporting communities? Some preliminary results. Public Performance & Management Review, 42(1), 115141. doi: 10.1080/15309576.2018.1526092.CrossRefGoogle Scholar
Lecy, J. D., Santamarina, F. J., & van Holm, E. J. (2019b). The political economy of nonprofit entrepreneurship: Using open data to explore geographic and demographic dimensions of nonprofit mission [Paper presentation]. USC CPPP Symposium, Los Angeles, California.Google Scholar
Lewis, D. D., Yang, Y., Rose, T. G., & Li, F. (2004). Rcv1: A new benchmark collection for text categorization research. Journal of machine learning research, 5(Apr), 361397.Google Scholar
Ma, J. (2021). Automated coding using machine learning and remapping the US nonprofit sector: A guide and benchmark. Nonprofit and Voluntary Sector Quarterly, 50(3), 662687. doi: 10.1177/0899764020968153.CrossRefGoogle Scholar
Manning, C. D., Schütze, H., & Raghavan, P. (2009). Introduction to information retrieval. Cambridge university Press. Online edition. https://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf.Google Scholar
Paxton, P., Velasco, K., & Ressler, R. (2019a). Form 990 Mission Glossary v.1. [Computer file]. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor].Google Scholar
Paxton, P., Velasco, K., & Ressler, R. (2019b). Form 990 Mission Stemmer v.1. [Computer file]. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor].Google Scholar
R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.Google Scholar
Saito, T., & Rehmsmeier, M. (n.d.). Basic evaluation measures from the confusion matrix. https://classeval.wordpress.com/introduction/basic-evaluation-measures/.Google Scholar
Salamon, L. M. & Anheier, H. K. (1996). The International classification of nonprofit organizations: ICNPO-Revision 1, 1996. Working Papers of the Johns Hopkins Comparative Nonprofit Sector Project, no. 19. Baltimore: The Johns Hopkins Institute for Policy Studies.Google Scholar
Tierney, L., Rossini, A. J., Li, N., & Sevcikova, H. (2018). snow: Simple network of workstations. R package version 0.4–3. https://CRAN.R-project.org/package=snow.Google Scholar
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org.10.1007/978-3-319-24277-4CrossRefGoogle Scholar
Wickham, H., & Seidel, D. (2020). scales: Scale functions for visualization. R package version 1.1.1. https://CRAN.R-project.org/package=scales.Google Scholar
Supplementary material: File

Santamarina et al. supplementary material

Santamarina et al. supplementary material
Download Santamarina et al. supplementary material(File)
File 170 Bytes