Hostname: page-component-745bb68f8f-mzp66 Total loading time: 0 Render date: 2025-01-07T18:51:35.211Z Has data issue: false hasContentIssue false

Outliers and Influential Observations in Exponential Random Graph Models

Published online by Cambridge University Press:  01 January 2025

Johan Koskinen*
Affiliation:
University of Manchester The University of Melbourne University of Linköping
Peng Wang
Affiliation:
Swinburne University of Technology
Garry Robins
Affiliation:
The University of Melbourne
Philippa Pattison
Affiliation:
The University of Sydney
*
Correspondence should be made to Johan Koskinen, The Mitchell Centre for Social Network Analysis and the Department of Social Statistics, School of Social Sciences, University of Manchester, Manchester M139PL, UK. Email: johan.koskinen@manchester.ac.uk

Abstract

We discuss measuring and detecting influential observations and outliers in the context of exponential family random graph (ERG) models for social networks. We focus on the level of the nodes of the network and consider those nodes whose removal would result in changes to the model as extreme or “central” with respect to the structural features that “matter”. We construe removal in terms of two case-deletion strategies: the tie-variables of an actor are assumed to be unobserved, or the node is removed resulting in the induced subgraph. We define the difference in inferred model resulting from case deletion from the perspective of information theory and difference in estimates, in both the natural and mean-value parameterisation, representing varying degrees of approximation. We arrive at several measures of influence and propose the use of two that do not require refitting of the model and lend themselves to routine application in the ERGM fitting procedure. MCMC p values are obtained for testing how extreme each node is with respect to the network structure. The influence measures are applied to two well-known data sets to illustrate the information they provide. From a network perspective, the proposed statistics offer an indication of which actors are most distinctive in the network structure, in terms of not abiding by the structural norms present across other actors.

Type
Original Paper
Copyright
Copyright © 2018 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Johan Koskinen would like to acknowledge financial support from the Leverhulme Trust Grant RPG-2013-140 and SRG2012.

References

Anderson, B. S.,Butts, C., &Carley, K.(1999).The interaction of size and density with graph-level indices.Social Networks,21,239267.CrossRefGoogle Scholar
Barndorff-Nielsen, O. E.(1978).Information and exponential families in statistical theory,New York:Wiley.Google Scholar
Belsley, D. A.,Kuh, E., &Welsh, R. E.(1980).Regression diagnostics: Identifying influential data and sources of collinearity, Wiley series in probability and mathematical statistics,New York:Wiley.CrossRefGoogle Scholar
Besag, J.(1974).Spatial interaction and the statistical analysis of lattice systems.Journal of the Royal Statistical Society B,36,96127.CrossRefGoogle Scholar
Block, P.,Koskinen, J. H.,Stadtfeld, C. J.,Hollway, J., &Steglich, C.(2018).Change we can believe in: Comparing longitudinal network models on consistency, interpretability and predictive power.Social Networks,52,189191.CrossRefGoogle Scholar
Borgatti, S. P., &Everett, M. G.(2006).A graph-theoretic perspective on centrality.Social Networks,28,466484.CrossRefGoogle Scholar
Chatterjee, S., &Hadi, A. S.(2009).Sensitivity analysis in linear regression,New York:John Wiley & Sons.Google Scholar
Cook, R. D.(1977).Detection of influential observations in linear regression.Technometrics,19,1518.CrossRefGoogle Scholar
Cook, R. D.(1986).Assessment of local influence.Journal of the Royal Statistical Society, Series B,48,133169.CrossRefGoogle Scholar
Corander, J., Dahmström, K., & Dahmström, P. (1998). Maximum likelihood estimation for Markov graphs. Research report, 1998:8, Stockholm University, Department of Statistics.Google Scholar
Corander, J., Dahmström, K., & Dahmström, P. (2002). Maximum likelihood estimation for exponential random graph model. In Hagberg, J.(ed.), Contributions to social network analysis, information theory, and other topics in statistics; A Festschrift in honour of Ove Frank (pp. 1–17). University of Stockholm: Department of Statistics.Google Scholar
Crouch, B., Wasserman, S., & Trachtenberg, F. (1998). Markov Chain Monte Carlo maximum likelihood estimation for p* social network models. Paper presented at the Sunbelt XVIII and Fifth European International Social Networks Conference, Sitges (Spain), May 28–31, 1998.Google Scholar
Dahmström, K., & Dahmström, P. (1993). ML-estimation of the clustering parameter in a Markov graph model. Stockholm: Research report, 1993:4, Department of Statistics.Google Scholar
Frank, O., &Strauss, D.(1986).Markov graphs.Journal of the American Statistical Association,81,832842.CrossRefGoogle Scholar
Freeman, L. C.(1978).Centrality in social networks conceptual clarification.Social Networks,1,215239.CrossRefGoogle Scholar
Gelman, A., &Meng, X. L.(1998).Simulating normalizing constants: From importance sampling to bridge sampling to path sampling.Statistical Science,13,163185.CrossRefGoogle Scholar
Handcock, M. S. (2003). Assessing degeneracy in statistical models of social networks. Working Paper no. 39, Center for Statistics and the Social Sciences, University of Washington. http://www.csss.washington.edu/Papers/wp39.pdf.Google Scholar
Handcock, M., &Gile, K.(2010).Modeling social networks from sampled data.The Annals of Applied Statistics,4,525.CrossRefGoogle ScholarPubMed
Hines, R. OH., &Hines, W. GS.(1995).Exploring Cook’s statistic graphically.The American Statistician,49,389394.CrossRefGoogle Scholar
Hines, R. OH.,Lawless, J. F., &Carter, E. M.(1992).Diagnostics for a cumulative multinomial generalized linear model, with applications to grouped toxicological mortality data.Journal of the American Statistical Association,87,10591069.CrossRefGoogle Scholar
Holland, P., &Leinhardt, S.(1981).An exponential family of probability distributions for directed graphs (with discussion).Journal of the American Statistical Association,76,3365.CrossRefGoogle Scholar
Huisman, M.(2009).Imputation of missing network data: Some simple procedures.Journal of Social Structure,10,1129.Google Scholar
Hunter, D. R., &Handcock, M. S.(2006).Inference in curved exponential family models for networks.Journal of Computational and Graphical Statistics,15,565583.CrossRefGoogle Scholar
Jonasson, J.(1999).The random triangle model.Journal of Applied Probability,36,852876.CrossRefGoogle Scholar
Koskinen, J. (in press). Exponential random graph models. In B. Everitt, G. Molenberghs, W. Piegorsch, F. Ruggeri, M. Davidian, & R. Kenett (Eds.), Wiley StatsRef: Statistics Reference Online. Wiley, stat08136. https://doi.org/10.1002/9781118445112.stat08136.CrossRefGoogle Scholar
Koskinen, J.,Robins, G., &Pattison, P. E.(2010).Analysing exponential random graph (p-star) models with missing data using bayesian data augmentation.Statistical Methodology,7,3366384.CrossRefGoogle Scholar
Koskinen, J.,Robins, G.,Wang, P., &Pattison, P. E.(2013).Bayesian analysis for partially observed network data, missing ties, attributes and actors.Social Networks,35,4514527.CrossRefGoogle Scholar
Koskinen, J., &Snijders, T. AB.,Lusher, D.,Koskinen, J., &Robins, G.(2013).Simulation, estimation and goodness of fit.Exponential random graph models for social networks: Theory, methods and applications,New York, NY:Cambridge University Press.141166.Google Scholar
Kuhnt, S.Outlier identification procedures for contingency tables using maximum likelihood and L1\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$L_1$$\end{document} estimates.(2004).Scandinavian Journal of Statistics,31,431442.CrossRefGoogle Scholar
Laumann, E. O.,Marsden, P. V., &Prensky, D.Burt, R. S., &Minor, M. J.(1983).The boundary specification problem in network analysis.Applied network analysis,London:Sage Publications.1834.Google Scholar
Lazega, E.(2001).The collegial phenomenon: The social mechanisms of cooperation among peers in a corporate law partnership,Oxford:Oxford University Press.CrossRefGoogle Scholar
Lee, A. H.(1988).Partial influence in generalized linear models.Biometrics,44,7177.CrossRefGoogle Scholar
Lehmann, E. L.(1983).Theory of point estimation,New York:Wiley.CrossRefGoogle Scholar
Lesaffre, E., &Albert, A.(1989).Multiple-group logistic regression diagnostics.Applied Statistics,38,425440.CrossRefGoogle Scholar
Lesaffre, E. mmanuel., &Verbeke, G. eert.(1998).Local Influence in Linear Mixed Models.Biometrics,54,2570CrossRefGoogle ScholarPubMed
Little, R. JA., &Rubin, D. B.(1987).Statistical analysis with missing data,New York:Wiley.Google Scholar
Lusher, D.,Koskinen, J., &Robins, G. L.(2013).Exponential random graph models for social networks: Theory, methods, and applications,Cambridge:Cambridge University Press.Google Scholar
McPherson, M.,Smith-Lovin, L., &Cook, J. M.(2001).Birds of a feather: Homophily in social networks.Annual Review of Sociology,27,415444.CrossRefGoogle Scholar
Meng, X-L, &Wong, W. H.(1996).Simulating ratios of normalizing constants via a simple identity: A theoretical exploration.Statistica Sinica,6,831860.Google Scholar
Neal, R. M. (1993) Probabilistic inference using Markov Chain Monte Carlo methods. Technical Report CRG–TR–93–1, Department of Statistics, University of Toronto. http://www.cs.utoronto.ca/~radford/. Accessed 29 Sept 2008.Google Scholar
Nomikos, J. M.(2007).Terrorism, media, and intelligence in Greece: Capturing the 17 November group.International Journal of Intelligence and CounterIntelligence,20,16578.CrossRefGoogle Scholar
Pattison, P. E., &Wasserman, S.(1999).Logit models and logistic regressions for social networks: II. Multivariate relations.British Journal of Mathematical and Statistical Psychology,52,169193.CrossRefGoogle ScholarPubMed
Pierce, D. A., &Schafer, D. W.(1986).Residuals in generalized linear models.Journal of the American Statistical Association,81,977986.CrossRefGoogle Scholar
Pregibon, D.(1981).Logistic regression diagnostics.The Annals of Statistics,9,705724.CrossRefGoogle Scholar
Rhodes, C. J., &Jones, P.(2009).Inferring missing links in partially observed social networks.Journal of the Operational Research Society,60,13731383.CrossRefGoogle Scholar
Robins, G. L., &Daraganova, G.Lusher, D.,Koskinen, J., &Robins, G.(2013).Social selection, dyadic covariates, and geospatial effects.Exponential random graph models for social networks: Theory, methods, and applications,Cambridge:Cambridge University Press.91101.Google Scholar
Robins, G. L.,Elliott, P., &Pattison, P. E.(2001).Network models for social selection processes.Social networks,23,130.CrossRefGoogle Scholar
Robins, G. L., &Lusher, D.Lusher, D.,Koskinen, J., &Robins, G.(2013).Illustrations: Simulation, estimation, and goodness of fit.Exponential random graph models for social networks: Theory, methods, and applications,Cambridge:Cambridge University Press.167185.Google Scholar
Robins, G. L., &Morris, M.(2007).Advances in exponential random graph (p*) Models.Social Networks,29,169172.CrossRefGoogle Scholar
Robins, G. L.,Pattison, P. E., &Elliot, P.(2001).Network models for social influence processes.Psychometrika,66,161190.CrossRefGoogle Scholar
Robins, G. L.,Pattison, P. E., Woolcock, J.(2005).Small and other worlds: Global network structures from local processes.American Journal of Sociology,110,894936.CrossRefGoogle Scholar
Rubin, D. B.(1976).Inference and missing data (with discussion).Biometrika,63,581592.CrossRefGoogle Scholar
Schoch, D., & Brandes, U. (2015). Stars, neighborhood inclusion, and network centrality. In SIAM workshop on network science.Google Scholar
Shalizi, C. R., &Rinaldo, A.(2013).Consistency under sampling of exponential random graph models.The Annals of Statistics,41,508535.CrossRefGoogle ScholarPubMed
Snijders, T. AB.(2002).Markov chain Monte Carlo estimation of exponential random graph models.Journal of Social Structure,3,2140.Google Scholar
Snijders, T. AB.(2010).Conditional marginalization for exponential random graph models.Journal of Mathematical Sociology,34,239252.CrossRefGoogle Scholar
Snijders, T. AB., &Borgatti, S. P.(1999).Non-parametric standard errors and tests for network statistics.Connections,22,6170.Google Scholar
Snijders, T. AB.,Pattison, P. E.,Robins, G. L.,&Handcock, M. S.(2006).New specifications for exponential random graph models.Sociological Methodology,36,99153.CrossRefGoogle Scholar
Schweinberger, M.(2011).Instability, sensitivity, and degeneracy of discrete exponential families.Journal of the American Statistical Association,106,13611370.CrossRefGoogle ScholarPubMed
Schweinberger, M., Krivitsky, P. N., & Butts, C. T. (2017). Foundations of finite-, super-, and infinite-population random graph inference. arXiv:1707.04800v1Google Scholar
Strauss, D.(1986).On a general class of models for interaction.SIAM Review,28,513527.CrossRefGoogle Scholar
The John Jay & ARTIS Transnational Terrorism Database, JJATT. (2009). http://doitapps.jjay.cuny.edu/jjatt/data.php. Accessed 27 July 2016.Google Scholar
van Duijn, M. AJ.,Gile, K. J., &Handcock, M. S.(2009).A framework for the comparison of maximum pseudo-likelihood and maximum likelihood estimation of exponential family random graph models.Social Networks,31,15262.CrossRefGoogle ScholarPubMed
Wang, P.,Pattison, P., &Robins, G.(2013).Exponential random graph model specifications for bipartite networks—A dependence hierarchy.Social Networks,35,2211222.CrossRefGoogle Scholar
Wang, P., Robins, G., Pattison, P., & Koskinen, J. (2014). MPNet, Program for the simulation and estimation of (p*\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$p^{\ast }$$\end{document}) exponential random graph models for Multilevel networks: USER MANUAL. Melbourne School of Psychological Sciences The University of Melbourne Australia.Google Scholar
Wasserman, S., &Faust, K.(1994).Social network analysis: Methods and applications,Cambridge:Cambridge University Press.CrossRefGoogle Scholar
Wasserman, S., &Pattison, P. E.(1996).Logit models and logistic regressions for social networks: I. An introduction to Markov graphs and p*.Psychometrika,61,401425.CrossRefGoogle Scholar
Waternaux, C.,Laird, N. M., &Ware, J. H.(1989).Methods for analysis of longitudinal data: Blood-lead concentrations and cognitive development.Journal of the American Statistical Association,84,3341.CrossRefGoogle Scholar
Weiss, R. E., &Lazaro, C. G.(1992).Residual plots for repeated measures.Statistics in Medicine,11,115124.CrossRefGoogle ScholarPubMed
Williams, D. A. (1984). Residuals in generalized linear models. In Proceedings of the XIIth international biometric conference, Tokyo (pp. 59–68).Google Scholar
Williams, D. A.(1987).Generalized linear model diagnostics using the deviance and single case deletions.Applied Statistics,36,181191.CrossRefGoogle Scholar