Hostname: page-component-745bb68f8f-hvd4g Total loading time: 0 Render date: 2025-01-27T04:12:04.634Z Has data issue: false hasContentIssue false

ON EFFICIENCY GAINS FROM MULTIPLE INCOMPLETE SUBSAMPLES

Published online by Cambridge University Press:  04 September 2019

Saraswata Chaudhuri*
Affiliation:
McGill University
*
*Address correspondence to Saraswata Chaudhuri, Department of Economics, McGill University, Montreal, Canada; e-mail: saraswata.chaudhuri@mcgill.ca.

Abstract

Cost-effective survey methods such as multi(R)-phase sampling typically generate samples that are collections of monotonic subsamples, i.e., the variables observed for the units in subsample r are also observed for the units in subsample r + 1 for r = 1,…,R – 1. These subsamples represent subpopulations that can be systematically different if the selection of a unit in each phase of sampling depends on the observed variables for that unit from past phases. Our article is about optimally combining all the subsamples for the efficient estimation of a finite dimensional parameter defined by moment restrictions on a generic target population that is an arbitrary union of these subpopulations. Only the R-th subsample is assumed to contain all the variables that are arguments of the moment function. Semiparametric efficiency bounds for estimation are obtained under a unified framework, allowing for full generality of the selection on observables in the sampling design. Contribution of each subsample toward efficient estimation is analyzed; and this turns out to differ fundamentally from that in setups where the same collection of subsamples is instead generated unplanned by unknown sampling. Uniquely, our setup enables all the subsamples to contribute to the efficient estimation for all the target populations, which we show is not possible in other setups. Efficient estimation is standard. Simulation evidence of substantive efficiency gains from using all the subsamples is provided for all the targets.

Type
ARTICLES
Copyright
Copyright © Cambridge University Press 2019 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

I am very much grateful to the editor P.C.B. Phillips, the co-editor P. Guggenberger and three anonymous referees for their detailed insightful comments. The article was circulated before as “A Note on Efficiency Gains from Multiple Incomplete Subsamples” but the title was modified at the suggestion of the editor. Previous versions of the article, some of which are available on the author’s webpage, benefitted from the helpful comments of A. Prokhorov, C. Muris, D. Guilkey, D. Frazier, E. Renault, F. Lange, J. Hill, J. Haushofer, J. MacKinnon, J. Wooldridge, M. Carrasco, M. Chemin, P. Saha Chaudhuri, S.J. Lee, and V. Zinde-Walsh, the seminar participants at Brown, Concordia, McGill (Econ and Biostat), Queen’s, U. Canterbury, U. Montreal, U. New South Wales, UNC Chapel Hill, U. Sydney, West Virginia University and the Midwest Econometrics Group meetings (2013).

References

REFERENCES

Abrevaya, J. & Donald, S.G. (2017) A GMM approach for dealing with missing data on regressors and instruments. Review of Economics and Statistics 99, 657662.CrossRefGoogle Scholar
Ackerberg, D., Chen, X., & Hahn, J. (2012) A practical asymptotic variance estimator for two-step semiparametric estimators. The Review of Economics and Statistics 94, 481498.CrossRefGoogle Scholar
Ai, C. & Chen, X. (2012) The semiparametric efficiency bound for models of sequential moment restrictions containing unknown functions. Journal of Econometrics 170, 442457.CrossRefGoogle Scholar
Andrews, D.W.K. (1994) Asymptotics for semiparametric econometric models via stochastic equicontinuity. Econometrica 62, 4372.CrossRefGoogle Scholar
Ashraf, N., Berry, J., & Shapiro, J.M. (2010) Can higher prices stimulate product use? Evidence from a field experiment in zambia. American Economic Review 100, 23832413.CrossRefGoogle Scholar
Ashraf, N., Field, E., & Lee, J. (2014) Household bargaining and excess fertility: An experimental study in zambia. American Economic Review 104, 22102237.CrossRefGoogle Scholar
Barnwell, J.L. & Chaudhuri, S. (2018) Efficient Estimation in Sub and Full Populations with Monotonically Missing at Random Data. Technical report, McGill University.Google Scholar
Beaman, L., Karlan, D., Thusbaert, B., & Udry, C. (2015) Self-Selection into Credit Markets: Evidence from Agriculture in Mali. Mimeo.Google Scholar
Beegle, K., Weerdt, J.D., Friedman, J., & Gibson, J. (2012) Methods of household consumption measurement through surveys: Experimental results from Tanzania. Journal of Development Economics 98, 318.CrossRefGoogle Scholar
Brown, B. & Newey, W. (1998) Efficient semiparametric estimation of expectations. Econometrica 66, 453464.CrossRefGoogle Scholar
Carroll, R., Ruppert, D., & Stefanski, L. (1995) Measurement Error in Nonlinear Models. Chapman and Hall.CrossRefGoogle Scholar
Cattaneo, M. (2010) Efficient semiparametric estimation of multivalued treatment effects under ignorability. Journal of Econometrics 155, 138154.CrossRefGoogle Scholar
Chamberlain, G. (1992). Comment: Sequential moment restrictions in panel data. Journal of Business and Economic Statistics 10, 2026.Google Scholar
Chatterjee, N. & Li, Y. (2010) Inference in semiparametric regression models under partial questionnaire design and nonmonotone missing data. Journal of the American Statistical Association 105, 787797.CrossRefGoogle Scholar
Chaudhuri, S. (2014) A Note on Efficiency Gains from Multiple Incomplete Subsamples. Mimeo.Google Scholar
Chaudhuri, S. & Guilkey, D.K. (2016) GMM with multiple missing variables. Journal of Applied Econometrics 31, 678706.CrossRefGoogle Scholar
Chaudhuri, S. & Hill, J.B. (2016) Heavy Tail Robust Estimation and Inference for Average Treatment Effect. Technical report, University of North Carolina.Google Scholar
Chen, X., Hong, H., & Tamer, E. (2005) Measurement error models with auxiliary data. Review of Economic Studies 72, 343366.CrossRefGoogle Scholar
Chen, X., Hong, H., & Tarozzi, A. (2008) Semiparametric efficiency in GMM models with auxiliary data. Annals of Statistics 36, 808843.CrossRefGoogle Scholar
Chen, X., Linton, O., & van Keilegom, I. (2003) Estimation of semiparametric models when the criteria function is not smooth. Econometrica 71, 15911608.CrossRefGoogle Scholar
Dardanoni, V., Modica, S., & Peracchi, F. (2011) Regression with imputed covariates: A generalized missing-indicator approach. Journal of Econometrics 162, 362368.CrossRefGoogle Scholar
Devereux, P.J. & Tripathi, G. (2009) Optimally combining censored and uncensored datasets. Journal of Econometrics 151, 1732.CrossRefGoogle Scholar
Graham, B.S. (2011) Efficiency bounds for missing data models with semiparametric restrictions. Econometrica 79, 437452.Google Scholar
Graham, B.S., Pinto, C., & Egel, D. (2012) Inverse probability tilting for moment condition models with missing data. Review of Economic Studies 79, 10531079.CrossRefGoogle Scholar
Graham, B.S., Pinto, C.C.D.X., & Egel, D. (2016) Efficient estimation of data combination models by the method of auxiliary-to-study tilting. Journal of Business and Economic Statistics 34, 288301.CrossRefGoogle Scholar
Graham, J.W., Hofer, S.M., & MacKinnon, D.P. (1996) Maximizing the usefulness of data obtained with planned missing value patterns: An application of maximum likelihood procedures. Multivariate Behavioral Research 31, 197218.CrossRefGoogle ScholarPubMed
Graham, J.W., Taylor, B.J., Olchowski, A.E., & Cumsille, P.E. (2006) Planned missing data designs in psychological research. Psychological Methods 11, 323342.CrossRefGoogle ScholarPubMed
Hahn, J. (1998) On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica 66, 315331.CrossRefGoogle Scholar
Holcroft, C., Rotnitzky, A., & Robins, J.M. (1997) Efficient estimation of regression parameters from multistage studies with validation of outcome and covariates. Journal of Statistical Planning and Inference 65, 349374.CrossRefGoogle Scholar
Holt, C.A. & Laury, S.K. (2002) Risk aversion and incentive effects. The American Economic Review 92, 16441655.CrossRefGoogle Scholar
Ichimura, I. & Martinez-Sanchis, E. (2005) Identification and Estimation of GMM Models by Combining Two Data Sets. Working paper.Google Scholar
Khan, S. & Tamer, E. (2010) Irregular identification, support conditions, and inverse weight estimation. Econometrica 78, 20212042.Google Scholar
Lee, A.J., Scott, A.J., & Wild, C.J. (2012) Efficient estimation in multiphase case–control studies. Biometrika 97, 361374.CrossRefGoogle Scholar
Little, R. & Rubin, D. (2002) Statistical Analysis with Missing Data. Wiley.CrossRefGoogle Scholar
McKenzie, D. & Rosenzweig, M. (2012) Preface for symposium on measurement and survey design. Journal of Development Economics 98, 12.CrossRefGoogle Scholar
Muris, C. (2016) Efficient GMM Estimation with a General Missing Data Pattern. Technical report, Simon Frasier University.Google Scholar
Newey, W.K. & McFadden, D.L. (1994) Large sample estimation and hypothesis testing. In Engle, R.F. & McFadden, D. (eds.), Handbook of Econometrics, vol. IV, chapter 36. pp. 22122245. Elsevier Science Publisher.Google Scholar
Pakes, A. & Pollard, D. (1989) Simulation and the asymptotics of optimization estimators. Econometrica 57, 10271057.CrossRefGoogle Scholar
Raghunathan, T.E. & Grizzle, J.E. (1995) A split questionnaire survey design. Journal of the American Statistical Association 90, 5463.CrossRefGoogle Scholar
Reilly, M. (1996) Optimal sampling strategies for two-stage studies. American Journal of Epidemiology 143, 92100.CrossRefGoogle ScholarPubMed
Ridder, G. & Moffitt, R. (2007) The econometrics of data combination. In Heckman, J.J. & Leamer, E.E. (eds.), Handbook of Econometrics, vol. 6B, chapter 75. pp. 54705547. Elsevier Science Publisher.Google Scholar
Robins, J. & Rotnitzky, A. (1995) Semiparametric efficiency in multivariate regression models with missing data. Journal of American Statistical Association 90, 122129.CrossRefGoogle Scholar
Robins, M., Rotnitzky, A., & Zhao, L. (1994) Estimation of regression coefficients when some regressors are not always observed. Journal of American Statistical Association 427, 846866.CrossRefGoogle Scholar
Robins, M., Rotnitzky, A., & Zhao, L. (1995) Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of American Statistical Association 429, 106121.CrossRefGoogle Scholar
Rotnitzky, A. & Robins, J. (1995) Semiparametric regression estimation in the presence of dependent censoring. Biometrika 82, 805820.CrossRefGoogle Scholar
Rubin, D. (1976) Inference and missing data. Biometrika 63, 581592.CrossRefGoogle Scholar
Shoemaker, D.M. (1973) Principles and Procedures of Multiple Matrix Sampling. Ballinger.Google Scholar
Thornton, R.L. (2008) The demand for, and impact of, learning HIV status. American Economic Review 98, 18291863.CrossRefGoogle ScholarPubMed
Tripathi, G. (2009) Optimally combining censored and uncensored datasets. Journal of Econometrics 151, 1732.Google Scholar
Tripathi, G. (2011) Moment-based inference with stratified data. Econometric Theory 27, 4773.CrossRefGoogle Scholar
Tsiatis, A.A. (2006) Semiparametric Theory and Missing Data. Springer.Google Scholar
Wacholder, S., Carroll, R.J., Pee, D., & Gail, M.H. (1994) The partial questionnaire design for case-control studies. Statistics in Medicine 13, 623634.CrossRefGoogle ScholarPubMed
Whittemore, A.S. (1997) Multistage sampling designs and estimating equations. Journal of Royal Statistical Society, Series B 59, 589602.CrossRefGoogle Scholar
Wooldridge, J. (1999) Asymptotic properties of weighted M-estimators for variable probability samples. Econometrica 69, 13851406.CrossRefGoogle Scholar
Wooldridge, J. (2007) Inverse probability weighted estimation for general missing data problems. Journal of Econometrics 141(2), 12811301.CrossRefGoogle Scholar
Supplementary material: PDF

Chaudhuri supplementary material

Online supplement

Download Chaudhuri supplementary material(PDF)
PDF 253.4 KB