Skip to main content Accessibility help
×
Hostname: page-component-5b777bbd6c-6lqsf Total loading time: 0 Render date: 2025-06-19T11:24:51.605Z Has data issue: false hasContentIssue false

Bibliography

Published online by Cambridge University Press:  16 May 2025

Vikram Krishnamurthy
Affiliation:
Cornell University, New York
Get access

Summary

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'
Type
Chapter
Information
Partially Observed Markov Decision Processes
Filtering, Learning and Controlled Sensing
, pp. 605 - 628
Publisher: Cambridge University Press
Print publication year: 2025

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Aberdeen, D. and Baxter, J.. “Scaling internal-state policy-gradient methods for POMDPs”. In: International Conference on Machine Learning. 2002, pp. 310.Google Scholar
Albert, R. and Barabási, A.-L.. “Statistical mechanics of complex networks”. In: Reviews of Modern Physics 74.1 (2002), p. 47.CrossRefGoogle Scholar
Abounadi, J., Bertsekas, D. P., and Borkar, V.. “Learning algorithms for Markov decision processes with average cost”. In: SIAM Journal on Control and Optimization 40.3 (2001), pp. 681698.CrossRefGoogle Scholar
Arjovsky, M., Chintala, S., and Bottou, L.. “Wasserstein generative adversarial networks”. In: International Conference on Machine Learning. 2017, pp. 214223.Google Scholar
Auer, P., Cesa-Bianchi, N., and Fischer, P.. “Finite-time analysis of the multiarmed bandit problem”. In: Machine Learning 47.2-3 (2002), pp. 235256.CrossRefGoogle Scholar
Afriat, S.. “The construction of utility functions from expenditure data”. In: International Economic Review 8.1 (1967), pp. 6777.CrossRefGoogle Scholar
Afriat, S.. Logic of Choice and Economic Theory. Clarendon Press, 1987.CrossRefGoogle Scholar
Agrawal, S. and Goyal, N.. “Analysis of Thompson sampling for the multi-armed bandit problem”. In: Conference on Learning Theory. Vol. 23. 2012.Google Scholar
Agrawal, S. and Goyal, N.. “Thompson sampling for contextual bandits with linear payoffs”. In: International Conference on Machine Learning. 2013, pp. 127135.Google Scholar
Agarwal, A. et al. “Taming the monster: a fast and simple algorithm for contextual bandits”. In: International Conference on Machine Learning. 2014, pp. 16381646.Google Scholar
Altman, E., Gaujal, B., and Hordijk, A.. Discrete-Event Control of Stochastic Networks: Multimodularity and Regularity. Springer-Verlag, 2004.Google Scholar
Arasaratnam, I. and Haykin, S.. “Cubature Kalman filters”. In: IEEE Transactions on Automatic Control 54.6 (2009), pp. 12541269.CrossRefGoogle Scholar
Albright, S.. “Structural results for partially observed Markov decision processes”. In: Operations Research 27.5 (Sept. 1979), pp. 10411053.CrossRefGoogle Scholar
Alipourfard, N., Nettasinghe, B., Abeliuk, A., Krishnamurthy, V., and Lerman, K.. “Friendship paradox biases perceptions in directed networks”. In: Nature Communications 11.1 (2020), pp. 19.CrossRefGoogle ScholarPubMed
Altman, E.. Constrained Markov Decision Processes. Chapman and Hall, 1999.Google Scholar
Anderson, B. D. O. and Moore, J. B.. Optimal Filtering. Prentice Hall, 1979.Google Scholar
Anderson, B. D. O. and Moore, J. B.. Optimal Control : Linear Quadratic Methods. Prentice-Hall, 1989.Google Scholar
Ambrogioni, L. et al. “Wasserstein variational inference”. In: Advances in Neural Information Processing Systems 31 (2018), pp. 111.Google Scholar
Amir, R.. “Supermodularity and complementarity in economics: An elementary survey”. In: Southern Economic Journal 71.3 (2005), pp. 636660.Google Scholar
Audibert, J., Munos, R., and Szepesvári, C.. “Exploration–exploitation tradeoff using variance estimates in multi-armed bandits”. In: Theoretical Computer Science 410.19 (2009), pp. 18761902.CrossRefGoogle Scholar
Abbeel, P. and Ng, A. Y.. “Apprenticeship learning via inverse reinforcement learning”. In: International Conference on Machine Learning. 2004, p. 1.Google Scholar
Andradottir, S.. “A global search method for discrete stochastic optimization”. In: SIAM Journal on Optimization 6.2 (May 1996), pp. 513530.CrossRefGoogle Scholar
Andradottir, S.. “Accelerating the convergence of random search methods for discrete stochastic optimization”. In: ACM Transactions on Modelling and Computer Simulation 9.4 (Oct. 1999), pp. 349380.CrossRefGoogle Scholar
Acemoglu, D. and Ozdaglar, A.. “Opinion dynamics and learning in social networks”. In: Dynamic Games and Applications 1.1 (2011), pp. 349.CrossRefGoogle Scholar
Albore, A., Palacios, H., and Geffner, H.. “A translation-based approach to contingent planning”. In: International Joint Conference on Artificial Intelligence. 2009, pp. 16231628.Google Scholar
Arapostathis, A., Borkar, V., Fernández-Gaucherand, E., Ghosh, M. K., and Marcus, S. I.. “Discrete-time controlled Markov processes with average cost criterion: A survey”. In: SIAM Journal on Control and Optimization 31.2 (1993), pp. 282344.CrossRefGoogle Scholar
Artzner, P., Delbaen, F., Eber, J., Heath, D., and Ku, H.. “Coherent multiperiod risk adjusted values and bellman’s principle”. In: Annals of Operations Research 152.1 (2007), pp. 522.CrossRefGoogle Scholar
Artzner, P., Delbaen, F., Eber, J., and Heath, D.. “Coherent measures of risk”. In: Mathematical Finance 9.3 (July 1999), pp. 203228.CrossRefGoogle Scholar
Åström, K. J.. “Optimal control of Markov processes with incomplete state information”. In: Journal of Mathematical Analysis and Applications 10.1 (1965), pp. 174205.CrossRefGoogle Scholar
Andersland, M. S. and Teneketzis, D.. “Measurement scheduling for recursive team estimation”. In: Journal of Optimization Theory and Applications 89.3 (June 1996), pp. 615636.CrossRefGoogle Scholar
Athey, S.. “Monotone comparative statics under uncertainty”. In: The Quarterly Journal of Economics 117.1 (2002), pp. 187223.CrossRefGoogle Scholar
Atar, R. and Zeitouni, O.. “Lyapunov exponents for finite state nonlinear filtering”. In: SIAM Journal on Control and Optimization 35.1 (1997), pp. 3655.CrossRefGoogle Scholar
Banerjee, A.. “A simple model of herd behavior”. In: Quaterly Journal of Economics 107.3 (Aug. 1992), pp. 797817.CrossRefGoogle Scholar
Barber, R. F., Candes, E. J., Ramdas, A., and Tibshirani, R. J.. “Conformal prediction beyond exchangeability”. In: The Annals of Statistics 51.2 (2023), pp. 816845.CrossRefGoogle Scholar
Baum, L. E., Petrie, T., Soules, G., and Weiss, N.. “A maximisation technique occurring in the statistical analysis of probabilistic functions of Markov chains”. In: Annals of Mathematical Statistics 41.1 (1970), pp. 164171.CrossRefGoogle Scholar
Bartlett, P. and Baxter, J.. “Estimation and approximation bounds for gradient-based reinforcement learning”. In: Journal of Computer and System Sciences 64.1 (2002), pp. 133150.CrossRefGoogle Scholar
Baras, J. S. and Bensoussan, A.. “Optimal sensor scheduling in nonlinear filtering of diffusion processes”. In: SIAM Journal on Control and Optimization 27.4 (July 1989), pp. 786813.CrossRefGoogle Scholar
Bubeck, S. and Cesa-Bianchi, N.. “Regret analysis of stochastic and nonstochastic multi-armed bandit problems”. In: arXiv preprint arXiv:1204.5721 (2012).Google Scholar
Brockett, R. W. and Clarke, J. M. C.. “The geometry of the conditional density equation”. In: Analysis and Optimization of Stochastic Systems. Ed. by Jacobs, O. L. R. et al. 1980, pp. 299309.Google Scholar
Bottou, L., Curtis, F. E., and Nocedal, J.. “Optimization methods for large-scale machine learning”. In: SIAM Review 60.2 (2018), pp. 223311.CrossRefGoogle Scholar
Bundfuss, S. and Dür, M.. “Algorithmic copositivity detection by simplicial partition”. In: Linear Algebra and its Applications 428.7 (2008), pp. 15111523.CrossRefGoogle Scholar
Bundfuss, S. and Dür, M.. “An adaptive linear approximation algorithm for copositive programs”. In: SIAM Journal on Optimization 20.1 (2009), pp. 3053.CrossRefGoogle Scholar
Boyd, S., Diaconis, P., and Xiao, L.. “Fastest mixing Markov chain on a graph”. In: SIAM Review 46.4 (2004), pp. 667689.CrossRefGoogle Scholar
Bellman, R.. Dynamic Programming. Princeton University Press, 1957.Google ScholarPubMed
Benes, V. E.. “Exact finite-dimensional filters for certain diffusions with nonlinear drift”. In: Stochastics 5 (1981), pp. 6592.CrossRefGoogle Scholar
Bensoussan, A.. Stochastic Control of Partially Observable Systems. Cambridge University Press, 1992.CrossRefGoogle Scholar
Bertsekas, D. P.. Nonlinear Programming. Athena Scientific, 2000.Google Scholar
Bertsekas, D. P.. “Dynamic programming and suboptimal control: a survey from ADP to MPC”. In: European Journal of Control 11.4 (2005), pp. 310334.CrossRefGoogle Scholar
Bertsekas, D. P.. Dynamic Programming and Optimal Control. Vol. 1 and 2. Athena Scientific, 2017.Google Scholar
Bertsekas, D. P.. Reinforcement Learning and Optimal Control. Athena Scientific, 2019.Google Scholar
Ben-Zvi, T. and Grosfeld-Nir, A.. “Partially observed Markov decision processes with binomial observations”. In: Operations Research Letters 41.2 (2013), pp. 201206.CrossRefGoogle Scholar
Bakry, D., Gentil, I., and Ledoux, M.. Analysis and Geometry of Markov Diffusion Operators. Vol. 348. Springer Science & Business Media, 2013.Google Scholar
Banerjee, A., Guo, X., and Wang, H.. “On the optimality of conditional expectation as a Bregman predictor”. In: IEEE Transactions on Information Theory 51.7 (2005), pp. 26642669.CrossRefGoogle Scholar
Booth, J. G. and Hobert, J. P.. “Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm”. In: Journal Royal Statistical Society. B 61 (1999), pp. 265285.CrossRefGoogle Scholar
Benaim, M., Hofbauer, J., and Sorin, S.. “Stochastic approximations and differential inclusions”. In: SIAM Journal on Control and Optimization 44.1 (2005), pp. 328348.CrossRefGoogle Scholar
Benaim, M., Hofbauer, J., and Sorin, S.. “Stochastic approximations and differential inclusions, Part II: applications”. In: Mathematics of Operations Research 31.3 (2006), pp. 673695.CrossRefGoogle Scholar
Bikchandani, S., Hirshleifer, D., and Welch, I.. “A theory of fads, fashion, custom, and cultural change as information cascades”. In: Journal of Political Economy 100.5 (Oct. 1992), pp. 9921026.CrossRefGoogle Scholar
Bianchi, L., Dorigo, M., Gambardella, L., and Gutjahr, W.. “A survey on metaheuristics for stochastic combinatorial optimization”. In: Natural Computing: An International Journal 8.2 (2009), pp. 239287.CrossRefGoogle Scholar
Billingsley, P.. Statistical Inference for Markov Processes. University of Chicago Press, 1961.Google Scholar
Billingsley, P.. Probability and Measure. Wiley, 1986.Google Scholar
Billingsley, P.. Convergence of Probability Measures. 2nd ed. Wiley, 1999.CrossRefGoogle Scholar
Blei, D. and Jordan, M.. “Variational inference for dirichlet process mixtures”. In: Bayesian Analysis 1.1 (Mar. 2006), pp. 121143.CrossRefGoogle Scholar
Boström, H., Johansson, U., and Löfström, T.. “Mondrian conformal predictive distributions”. In: Symposium on Conformal and Probabilistic Prediction and Applications. Vol. 152. Aug. 2021, pp. 2438.Google Scholar
Bhatt, S. and Krishnamurthy, V.. “Controlled sequential information fusion with social sensors”. In: IEEE Transactions on Automatic Control 66.12 (2020), pp. 58935908.CrossRefGoogle Scholar
Blei, D. M., Kucukelbir, A., and McAuliffe, J. D.. “Variational inference: a review for statisticians”. In: Journal of the American Statistical Association 112.518 (2017), pp. 859877.CrossRefGoogle Scholar
Babaioff, M., Kleinberg, R., and Papadimitriou, C.. “Congestion games with malicious players”. In: ACM Conference on Electronic Commerce. 2007, pp. 103112.Google Scholar
Bensoussan, A. and Lions, J.. Impulsive Control and Quasi-variational Inequalities. Gauthier-Villars, 1984.Google Scholar
Blackwell, D.. “Comparison of experiments”. In: Proceedings of the 2nd Berkeley Symposium on Mathematical Statistics and Probability. University of California Press. 1951, pp. 93102.CrossRefGoogle Scholar
Blackwell, D.. “Equivalent comparisons of experiments”. In: The Annals of Mathematical Statistics (1953), pp. 265272.CrossRefGoogle Scholar
Bar-Shalom, Y., Li, X. R., and Kirubarajan, T.. Estimation with Applications to Tracking and Navigation. John Wiley, 2008.Google Scholar
Boucheron, S., Lugosi, G., and Massart, P.. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, 2013.CrossRefGoogle Scholar
Björk, T. and Murgoci, A.. “A theory of Markovian time-inconsistent stochastic control in discrete time”. In: Finance and Stochastics 18.3 (2014), pp. 545592.CrossRefGoogle Scholar
Benveniste, A., Metivier, M., and Priouret, P.. Adaptive Algorithms and Stochastic Approximations. Vol. 22. Applications of Mathematics. Springer-Verlag, 1990.CrossRefGoogle Scholar
Bordignon, V., Matta, V., and Sayed, A. H.. “Adaptive social learning”. In: IEEE Transactions on Information Theory 67.9 (2021), pp. 60536081.CrossRefGoogle Scholar
Basseville, M. and Nikiforov, I.. Detection of Abrupt Changes – Theory and Applications. Information and System Sciences Series. Prentice Hall, 1993.Google Scholar
Bond, R. et al. “A 61-million-person experiment in social influence and political mobilization”. In: Nature 489 (Sept. 2012), pp. 295298.CrossRefGoogle ScholarPubMed
Borkar, V. S.. Stochastic Approximation. A Dynamical Systems Viewpoint. Cambridge University Press, 2008.CrossRefGoogle Scholar
Bose, S., Orosel, G., Ottaviani, M., and Vesterlund, L.. “Dynamic monopoly pricing and herding”. In: The RAND Journal of Economics 37.4 (2006), pp. 910928.CrossRefGoogle Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J.. “Distributed optimization and statistical learning via the alternating direction method of multipliers”. In: Foundations and Trends in Machine Learning 3.1 (2011), pp. 1122.CrossRefGoogle Scholar
Baum, L. E. and Petrie, T.. “Statistical inference for probabilistic functions of finite state Markov chains”. In: Annals of Mathematical Statistics 37 (1966), pp. 15541563.CrossRefGoogle Scholar
Blackman, S. and Popoli, R.. Design and Analysis of Modern Tracking Systems. Artech House, 1999.Google Scholar
Brunnermeier, M. K., Papakonstantinou, F., and Parker, J. A.. “Optimal time-inconsistent beliefs: Misplanning, procrastination, and commitment”. In: Management Science 63.5 (2017), pp. 13181340.CrossRefGoogle Scholar
Bäuerle, N. and Rieder, U.. “More risk-sensitive Markov decision processes”. In: Mathematics of Operations Research 39.1 (2013), pp. 105120.CrossRefGoogle Scholar
Bremaud, P.. Point Processes and Queues. Springer-Verlag, 1981.CrossRefGoogle Scholar
Bremaud, P.. Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues. Springer-Verlag, 1999.CrossRefGoogle Scholar
Broderick, T., Boyd, N., Wibisono, A., Wilson, A. C., and Jordan, M. I.. “Streaming variational Bayes”. In: Advances in Neural Information Processing Systems 26 (2013), pp. 17271735.Google Scholar
Bertsekas, D. P. and Shreve, S. E.. Stochastic Optimal Control: The Discrete-Time Case. Academic Press, 1978.Google Scholar
Barles, G. and Souganidis, P. E.. “Convergence of approximation schemes for fully nonlinear second order equations”. In: Asymptotic Analysis 4.3 (1991), pp. 271283.CrossRefGoogle Scholar
Ben-Tal, A. and Teboulle, M.. “An old-new concept of convex risk measures: the optimized certainty equivalent”. In: Mathematical Finance 17.3 (2007), pp. 449476.CrossRefGoogle Scholar
Bénabou, R. and Tirole, J.. “Mindful economics: the production, consumption, and value of beliefs”. In: Journal of Economic Perspectives 30.3 (2016), pp. 14164.CrossRefGoogle Scholar
Bertsekas, D. P. and Tsitsiklis, J. N.. “An analysis of stochastic shortest path problems”. In: Mathematics of Operations Research 16.3 (1991), pp. 580595.CrossRefGoogle Scholar
Bertsekas, D. P. and Tsitsiklis, J. N.. Neuro-Dynamic Programming. Athena Scientific, 1996.Google Scholar
Boyd, S. and Vandenberghe, L.. Convex Optimization. Cambridge University Press, 2004.CrossRefGoogle Scholar
Banerjee, T. and Veeravalli, V.. “Data-efficient quickest change detection with on-off observation control”. In: Sequential Analysis 31 (2012), pp. 4077.CrossRefGoogle Scholar
Burnaev, E. and Vovk, V.. “Efficiency of conformalized ridge regression”. In: Conference on Learning Theory. 2014, pp. 605622.Google Scholar
Benaim, M. and Weibull, J.. “Deterministic approximation of stochastic evolution in games”. In: Econometrica 71.3 (2003), pp. 873903.CrossRefGoogle Scholar
Bertsekas, D. P. and Yu, H.. “Q-learning and enhanced policy iteration in discounted dynamic programming”. In: Mathematics of Operations Research 37.1 (2012), pp. 6694.CrossRefGoogle Scholar
Caines, P. E.. Linear Stochastic Systems. Wiley, 1988.Google Scholar
Cassandra, A. R.. Tony’s POMDP Page. www.cs.brown.edu/research/ai/pomdp.Google Scholar
Cassandra, A. R.. “A survey of POMDP applications”. In: Working Notes of AAAI 1998 Fall Symposium on Planning with Partially Observable Markov Decision Processes. 1998, pp. 1724.Google Scholar
Cassandra, A. R.. “Exact and approximate algorithms for partially observed Markov decision process”. PhD thesis. Dept. Computer Science, Brown University, 1998.Google Scholar
Cook, J. O. and Barnes, L. W. Jr.Choice of delay of inevitable shock.” In: Journal of Abnormal and Social Psychology 68.6 (1964), pp. 669672.CrossRefGoogle ScholarPubMed
Charpentier, C. J., Bromberg-Martin, E. S., and Sharot, T.. “Valuation of knowledge and ignorance in mesolimbic reward circuitry”. In: Proceedings of the National Academy of Sciences 115.31 (2018), E7255–E7264.CrossRefGoogle ScholarPubMed
Cao, H., Cohen, S., and Szpruch, L.. “Identifiability in inverse reinforcement learning”. In: Advances in Neural Information Processing Systems 34 (2021), pp. 1236212373.Google Scholar
Cairoli, R. and Dalang, R. C.. Sequential Stochastic Optimization. John Wiley & Sons, 2011.Google Scholar
Caplin, A. and Dean, M.. “Revealed preference, rational inattention, and costly information acquisition”. In: American Economic Review 105.7 (2015), pp. 21832203.CrossRefGoogle Scholar
Caplin, A., Dean, M., and Leahy, J.. “Rational inattention, optimal consideration sets, and stochastic choice”. In: The Review of Economic Studies 86.3 (2019), pp. 10611094.CrossRefGoogle Scholar
Cherchye, L., De Rock, B., and Vermeulen, F.. “The revealed preference approach to collective consumption behaviour: testing and sharing rule recovery”. In: The Review of Economic Studies 78.1 (2011), pp. 176198.CrossRefGoogle Scholar
Cohen, S. N. and Elliott, R. J.. Stochastic Calculus and Applications. Vol. 2. Springer, 2015.CrossRefGoogle Scholar
Cen, S., Cheng, C., Chen, Y., Wei, Y., and Chi, Y.. “Fast global convergence of natural policy gradient methods with entropy regularization”. In: Operations Research 70.4 (2022), pp. 25632578.CrossRefGoogle Scholar
Cover, T. M. and Hellman, M. E.. “The two-armed-bandit problem with time-invariant finite memory”. In: IEEE Transactions on Information Theory 16.2 (1970), pp. 185195.CrossRefGoogle Scholar
Chamley, C.. Rational Herds: Economic Models of Social Learning. Cambridge University Press, 2004.Google Scholar
Chiou, W.. “A note on estimation algebras on nonlinear filtering theory”. In: Systems and Control Letters 28 (1996), pp. 5563.CrossRefGoogle Scholar
Chu, W., Li, L., Reyzin, L., and Schapire, R.. “Contextual bandits with linear payoff functions”. In: International Conference on Artificial Intelligence and Statistics. 2011, pp. 208214.Google Scholar
Choi, J. and Kim, K.. “Inverse reinforcement learning in partially observable environments”. In: Journal of Machine Learning Research 12 (2011), pp. 691730.Google Scholar
Cassandra, A. R., Kaelbling, L., and Littman, M. L.. “Acting optimally in partially observable stochastic domains”. In: AAAI. Vol. 94. 1994, pp. 10231028.Google Scholar
Caplin, A. and Leahy, J.. “Psychological expected utility theory and anticipatory feelings”. In: The Quarterly Journal of Economics 116.1 (2001), pp. 5579.CrossRefGoogle Scholar
Cassandras, C. G. and Lafortune, S.. Introduction to Discrete Event Systems. 3rd ed. Springer-Verlag, 2021.CrossRefGoogle Scholar
Coleman, T. F. and Li, Y.. “An interior trust region approach for nonlinear minimization subject to bounds”. In: SIAM Journal on Optimization 6.2 (1996), pp. 418445.CrossRefGoogle Scholar
Clark, J. M. C.. “The design of robust approximations to the stochastic differential equations of nonlinear filtering”. In: Communication Systems and Random Processes Theory, Darlington 1977. Ed. by Skwirzynski, J. K.. Sijthoff and Noordhoff, 1978.Google Scholar
Chen, J., Li, S. E., and Tomizuka, M.. “Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning”. In: IEEE Transactions on Intelligent Transportation Systems 23.6 (2022), pp. 50685078.CrossRefGoogle Scholar
Cassandra, A. R., Littman, M. L., and Zhang, N. L.. “Incremental pruning: A simple fast exact method for partially observed Markov decision processes”. In: Annual Conference on Uncertainty in Artificial Intelligence. 1997.Google Scholar
Caplin, A. and Martin, D.. “A testable theory of imperfect perception”. In: The Economic Journal 125.582 (2015), pp. 184202.CrossRefGoogle Scholar
Cappe, O., Moulines, E., and Ryden, T.. Inference in Hidden Markov Models. Springer-Verlag, 2005.CrossRefGoogle Scholar
Cavus, O. and Ruszczynski, A.. “Risk-averse control of undiscounted transient Markov models”. In: SIAM Journal on Control and Optimization 52.6 (2014), pp. 39353966.CrossRefGoogle Scholar
Cao, Y. and Ross, S.. “The friendship paradox”. In: Mathematical Scientist 41.1 (2016).Google Scholar
Cover, T. M. and Thomas, J. A.. Elements of Information Theory. Wiley-Interscience, 2006.Google Scholar
Candès, E. J. and Tao, T.. “The power of convex relaxation: Near-optimal matrix completion”. In: IEEE Transactions on Information Theory 56.5 (May 2009), pp. 20532080.CrossRefGoogle Scholar
Davis, M. H. A.. “On a multiplicative functional transformation arising in nonlinear filtering theory”. In: Z. Wahrscheinlichkeitstheorie verw. Gebiete 54 (1980), pp. 125139.CrossRefGoogle Scholar
Deb, R.. Interdependent Preferences, Potential Games and Household Consumption. MPRA Paper 6818. University Library of Munich, Germany, Jan. 2008.CrossRefGoogle Scholar
Deb, R.. “A testable model of consumption with externalities”. In: Journal of Economic Theory 144.4 (2009), pp. 18041816.CrossRefGoogle Scholar
Diaconis, P. and Freedman, D.. “On the consistency of Bayes estimates”. In: The Annals of Statistics (1986), pp. 126.Google Scholar
Doucet, A., Freitas, N. D., and Gordon, N., eds. Sequential Monte Carlo Methods in Practice. Springer-Verlag, 2001.CrossRefGoogle Scholar
Dayanik, S. and Goulding, C.. “Detection and identification of an unobservable change in the distribution of a Markov-modulated random sequence”. In: IEEE Transactions on Information Theory 55.7 (2009), pp. 33233345.CrossRefGoogle Scholar
Dorigo, M. and Gambardella, M.. “Ant-Q: A reinforcement learning approach to the traveling salesman problem”. In: International Conference on Machine Learning. 2014, pp. 252260.Google Scholar
Doucet, A., Godsill, S., and Andrieu, C.. “On sequential Monte-Carlo sampling methods for Bayesian filtering”. In: Statistics and Computing 10 (2000), pp. 197208.CrossRefGoogle Scholar
Doucet, A., Gordon, N., and Krishnamurthy, V.. “Particle filters for state estimation of jump Markov linear systems”. In: IEEE Transactions on Signal Processing 49 (2001), pp. 613624.CrossRefGoogle Scholar
Diewert, W.. “Afriat’s theorem and some extensions to choice under uncertainty”. In: The Economic Journal 122.560 (2012), pp. 305331.CrossRefGoogle Scholar
Diewert, W.. “Afriat and revealed preference theory”. In: The Review of Economic Studies (1973), pp. 419425.CrossRefGoogle Scholar
Doucet, A. and Johansen, A. M.. “A tutorial on particle filtering and smoothing: Fifteen years later”. In: Oxford Handbook on Nonlinear Filtering. Ed. by Crisan, D. and Rozovsky, B.. Oxford University Press, 2011.Google Scholar
Dasgupta, A., Kumar, R., and Sivakumar, D.. “Social sampling”. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM. 2012, pp. 235243.Google Scholar
Derman, C., Lieberman, G. J., and Ross, S. M.. “Optimal system allocations with penalty cost”. In: Management Science 23.4 (Dec. 1976), pp. 399403.CrossRefGoogle Scholar
Dempster, A. P., Laird, N. M., and Rubin, D. B.. “Maximum likelihood from incomplete data via the EM algorithm”. In: Journal of the Royal Statistical Society, B 39 (1977), pp. 138.CrossRefGoogle Scholar
van Dyk, D. and Meng, X.. “The art of data augmenation”. In: Journal of Computational and Graphical Statistics 10.1 (2001), pp. 150.CrossRefGoogle Scholar
Dean, M. and Martin, D.. “Measuring rationality with the minimum cost of revealed preference violations”. In: Review of Economics and Statistics 98.3 (2016), pp. 524534.CrossRefGoogle Scholar
Douc, R., Moulines, E., and Ryden, T.. “Asymptotic properties of the maximum likelihood estimator in autoregressive models with Markov regime”. In: The Annals of Statistics 32.5 (2004), pp. 22542304.CrossRefGoogle Scholar
Demuynck, T. and Rehbeck, J.. “Computing revealed preference goodness-of-fit measures with integer programming”. In: Economic Theory 76.4 (2023), pp. 11751195.CrossRefGoogle Scholar
Dentcheva, D. and Ruszczyński, A.. Risk-Averse Optimization and Control. Springer, 2024.CrossRefGoogle Scholar
Denardo, E. and Rothblum, U.. “Optimal stopping, exponential utility, and linear programming”. In: Mathematical Programming 16.1 (1979), pp. 228244.CrossRefGoogle Scholar
Dudley, R. M.. “Sample functions of the Gaussian process”. In: The Annals of Probability 1.1 (1973), pp. 66103.CrossRefGoogle Scholar
Dynkin, E.. “Controlled random sequences”. In: Theory of Probability & Its Applications 10.1 (1965), pp. 114.CrossRefGoogle Scholar
Eagle, J. N.. “The optimal search for a moving target when the search path is constrained”. In: Operations Research 32 (1984), pp. 11071115.CrossRefGoogle Scholar
Elliott, R. J., Aggoun, L., and Moore, J. B.. Hidden Markov Models – Estimation and Control. Springer-Verlag, 1995.Google Scholar
Easley, D. and Kleinberg, J.. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010.CrossRefGoogle Scholar
Ethier, S. N. and Kurtz, T. G.. Markov Processes – Characterization and Convergence. Wiley, 1986.CrossRefGoogle Scholar
Elliott, R. J. and Krishnamurthy, V.. “Exact finite-dimensional filters for maximum likelihood parameter estimation of continuous-time linear Gaussian systems”. In: SIAM Journal on Control and Optimization 35.6 (Nov. 1997), pp. 19081923.CrossRefGoogle Scholar
Elliott, R. J. and Krishnamurthy, V.. “New finite dimensional filters for estimation of discrete-time linear Gaussian models”. In: IEEE Transactions on Automatic Control 44.5 (May 1999), pp. 938951.CrossRefGoogle Scholar
Evans, J. and Krishnamurthy, V.. “Hidden Markov model state estimation over a packet switched network”. In: IEEE Transactions on Signal Processing 42.8 (Aug. 1999), pp. 21572166.CrossRefGoogle Scholar
Evans, R., Krishnamurthy, V., and Nair, G.. “Networked sensor management and data rate control for tracking maneuvering targets”. In: IEEE Transactions on Signal Processing 53.6 (June 2005), pp. 19791991.CrossRefGoogle Scholar
Erkin, Z., Bailey, M. D., Maillart, L. M., Schaefer, A. J., and Roberts, M. S.. “Eliciting patients’ revealed preferences: an inverse Markov decision process approach”. In: Decision Analysis 7.4 (2010), pp. 358365.CrossRefGoogle Scholar
Fan, W. et al. “Privacy preserving classification on local differential privacy in data centers”. In: Journal of Parallel and Distributed Computing 135 (2020), pp. 7082.CrossRefGoogle Scholar
Fontaine, X., Berthet, Q., and Perchet, V.. “Regularized contextual bandits”. In: International Conference on Artificial Intelligence and Statistics. 2019, pp. 21442153.Google Scholar
Feld, S. L.. “Why your friends have more friends than you do”. In: American Journal of Sociology 96.6 (1991), pp. 14641477.CrossRefGoogle Scholar
Fernando, T., Denman, S., Sridharan, S., and Fookes, C.. “Deep inverse reinforcement learning for behavior prediction in autonomous driving: accurate forecasts of vehicle motion”. In: IEEE Signal Processing Magazine 38.1 (2020), pp. 8796.CrossRefGoogle Scholar
Ferguson, T. S.. “A Bayesian analysis of some nonparametric problems”. In: The Annals of Statistics (1973), pp. 209230.Google Scholar
Fournier, N. and Guillin, A.. “On the rate of convergence in Wasserstein distance of the empirical measure”. In: Probability Theory and Related Fields 162.3 (2015), pp. 707738.CrossRefGoogle Scholar
Fessler, J. A. and Hero, A. O.. “Space–Alternating Generalized Expectation–Maximization algorithm”. In: IEEE Transactions on Signal Processing 42.10 (1994), pp. 26642677.CrossRefGoogle Scholar
Fanaswalla, M. and Krishnamurthy, V.. “Detection of anomalous trajectory patterns in target tracking via stochastic context-free grammars and reciprocal process models”. In: IEEE Journal on Selected Topics Signal Processing 7.1 (Feb. 2013), pp. 7690.CrossRefGoogle Scholar
Fanaswala, M. and Krishnamurthy, V.. “Syntactic models for trajectory constrained track-before-detect”. In: IEEE Transactions on Signal Processing 62.23 (2014), pp. 61306142.CrossRefGoogle Scholar
Filar, J., Kallenberg, L., and Lee, H.. “Variance-penalized Markov decision processes”. In: Mathematics of Operations Research 14.1 (1989), pp. 147161.CrossRefGoogle Scholar
Fudenberg, D. and Levine, D.. “Consistency and cautious fictitious play”. In: Journal of Economic Dynamics and Control 19.5-7 (1995), pp. 10651089.CrossRefGoogle Scholar
Fudenberg, D. and Levine, D. K.. The Theory of Learning in Games. MIT Press, 1998.Google Scholar
Fu, J., Luo, K., and Levine, S.. “Learning robust rewards with adversarial inverse reinforcement learning”. In: arXiv preprint arXiv:1710.11248 (2017).Google Scholar
Flury, B. D.. “Acceptance–rejection sampling made easy”. In: SIAM Review 32.3 (1990), pp. 474476.CrossRefGoogle Scholar
Forges, F. and Minelli, E.. “Afriat’s theorem for general budget sets”. In: Journal of Economic Theory 144.1 (2009), pp. 135145.CrossRefGoogle Scholar
Foygel Barber, R., Candes, E. J., Ramdas, A., and Tibshirani, R. J.. “The limits of distribution-free conditional predictive inference”. In: Information and Inference: A Journal of the IMA 10.2 (2021), pp. 455482.CrossRefGoogle Scholar
Frazier, P. I.. “A tutorial on Bayesian optimization”. In: arXiv preprint arXiv:1807.02811 (2018).Google Scholar
Fleming, W. H. and Soner, H. M.. Controlled Markov Processes and Viscosity Solutions. Vol. 25. Springer Science & Business Media, 2006.Google Scholar
Fostel, A., Scarf, H., and Todd, M.. “Two new proofs of Afriat’s theorem”. In: Economic Theory 24.1 (2004), pp. 211219.CrossRefGoogle Scholar
Fleissig, A. and Whitney, G.. “Testing for the significance of violations of Afriat’s inequalities”. In: Journal of Business & Economic Statistics 23.3 (2005), pp. 355362.CrossRefGoogle Scholar
Gantmacher, F. R.. Matrix Theory. Vol. 2. Chelsea Publishing Company, 1960.Google Scholar
Gassiat, E. and Boucherone, S.. “Optimal error exponents in hidden Markov models order estimation”. In: IEEE Transactions on Information Theory 49.4 (2003), pp. 964980.CrossRefGoogle Scholar
Goel, P. K. and Ginebra, J.. “When is one experiment ‘always better than’ another?” In: Journal of the Royal Statistical Society: Series D (The Statistician) 52.4 (2003), pp. 515537.Google Scholar
Ghosh, D.. “Maximum likelihood estimation of the dynamic shock-error model”. In: Journal of Econometrics 41.1 (1989), pp. 121143.CrossRefGoogle Scholar
Gittins, J. C.. Multi–armed Bandit Allocation Indices. Wiley, 1989.Google Scholar
Globerson, A. and Jaakkola, T.. “Approximate inference using conditional entropy decompositions”. In: International Conference on Artificial Intelligence and Statistics. 2007, pp. 131138.Google Scholar
Gharehshiran, O. N., Krishnamurthy, V., and Yin, G.. “Adaptive search algorithms for discrete stochastic optimization: A smooth best-response approach”. In: IEEE Transactions on Automatic Control 62.1 (2017), pp. 161176.CrossRefGoogle Scholar
Ghadimi, S. and Lan, G.. “Stochastic first-and zeroth-order methods for nonconvex stochastic programming”. In: SIAM Journal on Optimization 23.4 (2013), pp. 23412368.CrossRefGoogle Scholar
Garivier, A. and Moulines, E.. “On upper-confidence bound policies for switching bandit problems”. In: Algorithmic Learning Theory. Springer. 2011, pp. 174188.CrossRefGoogle Scholar
Gelfand, S. B. and Mitter, S. K.. “Recursive stochastic algorithms for global optimization in ℝd”. In: SIAM Journal on Control and Optimization 29.5 (1991), pp. 9991018.CrossRefGoogle Scholar
Goggin, E. M.. “Convergence of filters with applications to the Kalman–Bucy case”. In: IEEE Transactions on Information Theory 38.3 (1992), pp. 10911100.CrossRefGoogle Scholar
Ganuza, J.-J. and Penalva, J. S.. “Signal orderings based on dispersion and the supply of private information in auctions”. In: Econometrica 78.3 (2010), pp. 10071030.Google Scholar
Granovetter, M.. “Threshold models of collective behavior”. In: American Journal of Sociology 83.6 (May 1978), pp. 14201443.CrossRefGoogle Scholar
Grosfeld-Nir, A.. “Control limits for two-state partially observable Markov decision processes”. In: European Journal of Operational Research 182.1 (2007), pp. 300304.CrossRefGoogle Scholar
Goel, S. and Salganik, M. J.. “Respondent-driven sampling as Markov chain Monte Carlo”. In: Statistics in Medicine 28 (2009), pp. 22092229.CrossRefGoogle ScholarPubMed
Gordon, N. J., Salmond, D. J., and Smith, A. . M.. “Novel approach to nonlinear/non-Gaussian Bayesian state estimation”. In: IEE Proceedings-F 140.2 (1993), pp. 107113.Google Scholar
Guo, D., Shamai, S., and Verdú, S.. “Mutual information and minimum mean-square error in Gaussian channels”. In: IEEE Transactions on Information Theory 51.4 (2005), pp. 12611282.CrossRefGoogle Scholar
Ghosal, S. and Van der Vaart, A.. Fundamentals of Nonparametric Bayesian Inference. Vol. 44. Cambridge University Press, 2017.CrossRefGoogle Scholar
Hamdi, M., Solman, G., Kingstone, A., and Krishnamurthy, V.. “Social learning in a human society: An experimental study”. In: arXiv preprint arXiv:1408.5378 (2014).Google Scholar
Hauskrecht, M.. “Value-function approximations for partially observable Markov decision processes”. In: Journal of Artificial Intelligence Research 13.1 (2000), pp. 3394.CrossRefGoogle Scholar
Haykin, S.. “Cognitive radio: Brain-empowered wireless communications”. In: IEEE Journal on Selected Areas Communications 23.2 (Feb. 2005), pp. 201220.CrossRefGoogle Scholar
Haykin, S.. “Cognitive radar”. In: IEEE Signal Processing Magazine (Jan. 2006), pp. 3040.CrossRefGoogle Scholar
Haykin, S.. Adaptive Filter Theory. 5th ed. Prentice Hall, 2013.Google Scholar
Heckathorn, D. D. and Cameron, C. J.. “Network sampling: from snowball and multiplicity to respondent-driven sampling”. In: Annual Review of Sociology 43.1 (2017), pp. 101119.CrossRefGoogle Scholar
Hellman, M. E. and Cover, T. M.. “Learning with finite memory”. In: The Annals of Mathematical Statistics 41.3 (1970), pp. 765782.CrossRefGoogle Scholar
Ho, Y.-C. and Cao, X.-R.. Discrete Event Dynamic Systems and Perturbation Analysis. Kluwer Academic, 1991.CrossRefGoogle Scholar
Hsu, S., . Chuang, and Arapostathis, A.. “On the existence of stationary optimal policies for partially observed MDPs under the long-run average cost criterion”. In: Systems & Control Letters 55.2 (2006), pp. 165173.CrossRefGoogle Scholar
Huang, M. and Dey, S.. “Stability of Kalman filtering with Markovian packet losses”. In: Automatica 43.4 (2007), pp. 598607.CrossRefGoogle Scholar
Hannan, E. J. and Deistler, M.. The Statistical Theory of Linear Systems. Wiley, 1988.Google Scholar
Heckathorn, D. D.. “Respondent-driven sampling II: deriving valid population estimates from chain-referral samples of hidden populations”. In: Social Problems 49 (2002), pp. 1134.CrossRefGoogle Scholar
Heckathorn, D. D.. “Respondent-driven sampling: a new approach to the study of hidden populations”. In: Social Problems 44 (1997), pp. 174199.CrossRefGoogle Scholar
Heidergott, B.. Max-Plus Linear Stochastic Systems and Perturbation Analysis. Springer, 2007.Google Scholar
Herman, M., Gindele, T., Wagner, J., Schmitt, F., and Burgard, W.. “Inverse reinforcement learning with simultaneous estimation of rewards and dynamics”. In: International Conference on Artificial Intelligence and Statistics. 2016, pp. 102110.Google Scholar
Hall, P. and Heyde, C.. Martingale Limit Theory and its Application. Academic Press, 1980.Google Scholar
Horn, R. A. and Johnson, C. R.. Matrix Analysis. Cambridge University Press, 2012.CrossRefGoogle Scholar
Hoiles, W., Krishnamurthy, V., and Aprem, A.. “PAC algorithms for detecting Nash equilibrium play in social networks: From twitter to energy markets”. In: IEEE Access 4 (2016), pp. 81478161.CrossRefGoogle Scholar
Hoiles, W., Krishnamurthy, V., and Pattanayak, K.. “Rationally inattentive inverse reinforcement learning explains YouTube commenting behavior”. In: Journal of Machine Learning Research 21.170 (2020), pp. 139.Google Scholar
Hsu, D., Kakade, S., and Zhang, T.. “A spectral algorithm for learning hidden Markov models”. In: Journal of Computer and System Sciences 78.5 (2012), pp. 14601480.CrossRefGoogle Scholar
Hunter, D. and Lange, K.. “A tutorial on MM algorithms”. In: The American Statistician 58.1 (2004), pp. 3037.CrossRefGoogle Scholar
Higham, N. and Lin, L.. “On pth roots of stochastic matrices”. In: Linear Algebra and its Applications 435.3 (2011), pp. 448463.CrossRefGoogle Scholar
Hernández-Lerma, O. and Laserre, J. B.. Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer-Verlag, 1996.CrossRefGoogle Scholar
Handschin, J. E. and Mayne, D. Q.. “Monte Carlo techniques to estimate the conditional expectation in multi-stage non-linear filtering”. In: International Journal of Control 9.5 (1969), pp. 547559.CrossRefGoogle Scholar
Hoiles, W., Namvar, O., Krishnamurthy, V., Dao, N., and Zhang, H.. “Adaptive caching in the youtube content distribution network: A revealed preference game-theoretic learning approach”. In: IEEE Transactions on Cognitive Communications and Networking 1.1 (2015), pp. 7185.CrossRefGoogle Scholar
Hong, J., Kveton, B., Zaheer, M., and Ghavamzadeh, M.. “Hierarchical Bayesian bandits”. In: International Conference on Artificial Intelligence and Statistics. 2022, pp. 77247741.Google Scholar
Howard, R. A.. Dynamic Probabilistic Systems. Vol. 1 and 2. Wiley, 1971.Google Scholar
Hofbauer, J. and Sandholm, W.. “On the global convergence of stochastic fictitious play”. In: Econometrica 70.6 (Nov. 2002), pp. 22652294.CrossRefGoogle Scholar
Heyman, D. P. and Sobel, M. J.. Stochastic Models in Operations Research. Vol. 2. McGraw-Hill, 1984.Google Scholar
Hamilton, J. D. and Susmel, R.. “Autogregressive conditional heteroskedasticity and changes in regime”. In: Journal of Econometrics 64.2 (1994), pp. 307333.CrossRefGoogle Scholar
Hansen, O. H. and Torgersen, E. N.. “Comparison of linear normal experiments”. In: The Annals of Statistics (1974), pp. 367373.Google Scholar
Hastie, T., Tibshirani, R., and Friedman, J.. The Elements of Statistical Learning. Springer-Verlag, 2009.CrossRefGoogle Scholar
Iida, K.. Studies on the Optimal Search Plan. Vol. 70. Lecture Notes in Statistics. Springer-Verlag, 1990.Google Scholar
Ibars, C., Navarro, M., and Giupponi, L.. “Distributed demand management in smart grid with a congestion game”. In: IEEE International Conference on Smart Grid Communications. 2010, pp. 495500.Google Scholar
Jackson, M. O.. Social and Economic Networks. Princeton University Press, 2010.CrossRefGoogle Scholar
Jackson, M. O.. “The friendship paradox and systematic biases in perceptions and social norms”. In: Journal of Political Economy 127.2 (2019), pp. 777818.CrossRefGoogle Scholar
Jamison, B.. “Reciprocal processes”. In: Probability Theory and Related Fields 30.1 (1974), pp. 6586.Google Scholar
Jazwinski, A.. Stochastic Processes and Filtering Theory. Academic Press, 1970.Google Scholar
James, M. R., Baras, J. S., and Elliott, R. J.. “Risk-sensitive control and dynamic games for partially observed discrete-time nonlinear systems”. In: IEEE Transactions on Automatic Control 39.4 (Apr. 1994), pp. 780792.CrossRefGoogle Scholar
Jones, B. E. and Edgerton, D. L.. “Testing utility maximization with measurement errors in the data”. In: Measurement Error: Consequences, Applications and Solutions. Emerald Group Publishing Limited, 2009, pp. 199236.CrossRefGoogle Scholar
Jeon, W. et al. “Regularized inverse reinforcement learning”. In: arXiv preprint arXiv:2010.03691 (2020).Google Scholar
Jewitt, I.. “Applications of likelihood ratio orderings in economics”. In: Lecture Notes – Monograph Series (1991), pp. 174189.CrossRefGoogle Scholar
Johnston, L. and Krishnamurthy, V.. “Opportunistic file transfer over a fading channel - a POMDP search theory formulation with optimal threshold policies”. In: IEEE Transactions on Wireless Commun. 5.2 (Feb. 2006), pp. 394405.CrossRefGoogle Scholar
Jain, A. and Krishnamurthy, V.. “Controlling federated learning for covertness”. In: Transactions on Machine Learning Research (2024).Google Scholar
Jain, A. and Krishnamurthy, V.. “Interacting large language model agents. Bayesian social learning based interpretable models.” In: IEEE Access 13 (Feb. 2025), pp. 2546525504.CrossRefGoogle Scholar
James, M. R., Krishnamurthy, V., and LeGland, F.. “Time discretization of continuous-time filters and smoothers for HMM parameter estimation”. In: IEEE Transactions on Information Theory 42.2 (Mar. 1996), pp. 593605.CrossRefGoogle Scholar
Jordan, R., Kinderlehrer, D., and Otto, F.. “The variational formulation of the Fokker–Planck equation”. In: SIAM journal on Mathematical Analysis 29.1 (1998), pp. 117.CrossRefGoogle Scholar
Jobert, A. and Rogers, L. C. G.. “Valuations and dynamic convex risk measures”. In: Mathematical Finance 18.1 (2008), pp. 122.CrossRefGoogle Scholar
Krishnamurthy, V. and Abad, F. V.. “Gradient based policy optimization of constrained unichain Markov decision processes”. In: Stochastic Processes, Finance and Control: A Festschrift in Honor of Robert J. Elliott. Ed. by Cohen, S., Madan, D., and Siu, T.. http://arxiv.org/abs/1110.4946. World Scientific, 2012.Google Scholar
Krishnamurthy, V., Aprem, A., and Bhatt, S.. “Multiple stopping time POMDPs: Structural results & application in interactive advertising on social media”. In: Automatica 95 (2018), pp. 385– 398.CrossRefGoogle Scholar
Kailath, T.. Linear Systems. Prentice Hall, 1980.Google Scholar
Kallenberg, O.. Probabilistic Symmetries and Invariance Principles. Vol. 9. Springer, 2005.Google Scholar
Kalman, R. E.. “A new approach to linear filtering and prediction problems”. In: Transactions of the ASME, Series D (Journal of Basic Engineering) 82 (Mar. 1960), pp. 3545.CrossRefGoogle Scholar
Kalman, R. E.. “When is a linear control system optimal?” In: Journal of Basic Engineering (Apr. 1964), pp. 5160.CrossRefGoogle Scholar
Kamalaruban, P. et al. “Robust reinforcement learning via adversarial training with Langevin dynamics”. In: arXiv preprint arXiv:2002.06063 (2020).Google Scholar
Karlin, S.. Total Positivity. Vol. 1. Stanford University Press, 1968.Google Scholar
Kingma, D. and Ba, J.. “Adam: a method for stochastic optimization”. In: International Conference on Learning Representations (ICLR). 2015.Google Scholar
Krishnamurthy, V. and Bhatt, S.. “Sequential detection of market shocks with risk-averse CVaR social sensors”. In: IEEE Journal of Selected Topics in Signal Processing 10.6 (2016), pp. 10611072.CrossRefGoogle Scholar
Kalman, R. E. and Bucy, R. S.. “New results in linear filtering and prediction theory”. In: Transactions of the ASME, Series D (Journal of Basic Engineering) 83 (Mar. 1961), pp. 95108.CrossRefGoogle Scholar
Kushner, H. J. and Clark, D. S.. Stochastic Approximation Methods for Constrained and Uncon- strained Systems. Springer-Verlag, 1978.CrossRefGoogle Scholar
Korattikara, A., Chen, Y., and Welling, M.. “Austerity in MCMC land: Cutting the Metropolis– Hastings budget”. In: International Conference on Machine Learning. 2014, pp. 181189.Google Scholar
Krishnamurthy, V. and Djonin, D.. “Structured threshold policies for dynamic sensor scheduling – A partially observed Markov decision process approach”. In: IEEE Transactions on Signal Processing 55.10 (Oct. 2007), pp. 49384957.CrossRefGoogle Scholar
Krishnamurthy, V. and Djonin, D.. “Optimal threshold policies for multivariate POMDPs in radar resource management”. In: IEEE Transactions on Signal Processing 57.10 (2009), pp. 39543969.CrossRefGoogle Scholar
Katsikopoulos, K. V. and Engelbrecht, S. E.. “Markov decision processes with delays and asynchronous cost collection”. In: IEEE Transactions on Automatic Control 48.4 (2003), pp. 568574.CrossRefGoogle Scholar
Krishnamurthy, V., Gharehshiran, O. N., and Hamdi, M.. “Interactive sensing and decision making in social networks”. In: Foundations and Trends® in Signal Processing 7.1-2 (2014), pp. 1– 196.CrossRefGoogle Scholar
Krishnamurthy, V. and Hoiles, W.. “Afriat’s test for detecting malicious agents”. In: IEEE Signal Processing Letters 19.12 (2012), pp. 801804.CrossRefGoogle Scholar
Krishnamurthy, V. and Hoiles, W.. “Online reputation and polling systems: Data incest, social learning and revealed preferences”. In: IEEE Transactions Computational Social Systems 1.3 (Jan. 2015), pp. 164179.CrossRefGoogle Scholar
Khalil, H. K.. Nonlinear Systems. 3rd ed. Prentice Hall, 2002.Google Scholar
Kurniawati, H., Hsu, D., and Lee, W. S.. “SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces”. In: Robotics: Science and Systems Conference. 2008.Google Scholar
Kijima, M.. Markov Processes for Stochastic Modelling. Chapman and Hall, 1997.CrossRefGoogle Scholar
Keilson, J. and Kester, A.. “Monotone matrices and monotone Markov processes”. In: Stochastic Processes and their Applications 5.3 (1977), pp. 231241.CrossRefGoogle Scholar
Kolmogorov, A. N.. “Interpolation and extrapolation of stationary random sequences”. In: Bull. Acad. Sci. U.S.S.R, Ser. Math. 5 (1941), pp. 314.Google Scholar
Kolmogorov, A. N.. “Stationary sequences in Hilbert space”. In: Bull. Math. Univ. Moscow 2.6 (1941).Google Scholar
Komenda, J., Lahaye, S., Boimond, J.-L., and van den Boom, T.. “Max-plus algebra in the history of discrete event systems”. In: Annual Reviews in Control 45 (2018), pp. 240249.CrossRefGoogle Scholar
Krishnamurthy, V. and Poor, H. V.. “Social learning and Bayesian games in multiagent signal processing: How do local and global decision makers interact?” In: IEEE Signal Processing Magazine 30.3 (2013), pp. 4357.CrossRefGoogle Scholar
Krishnamurthy, V. and Pareek, U.. “Myopic bounds for optimal policy of POMDPs: An extension of Lovejoy’s structural results”. In: Operations Research 62.2 (2015), pp. 428434.CrossRefGoogle Scholar
Kloeden, P. E. and Platen, E.. Numerical Solution of Stochastic Differential Equations. Springer, 1992.CrossRefGoogle Scholar
Kontorovich, L. and Ramanan, K.. “Concentration inequalities for dependent random variables via the martingale method”. In: The Annals of Probability 36.6 (2008), pp. 21262158.CrossRefGoogle Scholar
Krishnamurthy, V. and Rojas, C.. “Reduced complexity HMM filtering with stochastic dominance bounds: A convex optimization approach”. In: IEEE Transactions on Signal Processing 62.23 (2014), pp. 63096322.CrossRefGoogle Scholar
Krishnamurthy, V. and Rangaswamy, M.. “How to calibrate your adversary’s capabilities? Inverse filtering for counter-autonomous systems”. In: IEEE Transactions on Signal Processing 67.24 (2019), pp. 65116525.CrossRefGoogle Scholar
Krishnamurthy, V. and Rojas, C.. “Slow convergence of interacting Kalman filters in word-of-mouth social learning”. In: 60th Annual Allerton Conference on Communication, Control and Computing. IEEE, 2024.Google Scholar
Karlin, S. and Rinott, Y.. “Classes of orderings of measures and related correlation inequalities. I. Multivariate totally positive distributions”. In: Journal of Multivariate Analysis 10.4 (Dec. 1980), pp. 467498.CrossRefGoogle Scholar
Krishnamurthy, V., Bitmead, R., Gevers, M., and Miehling, E.. “Sequential detection with mutual information stopping cost: Application in GMTI radar”. In: IEEE Transactions on Signal Processing 60.2 (2012), pp. 700714.CrossRefGoogle Scholar
Krishnamurthy, V., Angley, D., Evans, R., and Moran, W.. “Identifying cognitive radars – Inverse reinforcement learning using revealed preferences”. In: IEEE Transactions on Signal Processing 68 (2020), pp. 45294542.CrossRefGoogle Scholar
Krishnamurthy, V., Pattanayak, K., Gogineni, S., Kang, B., and Rangaswamy, M.. “Adversarial radar inference: Inverse tracking, identifying cognition and designing smart interference”. In: IEEE Transactions on Aerospace and Electronic Systems 57.4 (2021), pp. 20672081.CrossRefGoogle Scholar
Krishnamurthy, V.. “Algorithms for optimal scheduling and management of hidden Markov model sensors”. In: IEEE Transactions on Signal Processing 50.6 (June 2002), pp. 13821397.CrossRefGoogle Scholar
Krishnamurthy, V.. “Bayesian sequential detection with phase-distributed change time and non-linear penalty – A lattice programming POMDP approach”. In: IEEE Transactions on Information Theory 57.3 (Oct. 2011), pp. 70967124.CrossRefGoogle Scholar
Krishnamurthy, V.. “Quickest detection POMDPs with social learning: Interaction of local and global decision makers”. In: IEEE Transactions on Information Theory 58.8 (2012), pp. 55635587.CrossRefGoogle Scholar
Krishnamurthy, V.. “How to schedule measurements of a noisy Markov chain in decision making?” In: IEEE Transactions on Information Theory 59.9 (July 2013), pp. 44404461.CrossRefGoogle Scholar
Krishnamurthy, V.. “Convex stochastic dominance in Bayesian localization, filtering and controlled sensing POMDPs”. In: IEEE Transactions on Information Theory 66.5 (2019), pp. 31873201.CrossRefGoogle Scholar
Kuleshov, V. and Schrijvers, O.. “Inverse game theory: learning utilities in succinct games”. In: International Conference on Web and Internet Economics. Springer. 2015, pp. 413427.CrossRefGoogle Scholar
Karatzas, I. and Shreve, S.. Brownian Motion and Stochastic Calculus. 2nd ed. Springer, 1991.Google Scholar
Karatzas, I. and Shreve, S.. Methods of Mathematical Finance. Vol. 39. Springer, 1998.Google Scholar
Kahneman, D. and Tversky, A.. “Prospect theory: An analysis of decision under risk”. In: Econometrica 47.2 (1979), pp. 263291.CrossRefGoogle Scholar
Karlin, S. and Taylor, H. M.. A Second Course in Stochastic Processes. Academic Press, 1981.Google Scholar
Kunsch, H. R.. “Recursive Monte Carlo filters: Algorithms and theoretical analysis”. In: The Annals of Statistics 33.5 (2005), pp. 19832021.CrossRefGoogle Scholar
Kurniawati, H.. “Partially observable Markov decision processes and robotics”. In: Annual Review of Control, Robotics, and Autonomous Systems 5 (2022), pp. 253277.CrossRefGoogle Scholar
Kurtz, T. G.. Approximation of Population Processes. Vol. 36. SIAM, 1981.CrossRefGoogle Scholar
Kushner, H. J.. “A new method of locating the maximum point of an arbitrary multi-peak curve in the presence of noise”. In: Journal of Fluids Engineering 86.1 (1964), pp. 97106.Google Scholar
Kushner, H. J.. “Dynamical equations for optimal nonlinear filtering”. In: Journal of Differential Equations 3 (1967), pp. 179190.CrossRefGoogle Scholar
Kushner, H. J.. “A robust discrete state approximation to the optimal nonlinear filter for a diffusion”. In: Stochastics 3.2 (1979), pp. 7583.CrossRefGoogle Scholar
Kushner, H. J.. Approximation and Weak Convergence Methods for Random Processes, with Applications to Stochastic Systems Theory. MIT Press, 1984.Google Scholar
Kumar, P. R. and Varaiya, P.. Stochastic Systems – Estimation, Identification and Adaptive Control. Prentice-Hall, 1986.Google Scholar
Krishnamurthy, V. and Wahlberg, B.. “POMDP multiarmed bandits – Structural results”. In: Mathematics of Operations Research 34.2 (May 2009), pp. 287302.CrossRefGoogle Scholar
Kingma, D. P. and Welling, M.. “An introduction to variational autoencoders”. In: Foundations and Trends® in Machine Learning 12.4 (2019), pp. 307392.CrossRefGoogle Scholar
Kwon, M., Daptardar, S., Schrater, P., and Pitkow, X.. “Inverse rational control with partially observable continuous nonlinear dynamics”. In: arXiv preprint arXiv:2009.12576 (2020).Google ScholarPubMed
Kushner, H. J. and Yin, G.. Stochastic Approximation Algorithms and Recursive Algorithms and Applications. 2nd ed. Springer-Verlag, 2003.Google Scholar
Krishnamurthy, V. and Yin, G.. “Langevin dynamics for adaptive inverse reinforcement learning of stochastic gradient algorithms”. In: Journal of Machine Learning Research 22 (2021), pp. 149.Google Scholar
Krishnamurthy, V. and Yin, G.. “Multikernel passive stochastic gradient algorithms and transfer learning”. In: IEEE Transactions on Automatic Control 67.4 (2022), pp. 17921805.CrossRefGoogle Scholar
Lin, X., Adams, S. C., and Beling, P. A.. “Multi-agent inverse reinforcement learning for certain general-sum stochastic games”. In: Journal of Artificial Intelligence Research 66 (2019), pp. 473502.CrossRefGoogle Scholar
Li, T., Bolic, M., and Djuric, P. M.. “Resampling methods for particle filtering: classification, implementation, and strategies”. In: IEEE Signal Processing Magazine 32.3 (2015), pp. 7086.CrossRefGoogle Scholar
Levine, R. and Casella, G.. “Implementations of the Monte Carlo EM algorithm”. In: Journal of Computational and Graphical Statistics 10.3 (Sept. 2001), pp. 422439.CrossRefGoogle Scholar
Liu, J. S. and Chen, R.. “Sequential Monte Carlo methods for dynamic systems”. In: Journal American Statistical Association 93 (1998), pp. 10321044.CrossRefGoogle Scholar
Littman, M., Cassandra, A. R., and Kaelbling, L.. “Learning policies for partially observable environments: scaling up”. In: International Conference on Machine Learning. 1995, pp. 362370.Google Scholar
Le Cam, L.. “Comparison of experiments: A short review”. In: Lecture Notes – Monograph Series (1996), pp. 127138.CrossRefGoogle Scholar
Lee, K. et al. “Generalized Tsallis entropy reinforcement learning and its application to soft mobile robots.” In: Robotics: Science and Systems. Vol. 16. 2020, pp. 110.Google Scholar
Lei, J., G’Sell, M., Rinaldo, A., Tibshirani, R. J., and Wasserman, L.. “Distribution-free predictive inference for regression”. In: Journal of the American Statistical Association 113.523 (2018), pp. 10941111.CrossRefGoogle Scholar
Lei, J.. “Classification with confidence”. In: Biometrika 101.4 (2014), pp. 755769.CrossRefGoogle Scholar
Leroux, B. G.. “Maximum-likelihood estimation for hidden Markov models”. In: Stochastic Processes and its Applications 40 (1992), pp. 127143.CrossRefGoogle Scholar
Logothetis, A. and Isaksson, A.. “On sensor scheduling via information theoretic criteria”. In: American Control Conference. 1999, pp. 24022406.Google Scholar
Lindvall, T.. Lectures on the Coupling Method. Courier Dover Publications, 2002.Google Scholar
Littman, M. L.. “A tutorial on partially observable Markov decision processes”. In: Journal of Mathematical Psychology 53.3 (2009), pp. 119125.CrossRefGoogle Scholar
Littman, M. L.. “Algorithms for sequential decision making”. PhD thesis. Brown University, 1996.Google Scholar
Liu, J. S.. Monte Carlo Strategies in Scientific Computing. Springer-Verlag, 2001.Google Scholar
Ljung, L.. “Analysis of recursive stochastic algorithms”. In: IEEE Transactions on Automatic Control AC-22.4 (1977), pp. 551575.CrossRefGoogle Scholar
Ljung, L.. System Identification. 2nd ed. Prentice Hall, 1999.Google Scholar
Levine, S. and Koltun, V.. “Continuous inverse optimal control with locally optimal examples”. In: arXiv preprint arXiv:1206.4617 (2012).Google Scholar
LeGland, F. and Mevel, L.. “Exponential forgetting and geometric ergodicity in hidden Markov models”. In: Mathematics of Controls, Signals and Systems 13.1 (2000), pp. 6393.CrossRefGoogle Scholar
Lobel, I., Acemoglu, D., Dahleh, M., and Ozdaglar, A.. “Preliminary results on social learning with partial observations”. In: International Conference on Performance Evaluation Methodolgies and Tools. ACM, 2007.Google Scholar
López-Pintado, D.. “Diffusion in complex social networks”. In: Games and Economic Behavior 62.2 (2008), pp. 573590.CrossRefGoogle Scholar
Louis, T.. “Finding the observed information matrix when using the EM algorithm”. In: Journal of the Royal Statistical Society 44(B) (1982), pp. 226233.CrossRefGoogle Scholar
Lovejoy, W. S.. “On the convexity of policy regions in partially observed systems”. In: Operations Research 35.4 (July 1987), pp. 619621.CrossRefGoogle Scholar
Lovejoy, W. S.. “Ordered solutions for dynamic programs”. In: Mathematics of Operations Research 12.2 (1987), pp. 269276.CrossRefGoogle Scholar
Lovejoy, W. S.. “Some monotonicity results for partially observed Markov decision processes”. In: Operations Research 35.5 (Sept. 1987), pp. 736743.CrossRefGoogle Scholar
Lovejoy, W. S.. “A survey of algorithmic methods for partially observed Markov decision processes”. In: Annals of Operations Research 28 (1991), pp. 4766.CrossRefGoogle Scholar
Lovejoy, W. S.. “Computationally feasible bounds for partially observed Markov decision processes”. In: Operations Research 39.1 (Jan. 1991), pp. 162175.CrossRefGoogle Scholar
Lai, T. and Robbins, H.. “Asymptotically efficient adaptive allocation rules”. In: Advances in Applied Mathematics 6.1 (1985), pp. 422.CrossRefGoogle Scholar
Liu, C. and Rubin, D.. “The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence”. In: Biometrica 81.4 (1994), pp. 633648.CrossRefGoogle Scholar
Lehmann, E. L., Romano, J. P., and Casella, G.. Testing Statistical Hypotheses. Vol. 3. Springer, 2005.Google Scholar
Ljung, L. and Söderström, T.. Theory and Practice of Recursive Identification. MIT Press, 1983.Google Scholar
Ledoux, M. and Talagrand, M.. Probability in Banach Spaces: Isoperimetry and Processes. Vol. 23. Springer Science & Business Media, 1991.CrossRefGoogle Scholar
Luenberger, D.. Optimization by Vector Space Methods. Wiley, 1969.Google Scholar
Liu, Z. and Vandenberghe, L.. “Interior-point method for nuclear norm approximation with application to system identification”. In: SIAM Journal on Matrix Analysis and Applications 31.3 (2009), pp. 12351256.CrossRefGoogle Scholar
Luenberger, D. and Ye, Y.. Linear and Nonlinear Programming. 4th ed. Springer, 2016.CrossRefGoogle Scholar
Liu, K. and Zhao, Q.. “Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access”. In: IEEE Transactions on Information Theory 56.11 (2010), pp. 55475567.CrossRefGoogle Scholar
Markowitz, H.. “Portfolio selection”. In: The Journal of Finance 7.1 (1952), pp. 7791.Google Scholar
Marcus, S. I.. “Algebraic and geometric methods in nonlinear filtering”. In: SIAM Journal on Control and Optimization 22.6 (Nov. 1984), pp. 817844.CrossRefGoogle Scholar
Mas-Colell, A.. “On revealed preference analysis”. In: The Review of Economic Studies (1978), pp. 121131.CrossRefGoogle Scholar
Mattila, R., Rojas, C. R., Krishnamurthy, V., and Wahlberg, B.. “Computing monotone policies for Markov decision processes: A nearly-isotonic penalty approach”. In: IFAC-PapersOnLine 50.1 (2017), pp. 84298434.CrossRefGoogle Scholar
Mattila, R., Rojas, C., Moulines, E., Krishnamurthy, V., and Wahlberg, B.. “Fast and consistent learning of hidden Markov models by incorporating non-consecutive correlations”. In: International Conference on Machine Learning. Vol. 119. 13–18 Jul 2020, pp. 67856796.Google Scholar
Mattila, R., Rojas, C. R., Krishnamurthy, V., and Wahlberg, B.. “Inverse filtering for hidden Markov models with applications to counter-adversarial autonomous systems”. In: IEEE Transactions on Signal Processing 68 (Aug. 2020), pp. 49875002.CrossRefGoogle Scholar
Mayne, D. Q., Rawlings, J. B., Rao, C. V., and Scokaert, P.. “Constrained model predictive control: stability and optimality”. In: Automatica 36.6 (2000), pp. 789814.CrossRefGoogle Scholar
Moulines, E. and Bach, F.. “Non-asymptotic analysis of stochastic approximation algorithms for machine learning”. In: Advances in Neural Information Processing Systems 24 (2011), pp. 451459.Google Scholar
McFadden, D.. “Economic choices”. In: American Economic Review 91.3 (2001), pp. 351378.CrossRefGoogle Scholar
Meng, X. L.. “On the rate of convergence of the ECM algorithm”. In: The Annals of Statistics 22.1 (1994), pp. 326339.CrossRefGoogle Scholar
Mohler, R. and Hwang, C.. “Nonlinear data observability and information”. In: Journal of Franklin Institute 325.4 (1988), pp. 443464.CrossRefGoogle Scholar
Milgrom, P.. “Good news and bad news: Representation theorems and applications”. In: Bell Journal of Economics 12.2 (1981), pp. 380391.CrossRefGoogle Scholar
MacPhee, I. and Jordan, B.. “Optimal search for a moving target”. In: Probability in the Engineering and Information Sciences 9 (1995), pp. 159182.CrossRefGoogle Scholar
McLachlan, G. J. and Krishnan, T.. The EM Algorithm and Extensions. Wiley, 1996.Google Scholar
Matějka, F. and McKay, A.. “Rational inattention to discrete choices: a new foundation for the multinomial logit model”. In: American Economic Review 105.1 (2015), pp. 272298.CrossRefGoogle Scholar
Miller, S. M. and Mangan, C. E.. “Interacting effects of information and coping style in adapting to gynecologic stress: Should the doctor tell all?” In: Journal of Personality and Social Psychology 45.1 (1983), pp. 223236.CrossRefGoogle ScholarPubMed
MacEachern, S. N. and Müller, P.. “Estimating mixture of Dirichlet process models”. In: Journal of Computational and Graphical Statistics 7.2 (1998), pp. 223238.CrossRefGoogle Scholar
Maćkowiak, B., Matějka, F., and Wiederholt, M.. “Rational inattention: a review”. In: Journal of Economic Literature 61.1 (2023), pp. 226273.CrossRefGoogle Scholar
Molloy, T. L. and Nair, G. N.. “Smoother entropy for active state trajectory estimation and obfuscation in POMDPs”. In: IEEE Transactions on Automatic Control 68.6 (2023), pp. 35573572.CrossRefGoogle Scholar
Mnih, V. et al. “Human-level control through deep reinforcement learning”. In: Nature 518.7540 (2015), pp. 529533.CrossRefGoogle ScholarPubMed
Monahan, G. E.. “A survey of partially observable Markov decision processes: theory, models and algorithms”. In: Management Science 28.1 (Jan. 1982), pp. 116.CrossRefGoogle Scholar
Moral, P. D.. Feynman–Kac Formulae – Genealogical and Interacting Particle Systems with Applications. Springer-Verlag, 2004.Google Scholar
Moustakides, G. B.. “Optimal stopping times for detecting changes in distributions”. In: Annals of Statistics 14 (1986), pp. 13791387.CrossRefGoogle Scholar
Meier, L., Perschon, J., and Dressler, R.. “Optimal control of measurement subsystems”. In: IEEE Transactions on Automatic Control 12.5 (Oct. 1967), pp. 528536.CrossRefGoogle Scholar
Muller, A. and Stoyan, D.. Comparison Methods for Stochastic Models and Risk. Wiley, 2002.Google Scholar
Milgrom, P. and Shannon, C.. “Monotone comparative statics”. In: Econometrica 62.1 (1994), pp. 157180.CrossRefGoogle Scholar
Monderer, D. and Shapley, L. S.. “Potential games”. In: Games and Economic Behavior 14.1 (1996), pp. 124143.CrossRefGoogle Scholar
Manning, C. D. and Schütze, H.. Foundations of Statistical Natural Language Processing. The MIT Press, 1999.Google Scholar
Meyn, S. P. and Tweedie, R. L.. Markov Chains and Stochastic Stability. Cambridge University Press, 2009.CrossRefGoogle Scholar
Makino, T. and Takeuchi, J.. “Apprenticeship learning for model parameters of partially observable environments”. In: arXiv preprint arXiv:1206.6484 (2012).Google Scholar
Molavi, P., Tahbaz-Salehi, A., and Jadbabaie, A.. “A theory of non-Bayesian social learning”. In: Econometrica 86.2 (2018), pp. 445490.CrossRefGoogle Scholar
Muller, A.. “How does the value function of a Markov decision process depend on the transition probabilities?” In: Mathematics of Operations Research 22 (1997), pp. 872885.CrossRefGoogle Scholar
Marcus, S. I. and Willsky, A. S.. “Algebraic structure and finite dimensional nonlinear estimation”. In: SIAM Journal on Mathematical Analysis 9.2 (Apr. 1978), pp. 312327.CrossRefGoogle Scholar
Nakai, T.. “The problem of optimal stopping in a partially observable Markov chain”. In: Journal of Optimization Theory and Applications 45.3 (1985), pp. 425442.CrossRefGoogle Scholar
Natarajan, S. et al. “Multi-agent inverse reinforcement learning”. In: International Conference on Machine Learning. IEEE. 2010, pp. 395400.Google Scholar
Neal, R. M.. “Markov chain sampling methods for dirichlet process mixture models”. In: Journal of Computational and Graphical Statistics 9.2 (2000), pp. 249265.CrossRefGoogle Scholar
Nettasinghe, B., Alipourfard, N., Iota, S., Krishnamurthy, V., and Lerman, K.. “Scale-free degree distributions, homophily and the glass ceiling effect in directed networks”. In: Journal of Complex Networks 10.2 (2022).Google Scholar
Nettasinghe, B., Chatterjee, S., Tipireddy, R., and Halappanavar, M.. “Extending conformal prediction to hidden Markov models with exact validity via de Finetti’s theorem for Markov chains”. In: International Conference on Machine Learning. 2023.Google Scholar
Neuts, M. F.. Structured Stochastic Matrices of M/G/1 Type and Their Applications. Marcel Dekker, 1989.Google Scholar
Neyman, A.. “Correlated equilibrium and potential games”. In: International Journal of Game Theory 26.2 (1997), pp. 223227.CrossRefGoogle Scholar
Ng, A. and Jordan, M.. “PEGASUS: A policy search method for large MDPs and POMDPs”. In: Conference on Uncertainty in Artificial Intelligence. 2000, pp. 406415.Google Scholar
Neu, G., Jonsson, A., and Gómez, V.. “A unified view of entropy-regularized Markov decision processes”. In: arXiv preprint arXiv:1705.07798 (2017).Google Scholar
Ngo, M. H. and Krishnamurthy, V.. “Monotonicity of constrained optimal transmission policies in correlated fading channels with ARQ”. In: IEEE Transactions on Signal Processing 58.1 (2010), pp. 438451.CrossRefGoogle Scholar
Nettasinghe, B. and Krishnamurthy, V.. “What do your friends think?: Efficient polling methods for networks using friendship paradox”. In: IEEE Transactions on Knowledge and Data Engineering (2019).CrossRefGoogle Scholar
Nishimura, H., Ok, E. A., and Quah, J. K.-H.. “A comprehensive approach to revealed preference theory”. In: American Economic Review 107.4 (2017), pp. 123963.CrossRefGoogle Scholar
Nazin, A. V., Polyak, B. T., and Tsybakov, A. B.. “Passive stochastic approximation”. In: Automation and Remote Control 50 (1989), pp. 15631569.Google Scholar
Ng, A. and Russell, S.. “Algorithms for inverse reinforcement learning”. In: International Conference on Machine Learning. 2000, pp. 663670.Google Scholar
Neufeld, A. and Sester, J.. “Robust Q-learning algorithm for Markov decision processes under Wasserstein uncertainty”. In: Automatica 168 (2024), p. 111825.CrossRefGoogle Scholar
Ottaviani, M. and Sørensen, P.. “Information aggregation in debate: who should speak first?” In: Journal of Public Economics 81.3 (2001), pp. 393421.CrossRefGoogle Scholar
Pardoux, E.. “Equations du filtrage nonlineaire de la prediction et du lissage”. In: Stochastics 6 (1982), pp. 193231.CrossRefGoogle Scholar
Patek, S.. “On partially observed stochastic shortest path problems”. In: IEEE Conference on Decision and Control. 2001, pp. 50505055.CrossRefGoogle Scholar
Pflug, G.. Optimization of Stochastic Models: The Interface between Simulation and Optimization. Kluwer Academic Publishers, 1996.CrossRefGoogle Scholar
Pineau, J., Gordon, G., and Sebastian, T.. “Point-based value iteration: an anytime algorithm for POMDPs”. In: International Joint Conference on Artificial Intelligence. Vol. 3. 2003, pp. 10251032.Google Scholar
Poor, H. V. and Hadjiliadis, O.. Quickest Detection. Cambridge University Press, 2008.CrossRefGoogle Scholar
Pinedo, M. L.. Scheduling: Theory, Algorithms, and Systems. Springer-Verlag, 2022.CrossRefGoogle Scholar
Polyak, B. T. and Juditsky, A. B.. “Acceleration of stochastic approximation by averaging”. In: SIAM Journal on Control and Optimization 30.4 (July 1992), pp. 838855.CrossRefGoogle Scholar
Pattanayak, K. and Krishnamurthy, V.. “Necessary and sufficient conditions for inverse reinforcement learning of Bayesian stopping time problems”. In: Journal of Machine Learning Research 24.52 (2023), pp. 164.Google Scholar
Pattanayak, K. and Krishnamurthy, V.. “Unifying revealed preference and revealed rational inattention”. In: (2023). eprint: arXiv:2106.14486.Google Scholar
Pattanayak, K., Krishnamurthy, V., and Berry, C. M.. “Meta-cognitive radar. masking cognition from an inverse reinforcement learner”. In: IEEE Transactions on Aerospace and Electronic Systems 59.6 (Dec. 2023), pp. 88268844.CrossRefGoogle Scholar
Pattanayak, K., Krishnamurthy, V., and Jain, A.. “Interpretable deep image classification using rationally inattentive utility maximization”. In: IEEE Journal of Selected Topics in Signal Processing 18 (Apr. 2024), pp. 168183.CrossRefGoogle Scholar
Park, D., Khan, H., and Yener, B.. “Generation & evaluation of adversarial examples for malware obfuscation”. In: IEEE International Conference On Machine Learning And Applications. IEEE. 2019, pp. 12831290.Google Scholar
Platzman, L.. “Optimal infinite-horizon undiscounted control of finite probabilistic systems”. In: SIAM Journal on Control and Optimization 18 (1980), pp. 362380.CrossRefGoogle Scholar
Pollard, D.. Convergence of Stochastic Processes. Springer-Verlag, 2012.Google Scholar
Pollock, S.. “A simple model of search for a moving target”. In: Operations Research 18 (1970), pp. 893903.CrossRefGoogle Scholar
Poor, H. V.. “Quickest detection with exponential penalty for delay”. In: The Annals of Stastistics 26.6 (1998), pp. 21792205.Google Scholar
Pötscher, B. M. and Prucha, I. R.. Dynamic Nonlinear Econometric Models: Asymptotic Theory. Springer-Verlag, 1997.CrossRefGoogle Scholar
Parr, R. and Russell, S.. “Approximating optimal policies for partially observable stochastic domains”. In: International Joint Conference on Artificial Intelligence. Vol. 95. 1995, pp. 10881094.Google Scholar
Prelec, D.. “A Bayesian truth serum for subjective data”. In: Science 306.5695 (2004), pp. 462466.CrossRefGoogle ScholarPubMed
Peskir, G. and Shiryaev, A.. Optimal Stopping and Free-Boundary Problems. Springer, 2006.Google Scholar
Piggott, M. J. and Solo, V.. “Diffusion LMS with correlated regressors i: realization-wise stability”. In: IEEE Transactions on Signal Processing 64.21 (2016), pp. 54735484.CrossRefGoogle Scholar
Papadimitriou, C. H. and Tsitsiklis, J.. “The compexity of Markov decision processes”. In: Mathematics of Operations Research 12.3 (1987), pp. 441450.CrossRefGoogle Scholar
Puterman, M.. Markov Decision Processes. John Wiley, 1994.CrossRefGoogle Scholar
Pastor-Satorras, R. and Vespignani, A.. “Epidemic spreading in scale-free networks”. In: Physical Review Letters 86.14 (2001), p. 3200.CrossRefGoogle ScholarPubMed
Pitman, J. and Yor, M.. “The two-parameter Poisson–Dirichlet distribution derived from a stable subordinator”. In: The Annals of Probability (1997), pp. 855900.Google Scholar
Quah, J. and Strulovici, B.. “Comparative statics, informativeness, and the interval dominance order”. In: Econometrica 77.6 (2009), pp. 19491992.Google Scholar
Quah, J. and Strulovici, B.. “Aggregating the single crossing property”. In: Econometrica 80.5 (2012), pp. 23332348.Google Scholar
Rabiner, L. R.. “A tutorial on hidden Markov models and selected applications in speech recognition”. In: Proceedings of the IEEE 77.2 (1989), pp. 257285.CrossRefGoogle Scholar
Ristic, B., Arulampalam, S., and Gordon, N.. Beyond the Kalman Filter: Particle Filters for Tracking Applications. Artech, 2004.Google Scholar
Raginsky, M.. “Shannon meets Blackwell and Le Cam: channels, codes, and statistical experiments”. In: IEEE International Symposium on Information Theory. IEEE. 2011, pp. 12201224.Google Scholar
Raginsky, M.. “Channel polarization and Blackwell measures”. In: IEEE International Symposium on Information Theory. IEEE. 2016, pp. 5660.Google Scholar
Ratliff, N. D., Bagnell, J. A., and Zinkevich, M. A.. “Maximum margin planning”. In: International Conference on Machine Learning. 2006, pp. 729736.Google Scholar
Robert, C. P. and Casella, G.. Monte Carlo Statistical Methods. Springer-Verlag, 2013.Google Scholar
Reny, P. J.. “A characterization of rationalizable consumer behavior”. In: Econometrica 83.1 (2015), pp. 175192.CrossRefGoogle Scholar
Roy, N., Gordon, G., and Thrun, S.. “Finding approximate POMDP solutions through belief compression”. In: Journal of Artificial Intelligence Research 23 (2005), pp. 140.CrossRefGoogle Scholar
Richter, M. K.. “Revealed preference theory”. In: Econometrica (1966), pp. 635645.CrossRefGoogle Scholar
Riedel, F.. “Dynamic coherent risk measures”. In: Stochastic Processes and their Applications 112.2 (2004), pp. 185200.CrossRefGoogle Scholar
Rieder, U.. “Structural results for partially observed control models”. In: Methods and Models of Operations Research 35.6 (1991), pp. 473490.CrossRefGoogle Scholar
Rahimian, M. A. and Jadbabaie, A.. “Bayesian learning without recall”. In: IEEE Transactions on Signal and Information Processing over Networks 3.3 (2016), pp. 592606.CrossRefGoogle Scholar
Rockafellar, R.. Convex Analysis. Princeton, 1970.CrossRefGoogle Scholar
Ross, S., Pineau, J., Paquet, S., and Chaib-Draa, B.. “Online planning algorithms for POMDPs”. In: Journal of Artificial Intelligence Research 32 (2008), pp. 663704.CrossRefGoogle ScholarPubMed
Ross, S.. Simulation. 5th ed. Academic Press, 2013.Google ScholarPubMed
Rosenblatt, M.. “Remarks on some nonparametric estimates of a density function”. In: The Annals of Mathematical Statistics 27.3 (1956), pp. 832837.CrossRefGoogle Scholar
Ross, S.. “Arbitrary state Markovian decision processes”. In: The Annals of Mathematical Statistics (1968), pp. 21182122.CrossRefGoogle Scholar
Ross, S.. Introduction to Stochastic Dynamic Programming. Academic Press, 1983.Google Scholar
Raginsky, M., Rakhlin, A., and Telgarsky, M.. “Non-convex learning via stochastic gradient Langevin dynamics: a nonasymptotic analysis”. In: Conference on Learning Theory. 2017, pp. 16741703.Google Scholar
Rakhlin, A., Shamir, O., and Sridharan, K.. “Making gradient descent optimal for strongly convex stochastic optimization”. In: arXiv preprint arXiv:1109.5647 (2011).Google Scholar
Rockafellar, R. T. and Uryasev, S.. “Optimization of conditional value-at-risk”. In: Journal of Risk 2 (2000), pp. 2142.CrossRefGoogle Scholar
Rudin, W.. Principles of Mathematical Analysis. McGraw-Hill, 1976.Google Scholar
Ruszczyński, A.. “Risk-averse dynamic programming for Markov decision processes”. In: Mathematical Programming 125.2 (2010), pp. 235261.CrossRefGoogle Scholar
Rust, J.. “Structural estimation of Markov decision processes”. In: Handbook of Econometrics 4 (1994), pp. 30813143.CrossRefGoogle Scholar
Raghavan, V. and Veeravalli, V.. “Bayesian quickest change process detection”. In: ISIT. 2009, pp. 644648.Google Scholar
Rasmussen, C. E. and Williams, C.. Gaussian Processes for Machine Learning. Springer, 2006.Google Scholar
Rothschild, D. M. and Wolfers, J.. “Forecasting elections: voter intentions versus expectations”. In: Available at SSRN 1884644 (2011).Google Scholar
Revuz, D. and Yor, M.. Continuous Martingales and Brownian Motion. Springer, 2013.Google Scholar
Rieder, U. and Zagst, R.. “Monotonicity and bounds for convex stochastic control models”. In: Mathematical Methods of Operations Research 39.2 (June 1994), pp. 187207.CrossRefGoogle Scholar
Samuelson, P.. “A note on the pure theory of consumer’s behaviour”. In: Economica 20.4 (1938), pp. 6171.CrossRefGoogle Scholar
Sayed, A. H.. Adaptive Filters. Wiley, 2008.CrossRefGoogle Scholar
Sayed, A. H.. “Adaptation, learning, and optimization over networks”. In: Foundations and Trends in Machine Learning 7.4–5 (2014), pp. 311801.CrossRefGoogle Scholar
Sutton, R. and Barto, A.. Reinforcement Learning: An Introduction. MIT Press, 2018.Google Scholar
Shani, G., Brafman, R., and Shimony, S.. “Forward search value iteration for POMDPs”. In: IJCAI. 2007, pp. 26192624.Google Scholar
Schulman, J., Levine, S., Abbeel, P., Jordan, M., and Moritz, P.. “Trust region policy optimization”. In: International Conference on Machine Learning. 2015, pp. 18891897.Google Scholar
Seneta, E.. Non-negative Matrices and Markov Chains. Springer-Verlag, 1981.CrossRefGoogle Scholar
Sennott, L. I.. Stochastic Dynamic Programming and the Control of Queueing Systems. Wiley, 1999.Google Scholar
Sethuraman, J.. “A constructive definition of Dirichlet priors”. In: Statistica Sinica (1994), pp. 639650.Google Scholar
Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., and De Freitas, N.. “Taking the human out of the loop: a review of Bayesian optimization”. In: Proceedings of the IEEE 104.1 (2015), pp. 148175.CrossRefGoogle Scholar
Shannon, C. E.. “A note on a partial ordering for communication channels”. In: Information and Control 1.4 (1958), pp. 390397.CrossRefGoogle Scholar
Shiryaev, A. N.. “On optimum methods in quickest detection problems”. In: Theory of Probability and its Applications 8.1 (1963), pp. 2246.CrossRefGoogle Scholar
Silverman, B. W.. Density Estimation for Statistics and Data Analysis. Routledge, 2018.CrossRefGoogle Scholar
Sims, C. A.. “Implications of rational inattention”. In: Journal of Monetary Economics 50.3 (2003), pp. 665690.CrossRefGoogle Scholar
Sims, C. A.. “Rational inattention and monetary economics”. In: Handbook of Monetary Economics. Vol. 3. Elsevier, 2010, pp. 155181.Google Scholar
Singh, S., Jaakkola, T., Littman, M. L., and Szepesvári, C.. “Convergence results for single-step on-policy reinforcement-learning algorithms”. In: Machine Learning 38 (2000), pp. 287308.CrossRefGoogle Scholar
Singh, S. and Krishnamurthy, V.. “The optimal search for a Markovian target when the search path is constrained: the infinite horizon case”. In: IEEE Transactions on Automatic Control 48.3 (Mar. 2003), pp. 487492.CrossRefGoogle Scholar
Simmons, R. and Konig, S.. “Probabilistic navigation in partially observable environments”. In: International Joint Conference on Artificial Intelligence. Morgan Kaufman, 1995, pp. 10801087.Google Scholar
Solo, V. and Kong, X.. Adaptive Signal Processing Algorithms – Stability and Performance. Prentice Hall, 1995.Google Scholar
Snow, L., Krishnamurthy, V., and Sadler, B. M.. “Identifying coordination in a cognitive radar network – A multi-objective inverse reinforcement learning approach”. In: IEEE International Conference on Acoustics, Speech and Signal Processing. 2023, pp. 15.Google Scholar
Smith, J. E. and McCardle, K. F.. “Structural properties of stochastic dynamic programs”. In: Operations Research 50.5 (2002), pp. 796809.CrossRefGoogle Scholar
Sondik, E. J.. “The optimal control of partially observed Markov processes”. PhD thesis. Electrical Engineering, Stanford University, 1971.Google Scholar
Sondik, E. J.. “The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs”. In: Operations Research 26.2 (Mar. 1978), pp. 282304.CrossRefGoogle Scholar
Spall, J.. Introduction to Stochastic Search and Optimization. Wiley, 2003.CrossRefGoogle Scholar
Shani, G., Pineau, J., and Kaplow, R.. “A survey of point-based POMDP solvers”. In: Autonomous Agents and Multi-Agent Systems 27.1 (2013), pp. 151.CrossRefGoogle Scholar
Srinivas, N., Krause, A., Kakade, S. M., and Seeger, M.. “Gaussian process optimization in the bandit setting: no regret and experimental design”. In: International Conference on Machine Learning. 2010, pp. 10151022.Google Scholar
Shafieepoorfard, E., Raginsky, M., and Meyn, S. P.. “Rationally inattentive control of Markov processes”. In: SIAM Journal on Control and Optimization 54.2 (2016), pp. 9871016.CrossRefGoogle Scholar
Smith, T. and Simmons, R.. “Heuristic search value iteration for POMDPs”. In: Conference on Uncertainty in Artificial Intelligence. AUAI Press. 2004, pp. 520527.Google Scholar
Shaked, M. and Shanthikumar, J. G.. Stochastic Orders. Springer-Verlag, 2007.CrossRefGoogle Scholar
Smallwood, R. D. and Sondik, E. J.. “Optimal control of partially observable Markov processes over a finite horizon”. In: Operations Research 21 (1973), pp. 10711088.CrossRefGoogle Scholar
Stone, L.. “What’s happened in search theory since the 1975 Lanchester prize”. In: Operations Research 37.3 (May 1989), pp. 501506.CrossRefGoogle Scholar
Stratonovich, R. L.. “Conditional Markov processes”. In: Theory of Probability and its Applications 5.2 (1960), pp. 156178.CrossRefGoogle Scholar
Strauch, R. E.. “Negative dynamic programming”. In: The Annals of Mathematical Statistics 37.4 (1966), pp. 871890.CrossRefGoogle Scholar
Surowiecki, J.. The Wisdom of Crowds. Anchor, 2005.Google Scholar
Spaan, M. and Vlassis, N.. “Perseus: randomized point-based value iteration for POMDPs”. In: Journal of Artificial Intelligence Research 24 (2005), pp. 195220.CrossRefGoogle Scholar
Shafer, G. and Vovk, V.. “A tutorial on conformal prediction.” In: Journal of Machine Learning Research 9.3 (2008).Google Scholar
Segal, M. and Weinstein, E.. “A new method for evaluating the log-likelihood gradient, the Hessian, and the Fisher information matrix for linear dynamic systems”. In: IEEE Transactions on Information Theory 35.3 (May 1989), pp. 682687.CrossRefGoogle Scholar
Tanner, M. A.. Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions. Springer-Verlag, 1993.CrossRefGoogle Scholar
Tibshirani, R.. “Regression shrinkage and selection via the LASSO”. In: Journal of the Royal Statistical Society. Series B (Methodological) (1996), pp. 267288.CrossRefGoogle Scholar
Tierney, L.. “Markov chains for exploring posterior distributions”. In: The Annals of Statistics (1994), pp. 17011728.Google Scholar
Teh, Y. W. and Jordan, M. I.. “Hierarchical Bayesian nonparametric models with applications”. In: Bayesian Nonparametrics 1 (2010), pp. 158207.CrossRefGoogle Scholar
Tichavsky, P., Muravchik, C. H., and Nehorai, A.. “Posterior Cramér–Rao bounds for discretetime nonlinear filtering”. In: IEEE Transactions on Signal Processing 46.5 (May 1998), pp. 13861396.CrossRefGoogle Scholar
Topkis, D. M.. “Minimizing a submodular function on a lattice”. In: Operations Research 26 (1978), pp. 305321.CrossRefGoogle Scholar
Topkis, D. M.. Supermodularity and Complementarity. Princeton University Press, 1998.Google Scholar
Tartakovsky, A. G. and Veeravalli, V. V.. “General asymptotic Bayesian theory of quickest change detection”. In: Theory of Probability and its Applications 49.3 (2005), pp. 458497.CrossRefGoogle Scholar
Tsitsiklis, J. N. and Van Roy, B.. “Average cost temporal-difference learning”. In: Automatica 35.11 (1999), pp. 17991808.CrossRefGoogle Scholar
Taesup, M. and Weissman, T.. “Universal filtering via hidden Markov modeling”. In: IEEE Transactions on Information Theory 54.2 (2008), pp. 692708.Google Scholar
Vapnik, V. N.. Statistical Learning Theory. Wiley, 1998.Google Scholar
Varian, H.. “Revealed preference and its applications”. In: The Economic Journal 122.560 (2012), pp. 332338.CrossRefGoogle Scholar
Varian, H.. “The nonparametric approach to demand analysis”. In: Econometrica 50.1 (1982), pp. 945973.CrossRefGoogle Scholar
Varian, H.. “Non-parametric tests of consumer behaviour”. In: The Review of Economic Studies 50.1 (1983), pp. 99110.CrossRefGoogle Scholar
Vaswani, A. et al. “Attention is all you need”. In: Advances in Neural Information Processing Systems (2017), pp. 59986008.Google Scholar
Vega-Redondo, F.. Complex Social Networks. Vol. 44. Cambridge University Press, 2007.CrossRefGoogle Scholar
Vershynin, R.. High-Dimensional Probability: An Introduction with Applications in Data Science. Vol. 47. Cambridge University Press, 2018.Google Scholar
Vovk, V., Gammerman, A., and Shafer, G.. Algorithmic Learning in a Random World. Vol. 29. Springer, 2005.Google Scholar
Villani, C.. Optimal Transport: Old and New. Vol. 338. Springer, 2009.CrossRefGoogle Scholar
Visnevski, N., Krishnamurthy, V., Wang, A., and Haykin, S.. “Syntactic modeling and signal processing of multifunction radars: a stochastic context free grammar approach”. In: Proceedings of the IEEE 95.5 (May 2007), pp. 10001025.CrossRefGoogle Scholar
Vives, X.. “How fast do rational agents learn?” In: The Review of Economic Studies 60.2 (1993), pp. 329347.CrossRefGoogle Scholar
Vives, X.. “Learning from others: A welfare analysis”. In: Games and Economic Behavior 20.2 (1997), pp. 177200.CrossRefGoogle Scholar
Wald, A.. “Note on the consistency of the maximum likelihood estimate”. In: The Annals of Mathematical Statistics (1949), pp. 595601.CrossRefGoogle Scholar
Williams, J., Fisher, J., and Willsky, A.. “Approximate dynamic programming for communicationconstrained sensor network management”. In: IEEE Transactions on Signal Processing 55.8 (2007), pp. 43004311.CrossRefGoogle Scholar
White, C. C. and Harrington, D. P.. “Application of Jensen’s inequality to adaptive suboptimal design”. In: Journal of Optimization Theory and Applications 32.1 (1980), pp. 8999.CrossRefGoogle Scholar
Wong, E. and Hajek, B.. Stochastic Processes in Engineering Systems. 2nd ed. Springer-Verlag, 1985.CrossRefGoogle Scholar
Whittle, P.. “A simple condition for regularity in negative programming”. In: Journal of Applied Probability 16.2 (1979), pp. 305318.CrossRefGoogle Scholar
Whittle, P.. “Multi-armed bandits and the Gittins index”. In: Journal of the Royal Statistical Society B 42.2 (1980), pp. 143149.CrossRefGoogle Scholar
Whitt, W.. “Multivariate monotone likelihood ratio and uniform conditional stochastic order”. In: Journal of Applied Probability 19 (1982), pp. 695701.CrossRefGoogle Scholar
Wiener, N.. The Extrapolation, Interpolation and Smoothing of Stationary Time Series. Wiley, 1949.CrossRefGoogle Scholar
Wan, E. and Merwe, R. V. D.. “The unscented Kalman filter for nonlinear estimation”. In: Adaptive Systems for Signal Processing, Communications, and Control Symposium 2000. IEEE. 2000, pp. 153158.CrossRefGoogle Scholar
Wonham, W. M.. “Some applications of stochastic differential equations to optimal nonlinear filtering”. In: SIAM Journal on Control 2.3 (1965), pp. 347369.Google Scholar
Wulfmeier, M., Ondruska, P., and Posner, I.. “Maximum entropy deep inverse reinforcement learning”. In: arXiv preprint arXiv:1507.04888 (2015).Google Scholar
Welling, M. and Teh, Y. W.. “Bayesian learning via stochastic gradient Langevin dynamics”. In: International Conference on Machine Learning. 2011, pp. 681688.Google Scholar
Wu, C. F. J.. “On the convergence properties of the EM algorithm”. In: The Annals of Statistics 11.1 (1983), pp. 95103.CrossRefGoogle Scholar
White, L. B. and Vu, H. X.. “Maximum likelihood sequence estimation for hidden reciprocal processes”. In: IEEE Transactions on Automatic Control 58.10 (2013), pp. 26702674.CrossRefGoogle Scholar
Xie, J. et al. “Social consensus through the influence of committed minorities”. In: Physical Review E 84.1 (2011), p. 011130.CrossRefGoogle ScholarPubMed
Ye, Y.. “The simplex and policy-iteration methods are strongly polynomial for the Markov decision problem with a fixed discount rate”. In: Mathematics of Operations Research 36.4 (2011), pp. 593603.CrossRefGoogle Scholar
Yin, G., Ion, C., and Krishnamurthy, V.. “How does a stochastic optimization/approximation algorithm adapt to a randomly evolving optimum/root with jump Markov sample paths”. In: Mathematical Programming B. (Special Issue dedicated to B.T. Polyak’s 70th Birthday) 120.1 (2009), pp. 6799.Google Scholar
Yin, G.. “On extensions of Polyak’s averaging approach to stochastic approximation”. In: Stochastics and Stochastics Reports 36 (1991), pp. 245264.CrossRefGoogle Scholar
Yin, G. and Krishnamurthy, V.. “LMS algorithms for tracking slow Markov chains with applications to hidden Markov estimation and adaptive multiuser detection”. In: IEEE Transactions on Information Theory 51.7 (July 2005), pp. 24752490.CrossRefGoogle Scholar
Yu, F. and Krishnamurthy, V.. “Optimal joint session admission control in integrated WLAN and CDMA cellular network”. In: IEEE Transactions Mobile Computing 6.1 (Jan. 2007), pp. 126139.CrossRefGoogle Scholar
Yin, G. and Krishnamurthy, V.. “Finite sample and large deviations analysis of stochastic gradient algorithm with correlated noise”. In: arXiv preprint arXiv:2410.08449 (2024).Google Scholar
Yin, G., Krishnamurthy, V., and Ion, C.. “Regime switching stochastic approximation algorithms with application to adaptive discrete stochastic optimization”. In: SIAM Journal on Optimization 14.4 (2004), pp. 1171215.CrossRefGoogle Scholar
Yakir, B., Krieger, A. M., and Pollak, M.. “Detecting a change in regression: First-order optimality”. In: The Annals of Statistics 27.6 (1999), pp. 18961913.Google Scholar
Yu, L., Song, J., and Ermon, S.. “Multi-agent adversarial inverse reinforcement learning”. In: International Conference on Machine Learning. 2019, pp. 71947201.Google Scholar
Yin, G. and Yin, K.. “Passive stochastic approximation with constant step size and window width”. In: IEEE Transactions on Automatic Control 41.1 (1996), pp. 90106.CrossRefGoogle Scholar
Zhao, Q., Tong, L., Swami, A., and Chen, Y.. “Decentralized cognitive MAC for opportunistic spectrum access in ad hoc networks: a POMDP framework”. In: IEEE Journal on Selected Areas Communications (2007), pp. 589600.CrossRefGoogle Scholar
Ziebart, B. D., Maas, A. L., Bagnell, J. A., and Dey, A. K.. “Maximum entropy inverse reinforcement learning.” In: AAAI Conference on Artificial Intelligence. Vol. 8. Chicago, IL, USA. 2008, pp. 14331438.Google Scholar
Ziegler, D. M. et al. “Fine-tuning language models from human preferences”. In: arXiv preprint arXiv:1909.08593v2 (2020).Google Scholar
Zou, F., Yen, G. G., and Zhao, C.. “Dynamic multiobjective optimization driven by inverse reinforcement learning”. In: Information Sciences 575 (2021), pp. 468484.CrossRefGoogle Scholar

Save book to Kindle

To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

  • Bibliography
  • Vikram Krishnamurthy, Cornell University, New York
  • Book: Partially Observed Markov Decision Processes
  • Online publication: 16 May 2025
  • Chapter DOI: https://doi.org/10.1017/9781009449441.034
Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

  • Bibliography
  • Vikram Krishnamurthy, Cornell University, New York
  • Book: Partially Observed Markov Decision Processes
  • Online publication: 16 May 2025
  • Chapter DOI: https://doi.org/10.1017/9781009449441.034
Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

  • Bibliography
  • Vikram Krishnamurthy, Cornell University, New York
  • Book: Partially Observed Markov Decision Processes
  • Online publication: 16 May 2025
  • Chapter DOI: https://doi.org/10.1017/9781009449441.034
Available formats
×