Bibliography

Vikram Krishnamurthy

doi:10.1017/9781009449441.034

Aberdeen, D. and Baxter, J.. “Scaling internal-state policy-gradient methods for POMDPs”. In: International Conference on Machine Learning. 2002, pp. 3–10.Google Scholar

Albert, R. and Barabási, A.-L.. “Statistical mechanics of complex networks”. In: Reviews of Modern Physics 74.1 (2002), p. 47.CrossRef Google Scholar

Abounadi, J., Bertsekas, D. P., and Borkar, V.. “Learning algorithms for Markov decision processes with average cost”. In: SIAM Journal on Control and Optimization 40.3 (2001), pp. 681–698.CrossRef Google Scholar

Arjovsky, M., Chintala, S., and Bottou, L.. “Wasserstein generative adversarial networks”. In: International Conference on Machine Learning. 2017, pp. 214–223.Google Scholar

Auer, P., Cesa-Bianchi, N., and Fischer, P.. “Finite-time analysis of the multiarmed bandit problem”. In: Machine Learning 47.2-3 (2002), pp. 235–256.CrossRef Google Scholar

Afriat, S.. “The construction of utility functions from expenditure data”. In: International Economic Review 8.1 (1967), pp. 67–77.CrossRef Google Scholar

Afriat, S.. Logic of Choice and Economic Theory. Clarendon Press, 1987.CrossRef Google Scholar

Agrawal, S. and Goyal, N.. “Analysis of Thompson sampling for the multi-armed bandit problem”. In: Conference on Learning Theory. Vol. 23. 2012.Google Scholar

Agrawal, S. and Goyal, N.. “Thompson sampling for contextual bandits with linear payoffs”. In: International Conference on Machine Learning. 2013, pp. 127–135.Google Scholar

Agarwal, A. et al. “Taming the monster: a fast and simple algorithm for contextual bandits”. In: International Conference on Machine Learning. 2014, pp. 1638–1646.Google Scholar

Altman, E., Gaujal, B., and Hordijk, A.. Discrete-Event Control of Stochastic Networks: Multimodularity and Regularity. Springer-Verlag, 2004.Google Scholar

Arasaratnam, I. and Haykin, S.. “Cubature Kalman filters”. In: IEEE Transactions on Automatic Control 54.6 (2009), pp. 1254–1269.CrossRef Google Scholar

Albright, S.. “Structural results for partially observed Markov decision processes”. In: Operations Research 27.5 (Sept. 1979), pp. 1041–1053.CrossRef Google Scholar

Alipourfard, N., Nettasinghe, B., Abeliuk, A., Krishnamurthy, V., and Lerman, K.. “Friendship paradox biases perceptions in directed networks”. In: Nature Communications 11.1 (2020), pp. 1–9.CrossRef Google Scholar PubMed

Altman, E.. Constrained Markov Decision Processes. Chapman and Hall, 1999.Google Scholar

Anderson, B. D. O. and Moore, J. B.. Optimal Filtering. Prentice Hall, 1979.Google Scholar

Anderson, B. D. O. and Moore, J. B.. Optimal Control : Linear Quadratic Methods. Prentice-Hall, 1989.Google Scholar

Ambrogioni, L. et al. “Wasserstein variational inference”. In: Advances in Neural Information Processing Systems 31 (2018), pp. 1–11.Google Scholar

Amir, R.. “Supermodularity and complementarity in economics: An elementary survey”. In: Southern Economic Journal 71.3 (2005), pp. 636–660.Google Scholar

Audibert, J., Munos, R., and Szepesvári, C.. “Exploration–exploitation tradeoff using variance estimates in multi-armed bandits”. In: Theoretical Computer Science 410.19 (2009), pp. 1876–1902.CrossRef Google Scholar

Abbeel, P. and Ng, A. Y.. “Apprenticeship learning via inverse reinforcement learning”. In: International Conference on Machine Learning. 2004, p. 1.Google Scholar

Andradottir, S.. “A global search method for discrete stochastic optimization”. In: SIAM Journal on Optimization 6.2 (May 1996), pp. 513–530.CrossRef Google Scholar

Andradottir, S.. “Accelerating the convergence of random search methods for discrete stochastic optimization”. In: ACM Transactions on Modelling and Computer Simulation 9.4 (Oct. 1999), pp. 349–380.CrossRef Google Scholar

Acemoglu, D. and Ozdaglar, A.. “Opinion dynamics and learning in social networks”. In: Dynamic Games and Applications 1.1 (2011), pp. 3–49.CrossRef Google Scholar

Albore, A., Palacios, H., and Geffner, H.. “A translation-based approach to contingent planning”. In: International Joint Conference on Artificial Intelligence. 2009, pp. 1623–1628.Google Scholar

Arapostathis, A., Borkar, V., Fernández-Gaucherand, E., Ghosh, M. K., and Marcus, S. I.. “Discrete-time controlled Markov processes with average cost criterion: A survey”. In: SIAM Journal on Control and Optimization 31.2 (1993), pp. 282–344.CrossRef Google Scholar

Artzner, P., Delbaen, F., Eber, J., Heath, D., and Ku, H.. “Coherent multiperiod risk adjusted values and bellman’s principle”. In: Annals of Operations Research 152.1 (2007), pp. 5–22.CrossRef Google Scholar

Artzner, P., Delbaen, F., Eber, J., and Heath, D.. “Coherent measures of risk”. In: Mathematical Finance 9.3 (July 1999), pp. 203–228.CrossRef Google Scholar

Åström, K. J.. “Optimal control of Markov processes with incomplete state information”. In: Journal of Mathematical Analysis and Applications 10.1 (1965), pp. 174–205.CrossRef Google Scholar

Andersland, M. S. and Teneketzis, D.. “Measurement scheduling for recursive team estimation”. In: Journal of Optimization Theory and Applications 89.3 (June 1996), pp. 615–636.CrossRef Google Scholar

Athey, S.. “Monotone comparative statics under uncertainty”. In: The Quarterly Journal of Economics 117.1 (2002), pp. 187–223.CrossRef Google Scholar

Atar, R. and Zeitouni, O.. “Lyapunov exponents for finite state nonlinear filtering”. In: SIAM Journal on Control and Optimization 35.1 (1997), pp. 36–55.CrossRef Google Scholar

Banerjee, A.. “A simple model of herd behavior”. In: Quaterly Journal of Economics 107.3 (Aug. 1992), pp. 797–817.CrossRef Google Scholar

Barber, R. F., Candes, E. J., Ramdas, A., and Tibshirani, R. J.. “Conformal prediction beyond exchangeability”. In: The Annals of Statistics 51.2 (2023), pp. 816–845.CrossRef Google Scholar

Baum, L. E., Petrie, T., Soules, G., and Weiss, N.. “A maximisation technique occurring in the statistical analysis of probabilistic functions of Markov chains”. In: Annals of Mathematical Statistics 41.1 (1970), pp. 164–171.CrossRef Google Scholar

Bartlett, P. and Baxter, J.. “Estimation and approximation bounds for gradient-based reinforcement learning”. In: Journal of Computer and System Sciences 64.1 (2002), pp. 133–150.CrossRef Google Scholar

Baras, J. S. and Bensoussan, A.. “Optimal sensor scheduling in nonlinear filtering of diffusion processes”. In: SIAM Journal on Control and Optimization 27.4 (July 1989), pp. 786–813.CrossRef Google Scholar

Bubeck, S. and Cesa-Bianchi, N.. “Regret analysis of stochastic and nonstochastic multi-armed bandit problems”. In: arXiv preprint arXiv:1204.5721 (2012).Google Scholar

Brockett, R. W. and Clarke, J. M. C.. “The geometry of the conditional density equation”. In: Analysis and Optimization of Stochastic Systems. Ed. by Jacobs, O. L. R. et al. 1980, pp. 299–309.Google Scholar

Bottou, L., Curtis, F. E., and Nocedal, J.. “Optimization methods for large-scale machine learning”. In: SIAM Review 60.2 (2018), pp. 223–311.CrossRef Google Scholar

Bundfuss, S. and Dür, M.. “Algorithmic copositivity detection by simplicial partition”. In: Linear Algebra and its Applications 428.7 (2008), pp. 1511–1523.CrossRef Google Scholar

Bundfuss, S. and Dür, M.. “An adaptive linear approximation algorithm for copositive programs”. In: SIAM Journal on Optimization 20.1 (2009), pp. 30–53.CrossRef Google Scholar

Boyd, S., Diaconis, P., and Xiao, L.. “Fastest mixing Markov chain on a graph”. In: SIAM Review 46.4 (2004), pp. 667–689.CrossRef Google Scholar

Bellman, R.. Dynamic Programming. Princeton University Press, 1957.Google Scholar PubMed

Benes, V. E.. “Exact finite-dimensional filters for certain diffusions with nonlinear drift”. In: Stochastics 5 (1981), pp. 65–92.CrossRef Google Scholar

Bensoussan, A.. Stochastic Control of Partially Observable Systems. Cambridge University Press, 1992.CrossRef Google Scholar

Bertsekas, D. P.. Nonlinear Programming. Athena Scientific, 2000.Google Scholar

Bertsekas, D. P.. “Dynamic programming and suboptimal control: a survey from ADP to MPC”. In: European Journal of Control 11.4 (2005), pp. 310–334.CrossRef Google Scholar

Bertsekas, D. P.. Dynamic Programming and Optimal Control. Vol. 1 and 2. Athena Scientific, 2017.Google Scholar

Bertsekas, D. P.. Reinforcement Learning and Optimal Control. Athena Scientific, 2019.Google Scholar

Ben-Zvi, T. and Grosfeld-Nir, A.. “Partially observed Markov decision processes with binomial observations”. In: Operations Research Letters 41.2 (2013), pp. 201–206.CrossRef Google Scholar

Bakry, D., Gentil, I., and Ledoux, M.. Analysis and Geometry of Markov Diffusion Operators. Vol. 348. Springer Science & Business Media, 2013.Google Scholar

Banerjee, A., Guo, X., and Wang, H.. “On the optimality of conditional expectation as a Bregman predictor”. In: IEEE Transactions on Information Theory 51.7 (2005), pp. 2664–2669.CrossRef Google Scholar

Booth, J. G. and Hobert, J. P.. “Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm”. In: Journal Royal Statistical Society. B 61 (1999), pp. 265–285.CrossRef Google Scholar

Benaim, M., Hofbauer, J., and Sorin, S.. “Stochastic approximations and differential inclusions”. In: SIAM Journal on Control and Optimization 44.1 (2005), pp. 328–348.CrossRef Google Scholar

Benaim, M., Hofbauer, J., and Sorin, S.. “Stochastic approximations and differential inclusions, Part II: applications”. In: Mathematics of Operations Research 31.3 (2006), pp. 673–695.CrossRef Google Scholar

Bikchandani, S., Hirshleifer, D., and Welch, I.. “A theory of fads, fashion, custom, and cultural change as information cascades”. In: Journal of Political Economy 100.5 (Oct. 1992), pp. 992–1026.CrossRef Google Scholar

Bianchi, L., Dorigo, M., Gambardella, L., and Gutjahr, W.. “A survey on metaheuristics for stochastic combinatorial optimization”. In: Natural Computing: An International Journal 8.2 (2009), pp. 239–287.CrossRef Google Scholar

Billingsley, P.. Statistical Inference for Markov Processes. University of Chicago Press, 1961.Google Scholar

Billingsley, P.. Probability and Measure. Wiley, 1986.Google Scholar

Billingsley, P.. Convergence of Probability Measures. 2nd ed. Wiley, 1999.CrossRef Google Scholar

Blei, D. and Jordan, M.. “Variational inference for dirichlet process mixtures”. In: Bayesian Analysis 1.1 (Mar. 2006), pp. 121–143.CrossRef Google Scholar

Boström, H., Johansson, U., and Löfström, T.. “Mondrian conformal predictive distributions”. In: Symposium on Conformal and Probabilistic Prediction and Applications. Vol. 152. Aug. 2021, pp. 24–38.Google Scholar

Bhatt, S. and Krishnamurthy, V.. “Controlled sequential information fusion with social sensors”. In: IEEE Transactions on Automatic Control 66.12 (2020), pp. 5893–5908.CrossRef Google Scholar

Blei, D. M., Kucukelbir, A., and McAuliffe, J. D.. “Variational inference: a review for statisticians”. In: Journal of the American Statistical Association 112.518 (2017), pp. 859–877.CrossRef Google Scholar

Babaioff, M., Kleinberg, R., and Papadimitriou, C.. “Congestion games with malicious players”. In: ACM Conference on Electronic Commerce. 2007, pp. 103–112.Google Scholar

Bensoussan, A. and Lions, J.. Impulsive Control and Quasi-variational Inequalities. Gauthier-Villars, 1984.Google Scholar

Blackwell, D.. “Comparison of experiments”. In: Proceedings of the 2nd Berkeley Symposium on Mathematical Statistics and Probability. University of California Press. 1951, pp. 93–102.CrossRef Google Scholar

Blackwell, D.. “Equivalent comparisons of experiments”. In: The Annals of Mathematical Statistics (1953), pp. 265–272.CrossRef Google Scholar

Bar-Shalom, Y., Li, X. R., and Kirubarajan, T.. Estimation with Applications to Tracking and Navigation. John Wiley, 2008.Google Scholar

Boucheron, S., Lugosi, G., and Massart, P.. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, 2013.CrossRef Google Scholar

Björk, T. and Murgoci, A.. “A theory of Markovian time-inconsistent stochastic control in discrete time”. In: Finance and Stochastics 18.3 (2014), pp. 545–592.CrossRef Google Scholar

Benveniste, A., Metivier, M., and Priouret, P.. Adaptive Algorithms and Stochastic Approximations. Vol. 22. Applications of Mathematics. Springer-Verlag, 1990.CrossRef Google Scholar

Bordignon, V., Matta, V., and Sayed, A. H.. “Adaptive social learning”. In: IEEE Transactions on Information Theory 67.9 (2021), pp. 6053–6081.CrossRef Google Scholar

Basseville, M. and Nikiforov, I.. Detection of Abrupt Changes – Theory and Applications. Information and System Sciences Series. Prentice Hall, 1993.Google Scholar

Bond, R. et al. “A 61-million-person experiment in social influence and political mobilization”. In: Nature 489 (Sept. 2012), pp. 295–298.CrossRef Google Scholar PubMed

Borkar, V. S.. Stochastic Approximation. A Dynamical Systems Viewpoint. Cambridge University Press, 2008.CrossRef Google Scholar

Bose, S., Orosel, G., Ottaviani, M., and Vesterlund, L.. “Dynamic monopoly pricing and herding”. In: The RAND Journal of Economics 37.4 (2006), pp. 910–928.CrossRef Google Scholar

Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J.. “Distributed optimization and statistical learning via the alternating direction method of multipliers”. In: Foundations and Trends in Machine Learning 3.1 (2011), pp. 1–122.CrossRef Google Scholar

Baum, L. E. and Petrie, T.. “Statistical inference for probabilistic functions of finite state Markov chains”. In: Annals of Mathematical Statistics 37 (1966), pp. 1554–1563.CrossRef Google Scholar

Blackman, S. and Popoli, R.. Design and Analysis of Modern Tracking Systems. Artech House, 1999.Google Scholar

Brunnermeier, M. K., Papakonstantinou, F., and Parker, J. A.. “Optimal time-inconsistent beliefs: Misplanning, procrastination, and commitment”. In: Management Science 63.5 (2017), pp. 1318–1340.CrossRef Google Scholar

Bäuerle, N. and Rieder, U.. “More risk-sensitive Markov decision processes”. In: Mathematics of Operations Research 39.1 (2013), pp. 105–120.CrossRef Google Scholar

Bremaud, P.. Point Processes and Queues. Springer-Verlag, 1981.CrossRef Google Scholar

Bremaud, P.. Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues. Springer-Verlag, 1999.CrossRef Google Scholar

Broderick, T., Boyd, N., Wibisono, A., Wilson, A. C., and Jordan, M. I.. “Streaming variational Bayes”. In: Advances in Neural Information Processing Systems 26 (2013), pp. 1727–1735.Google Scholar

Bertsekas, D. P. and Shreve, S. E.. Stochastic Optimal Control: The Discrete-Time Case. Academic Press, 1978.Google Scholar

Barles, G. and Souganidis, P. E.. “Convergence of approximation schemes for fully nonlinear second order equations”. In: Asymptotic Analysis 4.3 (1991), pp. 271–283.CrossRef Google Scholar

Ben-Tal, A. and Teboulle, M.. “An old-new concept of convex risk measures: the optimized certainty equivalent”. In: Mathematical Finance 17.3 (2007), pp. 449–476.CrossRef Google Scholar

Bénabou, R. and Tirole, J.. “Mindful economics: the production, consumption, and value of beliefs”. In: Journal of Economic Perspectives 30.3 (2016), pp. 141–64.CrossRef Google Scholar

Bertsekas, D. P. and Tsitsiklis, J. N.. “An analysis of stochastic shortest path problems”. In: Mathematics of Operations Research 16.3 (1991), pp. 580–595.CrossRef Google Scholar

Bertsekas, D. P. and Tsitsiklis, J. N.. Neuro-Dynamic Programming. Athena Scientific, 1996.Google Scholar

Boyd, S. and Vandenberghe, L.. Convex Optimization. Cambridge University Press, 2004.CrossRef Google Scholar

Banerjee, T. and Veeravalli, V.. “Data-efficient quickest change detection with on-off observation control”. In: Sequential Analysis 31 (2012), pp. 40–77.CrossRef Google Scholar

Burnaev, E. and Vovk, V.. “Efficiency of conformalized ridge regression”. In: Conference on Learning Theory. 2014, pp. 605–622.Google Scholar

Benaim, M. and Weibull, J.. “Deterministic approximation of stochastic evolution in games”. In: Econometrica 71.3 (2003), pp. 873–903.CrossRef Google Scholar

Bertsekas, D. P. and Yu, H.. “Q-learning and enhanced policy iteration in discounted dynamic programming”. In: Mathematics of Operations Research 37.1 (2012), pp. 66–94.CrossRef Google Scholar

Caines, P. E.. Linear Stochastic Systems. Wiley, 1988.Google Scholar

Cassandra, A. R.. Tony’s POMDP Page. www.cs.brown.edu/research/ai/pomdp.Google Scholar

Cassandra, A. R.. “A survey of POMDP applications”. In: Working Notes of AAAI 1998 Fall Symposium on Planning with Partially Observable Markov Decision Processes. 1998, pp. 17–24.Google Scholar

Cassandra, A. R.. “Exact and approximate algorithms for partially observed Markov decision process”. PhD thesis. Dept. Computer Science, Brown University, 1998.Google Scholar

Cook, J. O. and Barnes, L. W. Jr. “Choice of delay of inevitable shock.” In: Journal of Abnormal and Social Psychology 68.6 (1964), pp. 669–672.CrossRef Google Scholar PubMed

Charpentier, C. J., Bromberg-Martin, E. S., and Sharot, T.. “Valuation of knowledge and ignorance in mesolimbic reward circuitry”. In: Proceedings of the National Academy of Sciences 115.31 (2018), E7255–E7264.CrossRef Google Scholar PubMed

Cao, H., Cohen, S., and Szpruch, L.. “Identifiability in inverse reinforcement learning”. In: Advances in Neural Information Processing Systems 34 (2021), pp. 12362–12373.Google Scholar

Cairoli, R. and Dalang, R. C.. Sequential Stochastic Optimization. John Wiley & Sons, 2011.Google Scholar

Caplin, A. and Dean, M.. “Revealed preference, rational inattention, and costly information acquisition”. In: American Economic Review 105.7 (2015), pp. 2183–2203.CrossRef Google Scholar

Caplin, A., Dean, M., and Leahy, J.. “Rational inattention, optimal consideration sets, and stochastic choice”. In: The Review of Economic Studies 86.3 (2019), pp. 1061–1094.CrossRef Google Scholar

Cherchye, L., De Rock, B., and Vermeulen, F.. “The revealed preference approach to collective consumption behaviour: testing and sharing rule recovery”. In: The Review of Economic Studies 78.1 (2011), pp. 176–198.CrossRef Google Scholar

Cohen, S. N. and Elliott, R. J.. Stochastic Calculus and Applications. Vol. 2. Springer, 2015.CrossRef Google Scholar

Cen, S., Cheng, C., Chen, Y., Wei, Y., and Chi, Y.. “Fast global convergence of natural policy gradient methods with entropy regularization”. In: Operations Research 70.4 (2022), pp. 2563–2578.CrossRef Google Scholar

Cover, T. M. and Hellman, M. E.. “The two-armed-bandit problem with time-invariant finite memory”. In: IEEE Transactions on Information Theory 16.2 (1970), pp. 185–195.CrossRef Google Scholar

Chamley, C.. Rational Herds: Economic Models of Social Learning. Cambridge University Press, 2004.Google Scholar

Chiou, W.. “A note on estimation algebras on nonlinear filtering theory”. In: Systems and Control Letters 28 (1996), pp. 55–63.CrossRef Google Scholar

Chu, W., Li, L., Reyzin, L., and Schapire, R.. “Contextual bandits with linear payoff functions”. In: International Conference on Artificial Intelligence and Statistics. 2011, pp. 208–214.Google Scholar

Choi, J. and Kim, K.. “Inverse reinforcement learning in partially observable environments”. In: Journal of Machine Learning Research 12 (2011), pp. 691–730.Google Scholar

Cassandra, A. R., Kaelbling, L., and Littman, M. L.. “Acting optimally in partially observable stochastic domains”. In: AAAI. Vol. 94. 1994, pp. 1023–1028.Google Scholar

Caplin, A. and Leahy, J.. “Psychological expected utility theory and anticipatory feelings”. In: The Quarterly Journal of Economics 116.1 (2001), pp. 55–79.CrossRef Google Scholar

Cassandras, C. G. and Lafortune, S.. Introduction to Discrete Event Systems. 3rd ed. Springer-Verlag, 2021.CrossRef Google Scholar

Coleman, T. F. and Li, Y.. “An interior trust region approach for nonlinear minimization subject to bounds”. In: SIAM Journal on Optimization 6.2 (1996), pp. 418–445.CrossRef Google Scholar

Clark, J. M. C.. “The design of robust approximations to the stochastic differential equations of nonlinear filtering”. In: Communication Systems and Random Processes Theory, Darlington 1977. Ed. by Skwirzynski, J. K.. Sijthoff and Noordhoff, 1978.Google Scholar

Chen, J., Li, S. E., and Tomizuka, M.. “Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning”. In: IEEE Transactions on Intelligent Transportation Systems 23.6 (2022), pp. 5068–5078.CrossRef Google Scholar

Cassandra, A. R., Littman, M. L., and Zhang, N. L.. “Incremental pruning: A simple fast exact method for partially observed Markov decision processes”. In: Annual Conference on Uncertainty in Artificial Intelligence. 1997.Google Scholar

Caplin, A. and Martin, D.. “A testable theory of imperfect perception”. In: The Economic Journal 125.582 (2015), pp. 184–202.CrossRef Google Scholar

Cappe, O., Moulines, E., and Ryden, T.. Inference in Hidden Markov Models. Springer-Verlag, 2005.CrossRef Google Scholar

Cavus, O. and Ruszczynski, A.. “Risk-averse control of undiscounted transient Markov models”. In: SIAM Journal on Control and Optimization 52.6 (2014), pp. 3935–3966.CrossRef Google Scholar

Cao, Y. and Ross, S.. “The friendship paradox”. In: Mathematical Scientist 41.1 (2016).Google Scholar

Cover, T. M. and Thomas, J. A.. Elements of Information Theory. Wiley-Interscience, 2006.Google Scholar

Candès, E. J. and Tao, T.. “The power of convex relaxation: Near-optimal matrix completion”. In: IEEE Transactions on Information Theory 56.5 (May 2009), pp. 2053–2080.CrossRef Google Scholar

Davis, M. H. A.. “On a multiplicative functional transformation arising in nonlinear filtering theory”. In: Z. Wahrscheinlichkeitstheorie verw. Gebiete 54 (1980), pp. 125–139.CrossRef Google Scholar

Deb, R.. Interdependent Preferences, Potential Games and Household Consumption. MPRA Paper 6818. University Library of Munich, Germany, Jan. 2008.CrossRef Google Scholar

Deb, R.. “A testable model of consumption with externalities”. In: Journal of Economic Theory 144.4 (2009), pp. 1804–1816.CrossRef Google Scholar

Diaconis, P. and Freedman, D.. “On the consistency of Bayes estimates”. In: The Annals of Statistics (1986), pp. 1–26.Google Scholar

Doucet, A., Freitas, N. D., and Gordon, N., eds. Sequential Monte Carlo Methods in Practice. Springer-Verlag, 2001.CrossRef Google Scholar

Dayanik, S. and Goulding, C.. “Detection and identification of an unobservable change in the distribution of a Markov-modulated random sequence”. In: IEEE Transactions on Information Theory 55.7 (2009), pp. 3323–3345.CrossRef Google Scholar

Dorigo, M. and Gambardella, M.. “Ant-Q: A reinforcement learning approach to the traveling salesman problem”. In: International Conference on Machine Learning. 2014, pp. 252–260.Google Scholar

Doucet, A., Godsill, S., and Andrieu, C.. “On sequential Monte-Carlo sampling methods for Bayesian filtering”. In: Statistics and Computing 10 (2000), pp. 197–208.CrossRef Google Scholar

Doucet, A., Gordon, N., and Krishnamurthy, V.. “Particle filters for state estimation of jump Markov linear systems”. In: IEEE Transactions on Signal Processing 49 (2001), pp. 613–624.CrossRef Google Scholar

Diewert, W.. “Afriat’s theorem and some extensions to choice under uncertainty”. In: The Economic Journal 122.560 (2012), pp. 305–331.CrossRef Google Scholar

Diewert, W.. “Afriat and revealed preference theory”. In: The Review of Economic Studies (1973), pp. 419–425.CrossRef Google Scholar

Doucet, A. and Johansen, A. M.. “A tutorial on particle filtering and smoothing: Fifteen years later”. In: Oxford Handbook on Nonlinear Filtering. Ed. by Crisan, D. and Rozovsky, B.. Oxford University Press, 2011.Google Scholar

Dasgupta, A., Kumar, R., and Sivakumar, D.. “Social sampling”. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM. 2012, pp. 235–243.Google Scholar

Derman, C., Lieberman, G. J., and Ross, S. M.. “Optimal system allocations with penalty cost”. In: Management Science 23.4 (Dec. 1976), pp. 399–403.CrossRef Google Scholar

Dempster, A. P., Laird, N. M., and Rubin, D. B.. “Maximum likelihood from incomplete data via the EM algorithm”. In: Journal of the Royal Statistical Society, B 39 (1977), pp. 1–38.CrossRef Google Scholar

van Dyk, D. and Meng, X.. “The art of data augmenation”. In: Journal of Computational and Graphical Statistics 10.1 (2001), pp. 1–50.CrossRef Google Scholar

Dean, M. and Martin, D.. “Measuring rationality with the minimum cost of revealed preference violations”. In: Review of Economics and Statistics 98.3 (2016), pp. 524–534.CrossRef Google Scholar

Douc, R., Moulines, E., and Ryden, T.. “Asymptotic properties of the maximum likelihood estimator in autoregressive models with Markov regime”. In: The Annals of Statistics 32.5 (2004), pp. 2254–2304.CrossRef Google Scholar

Demuynck, T. and Rehbeck, J.. “Computing revealed preference goodness-of-fit measures with integer programming”. In: Economic Theory 76.4 (2023), pp. 1175–1195.CrossRef Google Scholar

Dentcheva, D. and Ruszczyński, A.. Risk-Averse Optimization and Control. Springer, 2024.CrossRef Google Scholar

Denardo, E. and Rothblum, U.. “Optimal stopping, exponential utility, and linear programming”. In: Mathematical Programming 16.1 (1979), pp. 228–244.CrossRef Google Scholar

Dudley, R. M.. “Sample functions of the Gaussian process”. In: The Annals of Probability 1.1 (1973), pp. 66–103.CrossRef Google Scholar

Dynkin, E.. “Controlled random sequences”. In: Theory of Probability & Its Applications 10.1 (1965), pp. 1–14.CrossRef Google Scholar

Eagle, J. N.. “The optimal search for a moving target when the search path is constrained”. In: Operations Research 32 (1984), pp. 1107–1115.CrossRef Google Scholar

Elliott, R. J., Aggoun, L., and Moore, J. B.. Hidden Markov Models – Estimation and Control. Springer-Verlag, 1995.Google Scholar

Easley, D. and Kleinberg, J.. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010.CrossRef Google Scholar

Ethier, S. N. and Kurtz, T. G.. Markov Processes – Characterization and Convergence. Wiley, 1986.CrossRef Google Scholar

Elliott, R. J. and Krishnamurthy, V.. “Exact finite-dimensional filters for maximum likelihood parameter estimation of continuous-time linear Gaussian systems”. In: SIAM Journal on Control and Optimization 35.6 (Nov. 1997), pp. 1908–1923.CrossRef Google Scholar

Elliott, R. J. and Krishnamurthy, V.. “New finite dimensional filters for estimation of discrete-time linear Gaussian models”. In: IEEE Transactions on Automatic Control 44.5 (May 1999), pp. 938–951.CrossRef Google Scholar

Evans, J. and Krishnamurthy, V.. “Hidden Markov model state estimation over a packet switched network”. In: IEEE Transactions on Signal Processing 42.8 (Aug. 1999), pp. 2157–2166.CrossRef Google Scholar

Evans, R., Krishnamurthy, V., and Nair, G.. “Networked sensor management and data rate control for tracking maneuvering targets”. In: IEEE Transactions on Signal Processing 53.6 (June 2005), pp. 1979–1991.CrossRef Google Scholar

Erkin, Z., Bailey, M. D., Maillart, L. M., Schaefer, A. J., and Roberts, M. S.. “Eliciting patients’ revealed preferences: an inverse Markov decision process approach”. In: Decision Analysis 7.4 (2010), pp. 358–365.CrossRef Google Scholar

Fan, W. et al. “Privacy preserving classification on local differential privacy in data centers”. In: Journal of Parallel and Distributed Computing 135 (2020), pp. 70–82.CrossRef Google Scholar

Fontaine, X., Berthet, Q., and Perchet, V.. “Regularized contextual bandits”. In: International Conference on Artificial Intelligence and Statistics. 2019, pp. 2144–2153.Google Scholar

Feld, S. L.. “Why your friends have more friends than you do”. In: American Journal of Sociology 96.6 (1991), pp. 1464–1477.CrossRef Google Scholar

Fernando, T., Denman, S., Sridharan, S., and Fookes, C.. “Deep inverse reinforcement learning for behavior prediction in autonomous driving: accurate forecasts of vehicle motion”. In: IEEE Signal Processing Magazine 38.1 (2020), pp. 87–96.CrossRef Google Scholar

Ferguson, T. S.. “A Bayesian analysis of some nonparametric problems”. In: The Annals of Statistics (1973), pp. 209–230.Google Scholar

Fournier, N. and Guillin, A.. “On the rate of convergence in Wasserstein distance of the empirical measure”. In: Probability Theory and Related Fields 162.3 (2015), pp. 707–738.CrossRef Google Scholar

Fessler, J. A. and Hero, A. O.. “Space–Alternating Generalized Expectation–Maximization algorithm”. In: IEEE Transactions on Signal Processing 42.10 (1994), pp. 2664–2677.CrossRef Google Scholar

Fanaswalla, M. and Krishnamurthy, V.. “Detection of anomalous trajectory patterns in target tracking via stochastic context-free grammars and reciprocal process models”. In: IEEE Journal on Selected Topics Signal Processing 7.1 (Feb. 2013), pp. 76–90.CrossRef Google Scholar

Fanaswala, M. and Krishnamurthy, V.. “Syntactic models for trajectory constrained track-before-detect”. In: IEEE Transactions on Signal Processing 62.23 (2014), pp. 6130–6142.CrossRef Google Scholar

Filar, J., Kallenberg, L., and Lee, H.. “Variance-penalized Markov decision processes”. In: Mathematics of Operations Research 14.1 (1989), pp. 147–161.CrossRef Google Scholar

Fudenberg, D. and Levine, D.. “Consistency and cautious fictitious play”. In: Journal of Economic Dynamics and Control 19.5-7 (1995), pp. 1065–1089.CrossRef Google Scholar

Fudenberg, D. and Levine, D. K.. The Theory of Learning in Games. MIT Press, 1998.Google Scholar

Fu, J., Luo, K., and Levine, S.. “Learning robust rewards with adversarial inverse reinforcement learning”. In: arXiv preprint arXiv:1710.11248 (2017).Google Scholar

Flury, B. D.. “Acceptance–rejection sampling made easy”. In: SIAM Review 32.3 (1990), pp. 474–476.CrossRef Google Scholar

Forges, F. and Minelli, E.. “Afriat’s theorem for general budget sets”. In: Journal of Economic Theory 144.1 (2009), pp. 135–145.CrossRef Google Scholar

Foygel Barber, R., Candes, E. J., Ramdas, A., and Tibshirani, R. J.. “The limits of distribution-free conditional predictive inference”. In: Information and Inference: A Journal of the IMA 10.2 (2021), pp. 455–482.CrossRef Google Scholar

Frazier, P. I.. “A tutorial on Bayesian optimization”. In: arXiv preprint arXiv:1807.02811 (2018).Google Scholar

Fleming, W. H. and Soner, H. M.. Controlled Markov Processes and Viscosity Solutions. Vol. 25. Springer Science & Business Media, 2006.Google Scholar

Fostel, A., Scarf, H., and Todd, M.. “Two new proofs of Afriat’s theorem”. In: Economic Theory 24.1 (2004), pp. 211–219.CrossRef Google Scholar

Fleissig, A. and Whitney, G.. “Testing for the significance of violations of Afriat’s inequalities”. In: Journal of Business & Economic Statistics 23.3 (2005), pp. 355–362.CrossRef Google Scholar

Gantmacher, F. R.. Matrix Theory. Vol. 2. Chelsea Publishing Company, 1960.Google Scholar

Gassiat, E. and Boucherone, S.. “Optimal error exponents in hidden Markov models order estimation”. In: IEEE Transactions on Information Theory 49.4 (2003), pp. 964–980.CrossRef Google Scholar

Goel, P. K. and Ginebra, J.. “When is one experiment ‘always better than’ another?” In: Journal of the Royal Statistical Society: Series D (The Statistician) 52.4 (2003), pp. 515–537.Google Scholar

Ghosh, D.. “Maximum likelihood estimation of the dynamic shock-error model”. In: Journal of Econometrics 41.1 (1989), pp. 121–143.CrossRef Google Scholar

Gittins, J. C.. Multi–armed Bandit Allocation Indices. Wiley, 1989.Google Scholar

Globerson, A. and Jaakkola, T.. “Approximate inference using conditional entropy decompositions”. In: International Conference on Artificial Intelligence and Statistics. 2007, pp. 131–138.Google Scholar

Gharehshiran, O. N., Krishnamurthy, V., and Yin, G.. “Adaptive search algorithms for discrete stochastic optimization: A smooth best-response approach”. In: IEEE Transactions on Automatic Control 62.1 (2017), pp. 161–176.CrossRef Google Scholar

Ghadimi, S. and Lan, G.. “Stochastic first-and zeroth-order methods for nonconvex stochastic programming”. In: SIAM Journal on Optimization 23.4 (2013), pp. 2341–2368.CrossRef Google Scholar

Garivier, A. and Moulines, E.. “On upper-confidence bound policies for switching bandit problems”. In: Algorithmic Learning Theory. Springer. 2011, pp. 174–188.CrossRef Google Scholar

Gelfand, S. B. and Mitter, S. K.. “Recursive stochastic algorithms for global optimization in ℝ^d”. In: SIAM Journal on Control and Optimization 29.5 (1991), pp. 999–1018.CrossRef Google Scholar

Goggin, E. M.. “Convergence of filters with applications to the Kalman–Bucy case”. In: IEEE Transactions on Information Theory 38.3 (1992), pp. 1091–1100.CrossRef Google Scholar

Ganuza, J.-J. and Penalva, J. S.. “Signal orderings based on dispersion and the supply of private information in auctions”. In: Econometrica 78.3 (2010), pp. 1007–1030.Google Scholar

Granovetter, M.. “Threshold models of collective behavior”. In: American Journal of Sociology 83.6 (May 1978), pp. 1420–1443.CrossRef Google Scholar

Grosfeld-Nir, A.. “Control limits for two-state partially observable Markov decision processes”. In: European Journal of Operational Research 182.1 (2007), pp. 300–304.CrossRef Google Scholar

Goel, S. and Salganik, M. J.. “Respondent-driven sampling as Markov chain Monte Carlo”. In: Statistics in Medicine 28 (2009), pp. 2209–2229.CrossRef Google Scholar PubMed

Gordon, N. J., Salmond, D. J., and Smith, A. . M.. “Novel approach to nonlinear/non-Gaussian Bayesian state estimation”. In: IEE Proceedings-F 140.2 (1993), pp. 107–113.Google Scholar

Guo, D., Shamai, S., and Verdú, S.. “Mutual information and minimum mean-square error in Gaussian channels”. In: IEEE Transactions on Information Theory 51.4 (2005), pp. 1261–1282.CrossRef Google Scholar

Ghosal, S. and Van der Vaart, A.. Fundamentals of Nonparametric Bayesian Inference. Vol. 44. Cambridge University Press, 2017.CrossRef Google Scholar

Hamdi, M., Solman, G., Kingstone, A., and Krishnamurthy, V.. “Social learning in a human society: An experimental study”. In: arXiv preprint arXiv:1408.5378 (2014).Google Scholar

Hauskrecht, M.. “Value-function approximations for partially observable Markov decision processes”. In: Journal of Artificial Intelligence Research 13.1 (2000), pp. 33–94.CrossRef Google Scholar

Haykin, S.. “Cognitive radio: Brain-empowered wireless communications”. In: IEEE Journal on Selected Areas Communications 23.2 (Feb. 2005), pp. 201–220.CrossRef Google Scholar

Haykin, S.. “Cognitive radar”. In: IEEE Signal Processing Magazine (Jan. 2006), pp. 30–40.CrossRef Google Scholar

Haykin, S.. Adaptive Filter Theory. 5th ed. Prentice Hall, 2013.Google Scholar

Heckathorn, D. D. and Cameron, C. J.. “Network sampling: from snowball and multiplicity to respondent-driven sampling”. In: Annual Review of Sociology 43.1 (2017), pp. 101–119.CrossRef Google Scholar

Hellman, M. E. and Cover, T. M.. “Learning with finite memory”. In: The Annals of Mathematical Statistics 41.3 (1970), pp. 765–782.CrossRef Google Scholar

Ho, Y.-C. and Cao, X.-R.. Discrete Event Dynamic Systems and Perturbation Analysis. Kluwer Academic, 1991.CrossRef Google Scholar

Hsu, S., . Chuang, and Arapostathis, A.. “On the existence of stationary optimal policies for partially observed MDPs under the long-run average cost criterion”. In: Systems & Control Letters 55.2 (2006), pp. 165–173.CrossRef Google Scholar

Huang, M. and Dey, S.. “Stability of Kalman filtering with Markovian packet losses”. In: Automatica 43.4 (2007), pp. 598–607.CrossRef Google Scholar

Hannan, E. J. and Deistler, M.. The Statistical Theory of Linear Systems. Wiley, 1988.Google Scholar

Heckathorn, D. D.. “Respondent-driven sampling II: deriving valid population estimates from chain-referral samples of hidden populations”. In: Social Problems 49 (2002), pp. 11–34.CrossRef Google Scholar

Heckathorn, D. D.. “Respondent-driven sampling: a new approach to the study of hidden populations”. In: Social Problems 44 (1997), pp. 174–199.CrossRef Google Scholar

Heidergott, B.. Max-Plus Linear Stochastic Systems and Perturbation Analysis. Springer, 2007.Google Scholar

Herman, M., Gindele, T., Wagner, J., Schmitt, F., and Burgard, W.. “Inverse reinforcement learning with simultaneous estimation of rewards and dynamics”. In: International Conference on Artificial Intelligence and Statistics. 2016, pp. 102–110.Google Scholar

Hall, P. and Heyde, C.. Martingale Limit Theory and its Application. Academic Press, 1980.Google Scholar

Horn, R. A. and Johnson, C. R.. Matrix Analysis. Cambridge University Press, 2012.CrossRef Google Scholar

Hoiles, W., Krishnamurthy, V., and Aprem, A.. “PAC algorithms for detecting Nash equilibrium play in social networks: From twitter to energy markets”. In: IEEE Access 4 (2016), pp. 8147–8161.CrossRef Google Scholar

Hoiles, W., Krishnamurthy, V., and Pattanayak, K.. “Rationally inattentive inverse reinforcement learning explains YouTube commenting behavior”. In: Journal of Machine Learning Research 21.170 (2020), pp. 1–39.Google Scholar

Hsu, D., Kakade, S., and Zhang, T.. “A spectral algorithm for learning hidden Markov models”. In: Journal of Computer and System Sciences 78.5 (2012), pp. 1460–1480.CrossRef Google Scholar

Hunter, D. and Lange, K.. “A tutorial on MM algorithms”. In: The American Statistician 58.1 (2004), pp. 30–37.CrossRef Google Scholar

Higham, N. and Lin, L.. “On pth roots of stochastic matrices”. In: Linear Algebra and its Applications 435.3 (2011), pp. 448–463.CrossRef Google Scholar

Hernández-Lerma, O. and Laserre, J. B.. Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer-Verlag, 1996.CrossRef Google Scholar

Handschin, J. E. and Mayne, D. Q.. “Monte Carlo techniques to estimate the conditional expectation in multi-stage non-linear filtering”. In: International Journal of Control 9.5 (1969), pp. 547–559.CrossRef Google Scholar

Hoiles, W., Namvar, O., Krishnamurthy, V., Dao, N., and Zhang, H.. “Adaptive caching in the youtube content distribution network: A revealed preference game-theoretic learning approach”. In: IEEE Transactions on Cognitive Communications and Networking 1.1 (2015), pp. 71–85.CrossRef Google Scholar

Hong, J., Kveton, B., Zaheer, M., and Ghavamzadeh, M.. “Hierarchical Bayesian bandits”. In: International Conference on Artificial Intelligence and Statistics. 2022, pp. 7724–7741.Google Scholar

Howard, R. A.. Dynamic Probabilistic Systems. Vol. 1 and 2. Wiley, 1971.Google Scholar

Hofbauer, J. and Sandholm, W.. “On the global convergence of stochastic fictitious play”. In: Econometrica 70.6 (Nov. 2002), pp. 2265–2294.CrossRef Google Scholar

Heyman, D. P. and Sobel, M. J.. Stochastic Models in Operations Research. Vol. 2. McGraw-Hill, 1984.Google Scholar

Hamilton, J. D. and Susmel, R.. “Autogregressive conditional heteroskedasticity and changes in regime”. In: Journal of Econometrics 64.2 (1994), pp. 307–333.CrossRef Google Scholar

Hansen, O. H. and Torgersen, E. N.. “Comparison of linear normal experiments”. In: The Annals of Statistics (1974), pp. 367–373.Google Scholar

Hastie, T., Tibshirani, R., and Friedman, J.. The Elements of Statistical Learning. Springer-Verlag, 2009.CrossRef Google Scholar

Iida, K.. Studies on the Optimal Search Plan. Vol. 70. Lecture Notes in Statistics. Springer-Verlag, 1990.Google Scholar

Ibars, C., Navarro, M., and Giupponi, L.. “Distributed demand management in smart grid with a congestion game”. In: IEEE International Conference on Smart Grid Communications. 2010, pp. 495–500.Google Scholar

Jackson, M. O.. Social and Economic Networks. Princeton University Press, 2010.CrossRef Google Scholar

Jackson, M. O.. “The friendship paradox and systematic biases in perceptions and social norms”. In: Journal of Political Economy 127.2 (2019), pp. 777–818.CrossRef Google Scholar

Jamison, B.. “Reciprocal processes”. In: Probability Theory and Related Fields 30.1 (1974), pp. 65–86.Google Scholar

Jazwinski, A.. Stochastic Processes and Filtering Theory. Academic Press, 1970.Google Scholar

James, M. R., Baras, J. S., and Elliott, R. J.. “Risk-sensitive control and dynamic games for partially observed discrete-time nonlinear systems”. In: IEEE Transactions on Automatic Control 39.4 (Apr. 1994), pp. 780–792.CrossRef Google Scholar

Jones, B. E. and Edgerton, D. L.. “Testing utility maximization with measurement errors in the data”. In: Measurement Error: Consequences, Applications and Solutions. Emerald Group Publishing Limited, 2009, pp. 199–236.CrossRef Google Scholar

Jeon, W. et al. “Regularized inverse reinforcement learning”. In: arXiv preprint arXiv:2010.03691 (2020).Google Scholar

Jewitt, I.. “Applications of likelihood ratio orderings in economics”. In: Lecture Notes – Monograph Series (1991), pp. 174–189.CrossRef Google Scholar

Johnston, L. and Krishnamurthy, V.. “Opportunistic file transfer over a fading channel - a POMDP search theory formulation with optimal threshold policies”. In: IEEE Transactions on Wireless Commun. 5.2 (Feb. 2006), pp. 394–405.CrossRef Google Scholar

Jain, A. and Krishnamurthy, V.. “Controlling federated learning for covertness”. In: Transactions on Machine Learning Research (2024).Google Scholar

Jain, A. and Krishnamurthy, V.. “Interacting large language model agents. Bayesian social learning based interpretable models.” In: IEEE Access 13 (Feb. 2025), pp. 25465–25504.CrossRef Google Scholar

James, M. R., Krishnamurthy, V., and LeGland, F.. “Time discretization of continuous-time filters and smoothers for HMM parameter estimation”. In: IEEE Transactions on Information Theory 42.2 (Mar. 1996), pp. 593–605.CrossRef Google Scholar

Jordan, R., Kinderlehrer, D., and Otto, F.. “The variational formulation of the Fokker–Planck equation”. In: SIAM journal on Mathematical Analysis 29.1 (1998), pp. 1–17.CrossRef Google Scholar

Jobert, A. and Rogers, L. C. G.. “Valuations and dynamic convex risk measures”. In: Mathematical Finance 18.1 (2008), pp. 1–22.CrossRef Google Scholar

Krishnamurthy, V. and Abad, F. V.. “Gradient based policy optimization of constrained unichain Markov decision processes”. In: Stochastic Processes, Finance and Control: A Festschrift in Honor of Robert J. Elliott. Ed. by Cohen, S., Madan, D., and Siu, T.. http://arxiv.org/abs/1110.4946. World Scientific, 2012.Google Scholar

Krishnamurthy, V., Aprem, A., and Bhatt, S.. “Multiple stopping time POMDPs: Structural results & application in interactive advertising on social media”. In: Automatica 95 (2018), pp. 385– 398.CrossRef Google Scholar

Kailath, T.. Linear Systems. Prentice Hall, 1980.Google Scholar

Kallenberg, O.. Probabilistic Symmetries and Invariance Principles. Vol. 9. Springer, 2005.Google Scholar

Kalman, R. E.. “A new approach to linear filtering and prediction problems”. In: Transactions of the ASME, Series D (Journal of Basic Engineering) 82 (Mar. 1960), pp. 35–45.CrossRef Google Scholar

Kalman, R. E.. “When is a linear control system optimal?” In: Journal of Basic Engineering (Apr. 1964), pp. 51–60.CrossRef Google Scholar

Kamalaruban, P. et al. “Robust reinforcement learning via adversarial training with Langevin dynamics”. In: arXiv preprint arXiv:2002.06063 (2020).Google Scholar

Karlin, S.. Total Positivity. Vol. 1. Stanford University Press, 1968.Google Scholar

Kingma, D. and Ba, J.. “Adam: a method for stochastic optimization”. In: International Conference on Learning Representations (ICLR). 2015.Google Scholar

Krishnamurthy, V. and Bhatt, S.. “Sequential detection of market shocks with risk-averse CVaR social sensors”. In: IEEE Journal of Selected Topics in Signal Processing 10.6 (2016), pp. 1061–1072.CrossRef Google Scholar

Kalman, R. E. and Bucy, R. S.. “New results in linear filtering and prediction theory”. In: Transactions of the ASME, Series D (Journal of Basic Engineering) 83 (Mar. 1961), pp. 95–108.CrossRef Google Scholar

Kushner, H. J. and Clark, D. S.. Stochastic Approximation Methods for Constrained and Uncon- strained Systems. Springer-Verlag, 1978.CrossRef Google Scholar

Korattikara, A., Chen, Y., and Welling, M.. “Austerity in MCMC land: Cutting the Metropolis– Hastings budget”. In: International Conference on Machine Learning. 2014, pp. 181–189.Google Scholar

Krishnamurthy, V. and Djonin, D.. “Structured threshold policies for dynamic sensor scheduling – A partially observed Markov decision process approach”. In: IEEE Transactions on Signal Processing 55.10 (Oct. 2007), pp. 4938–4957.CrossRef Google Scholar

Krishnamurthy, V. and Djonin, D.. “Optimal threshold policies for multivariate POMDPs in radar resource management”. In: IEEE Transactions on Signal Processing 57.10 (2009), pp. 3954–3969.CrossRef Google Scholar

Katsikopoulos, K. V. and Engelbrecht, S. E.. “Markov decision processes with delays and asynchronous cost collection”. In: IEEE Transactions on Automatic Control 48.4 (2003), pp. 568–574.CrossRef Google Scholar

Krishnamurthy, V., Gharehshiran, O. N., and Hamdi, M.. “Interactive sensing and decision making in social networks”. In: Foundations and Trends® in Signal Processing 7.1-2 (2014), pp. 1– 196.CrossRef Google Scholar

Krishnamurthy, V. and Hoiles, W.. “Afriat’s test for detecting malicious agents”. In: IEEE Signal Processing Letters 19.12 (2012), pp. 801–804.CrossRef Google Scholar

Krishnamurthy, V. and Hoiles, W.. “Online reputation and polling systems: Data incest, social learning and revealed preferences”. In: IEEE Transactions Computational Social Systems 1.3 (Jan. 2015), pp. 164–179.CrossRef Google Scholar

Khalil, H. K.. Nonlinear Systems. 3rd ed. Prentice Hall, 2002.Google Scholar

Kurniawati, H., Hsu, D., and Lee, W. S.. “SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces”. In: Robotics: Science and Systems Conference. 2008.Google Scholar

Kijima, M.. Markov Processes for Stochastic Modelling. Chapman and Hall, 1997.CrossRef Google Scholar

Keilson, J. and Kester, A.. “Monotone matrices and monotone Markov processes”. In: Stochastic Processes and their Applications 5.3 (1977), pp. 231–241.CrossRef Google Scholar

Kolmogorov, A. N.. “Interpolation and extrapolation of stationary random sequences”. In: Bull. Acad. Sci. U.S.S.R, Ser. Math. 5 (1941), pp. 3–14.Google Scholar

Kolmogorov, A. N.. “Stationary sequences in Hilbert space”. In: Bull. Math. Univ. Moscow 2.6 (1941).Google Scholar

Komenda, J., Lahaye, S., Boimond, J.-L., and van den Boom, T.. “Max-plus algebra in the history of discrete event systems”. In: Annual Reviews in Control 45 (2018), pp. 240–249.CrossRef Google Scholar

Krishnamurthy, V. and Poor, H. V.. “Social learning and Bayesian games in multiagent signal processing: How do local and global decision makers interact?” In: IEEE Signal Processing Magazine 30.3 (2013), pp. 43–57.CrossRef Google Scholar

Krishnamurthy, V. and Pareek, U.. “Myopic bounds for optimal policy of POMDPs: An extension of Lovejoy’s structural results”. In: Operations Research 62.2 (2015), pp. 428–434.CrossRef Google Scholar

Kloeden, P. E. and Platen, E.. Numerical Solution of Stochastic Differential Equations. Springer, 1992.CrossRef Google Scholar

Kontorovich, L. and Ramanan, K.. “Concentration inequalities for dependent random variables via the martingale method”. In: The Annals of Probability 36.6 (2008), pp. 2126–2158.CrossRef Google Scholar

Krishnamurthy, V. and Rojas, C.. “Reduced complexity HMM filtering with stochastic dominance bounds: A convex optimization approach”. In: IEEE Transactions on Signal Processing 62.23 (2014), pp. 6309–6322.CrossRef Google Scholar

Krishnamurthy, V. and Rangaswamy, M.. “How to calibrate your adversary’s capabilities? Inverse filtering for counter-autonomous systems”. In: IEEE Transactions on Signal Processing 67.24 (2019), pp. 6511–6525.CrossRef Google Scholar

Krishnamurthy, V. and Rojas, C.. “Slow convergence of interacting Kalman filters in word-of-mouth social learning”. In: 60th Annual Allerton Conference on Communication, Control and Computing. IEEE, 2024.Google Scholar

Karlin, S. and Rinott, Y.. “Classes of orderings of measures and related correlation inequalities. I. Multivariate totally positive distributions”. In: Journal of Multivariate Analysis 10.4 (Dec. 1980), pp. 467–498.CrossRef Google Scholar

Krishnamurthy, V., Bitmead, R., Gevers, M., and Miehling, E.. “Sequential detection with mutual information stopping cost: Application in GMTI radar”. In: IEEE Transactions on Signal Processing 60.2 (2012), pp. 700–714.CrossRef Google Scholar

Krishnamurthy, V., Angley, D., Evans, R., and Moran, W.. “Identifying cognitive radars – Inverse reinforcement learning using revealed preferences”. In: IEEE Transactions on Signal Processing 68 (2020), pp. 4529–4542.CrossRef Google Scholar

Krishnamurthy, V., Pattanayak, K., Gogineni, S., Kang, B., and Rangaswamy, M.. “Adversarial radar inference: Inverse tracking, identifying cognition and designing smart interference”. In: IEEE Transactions on Aerospace and Electronic Systems 57.4 (2021), pp. 2067–2081.CrossRef Google Scholar

Krishnamurthy, V.. “Algorithms for optimal scheduling and management of hidden Markov model sensors”. In: IEEE Transactions on Signal Processing 50.6 (June 2002), pp. 1382–1397.CrossRef Google Scholar

Krishnamurthy, V.. “Bayesian sequential detection with phase-distributed change time and non-linear penalty – A lattice programming POMDP approach”. In: IEEE Transactions on Information Theory 57.3 (Oct. 2011), pp. 7096–7124.CrossRef Google Scholar

Krishnamurthy, V.. “Quickest detection POMDPs with social learning: Interaction of local and global decision makers”. In: IEEE Transactions on Information Theory 58.8 (2012), pp. 5563–5587.CrossRef Google Scholar

Krishnamurthy, V.. “How to schedule measurements of a noisy Markov chain in decision making?” In: IEEE Transactions on Information Theory 59.9 (July 2013), pp. 4440–4461.CrossRef Google Scholar

Krishnamurthy, V.. “Convex stochastic dominance in Bayesian localization, filtering and controlled sensing POMDPs”. In: IEEE Transactions on Information Theory 66.5 (2019), pp. 3187–3201.CrossRef Google Scholar

Kuleshov, V. and Schrijvers, O.. “Inverse game theory: learning utilities in succinct games”. In: International Conference on Web and Internet Economics. Springer. 2015, pp. 413–427.CrossRef Google Scholar

Karatzas, I. and Shreve, S.. Brownian Motion and Stochastic Calculus. 2nd ed. Springer, 1991.Google Scholar

Karatzas, I. and Shreve, S.. Methods of Mathematical Finance. Vol. 39. Springer, 1998.Google Scholar

Kahneman, D. and Tversky, A.. “Prospect theory: An analysis of decision under risk”. In: Econometrica 47.2 (1979), pp. 263–291.CrossRef Google Scholar

Karlin, S. and Taylor, H. M.. A Second Course in Stochastic Processes. Academic Press, 1981.Google Scholar

Kunsch, H. R.. “Recursive Monte Carlo filters: Algorithms and theoretical analysis”. In: The Annals of Statistics 33.5 (2005), pp. 1983–2021.CrossRef Google Scholar

Kurniawati, H.. “Partially observable Markov decision processes and robotics”. In: Annual Review of Control, Robotics, and Autonomous Systems 5 (2022), pp. 253–277.CrossRef Google Scholar

Kurtz, T. G.. Approximation of Population Processes. Vol. 36. SIAM, 1981.CrossRef Google Scholar

Kushner, H. J.. “A new method of locating the maximum point of an arbitrary multi-peak curve in the presence of noise”. In: Journal of Fluids Engineering 86.1 (1964), pp. 97–106.Google Scholar

Kushner, H. J.. “Dynamical equations for optimal nonlinear filtering”. In: Journal of Differential Equations 3 (1967), pp. 179–190.CrossRef Google Scholar

Kushner, H. J.. “A robust discrete state approximation to the optimal nonlinear filter for a diffusion”. In: Stochastics 3.2 (1979), pp. 75–83.CrossRef Google Scholar

Kushner, H. J.. Approximation and Weak Convergence Methods for Random Processes, with Applications to Stochastic Systems Theory. MIT Press, 1984.Google Scholar

Kumar, P. R. and Varaiya, P.. Stochastic Systems – Estimation, Identification and Adaptive Control. Prentice-Hall, 1986.Google Scholar

Krishnamurthy, V. and Wahlberg, B.. “POMDP multiarmed bandits – Structural results”. In: Mathematics of Operations Research 34.2 (May 2009), pp. 287–302.CrossRef Google Scholar

Kingma, D. P. and Welling, M.. “An introduction to variational autoencoders”. In: Foundations and Trends® in Machine Learning 12.4 (2019), pp. 307–392.CrossRef Google Scholar

Kwon, M., Daptardar, S., Schrater, P., and Pitkow, X.. “Inverse rational control with partially observable continuous nonlinear dynamics”. In: arXiv preprint arXiv:2009.12576 (2020).Google Scholar PubMed

Kushner, H. J. and Yin, G.. Stochastic Approximation Algorithms and Recursive Algorithms and Applications. 2nd ed. Springer-Verlag, 2003.Google Scholar

Krishnamurthy, V. and Yin, G.. “Langevin dynamics for adaptive inverse reinforcement learning of stochastic gradient algorithms”. In: Journal of Machine Learning Research 22 (2021), pp. 1–49.Google Scholar

Krishnamurthy, V. and Yin, G.. “Multikernel passive stochastic gradient algorithms and transfer learning”. In: IEEE Transactions on Automatic Control 67.4 (2022), pp. 1792–1805.CrossRef Google Scholar

Lin, X., Adams, S. C., and Beling, P. A.. “Multi-agent inverse reinforcement learning for certain general-sum stochastic games”. In: Journal of Artificial Intelligence Research 66 (2019), pp. 473–502.CrossRef Google Scholar

Li, T., Bolic, M., and Djuric, P. M.. “Resampling methods for particle filtering: classification, implementation, and strategies”. In: IEEE Signal Processing Magazine 32.3 (2015), pp. 70–86.CrossRef Google Scholar

Levine, R. and Casella, G.. “Implementations of the Monte Carlo EM algorithm”. In: Journal of Computational and Graphical Statistics 10.3 (Sept. 2001), pp. 422–439.CrossRef Google Scholar

Liu, J. S. and Chen, R.. “Sequential Monte Carlo methods for dynamic systems”. In: Journal American Statistical Association 93 (1998), pp. 1032–1044.CrossRef Google Scholar

Littman, M., Cassandra, A. R., and Kaelbling, L.. “Learning policies for partially observable environments: scaling up”. In: International Conference on Machine Learning. 1995, pp. 362–370.Google Scholar

Le Cam, L.. “Comparison of experiments: A short review”. In: Lecture Notes – Monograph Series (1996), pp. 127–138.CrossRef Google Scholar

Lee, K. et al. “Generalized Tsallis entropy reinforcement learning and its application to soft mobile robots.” In: Robotics: Science and Systems. Vol. 16. 2020, pp. 1–10.Google Scholar

Lei, J., G’Sell, M., Rinaldo, A., Tibshirani, R. J., and Wasserman, L.. “Distribution-free predictive inference for regression”. In: Journal of the American Statistical Association 113.523 (2018), pp. 1094–1111.CrossRef Google Scholar

Lei, J.. “Classification with confidence”. In: Biometrika 101.4 (2014), pp. 755–769.CrossRef Google Scholar

Leroux, B. G.. “Maximum-likelihood estimation for hidden Markov models”. In: Stochastic Processes and its Applications 40 (1992), pp. 127–143.CrossRef Google Scholar

Logothetis, A. and Isaksson, A.. “On sensor scheduling via information theoretic criteria”. In: American Control Conference. 1999, pp. 2402–2406.Google Scholar

Lindvall, T.. Lectures on the Coupling Method. Courier Dover Publications, 2002.Google Scholar

Littman, M. L.. “A tutorial on partially observable Markov decision processes”. In: Journal of Mathematical Psychology 53.3 (2009), pp. 119–125.CrossRef Google Scholar

Littman, M. L.. “Algorithms for sequential decision making”. PhD thesis. Brown University, 1996.Google Scholar

Liu, J. S.. Monte Carlo Strategies in Scientific Computing. Springer-Verlag, 2001.Google Scholar

Ljung, L.. “Analysis of recursive stochastic algorithms”. In: IEEE Transactions on Automatic Control AC-22.4 (1977), pp. 551–575.CrossRef Google Scholar

Ljung, L.. System Identification. 2nd ed. Prentice Hall, 1999.Google Scholar

Levine, S. and Koltun, V.. “Continuous inverse optimal control with locally optimal examples”. In: arXiv preprint arXiv:1206.4617 (2012).Google Scholar

LeGland, F. and Mevel, L.. “Exponential forgetting and geometric ergodicity in hidden Markov models”. In: Mathematics of Controls, Signals and Systems 13.1 (2000), pp. 63–93.CrossRef Google Scholar

Lobel, I., Acemoglu, D., Dahleh, M., and Ozdaglar, A.. “Preliminary results on social learning with partial observations”. In: International Conference on Performance Evaluation Methodolgies and Tools. ACM, 2007.Google Scholar

López-Pintado, D.. “Diffusion in complex social networks”. In: Games and Economic Behavior 62.2 (2008), pp. 573–590.CrossRef Google Scholar

Louis, T.. “Finding the observed information matrix when using the EM algorithm”. In: Journal of the Royal Statistical Society 44(B) (1982), pp. 226–233.CrossRef Google Scholar

Lovejoy, W. S.. “On the convexity of policy regions in partially observed systems”. In: Operations Research 35.4 (July 1987), pp. 619–621.CrossRef Google Scholar

Lovejoy, W. S.. “Ordered solutions for dynamic programs”. In: Mathematics of Operations Research 12.2 (1987), pp. 269–276.CrossRef Google Scholar

Lovejoy, W. S.. “Some monotonicity results for partially observed Markov decision processes”. In: Operations Research 35.5 (Sept. 1987), pp. 736–743.CrossRef Google Scholar

Lovejoy, W. S.. “A survey of algorithmic methods for partially observed Markov decision processes”. In: Annals of Operations Research 28 (1991), pp. 47–66.CrossRef Google Scholar

Lovejoy, W. S.. “Computationally feasible bounds for partially observed Markov decision processes”. In: Operations Research 39.1 (Jan. 1991), pp. 162–175.CrossRef Google Scholar

Lai, T. and Robbins, H.. “Asymptotically efficient adaptive allocation rules”. In: Advances in Applied Mathematics 6.1 (1985), pp. 4–22.CrossRef Google Scholar

Liu, C. and Rubin, D.. “The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence”. In: Biometrica 81.4 (1994), pp. 633–648.CrossRef Google Scholar

Lehmann, E. L., Romano, J. P., and Casella, G.. Testing Statistical Hypotheses. Vol. 3. Springer, 2005.Google Scholar

Ljung, L. and Söderström, T.. Theory and Practice of Recursive Identification. MIT Press, 1983.Google Scholar

Ledoux, M. and Talagrand, M.. Probability in Banach Spaces: Isoperimetry and Processes. Vol. 23. Springer Science & Business Media, 1991.CrossRef Google Scholar

Luenberger, D.. Optimization by Vector Space Methods. Wiley, 1969.Google Scholar

Liu, Z. and Vandenberghe, L.. “Interior-point method for nuclear norm approximation with application to system identification”. In: SIAM Journal on Matrix Analysis and Applications 31.3 (2009), pp. 1235–1256.CrossRef Google Scholar

Luenberger, D. and Ye, Y.. Linear and Nonlinear Programming. 4th ed. Springer, 2016.CrossRef Google Scholar

Liu, K. and Zhao, Q.. “Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access”. In: IEEE Transactions on Information Theory 56.11 (2010), pp. 5547–5567.CrossRef Google Scholar

Markowitz, H.. “Portfolio selection”. In: The Journal of Finance 7.1 (1952), pp. 77–91.Google Scholar

Marcus, S. I.. “Algebraic and geometric methods in nonlinear filtering”. In: SIAM Journal on Control and Optimization 22.6 (Nov. 1984), pp. 817–844.CrossRef Google Scholar

Mas-Colell, A.. “On revealed preference analysis”. In: The Review of Economic Studies (1978), pp. 121–131.CrossRef Google Scholar

Mattila, R., Rojas, C. R., Krishnamurthy, V., and Wahlberg, B.. “Computing monotone policies for Markov decision processes: A nearly-isotonic penalty approach”. In: IFAC-PapersOnLine 50.1 (2017), pp. 8429–8434.CrossRef Google Scholar

Mattila, R., Rojas, C., Moulines, E., Krishnamurthy, V., and Wahlberg, B.. “Fast and consistent learning of hidden Markov models by incorporating non-consecutive correlations”. In: International Conference on Machine Learning. Vol. 119. 13–18 Jul 2020, pp. 6785–6796.Google Scholar

Mattila, R., Rojas, C. R., Krishnamurthy, V., and Wahlberg, B.. “Inverse filtering for hidden Markov models with applications to counter-adversarial autonomous systems”. In: IEEE Transactions on Signal Processing 68 (Aug. 2020), pp. 4987–5002.CrossRef Google Scholar

Mayne, D. Q., Rawlings, J. B., Rao, C. V., and Scokaert, P.. “Constrained model predictive control: stability and optimality”. In: Automatica 36.6 (2000), pp. 789–814.CrossRef Google Scholar

Moulines, E. and Bach, F.. “Non-asymptotic analysis of stochastic approximation algorithms for machine learning”. In: Advances in Neural Information Processing Systems 24 (2011), pp. 451–459.Google Scholar

McFadden, D.. “Economic choices”. In: American Economic Review 91.3 (2001), pp. 351–378.CrossRef Google Scholar

Meng, X. L.. “On the rate of convergence of the ECM algorithm”. In: The Annals of Statistics 22.1 (1994), pp. 326–339.CrossRef Google Scholar

Mohler, R. and Hwang, C.. “Nonlinear data observability and information”. In: Journal of Franklin Institute 325.4 (1988), pp. 443–464.CrossRef Google Scholar

Milgrom, P.. “Good news and bad news: Representation theorems and applications”. In: Bell Journal of Economics 12.2 (1981), pp. 380–391.CrossRef Google Scholar

MacPhee, I. and Jordan, B.. “Optimal search for a moving target”. In: Probability in the Engineering and Information Sciences 9 (1995), pp. 159–182.CrossRef Google Scholar

McLachlan, G. J. and Krishnan, T.. The EM Algorithm and Extensions. Wiley, 1996.Google Scholar

Matějka, F. and McKay, A.. “Rational inattention to discrete choices: a new foundation for the multinomial logit model”. In: American Economic Review 105.1 (2015), pp. 272–298.CrossRef Google Scholar

Miller, S. M. and Mangan, C. E.. “Interacting effects of information and coping style in adapting to gynecologic stress: Should the doctor tell all?” In: Journal of Personality and Social Psychology 45.1 (1983), pp. 223–236.CrossRef Google Scholar PubMed

MacEachern, S. N. and Müller, P.. “Estimating mixture of Dirichlet process models”. In: Journal of Computational and Graphical Statistics 7.2 (1998), pp. 223–238.CrossRef Google Scholar

Maćkowiak, B., Matějka, F., and Wiederholt, M.. “Rational inattention: a review”. In: Journal of Economic Literature 61.1 (2023), pp. 226–273.CrossRef Google Scholar

Molloy, T. L. and Nair, G. N.. “Smoother entropy for active state trajectory estimation and obfuscation in POMDPs”. In: IEEE Transactions on Automatic Control 68.6 (2023), pp. 3557–3572.CrossRef Google Scholar

Mnih, V. et al. “Human-level control through deep reinforcement learning”. In: Nature 518.7540 (2015), pp. 529–533.CrossRef Google Scholar PubMed

Monahan, G. E.. “A survey of partially observable Markov decision processes: theory, models and algorithms”. In: Management Science 28.1 (Jan. 1982), pp. 1–16.CrossRef Google Scholar

Moral, P. D.. Feynman–Kac Formulae – Genealogical and Interacting Particle Systems with Applications. Springer-Verlag, 2004.Google Scholar

Moustakides, G. B.. “Optimal stopping times for detecting changes in distributions”. In: Annals of Statistics 14 (1986), pp. 1379–1387.CrossRef Google Scholar

Meier, L., Perschon, J., and Dressler, R.. “Optimal control of measurement subsystems”. In: IEEE Transactions on Automatic Control 12.5 (Oct. 1967), pp. 528–536.CrossRef Google Scholar

Muller, A. and Stoyan, D.. Comparison Methods for Stochastic Models and Risk. Wiley, 2002.Google Scholar

Milgrom, P. and Shannon, C.. “Monotone comparative statics”. In: Econometrica 62.1 (1994), pp. 157–180.CrossRef Google Scholar

Monderer, D. and Shapley, L. S.. “Potential games”. In: Games and Economic Behavior 14.1 (1996), pp. 124–143.CrossRef Google Scholar

Manning, C. D. and Schütze, H.. Foundations of Statistical Natural Language Processing. The MIT Press, 1999.Google Scholar

Meyn, S. P. and Tweedie, R. L.. Markov Chains and Stochastic Stability. Cambridge University Press, 2009.CrossRef Google Scholar

Makino, T. and Takeuchi, J.. “Apprenticeship learning for model parameters of partially observable environments”. In: arXiv preprint arXiv:1206.6484 (2012).Google Scholar

Molavi, P., Tahbaz-Salehi, A., and Jadbabaie, A.. “A theory of non-Bayesian social learning”. In: Econometrica 86.2 (2018), pp. 445–490.CrossRef Google Scholar

Muller, A.. “How does the value function of a Markov decision process depend on the transition probabilities?” In: Mathematics of Operations Research 22 (1997), pp. 872–885.CrossRef Google Scholar

Marcus, S. I. and Willsky, A. S.. “Algebraic structure and finite dimensional nonlinear estimation”. In: SIAM Journal on Mathematical Analysis 9.2 (Apr. 1978), pp. 312–327.CrossRef Google Scholar

Nakai, T.. “The problem of optimal stopping in a partially observable Markov chain”. In: Journal of Optimization Theory and Applications 45.3 (1985), pp. 425–442.CrossRef Google Scholar

Natarajan, S. et al. “Multi-agent inverse reinforcement learning”. In: International Conference on Machine Learning. IEEE. 2010, pp. 395–400.Google Scholar

Neal, R. M.. “Markov chain sampling methods for dirichlet process mixture models”. In: Journal of Computational and Graphical Statistics 9.2 (2000), pp. 249–265.CrossRef Google Scholar

Nettasinghe, B., Alipourfard, N., Iota, S., Krishnamurthy, V., and Lerman, K.. “Scale-free degree distributions, homophily and the glass ceiling effect in directed networks”. In: Journal of Complex Networks 10.2 (2022).Google Scholar

Nettasinghe, B., Chatterjee, S., Tipireddy, R., and Halappanavar, M.. “Extending conformal prediction to hidden Markov models with exact validity via de Finetti’s theorem for Markov chains”. In: International Conference on Machine Learning. 2023.Google Scholar

Neuts, M. F.. Structured Stochastic Matrices of M/G/1 Type and Their Applications. Marcel Dekker, 1989.Google Scholar

Neyman, A.. “Correlated equilibrium and potential games”. In: International Journal of Game Theory 26.2 (1997), pp. 223–227.CrossRef Google Scholar

Ng, A. and Jordan, M.. “PEGASUS: A policy search method for large MDPs and POMDPs”. In: Conference on Uncertainty in Artificial Intelligence. 2000, pp. 406–415.Google Scholar

Neu, G., Jonsson, A., and Gómez, V.. “A unified view of entropy-regularized Markov decision processes”. In: arXiv preprint arXiv:1705.07798 (2017).Google Scholar

Ngo, M. H. and Krishnamurthy, V.. “Monotonicity of constrained optimal transmission policies in correlated fading channels with ARQ”. In: IEEE Transactions on Signal Processing 58.1 (2010), pp. 438–451.CrossRef Google Scholar

Nettasinghe, B. and Krishnamurthy, V.. “What do your friends think?: Efficient polling methods for networks using friendship paradox”. In: IEEE Transactions on Knowledge and Data Engineering (2019).CrossRef Google Scholar

Nishimura, H., Ok, E. A., and Quah, J. K.-H.. “A comprehensive approach to revealed preference theory”. In: American Economic Review 107.4 (2017), pp. 1239–63.CrossRef Google Scholar

Nazin, A. V., Polyak, B. T., and Tsybakov, A. B.. “Passive stochastic approximation”. In: Automation and Remote Control 50 (1989), pp. 1563–1569.Google Scholar

Ng, A. and Russell, S.. “Algorithms for inverse reinforcement learning”. In: International Conference on Machine Learning. 2000, pp. 663–670.Google Scholar

Neufeld, A. and Sester, J.. “Robust Q-learning algorithm for Markov decision processes under Wasserstein uncertainty”. In: Automatica 168 (2024), p. 111825.CrossRef Google Scholar

Ottaviani, M. and Sørensen, P.. “Information aggregation in debate: who should speak first?” In: Journal of Public Economics 81.3 (2001), pp. 393–421.CrossRef Google Scholar

Pardoux, E.. “Equations du filtrage nonlineaire de la prediction et du lissage”. In: Stochastics 6 (1982), pp. 193–231.CrossRef Google Scholar

Patek, S.. “On partially observed stochastic shortest path problems”. In: IEEE Conference on Decision and Control. 2001, pp. 5050–5055.CrossRef Google Scholar

Pflug, G.. Optimization of Stochastic Models: The Interface between Simulation and Optimization. Kluwer Academic Publishers, 1996.CrossRef Google Scholar

Pineau, J., Gordon, G., and Sebastian, T.. “Point-based value iteration: an anytime algorithm for POMDPs”. In: International Joint Conference on Artificial Intelligence. Vol. 3. 2003, pp. 1025–1032.Google Scholar

Poor, H. V. and Hadjiliadis, O.. Quickest Detection. Cambridge University Press, 2008.CrossRef Google Scholar

Pinedo, M. L.. Scheduling: Theory, Algorithms, and Systems. Springer-Verlag, 2022.CrossRef Google Scholar

Polyak, B. T. and Juditsky, A. B.. “Acceleration of stochastic approximation by averaging”. In: SIAM Journal on Control and Optimization 30.4 (July 1992), pp. 838–855.CrossRef Google Scholar

Pattanayak, K. and Krishnamurthy, V.. “Necessary and sufficient conditions for inverse reinforcement learning of Bayesian stopping time problems”. In: Journal of Machine Learning Research 24.52 (2023), pp. 1–64.Google Scholar

Pattanayak, K. and Krishnamurthy, V.. “Unifying revealed preference and revealed rational inattention”. In: (2023). eprint: arXiv:2106.14486.Google Scholar

Pattanayak, K., Krishnamurthy, V., and Berry, C. M.. “Meta-cognitive radar. masking cognition from an inverse reinforcement learner”. In: IEEE Transactions on Aerospace and Electronic Systems 59.6 (Dec. 2023), pp. 8826–8844.CrossRef Google Scholar

Pattanayak, K., Krishnamurthy, V., and Jain, A.. “Interpretable deep image classification using rationally inattentive utility maximization”. In: IEEE Journal of Selected Topics in Signal Processing 18 (Apr. 2024), pp. 168–183.CrossRef Google Scholar

Park, D., Khan, H., and Yener, B.. “Generation & evaluation of adversarial examples for malware obfuscation”. In: IEEE International Conference On Machine Learning And Applications. IEEE. 2019, pp. 1283–1290.Google Scholar

Platzman, L.. “Optimal infinite-horizon undiscounted control of finite probabilistic systems”. In: SIAM Journal on Control and Optimization 18 (1980), pp. 362–380.CrossRef Google Scholar

Pollard, D.. Convergence of Stochastic Processes. Springer-Verlag, 2012.Google Scholar

Pollock, S.. “A simple model of search for a moving target”. In: Operations Research 18 (1970), pp. 893–903.CrossRef Google Scholar

Poor, H. V.. “Quickest detection with exponential penalty for delay”. In: The Annals of Stastistics 26.6 (1998), pp. 2179–2205.Google Scholar

Pötscher, B. M. and Prucha, I. R.. Dynamic Nonlinear Econometric Models: Asymptotic Theory. Springer-Verlag, 1997.CrossRef Google Scholar

Parr, R. and Russell, S.. “Approximating optimal policies for partially observable stochastic domains”. In: International Joint Conference on Artificial Intelligence. Vol. 95. 1995, pp. 1088–1094.Google Scholar

Prelec, D.. “A Bayesian truth serum for subjective data”. In: Science 306.5695 (2004), pp. 462–466.CrossRef Google Scholar PubMed

Peskir, G. and Shiryaev, A.. Optimal Stopping and Free-Boundary Problems. Springer, 2006.Google Scholar

Piggott, M. J. and Solo, V.. “Diffusion LMS with correlated regressors i: realization-wise stability”. In: IEEE Transactions on Signal Processing 64.21 (2016), pp. 5473–5484.CrossRef Google Scholar

Papadimitriou, C. H. and Tsitsiklis, J.. “The compexity of Markov decision processes”. In: Mathematics of Operations Research 12.3 (1987), pp. 441–450.CrossRef Google Scholar

Puterman, M.. Markov Decision Processes. John Wiley, 1994.CrossRef Google Scholar

Pastor-Satorras, R. and Vespignani, A.. “Epidemic spreading in scale-free networks”. In: Physical Review Letters 86.14 (2001), p. 3200.CrossRef Google Scholar PubMed

Pitman, J. and Yor, M.. “The two-parameter Poisson–Dirichlet distribution derived from a stable subordinator”. In: The Annals of Probability (1997), pp. 855–900.Google Scholar

Quah, J. and Strulovici, B.. “Comparative statics, informativeness, and the interval dominance order”. In: Econometrica 77.6 (2009), pp. 1949–1992.Google Scholar

Quah, J. and Strulovici, B.. “Aggregating the single crossing property”. In: Econometrica 80.5 (2012), pp. 2333–2348.Google Scholar

Rabiner, L. R.. “A tutorial on hidden Markov models and selected applications in speech recognition”. In: Proceedings of the IEEE 77.2 (1989), pp. 257–285.CrossRef Google Scholar

Ristic, B., Arulampalam, S., and Gordon, N.. Beyond the Kalman Filter: Particle Filters for Tracking Applications. Artech, 2004.Google Scholar

Raginsky, M.. “Shannon meets Blackwell and Le Cam: channels, codes, and statistical experiments”. In: IEEE International Symposium on Information Theory. IEEE. 2011, pp. 1220–1224.Google Scholar

Raginsky, M.. “Channel polarization and Blackwell measures”. In: IEEE International Symposium on Information Theory. IEEE. 2016, pp. 56–60.Google Scholar

Ratliff, N. D., Bagnell, J. A., and Zinkevich, M. A.. “Maximum margin planning”. In: International Conference on Machine Learning. 2006, pp. 729–736.Google Scholar

Robert, C. P. and Casella, G.. Monte Carlo Statistical Methods. Springer-Verlag, 2013.Google Scholar

Reny, P. J.. “A characterization of rationalizable consumer behavior”. In: Econometrica 83.1 (2015), pp. 175–192.CrossRef Google Scholar

Roy, N., Gordon, G., and Thrun, S.. “Finding approximate POMDP solutions through belief compression”. In: Journal of Artificial Intelligence Research 23 (2005), pp. 1–40.CrossRef Google Scholar

Richter, M. K.. “Revealed preference theory”. In: Econometrica (1966), pp. 635–645.CrossRef Google Scholar

Riedel, F.. “Dynamic coherent risk measures”. In: Stochastic Processes and their Applications 112.2 (2004), pp. 185–200.CrossRef Google Scholar

Rieder, U.. “Structural results for partially observed control models”. In: Methods and Models of Operations Research 35.6 (1991), pp. 473–490.CrossRef Google Scholar

Rahimian, M. A. and Jadbabaie, A.. “Bayesian learning without recall”. In: IEEE Transactions on Signal and Information Processing over Networks 3.3 (2016), pp. 592–606.CrossRef Google Scholar

Rockafellar, R.. Convex Analysis. Princeton, 1970.CrossRef Google Scholar

Ross, S., Pineau, J., Paquet, S., and Chaib-Draa, B.. “Online planning algorithms for POMDPs”. In: Journal of Artificial Intelligence Research 32 (2008), pp. 663–704.CrossRef Google Scholar PubMed

Ross, S.. Simulation. 5th ed. Academic Press, 2013.Google Scholar PubMed

Rosenblatt, M.. “Remarks on some nonparametric estimates of a density function”. In: The Annals of Mathematical Statistics 27.3 (1956), pp. 832–837.CrossRef Google Scholar

Ross, S.. “Arbitrary state Markovian decision processes”. In: The Annals of Mathematical Statistics (1968), pp. 2118–2122.CrossRef Google Scholar

Ross, S.. Introduction to Stochastic Dynamic Programming. Academic Press, 1983.Google Scholar

Raginsky, M., Rakhlin, A., and Telgarsky, M.. “Non-convex learning via stochastic gradient Langevin dynamics: a nonasymptotic analysis”. In: Conference on Learning Theory. 2017, pp. 1674– 1703.Google Scholar

Rakhlin, A., Shamir, O., and Sridharan, K.. “Making gradient descent optimal for strongly convex stochastic optimization”. In: arXiv preprint arXiv:1109.5647 (2011).Google Scholar

Rockafellar, R. T. and Uryasev, S.. “Optimization of conditional value-at-risk”. In: Journal of Risk 2 (2000), pp. 21–42.CrossRef Google Scholar

Rudin, W.. Principles of Mathematical Analysis. McGraw-Hill, 1976.Google Scholar

Ruszczyński, A.. “Risk-averse dynamic programming for Markov decision processes”. In: Mathematical Programming 125.2 (2010), pp. 235–261.CrossRef Google Scholar

Rust, J.. “Structural estimation of Markov decision processes”. In: Handbook of Econometrics 4 (1994), pp. 3081–3143.CrossRef Google Scholar

Raghavan, V. and Veeravalli, V.. “Bayesian quickest change process detection”. In: ISIT. 2009, pp. 644–648.Google Scholar

Rasmussen, C. E. and Williams, C.. Gaussian Processes for Machine Learning. Springer, 2006.Google Scholar

Rothschild, D. M. and Wolfers, J.. “Forecasting elections: voter intentions versus expectations”. In: Available at SSRN 1884644 (2011).Google Scholar

Revuz, D. and Yor, M.. Continuous Martingales and Brownian Motion. Springer, 2013.Google Scholar

Rieder, U. and Zagst, R.. “Monotonicity and bounds for convex stochastic control models”. In: Mathematical Methods of Operations Research 39.2 (June 1994), pp. 187–207.CrossRef Google Scholar

Samuelson, P.. “A note on the pure theory of consumer’s behaviour”. In: Economica 20.4 (1938), pp. 61–71.CrossRef Google Scholar

Sayed, A. H.. Adaptive Filters. Wiley, 2008.CrossRef Google Scholar

Sayed, A. H.. “Adaptation, learning, and optimization over networks”. In: Foundations and Trends in Machine Learning 7.4–5 (2014), pp. 311–801.CrossRef Google Scholar

Sutton, R. and Barto, A.. Reinforcement Learning: An Introduction. MIT Press, 2018.Google Scholar

Shani, G., Brafman, R., and Shimony, S.. “Forward search value iteration for POMDPs”. In: IJCAI. 2007, pp. 2619–2624.Google Scholar

Schulman, J., Levine, S., Abbeel, P., Jordan, M., and Moritz, P.. “Trust region policy optimization”. In: International Conference on Machine Learning. 2015, pp. 1889–1897.Google Scholar

Seneta, E.. Non-negative Matrices and Markov Chains. Springer-Verlag, 1981.CrossRef Google Scholar

Sennott, L. I.. Stochastic Dynamic Programming and the Control of Queueing Systems. Wiley, 1999.Google Scholar

Sethuraman, J.. “A constructive definition of Dirichlet priors”. In: Statistica Sinica (1994), pp. 639–650.Google Scholar

Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., and De Freitas, N.. “Taking the human out of the loop: a review of Bayesian optimization”. In: Proceedings of the IEEE 104.1 (2015), pp. 148–175.CrossRef Google Scholar

Shannon, C. E.. “A note on a partial ordering for communication channels”. In: Information and Control 1.4 (1958), pp. 390–397.CrossRef Google Scholar

Shiryaev, A. N.. “On optimum methods in quickest detection problems”. In: Theory of Probability and its Applications 8.1 (1963), pp. 22–46.CrossRef Google Scholar

Silverman, B. W.. Density Estimation for Statistics and Data Analysis. Routledge, 2018.CrossRef Google Scholar

Sims, C. A.. “Implications of rational inattention”. In: Journal of Monetary Economics 50.3 (2003), pp. 665–690.CrossRef Google Scholar

Sims, C. A.. “Rational inattention and monetary economics”. In: Handbook of Monetary Economics. Vol. 3. Elsevier, 2010, pp. 155–181.Google Scholar

Singh, S., Jaakkola, T., Littman, M. L., and Szepesvári, C.. “Convergence results for single-step on-policy reinforcement-learning algorithms”. In: Machine Learning 38 (2000), pp. 287–308.CrossRef Google Scholar

Singh, S. and Krishnamurthy, V.. “The optimal search for a Markovian target when the search path is constrained: the infinite horizon case”. In: IEEE Transactions on Automatic Control 48.3 (Mar. 2003), pp. 487–492.CrossRef Google Scholar

Simmons, R. and Konig, S.. “Probabilistic navigation in partially observable environments”. In: International Joint Conference on Artificial Intelligence. Morgan Kaufman, 1995, pp. 1080–1087.Google Scholar

Solo, V. and Kong, X.. Adaptive Signal Processing Algorithms – Stability and Performance. Prentice Hall, 1995.Google Scholar

Snow, L., Krishnamurthy, V., and Sadler, B. M.. “Identifying coordination in a cognitive radar network – A multi-objective inverse reinforcement learning approach”. In: IEEE International Conference on Acoustics, Speech and Signal Processing. 2023, pp. 1–5.Google Scholar

Smith, J. E. and McCardle, K. F.. “Structural properties of stochastic dynamic programs”. In: Operations Research 50.5 (2002), pp. 796–809.CrossRef Google Scholar

Sondik, E. J.. “The optimal control of partially observed Markov processes”. PhD thesis. Electrical Engineering, Stanford University, 1971.Google Scholar

Sondik, E. J.. “The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs”. In: Operations Research 26.2 (Mar. 1978), pp. 282–304.CrossRef Google Scholar

Spall, J.. Introduction to Stochastic Search and Optimization. Wiley, 2003.CrossRef Google Scholar

Shani, G., Pineau, J., and Kaplow, R.. “A survey of point-based POMDP solvers”. In: Autonomous Agents and Multi-Agent Systems 27.1 (2013), pp. 1–51.CrossRef Google Scholar

Srinivas, N., Krause, A., Kakade, S. M., and Seeger, M.. “Gaussian process optimization in the bandit setting: no regret and experimental design”. In: International Conference on Machine Learning. 2010, pp. 1015–1022.Google Scholar

Shafieepoorfard, E., Raginsky, M., and Meyn, S. P.. “Rationally inattentive control of Markov processes”. In: SIAM Journal on Control and Optimization 54.2 (2016), pp. 987–1016.CrossRef Google Scholar

Smith, T. and Simmons, R.. “Heuristic search value iteration for POMDPs”. In: Conference on Uncertainty in Artificial Intelligence. AUAI Press. 2004, pp. 520–527.Google Scholar

Shaked, M. and Shanthikumar, J. G.. Stochastic Orders. Springer-Verlag, 2007.CrossRef Google Scholar

Smallwood, R. D. and Sondik, E. J.. “Optimal control of partially observable Markov processes over a finite horizon”. In: Operations Research 21 (1973), pp. 1071–1088.CrossRef Google Scholar

Stone, L.. “What’s happened in search theory since the 1975 Lanchester prize”. In: Operations Research 37.3 (May 1989), pp. 501–506.CrossRef Google Scholar

Stratonovich, R. L.. “Conditional Markov processes”. In: Theory of Probability and its Applications 5.2 (1960), pp. 156–178.CrossRef Google Scholar

Strauch, R. E.. “Negative dynamic programming”. In: The Annals of Mathematical Statistics 37.4 (1966), pp. 871–890.CrossRef Google Scholar

Surowiecki, J.. The Wisdom of Crowds. Anchor, 2005.Google Scholar

Spaan, M. and Vlassis, N.. “Perseus: randomized point-based value iteration for POMDPs”. In: Journal of Artificial Intelligence Research 24 (2005), pp. 195–220.CrossRef Google Scholar

Shafer, G. and Vovk, V.. “A tutorial on conformal prediction.” In: Journal of Machine Learning Research 9.3 (2008).Google Scholar

Segal, M. and Weinstein, E.. “A new method for evaluating the log-likelihood gradient, the Hessian, and the Fisher information matrix for linear dynamic systems”. In: IEEE Transactions on Information Theory 35.3 (May 1989), pp. 682–687.CrossRef Google Scholar

Tanner, M. A.. Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions. Springer-Verlag, 1993.CrossRef Google Scholar

Tibshirani, R.. “Regression shrinkage and selection via the LASSO”. In: Journal of the Royal Statistical Society. Series B (Methodological) (1996), pp. 267–288.CrossRef Google Scholar

Tierney, L.. “Markov chains for exploring posterior distributions”. In: The Annals of Statistics (1994), pp. 1701–1728.Google Scholar

Teh, Y. W. and Jordan, M. I.. “Hierarchical Bayesian nonparametric models with applications”. In: Bayesian Nonparametrics 1 (2010), pp. 158–207.CrossRef Google Scholar

Tichavsky, P., Muravchik, C. H., and Nehorai, A.. “Posterior Cramér–Rao bounds for discretetime nonlinear filtering”. In: IEEE Transactions on Signal Processing 46.5 (May 1998), pp. 1386– 1396.CrossRef Google Scholar

Topkis, D. M.. “Minimizing a submodular function on a lattice”. In: Operations Research 26 (1978), pp. 305–321.CrossRef Google Scholar

Topkis, D. M.. Supermodularity and Complementarity. Princeton University Press, 1998.Google Scholar

Tartakovsky, A. G. and Veeravalli, V. V.. “General asymptotic Bayesian theory of quickest change detection”. In: Theory of Probability and its Applications 49.3 (2005), pp. 458–497.CrossRef Google Scholar

Tsitsiklis, J. N. and Van Roy, B.. “Average cost temporal-difference learning”. In: Automatica 35.11 (1999), pp. 1799–1808.CrossRef Google Scholar

Taesup, M. and Weissman, T.. “Universal filtering via hidden Markov modeling”. In: IEEE Transactions on Information Theory 54.2 (2008), pp. 692–708.Google Scholar

Vapnik, V. N.. Statistical Learning Theory. Wiley, 1998.Google Scholar

Varian, H.. “Revealed preference and its applications”. In: The Economic Journal 122.560 (2012), pp. 332–338.CrossRef Google Scholar

Varian, H.. “The nonparametric approach to demand analysis”. In: Econometrica 50.1 (1982), pp. 945–973.CrossRef Google Scholar

Varian, H.. “Non-parametric tests of consumer behaviour”. In: The Review of Economic Studies 50.1 (1983), pp. 99–110.CrossRef Google Scholar

Vaswani, A. et al. “Attention is all you need”. In: Advances in Neural Information Processing Systems (2017), pp. 5998–6008.Google Scholar

Vega-Redondo, F.. Complex Social Networks. Vol. 44. Cambridge University Press, 2007.CrossRef Google Scholar

Vershynin, R.. High-Dimensional Probability: An Introduction with Applications in Data Science. Vol. 47. Cambridge University Press, 2018.Google Scholar

Vovk, V., Gammerman, A., and Shafer, G.. Algorithmic Learning in a Random World. Vol. 29. Springer, 2005.Google Scholar

Villani, C.. Optimal Transport: Old and New. Vol. 338. Springer, 2009.CrossRef Google Scholar

Visnevski, N., Krishnamurthy, V., Wang, A., and Haykin, S.. “Syntactic modeling and signal processing of multifunction radars: a stochastic context free grammar approach”. In: Proceedings of the IEEE 95.5 (May 2007), pp. 1000–1025.CrossRef Google Scholar

Vives, X.. “How fast do rational agents learn?” In: The Review of Economic Studies 60.2 (1993), pp. 329–347.CrossRef Google Scholar

Vives, X.. “Learning from others: A welfare analysis”. In: Games and Economic Behavior 20.2 (1997), pp. 177–200.CrossRef Google Scholar

Wald, A.. “Note on the consistency of the maximum likelihood estimate”. In: The Annals of Mathematical Statistics (1949), pp. 595–601.CrossRef Google Scholar

Williams, J., Fisher, J., and Willsky, A.. “Approximate dynamic programming for communicationconstrained sensor network management”. In: IEEE Transactions on Signal Processing 55.8 (2007), pp. 4300–4311.CrossRef Google Scholar

White, C. C. and Harrington, D. P.. “Application of Jensen’s inequality to adaptive suboptimal design”. In: Journal of Optimization Theory and Applications 32.1 (1980), pp. 89–99.CrossRef Google Scholar

Wong, E. and Hajek, B.. Stochastic Processes in Engineering Systems. 2nd ed. Springer-Verlag, 1985.CrossRef Google Scholar

Whittle, P.. “A simple condition for regularity in negative programming”. In: Journal of Applied Probability 16.2 (1979), pp. 305–318.CrossRef Google Scholar

Whittle, P.. “Multi-armed bandits and the Gittins index”. In: Journal of the Royal Statistical Society B 42.2 (1980), pp. 143–149.CrossRef Google Scholar

Whitt, W.. “Multivariate monotone likelihood ratio and uniform conditional stochastic order”. In: Journal of Applied Probability 19 (1982), pp. 695–701.CrossRef Google Scholar

Wiener, N.. The Extrapolation, Interpolation and Smoothing of Stationary Time Series. Wiley, 1949.CrossRef Google Scholar

Wan, E. and Merwe, R. V. D.. “The unscented Kalman filter for nonlinear estimation”. In: Adaptive Systems for Signal Processing, Communications, and Control Symposium 2000. IEEE. 2000, pp. 153–158.CrossRef Google Scholar

Wonham, W. M.. “Some applications of stochastic differential equations to optimal nonlinear filtering”. In: SIAM Journal on Control 2.3 (1965), pp. 347–369.Google Scholar

Wulfmeier, M., Ondruska, P., and Posner, I.. “Maximum entropy deep inverse reinforcement learning”. In: arXiv preprint arXiv:1507.04888 (2015).Google Scholar

Welling, M. and Teh, Y. W.. “Bayesian learning via stochastic gradient Langevin dynamics”. In: International Conference on Machine Learning. 2011, pp. 681–688.Google Scholar

Wu, C. F. J.. “On the convergence properties of the EM algorithm”. In: The Annals of Statistics 11.1 (1983), pp. 95–103.CrossRef Google Scholar

White, L. B. and Vu, H. X.. “Maximum likelihood sequence estimation for hidden reciprocal processes”. In: IEEE Transactions on Automatic Control 58.10 (2013), pp. 2670–2674.CrossRef Google Scholar

Xie, J. et al. “Social consensus through the influence of committed minorities”. In: Physical Review E 84.1 (2011), p. 011130.CrossRef Google Scholar PubMed

Ye, Y.. “The simplex and policy-iteration methods are strongly polynomial for the Markov decision problem with a fixed discount rate”. In: Mathematics of Operations Research 36.4 (2011), pp. 593–603.CrossRef Google Scholar

Yin, G., Ion, C., and Krishnamurthy, V.. “How does a stochastic optimization/approximation algorithm adapt to a randomly evolving optimum/root with jump Markov sample paths”. In: Mathematical Programming B. (Special Issue dedicated to B.T. Polyak’s 70th Birthday) 120.1 (2009), pp. 67–99.Google Scholar

Yin, G.. “On extensions of Polyak’s averaging approach to stochastic approximation”. In: Stochastics and Stochastics Reports 36 (1991), pp. 245–264.CrossRef Google Scholar

Yin, G. and Krishnamurthy, V.. “LMS algorithms for tracking slow Markov chains with applications to hidden Markov estimation and adaptive multiuser detection”. In: IEEE Transactions on Information Theory 51.7 (July 2005), pp. 2475–2490.CrossRef Google Scholar

Yu, F. and Krishnamurthy, V.. “Optimal joint session admission control in integrated WLAN and CDMA cellular network”. In: IEEE Transactions Mobile Computing 6.1 (Jan. 2007), pp. 126–139.CrossRef Google Scholar

Yin, G. and Krishnamurthy, V.. “Finite sample and large deviations analysis of stochastic gradient algorithm with correlated noise”. In: arXiv preprint arXiv:2410.08449 (2024).Google Scholar

Yin, G., Krishnamurthy, V., and Ion, C.. “Regime switching stochastic approximation algorithms with application to adaptive discrete stochastic optimization”. In: SIAM Journal on Optimization 14.4 (2004), pp. 117–1215.CrossRef Google Scholar

Yakir, B., Krieger, A. M., and Pollak, M.. “Detecting a change in regression: First-order optimality”. In: The Annals of Statistics 27.6 (1999), pp. 1896–1913.Google Scholar

Yu, L., Song, J., and Ermon, S.. “Multi-agent adversarial inverse reinforcement learning”. In: International Conference on Machine Learning. 2019, pp. 7194–7201.Google Scholar

Yin, G. and Yin, K.. “Passive stochastic approximation with constant step size and window width”. In: IEEE Transactions on Automatic Control 41.1 (1996), pp. 90–106.CrossRef Google Scholar

Zhao, Q., Tong, L., Swami, A., and Chen, Y.. “Decentralized cognitive MAC for opportunistic spectrum access in ad hoc networks: a POMDP framework”. In: IEEE Journal on Selected Areas Communications (2007), pp. 589–600.CrossRef Google Scholar

Ziebart, B. D., Maas, A. L., Bagnell, J. A., and Dey, A. K.. “Maximum entropy inverse reinforcement learning.” In: AAAI Conference on Artificial Intelligence. Vol. 8. Chicago, IL, USA. 2008, pp. 1433–1438.Google Scholar

Ziegler, D. M. et al. “Fine-tuning language models from human preferences”. In: arXiv preprint arXiv:1909.08593v2 (2020).Google Scholar

Zou, F., Yen, G. G., and Zhao, C.. “Dynamic multiobjective optimization driven by inverse reinforcement learning”. In: Information Sciences 575 (2021), pp. 468–484.CrossRef Google Scholar

Book contents

Bibliography

Summary

Access options

Book purchase

Temporarily unavailable

References

Book contents

Bibliography

Summary

Access options

Book purchase

Temporarily unavailable

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive