We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Bringing together idiomatic Python programming, foundational numerical methods, and physics applications, this is an ideal standalone textbook for courses on computational physics. All the frequently used numerical methods in physics are explained, including foundational techniques and hidden gems on topics such as linear algebra, differential equations, root-finding, interpolation, and integration. The second edition of this introductory book features several new codes and 140 new problems (many on physics applications), as well as new sections on the singular-value decomposition, derivative-free optimization, Bayesian linear regression, neural networks, and partial differential equations. The last section in each chapter is an in-depth project, tackling physics problems that cannot be solved without the use of a computer. Written primarily for students studying computational physics, this textbook brings the non-specialist quickly up to speed with Python before looking in detail at the numerical methods often used in the subject.
A major concern in the social sciences is understanding and explaining the relationship between two variables. We showed in Chapter 5 how to address this issue using tabular presentations. In this chapter we show how to address the issue statistically via regression and correlation. We first cover the two concepts of regression and correlation. We then turn to the issue of statistical inference and ways of evaluating the statistical significance of our results. Since most social science research is undertaken using sample data, we need to determine whether the regression and correlation coefficients we calculate using the sample data are statistically significant in the larger population from which the sample data were drawn.
Community detection is one of the most important methodological fields of network science, and one which has attracted a significant amount of attention over the past decades. This area deals with the automated division of a network into fundamental building blocks, with the objective of providing a summary of its large-scale structure. Despite its importance and widespread adoption, there is a noticeable gap between what is arguably the state-of-the-art and the methods which are actually used in practice in a variety of fields. The Elements attempts to address this discrepancy by dividing existing methods according to whether they have a 'descriptive' or an 'inferential' goal. While descriptive methods find patterns in networks based on context-dependent notions of community structure, inferential methods articulate a precise generative model, and attempt to fit it to data. In this way, they are able to provide insights into formation mechanisms and separate structure from noise. This title is also available as open access on Cambridge Core.
From observed data, statistical inference infers the properties of the underlying probability distribution. For hypothesis testing, the t-test and some non-parametric alternatives are covered. Ways to infer confidence intervals and estimate goodness of fit are followed by the F-test (for test of variances) and the Mann-Kendall trend test. Bootstrap sampling and field significance are also covered.
A link is made between epistemology – that is to say, the philosophy of knowledge – and statistics. Hume's criticism of induction is covered, as is Popper's. Various philosophies of statistics are described.
Katz, King, and Rosenblatt (2020, American Political Science Review 114, 164–178) introduces a theoretical framework for understanding redistricting and electoral systems, built on basic statistical and social science principles of inference. DeFord et al. (2021, Political Analysis, this issue) instead focuses solely on descriptive measures, which lead to the problems identified in our article. In this article, we illustrate the essential role of these basic principles and then offer statistical, mathematical, and substantive corrections required to apply DeFord et al.’s calculations to social science questions of interest, while also showing how to easily resolve all claimed paradoxes and problems. We are grateful to the authors for their interest in our work and for this opportunity to clarify these principles and our theoretical framework.
This chapter focuses on critical infrastructures in the power grid, which often rely on Industrial Control Systems (ICS) to operate and are exposed to vulnerabilities ranging from physical damage to injection of information that appears to be consistent with industrial control protocols. This way, infiltration of firewalls protecting the control perimeter of the control network becomes a significant tread. The goal of this chapter is to review identification and intrusion detection algorithms for protecting the power grid, based on the knowledge of the expected behavior of the system.
We develop a model that successfully learns social and organizational human network structure using ambient sensing data from distributed plug load energy sensors in commercial buildings. A key goal for the design and operation of commercial buildings is to support the success of organizations within them. In modern workspaces, a particularly important goal is collaboration, which relies on physical interactions among individuals. Learning the true socio-organizational relational ties among workers can therefore help managers of buildings and organizations make decisions that improve collaboration. In this paper, we introduce the Interaction Model, a method for inferring human network structure that leverages data from distributed plug load energy sensors. In a case study, we benchmark our method against network data obtained through a survey and compare its performance to other data-driven tools. We find that unlike previous methods, our method infers a network that is correlated with the survey network to a statistically significant degree (graph correlation of 0.46, significant at the 0.01 confidence level). We additionally find that our method requires only 10 weeks of sensing data, enabling dynamic network measurement. Learning human network structure through data-driven means can enable the design and operation of spaces that encourage, rather than inhibit, the success of organizations.
Quantitative comparative social scientists have long worried about the performance of multilevel models when the number of upper-level units is small. Adding to these concerns, an influential Monte Carlo study by Stegmueller (2013) suggests that standard maximum-likelihood (ML) methods yield biased point estimates and severely anti-conservative inference with few upper-level units. In this article, the authors seek to rectify this negative assessment. First, they show that ML estimators of coefficients are unbiased in linear multilevel models. The apparent bias in coefficient estimates found by Stegmueller can be attributed to Monte Carlo Error and a flaw in the design of his simulation study. Secondly, they demonstrate how inferential problems can be overcome by using restricted ML estimators for variance parameters and a t-distribution with appropriate degrees of freedom for statistical inference. Thus, accurate multilevel analysis is possible within the framework that most practitioners are familiar with, even if there are only a few upper-level units.
Intensive week-long Summer Schools in Statistics for Astronomers were initiated at Penn State in 2005 and have been continued annually. Due to their popularity and high demand, additional full summer schools have been organized in India, Brazil, Space Telescope Science Institute.
The Summer Schools seek to give a broad exposure to fundamental concepts and a wide range of resulting methods across many fields of statistics. The Summer Schools in statistics and data analysis for young astronomers present concepts and methodologies with hands on tutorials using the data from astronomical surveys.
In this paper, we use queuing theory to model the number of insured households in an insurance portfolio. The model is based on an idea from Boucher and Couture-Piché (2015), who use a queuing theory model to estimate the number of insured cars on an insurance contract. Similarly, the proposed model includes households already insured, but the modeling approach is modified to include new households that could be added to the portfolio. For each household, we also use the queuing theory model to estimate the number of insured cars. We analyze an insurance portfolio from a Canadian insurance company to support this discussion. Statistical inference techniques serve to estimate each parameter of the model, even in cases where some explanatory variables are included in each of these parameters. We show that the proposed model offers a reasonable approximation of what is observed, but we also highlight the situations where the model should be improved. By assuming that the insurance company makes a $1 profit for each one-year car exposure, the proposed approach allows us to determine a global value of the insurance portfolio of an insurer based on the customer equity concept.
Visual displays of data in the parasitology literature are often presented in a way which is not very informative regarding the distribution of the data. An example being simple barcharts with half an error bar on top to display the distribution of parasitaemia and biomarkers of host immunity. Such displays obfuscate the shape of the data distribution through displaying too few statistical measures to explain the spread of all the data and selecting statistical measures which are influenced by skewness and outliers. We describe more informative, yet simple, visual representations of the data distribution commonly used in statistics and provide guidance with regards to the display of estimates of population parameters (e.g. population mean) and measures of precision (e.g. 95% confidence interval) for statistical inference. In this article we focus on visual displays for numerical data and demonstrate such displays using an example dataset consisting of total IgG titres in response to three Plasmodium blood antigens measured in pregnant women and parasitaemia measurements from the same study. This tutorial aims to highlight the importance of displaying the data distribution appropriately and the role such displays have in selecting statistics to summarize its distribution and perform statistical inference.
There is burgeoning interest in predicting road development because of the wide ranging important socioeconomic and environmental issues that roads present, including the close links between road development, deforestation and biodiversity loss. This is especially the case in developing nations, which are high in natural resources, where road development is rapid and often not centrally managed. Characterization of large scale spatio-temporal patterns in road network development has been greatly overlooked to date. This paper examines the spatio-temporal dynamics of road density across the Brazilian Amazon and assesses the relative contributions of local versus neighbourhood effects for temporal changes in road density at regional scales. To achieve this, a combination of statistical analyses and model-data fusion techniques inspired by studies of spatio-temporal dynamics of populations in ecology and epidemiology were used. The emergent development may be approximated by local growth that is logistic through time and directional dispersal. The current rates and dominant direction of development may be inferred, by assuming that roads develop at a rate of 55 km per year. Large areas of the Amazon will be subject to extensive anthropogenic change should the observed patterns of road development continue.
In this paper, following an open portfolio approach, we show how to estimate a Bonus-malus system evolution.
Considering a model for the number of new annual policies, we obtain ML estimators, asymptotic distributions and confidence regions for the expected number of new policies entering the portfolio in each year, as well as for the expected number and proportion of insureds in each bonus class, by year of enrollment. Confidence regions for the distribution of policyholders result in confidence regions for optimal bonus scales.
Our treatment is illustrated by an example with numerical results.
This chapter provides an introduction to several fundamental methods for analyzing data from clinical trials, including an overview of two very important and related concepts: confidence intervals and tests of hypotheses. In making statistical inference, one draws a sample from a population and computes one or more statistics. Most confidence intervals use similar methods. In general, a confidence interval is made up of a point estimate, the standard error of that estimate, and a tabular value. The method of intention to treat has become the standard for analyzing data from clinical trials. Regulatory organizations such as FDA and the International Conference on Harmonisation (ICH) recommend that the primary efficacy analysis be based on the intention to treat principle. Multiple methods have been used to impute missing data. A method that was very commonly used in the past is called 'last observation carried forward' (LOCF).
Based on the concept of multipower variation we establish a class of easily computable and robust estimators for the integrated volatility, especially including the squared integrated volatility, in Lévy-type stochastic volatility models. We derive consistency and feasible distributional results for the estimators. Furthermore, we discuss the applications to time-changed CGMY, normal inverse Gaussian, and hyperbolic models with and without leverage, where the time-changes are based on integrated Cox-Ingersoll-Ross or Ornstein-Uhlenbeck-type processes. We deduce which type of market microstructure does not affect the estimates.
S. James Press's many contributions to statistical research, lecturing, mentoring students, the statistics profession, etc., are summarized. Then some new developments in Bayesian analysis are described and remarks on the future of Bayesian analysis are presented.