Search

5 - Network Learning: Growing a Lexicon by Degrees
from Part II - Language
Thomas T. Hills, University of Warwick
Book:

Behavioral Network Science

Published online:

08 November 2024

Print publication:

19 December 2024, pp 75-86
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The way networks grow and change over time is called network evolution. Numerous off-the-shelf algorithms have been developed to study network evolution. These can give us insight into the way systems grow and change over time. However, what off-the-shelf algorithms often lack are knowledge of the behavioral details surrounding a specific problem. Here we will develop a simple case that we will revisit over the next few chapters: How do children learn words from exposure to a sea of language? One possibility is that the words children learn first influence the words they learn next. Another possibility is that the structure of language itself facilitates the learning of some words over others. Indeed, we know that adults speak differently to children in ways that facilitate language learning, with semantically informative words tending to appear more often around words that children learn earliest. This invites the question: To what extent does the semantic structure of language predict word learning? This chapter will provide a general framework for building and competing models against one another with a specific application to the network evolution of child vocabularies.

Chapter 4 - Primer
from Part I - Background
James Bagrow, University of Vermont, Yong‐Yeol Ahn, Indiana University, Bloomington
Book:

Working with Network Data

Published online:

06 June 2024

Print publication:

13 June 2024, pp 39-62
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Network science is a broadly interdisciplinary field, pulling from computer science, mathematics, statistics, and more. The data scientist working with networks thus needs a broad base of knowledge, as network data calls for—and is analyzed with—many computational and mathematical tools. One needs good working knowledge in programming, including data structures and algorithms to effectively analyze networks. In addition to graph theory, probability theory is the foundation for any statistical modeling and data analysis. Linear algebra provides another foundation for network analysis and modeling because matrices are often the most natural way to represent graphs. Although this book assumes that readers are familiar with the basics of these topics, here we review the computational and mathematical concepts and notation that will be used throughout the book. You can use this chapter as a starting point for catching up on the basics, or as reference while delving into the book.

Hypothesis Tests under Separation
Carlisle Rainey
Journal:

Political Analysis / Volume 32 / Issue 2 / April 2024

Published online by Cambridge University Press:

07 September 2023, pp. 172-185
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Separation commonly occurs in political science, usually when a binary explanatory variable perfectly predicts a binary outcome. In these situations, methodologists often recommend penalized maximum likelihood or Bayesian estimation. But researchers might struggle to identify an appropriate penalty or prior distribution. Fortunately, I show that researchers can easily test hypotheses about the model coefficients with standard frequentist tools. While the popular Wald test produces misleading (even nonsensical) p-values under separation, I show that likelihood ratio tests and score tests behave in the usual manner. Therefore, researchers can produce meaningful p-values with standard frequentist tools under separation without the use of penalties or prior information.

4 - Introduction to Linear Models
Gerry P. Quinn, Deakin University, Victoria, Michael J. Keough, University of Melbourne
Book:

Experimental Design and Data Analysis for Biologists

Published online:

04 September 2023

Print publication:

07 September 2023, pp 45-61
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

There is a daunting array of statistical “methods” out there – regression, ANOVA, loglinear models, GLMMS, ANCOVA, etc. They often are treated as different data analysis approaches. We take a more holistic view. Most methods biologists use are variations on a central theme of generalized linear models – relating a biological response to a linear combination of predictor variables. We show how several common “named” methods are related, based on classifying biological response and predictor variables as continuous or categorical. We use simple regression, single-factor ANOVA, logistic regression, and two-dimensional contingency tables to show how these methods all represent generalized linear models with a single predictor. We describe how we fit these models and outline their assumptions.

3 - Basics of Detection and Mixture Models
David J. Miller, Pennsylvania State University, Zhen Xiang, University of Illinois, Urbana-Champaign, George Kesidis, Pennsylvania State University
Book:

Adversarial Learning and Secure AI

Published online:

07 September 2023

Print publication:

31 August 2023, pp 56-75
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter, we introduce the design of statistical anomaly detectors. We discuss types of data – continuous, discrete categorical, and discrete ordinal features – encountered in practice. We then discuss how to model such data, in particular to form a null model for statistical anomaly detection, with emphasis on mixture densities. The EM algorithm is developed for estimating the parameters of a mixture density. K-means is a specialization of EM for Gaussian mixtures. The Bayesian information criterion (BIC) is discussed and developed – widely used for estimating the number of components in a mixture density. We also discuss parsimonious mixtures, which economize on the number of model parameters in a mixture density (by sharing parameters across components). These models allow BIC to obtain accurate model order estimates even when the feature dimensionality is huge and the number of data samples is small (a case where BIC applied to traditional mixtures grossly underestimates the model order). Key performance measures are discussed, including true positive rate, false positive rate, and receiver operating characteristic (ROC) and associated area-under-the-curve (ROC AUC). The density models are used in attack detection defenses in Chapters 4 and 13. The detection performance measures are used throughout the book.

8 - Approximate Bayesian Computation
- By Noah Thomas, Brandon M. Turner, Trisha Van Zandt
Edited by F. Gregory Ashby, University of California, Santa Barbara, Hans Colonius, Carl V. Ossietzky Universität Oldenburg, Germany, Ehtibar N. Dzhafarov, Purdue University, Indiana
Book:

New Handbook of Mathematical Psychology

Published online:

20 April 2023

Print publication:

27 April 2023, pp 357-384
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Approximate Bayesian analysis is presented as the solution for complex computational models where no explicit maximum likelihood estimation is possible. The activation-suppression racemodel (ASR), which does have a likelihood amenable to Markov chain Monte Carlo methods, is used to demonstrate the accuracy with which parameters can be estimated with the approximate Bayesian methods.

8 - Learning and Generalization
William W. Hsieh, University of British Columbia, Vancouver
Book:

Introduction to Environmental Data Science

Published online:

23 March 2023

Print publication:

23 March 2023, pp 245-282
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

A good model aims to learn the underlying signal without overfitting (i.e. fitting to the noise in the data). This chapter has four main parts: The first part covers objective functions and errors. The second part covers various regularization techniques (weight penalty/decay, early stopping, ensemble, dropout, etc.) to prevent overfitting. The third part covers the Bayesian approach to model selection and model averaging. The fourth part covers the recent development of interpretable machine learning.

3 - Probability Distributions
William W. Hsieh, University of British Columbia, Vancouver
Book:

Introduction to Environmental Data Science

Published online:

23 March 2023

Print publication:

23 March 2023, pp 65-100
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

As probability distributions form the cornerstone of statistics, a survey is made of the common families of distributions, including the binomial distribution, Poisson distribution, multinomial distribution, Gaussian distribution, gamma distribution, beta distribution, von Mises distribution, extreme value distributions, t-distribution and chi-squared distribution. Other topics include maximum likelihood estimation, Gaussian mixtures and kernel density estimation.

Non-Gaussian score-driven conditionally heteroskedastic models with a macroeconomic application
Szabolcs Blazsek, Álvaro Escribano, Adrián Licht
Journal:

Macroeconomic Dynamics / Volume 28 / Issue 1 / January 2024

Published online by Cambridge University Press:

09 March 2023, pp. 32-50
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
We contribute to the literature on empirical macroeconomic models with time-varying conditional moments, by introducing a heteroskedastic score-driven model with Student’s t-distributed innovations, named the heteroskedastic score-driven $t$-QVAR (quasi-vector autoregressive) model. The $t$-QVAR model is a robust nonlinear extension of the VARMA (VAR moving average) model. As an illustration, we apply the heteroskedastic $t$-QVAR model to a dynamic stochastic general equilibrium model, for which we estimate Gaussian-ABCD and $t$-ABCD representations. We use data on economic output, inflation, interest rate, government spending, aggregate productivity, and consumption of the USA for the period of 1954 Q3 to 2022 Q1. Due to the robustness of the heteroskedastic $t$-QVAR model, even including the period of the coronavirus disease of 2019 (COVID-19) pandemic and the start of the Russian invasion of Ukraine, we find a superior statistical performance, lower policy-relevant dynamic effects, and a higher estimation precision of the impulse response function for US gross domestic product growth and US inflation rate, for the heteroskedastic score-driven $t$-ABCD representation rather than for the homoskedastic Gaussian-ABCD representation.

Implementation of the Multiple-Measure Maximum Likelihood strategy classification method in R: Addendum to Glöckner (2009) and practical guide for application
Marc Jekel, Andreas Nicklisch, Andreas Glöckner
Journal:

Judgment and Decision Making / Volume 5 / Issue 1 / February 2010

Published online by Cambridge University Press:

01 January 2023, pp. 54-63
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
One major challenge to behavioral decision research is to identify the cognitive processes underlying judgment and decision making. Glöckner (2009) has argued that, compared to previous methods, process models can be more efficiently tested by simultaneously analyzing choices, decision times, and confidence judgments. The Multiple-Measure Maximum Likelihood (MM-ML) strategy classification method was developed for this purpose and implemented as a ready-to-use routine in STATA, a commercial package for statistical data analysis. In the present article, we describe the implementation of MM-ML in R, a free package for data analysis under the GNU general public license, and we provide a practical guide to application. We also provide MM-ML as an easy-to-use R function. Thus, prior knowledge of R programming is not necessary for those interested in using MM-ML.

4 - Halphen Type B Distribution
Vijay P. Singh, Texas A & M University, Lan Zhang, University of Akron, Ohio
Book:

Generalized Frequency Distributions for Environmental and Water Engineering

Published online:

13 April 2022

Print publication:

07 April 2022, pp 114-139
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The Halphen type B (Hal-B) frequency distribution has been employed for frequency analyses of hydrometeorological and hydrological extremes. This chapter derives this distribution using entropy theory and discusses the estimation of its parameters with the use of the constraints used for their derivation. The distribution i+L13s tested using entropy and the methods of moments and maximum likelihood estimation.

1 - Introduction
Vijay P. Singh, Texas A & M University, Lan Zhang, University of Akron, Ohio
Book:

Generalized Frequency Distributions for Environmental and Water Engineering

Published online:

13 April 2022

Print publication:

07 April 2022, pp 1-49
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Several generalized frequency distributions have been employed in environmental and water engineering over the years. These distributions are quite versatile and can apply to frequency analysis of a wide variety of random variables, such as flood peaks, volume, duration, and inter-arrival time; extreme rainfall amount, duration, spatial coverage, and inter-arrival time; drought duration, severity, spatial extent, and inter-arrival time; wind speed, duration, direction, and spatial coverage; water quality parameters; and sediment concentration, discharge, and yield. However, because of their relatively complex form, these distributions have not become as popular as the simpler distributions. These distributions have at least three but usually more parameters, which have been estimated using the methods of moments, maximum likelihood, probability weighted moments, and L-moments. In some cases, entropy theory has been used to estimate parameters. This chapter provides a snapshot of the generalized distributions that will be discussed in this book. Moreover, a short discussion of the methods of parameter estimation, goodness-of-fit statistics, and confidence intervals is provided.

Enhancement of light aircraft 6 DOF simulation using flight test data in longitudinal motion
L.V.T. Nguyen, M. Tyan, J.-W. Lee, S. Kim
Journal:

The Aeronautical Journal / Volume 125 / Issue 1290 / August 2021

Published online by Cambridge University Press:

29 April 2021, pp. 1358-1379
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This paper proposes a procedure to improve the accuracy of the light aircraft 6 DOF simulation model by implementing model tuning and aerodynamic database correction using flight test data. In this study, the full-scale flight testing of a 2-seater aircraft has been performed in specific longitudinal manoeuver for model enhancement and simulation validation purposes. The baseline simulation model database is constructed using multi-fidelity analysis methods such as wind tunnel (W/T) test, computational fluid dynamic (CFD) and empirical calculation. The enhancement process starts with identifying longitudinal equations of motion for sensitivity analysis, where the effect of crucial parameters is analysed and then adjusted using the model tuning technique. Next, the classical Maximum Likelihood (ML) estimation method is applied to calculate aerodynamic derivatives from flight test data, these parameters are utilised to correct the initial aerodynamic table. A simulation validation process is introduced to evaluate the accuracy of the enhanced 6 DOF simulation model. The presented results demonstrate that the applied enhancement procedure has improved the simulation accuracy in longitudinal motion. The discrepancy between the simulation and flight test response showed significant improvement, which satisfies the regulation tolerance.

12 - A Data Analytics Perspective of Fundamental Power Grid Analysis Techniques
from Part IV - Signal Processing
- By Danilo P. Mandic, Sithan Kanna, Yili Xia, Anthony G. Constantinides
Edited by Ali Tajer, Rensselaer Polytechnic Institute, New York, Samir M. Perlaza, H. Vincent Poor, Princeton University, New Jersey
Book:

Advanced Data Analytics for Power Systems

Published online:

22 March 2021

Print publication:

08 April 2021, pp 285-311
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The problem of tracking the system frequency is ubiquitous in power systems. However, despite numerous empirical comparative studies of various algorithms, the underlying links and commonalities between frequency tracking methods are often overlooked. To this end, we show that the treatment of the two best known frequency estimation methodologies: (i) tracking the rate of change of the voltage phasor angles, and (ii) fixed frequency demodulation, can be unified, whereby the former can be interpreted as a special case of the latter. Furthermore, we show that the frequency estimator derived from the difference in the phase angle is the maximum likelihood frequency estimator of a nonstationary sinusoid. Drawing upon the data analytics interpretation of the Clarke and related transforms in power system analysis as practical Principal Component Analyzers (PCA), we then set out to explore commonalities between classic frequency estimation techniques and widely linear modeling. The so-obtained additional degrees of freedom allow us to arrive at the adaptive Smart Clarke and Smart Park transforms (SCT and SPT), which are shown to operate in an unbiased and statistically consistent way for both standard and dynamically unbalanced smart grids. Overall, this work suggest avenues for next generation solutions for the analysis of modern grids that are not accessible from the Circuit Theory perspective.

8 - Statistical Modeling and DNA Mixture Evaluation
Ronald Meester, Vrije Universiteit, Amsterdam, Klaas Slooten, Vrije Universiteit, Amsterdam
Book:

Probability and Forensic Evidence

Published online:

09 April 2021

Print publication:

08 April 2021, pp 207-253
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Predicted Probabilities and Inference with Multinomial Logit
Philip Paolino
Journal:

Political Analysis / Volume 29 / Issue 3 / July 2021

Published online by Cambridge University Press:

16 November 2020, pp. 416-421
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Multinomial logit (MNL) differs from many other econometric methods because it estimates the effects of variables upon nominal, not ordered outcomes. One consequence of this is that the estimated coefficients vary depending upon a researcher’s decision about the choice of a reference, or “baseline,” outcome. Most researchers realize this in principle, but many focus upon the statistical significance of MNL coefficients for inference in the same way that they use the coefficients from models with ordered dependent variables. In some instances, this leads researchers to report statistics that do not reflect the correct quantities of interest and reach flawed conclusions. In this note, I argue that researchers need to orient their approach to analyzing both the substantive and statistical significance of predicted probabilities of interest that match their research questions.

7 - Spatial Econometrics
George Grekousis
Book:

Spatial Analysis Methods and Practice

Published online:

20 May 2020

Print publication:

11 June 2020, pp 451-504
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter deals with
Spatial econometrics notions
Handling spatial dependence and spatial heterogeneity through spatial econometric models
Diagnostics for spatial dependence (Lagrange multiplier tests)
Spatial autoregressive model (SAR)
Spatial error model
Spatial filtering
Spatial regression with regimes
Spatial two-stage least squares (S2SLS)
Maximum likelihood
Generalized method of moments (GMM)
After a thorough study of the theory and lab sections, you will be able to
Distinguish which spatial regression method to use in order to account for spatial dependence and spatial heterogeneity
Interpret spatial diagnostics used in OLS regression that can guide you to the appropriate spatial regression model
Build spatial regression models and interpret results
Identify if spatial dependence or spatial heterogeneity is accounted for by the adopted method
Conduct spatial econometric analysis through GeoDa space

Rejoinder to Daniel Stegmueller's Comments
Martin Elff, Jan Paul Heisig, Merlin Schaeffer, Susumu Shikano
Journal:

British Journal of Political Science / Volume 51 / Issue 1 / January 2021

Published online by Cambridge University Press:

13 May 2020, pp. 460-462

Print publication:

January 2021
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation

Multilevel Analysis with Few Clusters: Improving Likelihood-Based Methods to Provide Unbiased Estimates and Accurate Inference
Martin Elff, Jan Paul Heisig, Merlin Schaeffer, Susumu Shikano
Journal:

British Journal of Political Science / Volume 51 / Issue 1 / January 2021

Published online by Cambridge University Press:

13 May 2020, pp. 412-426

Print publication:

January 2021
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Quantitative comparative social scientists have long worried about the performance of multilevel models when the number of upper-level units is small. Adding to these concerns, an influential Monte Carlo study by Stegmueller (2013) suggests that standard maximum-likelihood (ML) methods yield biased point estimates and severely anti-conservative inference with few upper-level units. In this article, the authors seek to rectify this negative assessment. First, they show that ML estimators of coefficients are unbiased in linear multilevel models. The apparent bias in coefficient estimates found by Stegmueller can be attributed to Monte Carlo Error and a flaw in the design of his simulation study. Secondly, they demonstrate how inferential problems can be overcome by using restricted ML estimators for variance parameters and a t-distribution with appropriate degrees of freedom for statistical inference. Thus, accurate multilevel analysis is possible within the framework that most practitioners are familiar with, even if there are only a few upper-level units.

Mixed-stock and discriminant models use for assessing recruitment sources of estuarine fish populations in La Plata Basin (South America)
Esteban Avigliano, Jorge Pisonero, Nerea Bordel, Alejandro Dománico, Alejandra Vanina Volpedo
Journal:

Journal of the Marine Biological Association of the United Kingdom / Volume 99 / Issue 6 / September 2019

Published online by Cambridge University Press:

20 March 2019, pp. 1429-1433
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
The objective of this study was to identify potential recruitment sources of Prochilodus lineatus from freshwater areas (Paraná and Uruguay rivers) to estuarine population of the Río de la Plata Estuary (La Plata Basin, South America), considering young (age-1) and adult (age-7) fish. LA-ICP-MS chemical analysis of the otolith core (nine element:Ca ratios) of an unknown mixed sample from Río de la Plata Estuary (2011 and 2017) was compared with a young-of-year baseline data set (same cohort) and classified into freshwater nurseries (Paraná or Uruguay river) by using maximum classification-likelihood models (MLE and MCL) and quadratic discriminant analysis (QDA). Considering the three models used, the Uruguay River was the most important contributor for both young and adult populations. The young population (2011) was highly mixed with contributions between 31.7 to 68.3%, while the degree of mixing was found to decrease in 2017 (adult fish) from 97.1 to 100% contributions. The three employed methods showed comparable estimates, however, the QDA showed a high similarity with the MCL model, suggesting sensitivity to evaluate small contributions, unlike the MLE method. Our results show the potential application of maximum likelihood mixture models and QDA for determining the relative importance of recruitment sources of fish in estuarine waters of the La Plata Basin.

Search Results

Refine search

Refine search

Actions for selected content:

62 results

5 - Network Learning: Growing a Lexicon by Degrees

Summary

Chapter 4 - Primer

Summary

Hypothesis Tests under Separation

4 - Introduction to Linear Models

Summary

3 - Basics of Detection and Mixture Models

Summary

8 - Approximate Bayesian Computation

Summary

8 - Learning and Generalization

Summary

3 - Probability Distributions

Summary

Non-Gaussian score-driven conditionally heteroskedastic models with a macroeconomic application

Implementation of the Multiple-Measure Maximum Likelihood strategy classification method in R: Addendum to Glöckner (2009) and practical guide for application

4 - Halphen Type B Distribution

Summary

1 - Introduction

Summary

Enhancement of light aircraft 6 DOF simulation using flight test data in longitudinal motion

12 - A Data Analytics Perspective of Fundamental Power Grid Analysis Techniques

Summary

8 - Statistical Modeling and DNA Mixture Evaluation

Predicted Probabilities and Inference with Multinomial Logit

7 - Spatial Econometrics

Summary

Rejoinder to Daniel Stegmueller's Comments

Multilevel Analysis with Few Clusters: Improving Likelihood-Based Methods to Provide Unbiased Estimates and Accurate Inference

Mixed-stock and discriminant models use for assessing recruitment sources of estuarine fish populations in La Plata Basin (South America)

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

62 results

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary