We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
We generalize to a broader class of decoupled measures a result of Ziv and Merhav on universal estimation of the specific cross (or relative) entropy, originally for a pair of multilevel Markov measures. Our generalization focuses on abstract decoupling conditions and covers pairs of suitably regular g-measures and pairs of equilibrium measures arising from the “small space of interactions” in mathematical statistical mechanics.
This chapter introduces the machine learning side of things of this book. Although we assume some prior experience in machine learning, we start off with a full recap of the basic concepts and key terminology. This includes a discussion of learning paradigms, such as supervised and unsupervised learning, and the machine learning life cycle, articulating the steps to go from data collection to model deployment. We cover topics like data preparation and preprocessing, model evaluation and selection, and machine learning pipelines, showing how all the stages of this cycle are susceptible to being compromised when we talk about large-scale data analytics. After that, the rest of the chapter is devoted to the machine learning library of Spark, MLLib. Basic concepts such as Transformers, Estimators, and Pipelines are presented with an example using linear regression. The example provided forces us to use a pipeline of methods to get the data ready for training. This allows us to introduce some of the data preparation packages of Spark (e.g., VectorAssembler or StandardScaler). Finally, we explore evaluation packages (e.g., RegressionEvaluator) and how to perform hyperparameter tuning.
We introduce principles of point estimation, that is, the estimation of a value for the vector of unknown parameters of the density of a variate. The chapter starts by considering some desirable properties of point estimators, a sort of “the good, the bad, and the ugly” classification! The topics covered include bias, efficiency, mean-squared error (MSE), consistency, robustness, invariance, and admissibility. We then introduce methods of summarizing the data via statistics that retain the relevant sample information about the parameter vector, and we see how they achieve the desirable properties of estimators. We discuss sufficiency, Neyman's factorization, ancillarity, Rao-Blackwellization, completeness, the Lehmann–Scheffé theorem and the minimum-variance unbiasedness of an estimator, and Basu's theorem. We consider the exponential family and special cases and conclude by introducing the most common model in statistics, the linear model, which is used for illustrations in this chapter and is covered more extensively in the following chapters.
We estimate the anisotropic index of an anisotropic fractional Brownian field. For all directions, we give a convergent estimator of the value of the anisotropic index in this direction, based on generalized quadratic variations. We also prove a central limit theorem. First we present a result of identification that relies on the asymptotic behavior of the spectral density of a process. Then, we define Radon transforms of the anisotropic fractional Brownian field and prove that these processes admit a spectral density satisfying the previous assumptions. Finally we use simulated fields to test the proposed estimator in different anisotropic and isotropic cases. Results show that the estimator behaves similarly in all cases and is able to detect anisotropy quiteaccurately.
Wind direction is a circular variable. This makes the algorithms used to find its standard deviation different from that of the linear variables. In particular, the requirement for storing all the data points before the standard deviation can be computed limits the storage capacity and puts great strain on remote data acquisition systems. Various algorithms have therefore been developed to estimate the standard deviation in order to reduce the number of terms stored. The following work consists of a comparative analysis of such estimators together with the parameters used. It emerges that some of the assumptions adopted to produce the equations being analysed do not hold in practice, even though this does not affect significantly the performance of the estimators that depend on them. On the other hand, the parameter that has the best trend with the algorithm adopted is the magnitude of the vector to the centre of gravity of the system. However, such a result gives rise to some concerns since it does not account for the ‘vectorial’ nature of the angle being treated.
Results of a previous paper (Liebetrau (1977a)) are extended to higher dimensions. An estimator V∗(t1, t2) of the variance function V(t1, t2) of a two-dimensional process is defined, and its first- and second-moment structure is given assuming the process to be Poisson. Members of a class of estimators of the form where and for 0 < α i < 1, are shown to converge weakly to a non-stationary Gaussian process. Similar results hold when the t′i are taken to be constants, when V is replaced by a suitable estimator and when the dimensionality of the underlying Poisson process is greater than two.
The second-moment structure of an estimator V*(t) of the variance-time curve V(t) of a weakly stationary point process is obtained in the case where the process is Poisson. This result is used to establish the weak convergence of a class of estimators of the form Tβ(V*(tTα) – V(tTα)), 0 < α < 1, to a non-stationary Gaussian process. Similar results are shown to hold when α = 0 and in the case where V(tTα) is replaced by a suitable estimator.
Recommend this
Email your librarian or administrator to recommend adding this to your organisation's collection.