We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Statistical analysis is usually necessary to answer questions with behavioural data. Analysis should be planned and registered before collecting data. Once collected, a dataset should be formatted and permanently archived prior to analysis. Data is checked and visualised with descriptive statistics and graphs. Models representing hypotheses about the true effects present in the population from which the dataset is a sample are built and tested with inferential statistics. Many different hypotheses can be captured using a linear modelling framework in which an outcome variable is predicted with a combination of predictor variables and interactions. Sources of non-independence in datasets can be addressed with mixed models. The robustness of findings can be examined by comparing the results obtained when analysis is done in different ways using model selection and multiverse approaches. Confirmatory analysis designed to test preregistered hypotheses should be clearly differentiated from exploratory analysis that generates new hypotheses.
This chapter deals with the topic of feature expansion and imputation with a particular emphasis on computable features. While poor domain modelling may result in too many features being added to the model, there are times when plenty of value can be gained by looking into generating features from existing ones. The excess features can then be removed using feature selection techniques (discussed in the next chapter). Computable Features will be particularly useful if we know the underlining ML model is unable to do certain operations over the features, like multiplying them (e.g., if the ML involves a simple, linear modelling). Another type of feature expansion involves calculating a best effort approximation of values missing in the data (Feature Imputation). The most straightforward expansion for features happens when the raw data contains multiple items of information under a single column (Decomposing Complex Features). The chapter concludes by borrowing ideas from a technique used in SVMs called the kernel trick. The type of projections that practitioners have found useful can lend themselves to be applied directly without the use of kernels.
Recommend this
Email your librarian or administrator to recommend adding this to your organisation's collection.