We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In this chapter, students learn about the levels of measurement that social scientists engage in when collecting data. The most common system for conceptualizing quantitative data was developed by Stevens, who defined four levels of data, which are (in ascending order of complexity) nominal, ordinal, interval, and ratio-level data. Nominal data consist of mutually exclusive and exhaustive categories, which are then given an arbitrary number. Ordinal data have all of the qualities of nominal data, but the numbers in ordinal data also indicate rank order. Interval data are characterized by all the traits of nominal and ordinal data, but the spacing between numbers is equal across the entire length of the scale. Finally, ratio data are characterized by the presence of an absolute zero. Higher levels of data contain more information, although it is always possible to convert from one level of data to a lower level. It is not possible to convert data to a higher level than it was collected at. It is important to recognize the level of data because there are certain mathematical procedures that require certain levels of data. Social scientists who ignore the level of their data risk producing meaningless results or distorted statistics.
This chapter starts with basic definitions such as types of machine learning (supervised vs. unsupervised learning, classifiers vs. regressors), types of features (binary, categorical, discrete, continuos), metrics (precision, recall, f-measure, accuracy, overfitting), and raw data and then defines the machine learning cycle and the feature engineering cycle. The feature engineering cycle hinges on two types of analysis: exploratory data analysis, at the beginning of the cycle and error analysis at the end of each feature engineering cycle. Domain modelling and feature construction concludes the chapter with particular emphasis on feature ideation techniques.
When machine learning engineers work with data sets, they may find the results aren't as good as they need. Instead of improving the model or collecting more data, they can use the feature engineering process to help improve results by modifying the data's features to better capture the nature of the problem. This practical guide to feature engineering is an essential addition to any data scientist's or machine learning engineer's toolbox, providing new ideas on how to improve the performance of a machine learning solution. Beginning with the basic concepts and techniques, the text builds up to a unique cross-domain approach that spans data on graphs, texts, time series, and images, with fully worked out case studies. Key topics include binning, out-of-fold estimation, feature selection, dimensionality reduction, and encoding variable-length data. The full source code for the case studies is available on a companion website as Python Jupyter notebooks.
Data analysis and interpretation allow you to test your predictions and interpret your results. This is an exciting time and can be daunting because it’s a big change from data collection. It’s very unlikely that you will have collected exactly the data you set out to collect, but your analysis plan will keep you on track and avoid the dangers of aimlessly exploring your dataset. You will probably need further statistical advice at this stage. This chapter guides you through data preparation, initial data analysis, hypothesis testing, calculating your effect sizes and confidence intervals, interpreting your results and extrapolating from them.
Recommend this
Email your librarian or administrator to recommend adding this to your organisation's collection.