Skip to main content Accessibility help
×
Hostname: page-component-78c5997874-4rdpn Total loading time: 0 Render date: 2024-11-14T16:35:53.508Z Has data issue: false hasContentIssue false

6 - Data sets with outliers

from PART II - The kernel method

Published online by Cambridge University Press:  05 January 2013

Wolfgang Härdle
Affiliation:
Rheinische Friedrich-Wilhelms-Universität Bonn
Get access

Summary

In exploratory data analysis one might wish instead to discover patterns while making few assumptions about data structure, using techniques with properties that change only gradually across a wide range of noise distributions. Nonlinear data smoothers provide a practical method of finding general smooth patterns for sequenced data confounded with long-tailed noise.

P. Velleman (1980, p. 609)

Suppose that one observes data such as those in Figure 6.1: the main body of the data lies in a strip around zero and a few observations, governing the scaling of the scatter plot, lie apart from this region. These few data points are obviously outliers. This terminology does not mean that outliers are not part of the joint distribution of the data or that they contain no information for estimating the regression curve. It means rather that outliers look as if they are too small a fraction of the data to be allowed to dominate the small-sample behavior of the statistics to be calculated. Any smoother (based on local averages) applied to data like that in Figure 6.1 will exhibit a tendency to “follow the outlying observations.” Methods for handling data sets with outliers are called robust or resistant.

From a data-analytic viewpoint, a nonrobust behavior of the smoother is sometimes undesirable. Suppose that, a posteriori, a parametric model for the response curve is to be postulated. Any erratic behavior of the nonparametric pilot estimate will cause biased parametric formulations. Imagine, for example, a situation in which an outlier has not been identified and the nonparametric smoothing method has produced a slight peak in the neighborhood of that outlier. A parametric model which fitted that “nonexisting” peak would be too high-dimensional.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 1990

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

  • Data sets with outliers
  • Wolfgang Härdle, Rheinische Friedrich-Wilhelms-Universität Bonn
  • Book: Applied Nonparametric Regression
  • Online publication: 05 January 2013
  • Chapter DOI: https://doi.org/10.1017/CCOL0521382483.006
Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

  • Data sets with outliers
  • Wolfgang Härdle, Rheinische Friedrich-Wilhelms-Universität Bonn
  • Book: Applied Nonparametric Regression
  • Online publication: 05 January 2013
  • Chapter DOI: https://doi.org/10.1017/CCOL0521382483.006
Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

  • Data sets with outliers
  • Wolfgang Härdle, Rheinische Friedrich-Wilhelms-Universität Bonn
  • Book: Applied Nonparametric Regression
  • Online publication: 05 January 2013
  • Chapter DOI: https://doi.org/10.1017/CCOL0521382483.006
Available formats
×