Data sets with outliers

Wolfgang Härdle

doi:10.1017/CCOL0521382483.006

6 - Data sets with outliers

from PART II - The kernel method

Published online by Cambridge University Press: 05 January 2013

Wolfgang Härdle

Show author details

Wolfgang Härdle: Affiliation:
Rheinische Friedrich-Wilhelms-Universität Bonn

Book contents

Get access

Summary

In exploratory data analysis one might wish instead to discover patterns while making few assumptions about data structure, using techniques with properties that change only gradually across a wide range of noise distributions. Nonlinear data smoothers provide a practical method of finding general smooth patterns for sequenced data confounded with long-tailed noise.

P. Velleman (1980, p. 609)

Suppose that one observes data such as those in Figure 6.1: the main body of the data lies in a strip around zero and a few observations, governing the scaling of the scatter plot, lie apart from this region. These few data points are obviously outliers. This terminology does not mean that outliers are not part of the joint distribution of the data or that they contain no information for estimating the regression curve. It means rather that outliers look as if they are too small a fraction of the data to be allowed to dominate the small-sample behavior of the statistics to be calculated. Any smoother (based on local averages) applied to data like that in Figure 6.1 will exhibit a tendency to “follow the outlying observations.” Methods for handling data sets with outliers are called robust or resistant.

From a data-analytic viewpoint, a nonrobust behavior of the smoother is sometimes undesirable. Suppose that, a posteriori, a parametric model for the response curve is to be postulated. Any erratic behavior of the nonparametric pilot estimate will cause biased parametric formulations. Imagine, for example, a situation in which an outlier has not been identified and the nonparametric smoothing method has produced a slight peak in the neighborhood of that outlier. A parametric model which fitted that “nonexisting” peak would be too high-dimensional.

Information

Type: Chapter
Information: Applied Nonparametric Regression , pp. 190 - 202

DOI: https://doi.org/10.1017/CCOL0521382483.006 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 1990

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

Accessibility standard: Unknown

Why this information is here

This section outlines the accessibility features of this content - including support for screen readers, full keyboard navigation and high-contrast display options. This may not be relevant for you.

Accessibility Information

Accessibility compliance for the PDF of this book is currently unknown and may be updated in the future.