Hostname: page-component-cd9895bd7-hc48f Total loading time: 0 Render date: 2024-12-27T06:36:34.644Z Has data issue: false hasContentIssue false

Multiple feature regularized kernel for hyperspectral imagery classification

Published online by Cambridge University Press:  26 March 2020

Xu Yan
Affiliation:
Department of Electrical and Computer Engineering, Mississippi State University, Starkville, USA
Peng Jiangtao*
Affiliation:
Hubei Key Laboratory of Applied Mathematics, Faculty of Mathematics and Statistics, Hubei University, Wuhan, China
Du Qian
Affiliation:
Department of Electrical and Computer Engineering, Mississippi State University, Starkville, USA
*
Corresponding author: J. Peng, Email: pengjt1982@126.com

Abstract

In this paper, a multiple feature regularized kernel is proposed for hyperspectral imagery classification. To exploit the label information, a regularized kernel is used to refine the original kernel in the Support Vector Machine classifier. Furthermore, since spatial features have been widely investigated for hyperspectral imagery classification, different types of spatial features including spectral feature, local feature (i.e. local binary pattern), global feature (i.e. Gabor feature), and shape feature (i.e. extended multiattribute profiles) are included to provide distinguish discriminative information. Finally, a majority voting-based ensemble approach, which combines different types of features, is adopted to further increase the classification performance. Combining different discriminative feature information can improve the classification performance since one type of feature may result in poor performance, especially when the number of training samples is limited. Experimental results demonstrated that the proposed approach has superior performance compared with the state-of-the-art classifiers.

Type
Original Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © The Author(s), 2020 published by Cambridge University Press in association with Asia Pacific Signal and Information Processing Association

I. INTRODUCTION

Hyperspectral images (HSIs) contain hundreds of contiguous spectral bands ranging from visible to infrared bands, which can provide rich spectral information [Reference Plaza1]. Compared with multispectral images with only a few spectral bands, they offer an advantage in terms of classification and detection [Reference Xu, Du, Li, Chen and Younan2]. Due to the rich spectral bands, HSIs have a variety of applications including environmental pollution control, agriculture precision farming, mineral exploration, etc [Reference Lillesand and Kiefer3Reference Mei, Ji, Geng, Zhang, Li and Du6]. Accurate classification is very important to those applications. The key for accurate classification is to learn an accurate similarity metric between samples. The traditional pixel-wise algorithms assume the samples from the same class have similar spectral characteristics and simply compare the similarity between pixels. However, for a complicate HSI scene, different materials might have a similar spectral signature, and pixels from the same class usually have various spectral signature. Thus, it is quite difficult to distinguish them by simply comparing the spectral distance. Additionally, the limited number of training samples and the high dimensionality, also known as the Hughes Phenomenon, prohibit the performance of HSI classification [Reference Yang, Du and Chen7].

To overcome the above-mentioned problems, kernel-based approaches have been applied for HSI classification and shown excellent performance [Reference Liu, Fowler and Zhao8Reference Kuo, Ho, Li, Hung and Taur10]. Kernel methods project the original samples into a high-dimensional space, where the samples are more separable. The commonly used kernels include Gaussian radial basis function (RBF), Polynomial, and linear kernels [Reference Melgani and Bruzzone11]. In [Reference Camps-Valls and Bruzzone12], kernel-based Support Vector Machine (SVM) is proposed for HSI classification. A region-kernel-based SVM [Reference Peng, Zhou and Chen13], which measures the region-to-region similarity, is able to capture the spatial–spectral similarity and has shown improved performance. Other variants such as generalized composite kernel [Reference Li, Marpu, Plaza, Bioucas-Dias and Benediktsson14] and sample-cluster composite kernels [Reference Gomez-Chova, Camps-Valls, Bruzzone and Calpe-Maravilla15] are also introduced for spectral–spatial HSI classification. In [Reference Sun, Liu, Xu, Tian and Li16], a band-weighted SVM, which generates a band weight vector to regularize the original SVM, is presented for HSI classification. Kernel collaborative representation with Tikhonov regularization is presented for HSI classification in [Reference Li, Du and Xiong17, Reference Ma, Mei, Wan, Hou, Wang and Feng18].

Although kernel methods have achieved excellent performance for HSI classification, the kernels are constructed using the distance between the training samples only while ignoring the available label information. The label information among the training samples can be used to help refine the original kernel. In [Reference Kwok and Tsang19Reference Pan, Chen, Xu and Chen21], the ideal kernel method is presented which incorporated the label information into the original kernel and has shown better performance compared to using the original kernel only. An ideal regularized composite kernel (IRCK) framework is applied for HSI classification in [Reference Peng, Chen, Zhou and Li22], which combines spatial information, spectral information, and label information simultaneously. Multiple Kernel Learning (MKL) [Reference Mehmet and Alpayd23], which combines multiple kernels to improve the performance, has been applied for HSI. A composite kernel approach, which balances the spatial and spectral information, is presented in [Reference Camps-Valls, Gomez-Chova, Muñoz-Marí, Vila-Francés and Calpe-Maravilla24], and four types of different composite kernels are proposed including summation kernel, weighted summation kernel, stacked kernel, and cross-information kernel. In [Reference Wang, Gu and Tuia25], a two-step discriminative MKL (DMKL), which can increase the between-class scatter while decrease the within-class scatter in the reproduce kernel Hilbert space, is presented for HSI classification.

The abovementioned IRCK method incorporates the spatial information in terms of the spatial mean pixel of the neighborhood; however, it does not consider other different types of spatial information, which may provide complementary discriminative information. Different types of spatial features have been widely investigated for HSI classification. For instance, in [Reference Li and Du26], a Gabor-filtering-based nearest regularized subspace is presented for HSI classification. In [Reference Li, Chen, Su and Du27], local binary pattern is applied to extract local image features such as edges, corners, and spots, and then extreme learning machine is applied to achieve excellent performance for HSI classification. In [Reference Dalla Mura, Atli Benediktsson, Waske and Bruzzone28], extended multi-attribute profiles feature (EMAP) is presented for the analysis of hyperspectral imagery, which can efficiently extract the spatial information for classification purposes.

Furthermore, for a complicated HSI scene, one type of feature may not guarantee good performance, especially when the number of training samples is small. Different types of features can be used to provide more discriminative information, and multiple feature learning can be a good solution to solve the above limitation. For instance, a multiple feature learning framework, which integrates both linear and non-linear features, is introduced for HSI classification [Reference Li29]. In [Reference Li, Zhang and Zhang30], a non-linear joint collaborative representation model with multiple feature learning is presented to solve the small sample set problem in HSI. An efficient patch alignment framework, which combines multiple features, is proposed in [Reference Zhang, Zhang, Tao and Huang31].

Inspired by the idea of multiple feature learning, in this paper, we propose to combine different types of features for the IR kernel. Four types of features including spectral feature, Gabor feature, EMAP feature, and LBP feature are investigated due to their potential for HSI classifications. Moreover, a majority voting-based ensemble approach is adopted to make more robust classifications. The rest of the paper is organized as follows. The related work on regularized kernel is introduced in Section II. Section III shows the proposed framework. In Section IV, the experimental results and analysis are provided. Finally, Section V draws the conclusion.

II. REGULARIZED KERNEL

A) Standard kernel

Suppose we have a set of HSI training samples $\varsigma = \{ ({\boldsymbol{x}_1},{y_1}),({\boldsymbol{x}_2},{y_2}), ..., ({\boldsymbol{x}_N},{y_N})\}$ with ${\boldsymbol{x}_i} \in {\Re ^d}$, where d denotes the dimensionality of a sample. The kernel measures the similarity between two samples. The most commonly used kernels are linear kernel, polynomial kernel, and RFB kernel. We choose the RBF kernel in our experiment due to its wide application for HSI classification, and the RBF kernel is calculated as

(1)\begin{equation} {\boldsymbol{K}_{ij}} = \boldsymbol{K}({\boldsymbol{x}_i},{\boldsymbol{x}_j}) = \exp \left( { - {{{{{\left \Vert {{\boldsymbol{x}_i} - {\boldsymbol{x}_j}} \right \Vert}^2}} \over {2{\delta^2}}}}} \right), \end{equation}

where $\delta$ is the band width of the RBF kernel.

B) Ideal kernel

The ideal kernel is defined as [Reference Pan, Lai and Shen20]

(2)\begin{equation} {\boldsymbol{T}_{ij}}{\rm =} \boldsymbol{T}({\boldsymbol{x}_i},{\boldsymbol{x}_j}) = \left\{ {\begin{array}{@{}c@{}} {1,{y_i} = {y_j}}\\ {0,{y_i} \ne {y_j}} \end{array}} \right.. \end{equation}

The ideal kernel is inspired by the idea that if two samples should be considered as “similar” if and only they belong to the same class. The ideal kernel has included the label information.

C) Ideal regularized kernel

The similarity measurement between samples in the standard kernel does not consider the labeled information, and a more desirable kernel can be learned by incorporating the labeled information. In [Reference Pan, Chen, Xu and Chen21], an ideal regularization (IR) learning framework is proposed, which can efficiently incorporate the label information into the standard kernel. The following equation learns a more desirable matrix K given the initial kernel ${\boldsymbol{K}_0}$ [Reference Pan, Chen, Xu and Chen21]

(3)\begin{equation} \mathop {\min} \limits_{\boldsymbol{K}\,{\succeq}\,0} (D(\boldsymbol{K},{\boldsymbol{K}_0}) + \gamma {\Omega} (\boldsymbol{K})), \end{equation}

where D represents the von Neumann divergence between $\boldsymbol{K}$ and ${\boldsymbol{K}_0}$, ${\Omega}$ is a regularization term, and $\gamma$ is the regularization parameter. $\boldsymbol{K}\,{\succeq}\,0$ indicates that $\boldsymbol{K}$ is a symmetric positive semidefinite matrix. The von Neumann divergence is expressed as

(4)\begin{equation} D(\boldsymbol{K},{\boldsymbol{K}_0}) = {\rm tr}(\boldsymbol{K}\log \boldsymbol{K} - \boldsymbol{K}\log {\boldsymbol{K}_0} - \boldsymbol{K} + {\boldsymbol{K}_0}), \end{equation}

where tr(•) represents the trace of a matrix and ${\Omega}$(K) is defined as ${\Omega} (\boldsymbol{K}) = - {\rm tr}(\boldsymbol{KT})$. The functionality of this term is to incorporate the label information into the standard kernel. Therefore, the final objective function is:

(5)\begin{equation} \mathop {\min} \limits_{\boldsymbol{K}\,{\succeq}\,0} {\rm tr}(\boldsymbol{K}\log \boldsymbol{K} - \boldsymbol{K}\log {\boldsymbol{K}_0} - \boldsymbol{K} + {\boldsymbol{K}_0} - \gamma {\rm tr}(\boldsymbol{KT})).\end{equation}

Taking the derivative regarding K and setting the equation to zero, the following equation for K is derived as

(6)\begin{equation} \boldsymbol{K} = \exp (\log {\boldsymbol{K}_0} + \gamma \boldsymbol{T}) = {\boldsymbol{K}_0} * \exp (\gamma \boldsymbol{T}), \end{equation}

where * represents the element-wise dot product between two matrices. Using the Taylor expansion of equation (6), it can be also expressed as:

(7)\begin{equation} \boldsymbol{K} = {\boldsymbol{K}_0} + \gamma {\boldsymbol{K}_0} * \boldsymbol{T} + {\textstyle{{{\gamma ^2}} \over {2!}}}{\boldsymbol{K}_0} * {\boldsymbol{T}^2} + .... \end{equation}

The above equation indicates that the IR kernel $\boldsymbol{K}$ can be considered as a linear combination of the standard kernel and the ideal kernels.

The out-of-sample extension is investigated for samples that are not encountered before. Denote $\boldsymbol{S} = {{{\rm K}}_0}^{ - 1}(\boldsymbol{K} + {\boldsymbol{K}_0}){\boldsymbol{K}_0}^{ - 1}$, then the kernel between the two new points $\boldsymbol{s}$ and $\boldsymbol{t}$ can be calculated as

(8)\begin{equation} \boldsymbol{K}(\boldsymbol{s},\boldsymbol{t}) = - {\boldsymbol{K}_0}(\boldsymbol{s},\boldsymbol{t}) + \sum\limits_{i,j = 1}^N {\boldsymbol{S}(i,j)} {\boldsymbol{K}_0}(\boldsymbol{s},{\boldsymbol{x}_i}){\boldsymbol{K}_0}({\boldsymbol{x}_j},\boldsymbol{t}). \end{equation}

D) Majority voting-based ensemble approach

We propose to use a majority voting-based ensemble approach to combine the output of SVM using the IR kernel with different types of features. Suppose we are using m features and ${f_i}(\boldsymbol{x})$ represents the output using the i-th feature, the final output is calculated as

(9)\begin{equation} \tilde{y} = {\rm mode}\{ {f_1}(\boldsymbol{x}),{f_2}(\boldsymbol{x}), ..., {f_m}(\boldsymbol{x})\}. \end{equation}

III. PROPOSED FRAMEWORK

In this paper, a majority voting-based multi-feature IR kernel method which can efficiently deal with the small training sample problem is proposed. Figure 1 shows the proposed framework. First, principal component analysis [Reference Rodarmel and Shan32] is conducted on the original dataset to extract the principal components of the original dataset. Different types of features including EMAP features, Gabor features, and LBP features are then extracted from the principle components. Classification is conducted on each type of feature with the IR kernel-based SVM. Finally, a majority voting-based ensemble approach is adopted to combine the results of the classification output.

Fig. 1. Proposed framework.

IV. EXPERIMENTAL RESULTS

A) Experimental dataset

Three popular HSI datasets are used in our experiments.

  1. (1) Indian Pines: This dataset is collected by the Airborne Visible and Infrared Imaging Spectrometer (AVIRIS) sensor over the Indian Pines site in June 1992. The spatial size is 145 × 145 with the spatial resolution of 20 m/ pixel. The 200 spectral bands are ranging from 400 to 2500 nm. After bad bands removal, 200 spectral bands remain. There are a total of 16 classes in the scene. The false color infrared of bands 50, 27, and 17 is shown in Fig. 2(a), and the groundtruth is shown in Fig. 2(b).

  2. (2) University of Pavia: The second dataset is acquired by the Reflective Optics System Imaging Spectrometer (ROSIS) sensor, and the scene covers the University of Pavia with 115 spectral bands ranging from 0.43 to 0.86 µm. The spatial size is 610 × 340 with the spatial resolution of 1.3 m/pixel including nine classes. After removing the noisy band, 103 bands remain. Figures 3(a) and 3(b) show the color infrared composite of bands 60, 30, 2 and the groundtruth, respectively.

  3. (3) Salinas: The third dataset is collected by the AVISIS sensor over Salinas Valley, California. The spatial size is 512 × 217 with the spatial resolution of 3.7 m/pixel. After removing the 20 water absorption bands, 200 bands remain. The color infrared composite of bands 47, 27, 13 and the groundtruth is shown in Fig. 4(a), and the groundtruth of this dataset is shown in Fig. 4(b). There are a total of 16 classes.

Fig. 2. Indian pines dataset. (a) Color-infrared-composite of bands 50, 27, and 17. (b) Groundtruth.

Fig. 3. University of Pavia dataset. (a) Color-infrared-composite of bands 60, 30, and 2. (b) Groundtruth.

Fig. 4. Salinas dataset. (a) Color-infrared-composite of bands 47, 27, and 13. (b) Groundtruth.

B) Experimental setup

The number of training samples per class is denoted as N. In the experiment, N is chosen to be $\{3, 5, 7,\ldots,13\}$, and for those classes less than N samples, half of the total samples are chosen for training. The rest of the labeled samples are chosen as the testing set. Gaussian kernel is investigated, and the LIBSVM [Reference Chang and Lin33] is used to implement SVM. The number of PCs for the Gabor features, EMAP features, and LBP features are set to 10, 4, and 4, respectively [Reference Li and Du26Reference Dalla Mura, Atli Benediktsson, Waske and Bruzzone28]. The EMAP features are generated according to [Reference Dalla Mura, Atli Benediktsson, Waske and Bruzzone28], and the parameters for LBP are set according to [Reference Li, Chen, Su and Du27]. Each experiment is repeated 50 times to avoid bias.

C) Results

Tables 1–3 list the overall accuracies (OAs), average accuracies (AAs) and the κ coefficient of Indian Pines, University of Pavia, and Salinas, respectively, for different types of features using the standard and the regularized kernel; furthermore, a majority voting-based ensemble approach is presented with the standard and regularized kernel to better utilize the discriminative information.

Table 1. Classification accuracies for different features and ensemble approach on Indian Pines.

Table 2. Classification accuracies for different features and ensemble approach on the University of Pavia.

Table 3. Classification accuracies for different features and ensemble approach on Salinas.

Several observations can be drawn from Table 1. First, the regularized versions consistently outperform the standard kernels using all the four features. Specifically, for the EMAP feature, the regularized kernel can approximately improve the OAs of standard kernel around 5% when the number of training samples is small. For the spectral, Gabor, and LBP, the IR kernels can outperform the standard kernel, indicating the advantage of incorporating the label information into the kernel metric. Second, the majority voting-based ensemble approach can achieve higher OAs compared using a single feature for both the standard and IR kernels. The explanation is that by combining multiple features, complementary discrimination information brought by multiple features is used. Thus, better performance can be achieved. Additionally, the classification performance is improved with the increased number of training samples.

From the results for the University of Pavia, it can be observed that the IR kernel versions can significantly outperform the standard kernel on both the EMAP and Gabor features. For instance, the OAs increase approximately by 3 and 5% on EMAP and Gabor feature, respectively, for the University of Pavia dataset. The IR kernel also has better performance for the spectral feature and LBP feature. Moreover, the ensemble approach using the IR kernel has the best performance, indicating the advantage of combining complimentary discriminative information. It can be observed that the ensemble approach has a smaller variance compared with using a specific feature alone. This is expected because by combining different types of features, a more robust result can be achieved.

For the results on Salinas as shown in Table 3, several conclusions can be drawn. First, it can be concluded that IR can provide better performance compared with the standard kernel for all types of features. For this dataset, ensemble approaches have slightly better performance compared with using the EMAP feature alone. Gabor feature does not have a good performance on this dataset, including the Gabor feature for this dataset can affect the accuracy. We also notice that the regularized kernel has poor performance if the standard kernel has poor performance. In addition, with only seven and 11 training samples per class for Indian Pines, University of Pavia, and Salinas, the OA can be above 90%. The explanation maybe that when the number of training samples is limited, the similarity measurement between samples is not reliable, and thus, the label similarity between different classes can learn a desirable kernel metric, which is essential for good classification performance. Another conclusion is that the ensemble approach provides good performance with a limited number of training samples.

Figure 5 compares the OAs of IRCK, DMKL, and our proposed method using 15 samples per class. It can be concluded that the proposed method has superior performance over IRCK on three different datasets. More specifically, the proposed method has around 6.1, 4.7, and 1.4% higher OA than IRCK for Indian Pines, University of Pavia, and Salinas, respectively. Compared with DMKL, the proposed method has around 7 and 4% higher OA than DMKL on Indian Pines and the University of Pavia, respectively. On the Salinas dataset, the proposed method slightly degrades from DMKL. In Fig. 6, the classification maps of Indian Pines are shown for four different types of features. The proposed method has around 0.6, 4, and 1.5% higher OA for Indian Pines, University of Pavia, and Salinas, respectively. The observation is that few misclassified samples exist in the classification map of the ensemble approaches. This is expected since different features can provide discriminative information, which can lead to better performance.

Fig. 5. Overall accuracies of IRCK, DMKL, and the proposed method for three datasets.

Fig. 6. Classification maps for Indian Pines. The first and second rows correspond to Standard and IR kernels. (a) Spectral-Sta(OA = 63.31%). (b) EMAP-Sta(OA = 83.9%). (c) Gabor-Sta(OA = 77.8%). (d) LBP-Sta (OA = 73.2%). (e) Ensemble-Sta(OA = 92.4%). (f) Spectral-IR(OA = 64.4%). (g) EMAP-IR(OA = 86.4%). (h) Gabor-IR(OA = 78.1%). (i)LBP-IR (OA = 76.8%). (j) Ensemble-IR(OA = 93.8).

D) Parameters analysis

The parameters in this paper are analyzed in this section. For the SVM, we will first analyze the RBF kernel σ and the regularization term C first. For σ, it is chosen from the range {2−4, 2−3, …, 24}. C is chosen in the range {1, 10, 100, 105}. Figures 7(a)–7(c) show the OA with different σ and C for EMAP features on Indian Pines, University of Pavia, and Salinas, respectively, using 13 training samples per class. A holdout validation sample set is used for testing. It can be concluded from Fig. 7 that C does not affect the OA much especially when it is larger than 100, and any number above 100 will guarantee good performance. For σ, a relatively small number will generate good classification performance. In order to analyze the effect of the $\gamma$ parameter in IR, in the following experiment, C is set as 1000, and σ is chosen as 2−1 for the EMAP features of different datasets to ensure good performance.

Fig. 7. Parameters tuning using EMAP feature. (a) σ and C for Indian Pines. (b) σ and C for the University of Pavia. (c) σ and C for Salinas. (d) $\gamma$ for all datasets.

The parameter $\gamma$ plays an important role in the classification performance. We investigate the effect of $\gamma$ on the EMAP feature for Indian Pines, University of Pavia, and Salinas, respectively. Figure 7(d) shows the OAs of the three datasets with respect to different $\gamma$ in the range {1e−3, 5e−3, …, 5e−1}. It can be seen that the OA stays stable when $\gamma$ is very small, and when $\gamma$ increases, the OAs of all three datasets will decrease. This may be due to the reason that the label information is dominant in the generated kernel while ignoring the spectral similarity between the training samples. In our experiment, we set C as 1000, σ as 2−1, and $\gamma$ as 5e−3 for the EMAP features for three datasets. For other types of features, a similar parameter tuning process is adopted.

V. CONCLUSIONS

In this paper, a novel multiple feature-based IR kernel is presented for HSI classification. The proposed framework incorporates the label information in conjunction with the complementary discriminative information with multiple features. Experimental results show our proposed approach can achieve superior classification performance.

ACKNOWLEDGEMENT

The authors would like to thank Prof. D. Landgrebe for providing the Indian Pines data set, Prof. P. Gamba for providing the University of Pavia data set, and Dr. J. Anthony Gualtieri for providing Salinas data set.

FINANCIAL SUPPORT

This work was supported by the National Natural Science Foundation of China under Grant Nos. 61871177 and 11771130.

Yan Xu received his Ph.D. degree in 2019 from the Department of Electrical and Computer Engineering, Mississippi State University, USA.

Jiangtao Peng received the B.S. and M.S. degrees from Hubei University, Wuhan, China, in 2005 and 2008, and his Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences, China, in 2011. He is currently a Professor in the Faculty of Mathematics and Statistics, Hubei University, China. His research interests include machine learning and hyperspectral image processing.

Qian Du received the Ph.D. degree in electrical engineering from the University of Maryland – Baltimore County, Baltimore, MD, USA, in 2000. She is currently a Bobby Shackouls Professor with the Department of Electrical and Computer Engineering, Mississippi State University, Starkville, MS, USA. Her research interests include hyperspectral remote-sensing image analysis and applications, pattern classification, data compression, and neural networks. Dr. Du is a fellow of SPIE and IEEE. Since 2016, she has been the Editor-in-Chief of the IEEE JSTARS.

References

REFERENCES

[1]Plaza, A. et al. : Recent advances in techniques for hyperspectral image processing. Remote Sens. Environ., 113 (2009), S110S122.CrossRefGoogle Scholar
[2]Xu, Y.; Du, Q.; Li, W.; Chen, C.; Younan, N.H.: Nonlinear classification of multispectral imagery using representation-based classifiers. Remote Sens., 9 (7) (2017), 662.CrossRefGoogle Scholar
[3]Lillesand, T.M.; Kiefer, R.W.: Remote Sensing and Image Interpretation, vol. 4, Wiley, Hoboken, NJ, USA, 2015.Google Scholar
[4]Bioucas-dias, J.M.; Plaza, A.; Camps-valls, G.; Scheunders, P.; Nasrabadi, N.M.; Chanussot, J.: Hyperspectral remote sensing data analysis and future challenges. IEEE Geosci. Remote Sens. Mag., 1 (2) (2013), 636.CrossRefGoogle Scholar
[5]Nasrabadi, N.M.: Hyperspectral target detection: an overview of current and future challenges. IEEE Signal Process. Mag., 31 (1) (2014), 3444.CrossRefGoogle Scholar
[6]Mei, S.; Ji, J.; Geng, Y.; Zhang, Z.; Li, X.; Du, Q.: Unsupervised spatial-spectral feature learning by 3D convolutional autoencoder for hyperspectral classification. IEEE Trans. Geosci. Remote Sens., 57 (9) (2019), 68086820.CrossRefGoogle Scholar
[7]Yang, H.; Du, Q.; Chen, G.: Particle swarm optimization-based hyperspectral dimensionality reduction for urban land cover classification. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., 5 (2) (2012), 544554.CrossRefGoogle Scholar
[8]Liu, W.; Fowler, J.E.; Zhao, C.: Spatial logistic regression for support-vector classification of hyperspectral imagery. IEEE Geosci. Remote Sens. Lett., 14 (3) (2017), 439443.CrossRefGoogle Scholar
[9]Gan, L.; Du, P.; Xia, J.; Meng, Y.: Kernel fused representation-based classifier for hyperspectral imagery. IEEE Geosci. Remote Sens. Lett., 14 (5) (2017), 684688.CrossRefGoogle Scholar
[10]Kuo, B.-C.; Ho, H.-H.; Li, C.-H.; Hung, C.-C.; Taur, J.-S.: A kernel-based feature selection method for SVM With RBF kernel for hyperspectral image classification. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., 7 (1) (Jan. 2014), 317326.Google Scholar
[11]Melgani, F.; Bruzzone, L.: Classification of hyperspectral remote sensing. IEEE Trans. Geosci. Remote Sens., 42, (2004), 17781790.CrossRefGoogle Scholar
[12]Camps-Valls, G.; Bruzzone, L.: Kernel-based methods for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens., 43 (6) (2005), 13511362.CrossRefGoogle Scholar
[13]Peng, J.; Zhou, Y.; Chen, C.L.P.: Region-kernel-based support vector machines for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens., 53 (9) (2015), 48104824.CrossRefGoogle Scholar
[14]Li, J.; Marpu, P.R.; Plaza, A.; Bioucas-Dias, J.M.; Benediktsson, J.A.: Generalized composite kernel framework for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens., 51 (9) (2013), 48164829.CrossRefGoogle Scholar
[15]Gomez-Chova, L.; Camps-Valls, G.; Bruzzone, L.; Calpe-Maravilla, J.: Mean map kernel methods for semisupervised cloud classification. IEEE Trans. Geosci. Remote Sens., 48 (1) (2010), 207220.CrossRefGoogle Scholar
[16]Sun, W.; Liu, C.; Xu, Y.; Tian, L.; Li, W.: A band-weighted support vector machine method for hyperspectral imagery classification. IEEE Geosci. Remote Sens. Lett., 14 (10) (2017), 17101714.CrossRefGoogle Scholar
[17]Li, W.; Du, Q.; Xiong, M.: Kernel collaborative representation with Tikhonov regularization for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett., 12 (1) (2015), 4852.Google Scholar
[18]Ma, M.; Mei, S.; Wan, S.; Hou, J.; Wang, Z.; Feng, D.D.: Video summarization via block sparse dictionary selection. Neurocomputing, 378 (2020), 197209.CrossRefGoogle Scholar
[19]Kwok, J.T.; Tsang, I.W.: Learning with idealized kernels, in Proc. Int. Conf. Mach. Learn., 2003, 400407.Google Scholar
[20]Pan, B.; Lai, J.; Shen, L.: Ideal regularization for learning kernels from labels. Neural Netw., 56 (2014), 2234.CrossRefGoogle ScholarPubMed
[21]Pan, B.; Chen, W.S.; Xu, C.; Chen, B.: A novel framework for learning geometry-aware kernels. IEEE Trans. Neural Netw. Learn. Syst., 27 (5) (2016), 939951.CrossRefGoogle ScholarPubMed
[22]Peng, J.; Chen, H.; Zhou, Y.; Li, L.: Ideal regularized composite kernel for hyperspectral image classification. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens, 10 (4) (2017), 15631574.CrossRefGoogle Scholar
[23]Mehmet, G.; Alpayd, E.: Multiple kernel learning algorithms. J. Mach. Learn. Research, 12 (2011), 22112268.Google Scholar
[24]Camps-Valls, G.; Gomez-Chova, L.; Muñoz-Marí, J.; Vila-Francés, J.; Calpe-Maravilla, J.: Composite kernels for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett., 3 (1) (2006), 9397.CrossRefGoogle Scholar
[25]Wang, Q.; Gu, Y.; Tuia, D.: Discriminative multiple kernel learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens., 54 (7) (2016), 39123927.CrossRefGoogle Scholar
[26]Li, W.; Du, Q.: Gabor-filtering-based nearest regularized subspace for hyperspectral image classification. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., 7 (4) (2014), 10121022.CrossRefGoogle Scholar
[27]Li, W.; Chen, C.; Su, H.; Du, Q.: Local binary patterns and extreme learning machine for hyperspectral imagery classification. IEEE Trans. Geosci. Remote Sens., 53 (7) (2015), 36813693.CrossRefGoogle Scholar
[28]Dalla Mura, M.; Atli Benediktsson, J.; Waske, B.; Bruzzone, L.: Extended profiles with morphological attribute filters for the analysis of hyperspectral data. Int. J. Remote Sens., 31 (22) (2010), 59755991.CrossRefGoogle Scholar
[29]Li, J. et al. : Multiple feature learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens., 53 (3) (2015), 15921606.CrossRefGoogle Scholar
[30]Li, J.; Zhang, H.; Zhang, L.: A nonlinear multiple feature learning classifier for hyperspectral images with limited training samples. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., 8 (6) (2015), 27282738.CrossRefGoogle Scholar
[31]Zhang, L.; Zhang, L.; Tao, D.; Huang, X.: On combining multiple features for hyperspectral remote sensing image classification. IEEE Trans. Geosci. Remote Sens., 50 (3) (2012), 879893.CrossRefGoogle Scholar
[32]Rodarmel, C.; Shan, J.: Principal component analysis for hyperspectral image classification. Inf. Syst., 62 (2) (2002), 115115.Google Scholar
[33]Chang, C.-C.; Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol., 2 (3) (2011), 127.CrossRefGoogle Scholar
Figure 0

Fig. 1. Proposed framework.

Figure 1

Fig. 2. Indian pines dataset. (a) Color-infrared-composite of bands 50, 27, and 17. (b) Groundtruth.

Figure 2

Fig. 3. University of Pavia dataset. (a) Color-infrared-composite of bands 60, 30, and 2. (b) Groundtruth.

Figure 3

Fig. 4. Salinas dataset. (a) Color-infrared-composite of bands 47, 27, and 13. (b) Groundtruth.

Figure 4

Table 1. Classification accuracies for different features and ensemble approach on Indian Pines.

Figure 5

Table 2. Classification accuracies for different features and ensemble approach on the University of Pavia.

Figure 6

Table 3. Classification accuracies for different features and ensemble approach on Salinas.

Figure 7

Fig. 5. Overall accuracies of IRCK, DMKL, and the proposed method for three datasets.

Figure 8

Fig. 6. Classification maps for Indian Pines. The first and second rows correspond to Standard and IR kernels. (a) Spectral-Sta(OA = 63.31%). (b) EMAP-Sta(OA = 83.9%). (c) Gabor-Sta(OA = 77.8%). (d) LBP-Sta (OA = 73.2%). (e) Ensemble-Sta(OA = 92.4%). (f) Spectral-IR(OA = 64.4%). (g) EMAP-IR(OA = 86.4%). (h) Gabor-IR(OA = 78.1%). (i)LBP-IR (OA = 76.8%). (j) Ensemble-IR(OA = 93.8).

Figure 9

Fig. 7. Parameters tuning using EMAP feature. (a) σ and C for Indian Pines. (b) σ and C for the University of Pavia. (c) σ and C for Salinas. (d) $\gamma$ for all datasets.