A Classification of Remote Sensing Image Based on Improved Compound Kernels of Svm

(1)

HAL Id: hal-01062132

https://hal.inria.fr/hal-01062132

Submitted on 9 Sep 2014

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Distributed under a Creative Commons Attribution| 4.0 International License

A Classification of Remote Sensing Image Based on Improved Compound Kernels of Svm

Jianing Zhao, Wanlin Gao, Zili Liu, Guifen Mou, Lin Lu, Lina Yu

To cite this version:

Jianing Zhao, Wanlin Gao, Zili Liu, Guifen Mou, Lin Lu, et al.. A Classification of Remote Sensing Image Based on Improved Compound Kernels of Svm. Third IFIP TC 12 International Conference on Computer and Computing Technologies in Agriculture III (CCTA), Oct 2009, Beijing, China.

pp.15-20, �10.1007/978-3-642-12220-0_3�. �hal-01062132�

(2)

A CLASSIFICATION OF REMOTE SENSING IMAGE BASED ON IMPROVED COMPOUND KERNELS OF SVM

Jianing Zhao¹, Wanlin Gao¹^,*, Zili Liu¹, Guifen Mou², Lin Lu³, Lina Yu¹

1 College of Information and Electrical Engineering, China Agricultural University, Beijing, P. R. China 100083

2 Yunnan Xin Nan Nan Agricultural Technology Co., Ltd. Yunnan Province, P. R. China 650214

3 Kunming Agriculture machinery research institute, Yunnan Province, P. R. China 650034

* Corresponding author, Address: College of Information and Electrical Engineering, China Agricultural University, No. 17, Qinghua Dong Lu, Haidian, Beijing 100083, P. R. China, Tel: +86-010-62736755, Fax: +86-010-62736755, Email: gaowlin@cau.edu.cn

Abstract: The accuracy of RS classification based on SVM which is developed from statistical learning theory is high under small number of train samples, which results in satisfaction of classification on RS using SVM methods. The traditional RS classification method combines visual interpretation with computer classification. The accuracy of the RS classification, however, is improved a lot based on SVM method, because it saves much labor and time which is used to interpret images and collect training samples. Kernel functions play an important part in the SVM algorithm. It uses improved compound kernel function and therefore has a higher accuracy of classification on RS images. Moreover, compound kernel improves the generalization and learning ability of the kernel.

Keywords: compound kernel，remote sensing，image classification，support vector machine

1. INTRODUCTION

Classification on remote sensing images is very important in the field of remote sensing processing. Among the classification algorithms, the accuracy of SVM (C. J. Burges, 1998) classification is one of the highest

(3)

2 Jianing Zhao , Wanlin Gao , Zili Liu , Guifen Mou , Lin Lu , Lina Yu methods. SVM is short for Support Vector Machine which is a machine learning method proposed by Vapnik according to statistical leaning theory (Vapnik, 1995; Vapnik, 1998), it integrates many techniques including optimal hyperplane (Cristianini, 2005), mercer kernel, slave variable, convex quadratic programming and so on. Support Vector Machine successfully solves many practical problems such as small number of samples, non linear, multi dimensions, local minimum and so on. In several challenging applications, SVM achieves the best performance so far (ZHANG Xue Gong, 2000). In this paper, a compound kernel is proposed which is better than a single kernel to improve generalization and learning ability.

2. SVM METHOD

2.1 SVM Basis

Support Vector Machine is a machine learning algorithm based on statistical learning theory, using the principle of structural risk minimization, minimizing the errors of sample while shrinking the upperbound of generalization error of the model, therefore, improving the generalization of the model. Compared with other machine learning algorithms which are based on the principle of empirical risk minimization, statistical learning theory proposes a new strategy: the mechanism of SVM: find a optimal classification haperplane which satisfies the requirement of classification;

separate the two classes as much as possible and maximize the margin of both sides of the hyperplane, namely, make the separated data farthest from the hyperplane. A training sample can be separated by different hyperplanes.

When the margin of the hyperplane is largest, the hyperplane is the optimal hyperplane of separability (Cristianini et al. 2005).

2.1.1 Linear Separability

A two-class classification problem can be stated in following way: N training samples can be represented as a set of pairs (xi ,yi)，i=1,2…n with yi

the label of the class which can be set to values of 1and xR^dstands for feature vector with d components. The hyperplane is defined asg(x)wxb0.

Find the optimal hyperplane which leads to maximization of the margin.

The optimal question is then translated to seek the minimization of the following function (1) and (2):

(4)

Compound Kernels Of Svm

) ( ) 2( ) 1 , (

1











 ⁿ

i

C i

w w

w

 

(1)

0 1

]

[  _i   _i 

i w x b

y



i1,,n (2)

Where: w is normal to the hyperplane, C is a regularisation parameter, b is the offset.

The dual problem of the above problem is searching the maximization of the following function:



 





 ⁿ

j

j i j i j i n

i

i y y x x

Q

1 1

) 2 (

) 1

(

   

(3)

which is constrained by

n i

C

y _i

n

i i

i 0,0 , 1,2, ,

1















 (4)

The classification rule based on optimal hyperplane is to find the optimal classification function:







    

 ⁿ

i

i i

i y x x b

b x w x

f

1

} ) ( sgn{

} ) sgn{(

)

(  (5)

through solving the above problems, the coefficient _i^of a non support vector is zero.

2.1.2 Non-linear Separability

For a non-linear problem, we use kernel functions which satisfy Mercer’s condition to project data onto higher dimensions where the data are considered to be linear separable. With kernel functions introduced, non- linear algorithm can be implemented without increasing the complexity of the algorithm. If we use inner products K(x,x^')



(x),



(x^')to replace dot products in the optimal hyperplane, which equals to convert the original feature space to a new feature space, therefore, majorized function of function(3) turns to:

 

 



 ⁿ

i

j i j i j n

j i

i y y K x x

Q

1 1

) , 2 (

) 1

(

   

(6)

And the corresponding discrimination (5) turns to:







  

 ⁿ

i

i i

iyK x x b

x f

1

} ) ( sgn{

)

(



(7)

(5)

4 Jianing Zhao , Wanlin Gao , Zili Liu , Guifen Mou , Lin Lu , Lina Yu

2.2 Kernel Function

Kernel functions have properties as follows:

Property 1: If K1 , K2 are two kernels, and a1 , a2 are two positive real numbers, then K( u , v ) = a1* K1 ( u , v) + a2 * K2 ( u , v) is also a kernel which satisfies Mercer’s condition.

Property 2: If K1 , K2 are two kernels, then K( u , v) = K1 ( u , v) * K2 ( u , v) is also a kernel which satisfies Mercer’s condition.

Property 3: if K1 is a kernel, the exponent of K1 is also a kernel, that is, K( u , v) = exp (K1 ( u , v) ) (XIA Hongxia, 2009).

There are 4 types of kernels which are often used: linear kernels, polynomial kernels, Gauss RBF kernels, sigmoid kernels.

(1)linear kernels : K(x_ix_j) x^T_i x_j (2)polynomial kernels : K(x_ix_j)(



x_i^Tx_j r)^d (3)RBF kernelsK(x_ix_j)exp( x_i x_j ²) (4)sigmoid kernel: K(x_ix_j)tanh(



x_i^Tx_j r)

3 COMPOUND KERNELS

Currently, there are so many kernels each of which has individual characteristic. But they can be classified into two main types, that is local kernel and global kernels (Smits et al. 2002).

(1)local kernels

Only the data whose values approach each other have an influence on the kernel values. Basically, all kernels based on a distance function are local kernels. Typical local kernels are:

RBF： K(x_ix_j)exp( x_i x_j ²) (2)global kernels

Samples that are far away from each others still have an influence on the kernel value. All kernels based on the dot-product are global:

linear： K(x_ix_j)x_i^Tx_j polynomial： K(x_ix_j)(



x_i^Tx_j r)^d sigmoid: K(x_ix_j)tanh(



x_i^Tx_j r)

The upperbound of the expected risk of SVM is the proportion of the average number of support vector in the training sample to the total number of training samples:

E[P(error)]≤E[number of support vector]/(total number of support vector in

(6)

Compound Kernels Of Svm train samples-1)

We can get that if the number of support vector is reduced, the ability of generalization of SVM can be improved. Therefore, Gauss kernels can be improved as follows:

0 ) exp(

)

(x_ix_j a  x_ix_j ²  

K (8)

Through adding a coefficient a which is a real number greater than 1, the absolute value of the coefficient of the quadratic term of quadratic programming function in equation (8) is increased. Hence, the optimal value of α is reduced, and the number of support vector is reduced, the ability of generalization therefore is improved.

If the total number of train samples is fixed, the error rate of classification can be reduced by decreasing the number of support vectors.

If kernel functions satisfy Mercer’s condition, the linear combination of them are eligible for kernels. Examples are:

) ( )

( )

(x_i x_j a K₁ x_i x_j b K₂ x_i x_j

K        (9)

Where both a and b are real numbers greater than 1, K1 , K2 can be any kernel. Global kernels have good generalization, while local kernels have good learning ability. Hence, combining the two kernels will make full use of their merits, which achieves good learning ability and generalization.

For a compound kernel, four parameters need to be confirmed. The values of the parameters have great effect of the accuracy of classification. Optimal

) , , ,

(C  a b is needed for the compound kernel.

4. EXPERIMETAL RESULTS AND DISCUSSION

In this paper, a remote sensing image which resolution is 30 meters of rice paddy in Guangdong province is selected to test the results, the program of classification is written based on libsvm. The results are showed in table 1 as follows:

Table 1. The classification accuracy using different kernels.

Table 1. shows result of the classification of the remote sensing image.

The classification method of compound kernel has the highest accuracy, and

Number of pixels

kernel Number of sv Accuracy

350 RBF C=1 γ=0.1 310 91.2%

350 linear 210 85.3%

350 Compound of

Linear and RBF C=2 γ=0.5 a=1 b=3

270 93.1%

(7)

6 Jianing Zhao , Wanlin Gao , Zili Liu , Guifen Mou , Lin Lu , Lina Yu needs less number of sv.

From the result, we can get that the compound kernel has good ability of generalization and learning. Compared to the RBF kernel, the compound kernel has a higher accuracy of classification but the number of support vector is lower than it. Hence, the compound kernel achieves good generalization ability. In the test, the value of (C,,a,b)has a great effect on the accuracy of classification. For different remote sensing image and different, different compound kernels and (C,,a,b)should be selected.

The optimal set of compound and value of (C,,a,b)should be fixed through repeated trails.

5. CONCLUSION

In conclusion, the compound kernels yield better results than the single kernel. The compound kernels need less number of support vectors, which means that the kernels have good generalization ability and achieve higher classification accuracy than single kernels.

REFERENCES

C. J. Burges, ―A tutorial on support vector machines for pattern recognition,‖in Data mining and knowledge discovery, U. Fayyad, Ed. Kluwer Academic, 1998, pp. 1–43.

N.Cristianini, Shawe-Taylor J.An introduction to support vector machines and other Kernel- based learning methods：House of Electronics Industry, 2005

G. Smits, E. Jordaan, ―Improved svm regression using mixtures of kernels,‖ in IJCNN, 2002.

V.N. Vapnik, The nature of statistical learning theory, NewYork:Springer-Verlag,1995.

V.N. Vapnik, Statistical Learning theory, Wiley, New York, 1998.

XIA Hongxia, DING Zichun, LI Zhe, GUO Cuicui, SONG Huazhu. A Adaptive Compound Kernel Function of Support Vector Machines. JOURNAL OF WUHAN UNIVERSITY OF TECHNOLOGY, 2009,2

ZHANG Xue Gong. Introduction statistical learning theory and support vectormachines, Act Autom atica S inica ,2000, 1 (26) : 32—42