Transfer Learning with Kernel Methods

(1)

HAL Id: tel-02972361

https://tel.archives-ouvertes.fr/tel-02972361

Submitted on 20 Oct 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Xiaoyi Chen

To cite this version:

Xiaoyi Chen. Transfer Learning with Kernel Methods. Machine Learning [cs.LG]. Université de Technologie de Troyes, 2018. English. �NNT : 2018TROY0005�. �tel-02972361�

(2)

(3)

THESE

pour l’obtention du grade de

DOCTEUR de l’UNIVERSITE DE TECHNOLOGIE DE TROYES

Spécialité : OPTIMISATION ET SURETE DES SYSTEMES

présentée et soutenue par

Xiaoyi CHEN

le 16 mars 2018

Transfer Learning with Kernel Methods

JURY

M. P. LARZABAL PROFESSEUR DES UNIVERSITES Président

M. F. ABDALLAH PROFESSEUR Examinateur

M. S. CANU PROFESSEUR DES UNIVERSITES Rapporteur M. P. HONEINE PROFESSEUR DES UNIVERSITES Examinateur M. R. LENGELLÉ PROFESSEUR DES UNIVERSITES Directeur de thèse Mme L. OUKHELLOU DIRECTEUR DE RECHERCHE IFSTTAR Rapporteur

(4)

Iwouldliketo expressmysineregratitudetoMr.RégisLENGELLE, mysupervisorduring

mythreeandhalfyears'dotoralresearh.Hehasguidedmewithhisknowledgeandexperiene.

Ihighlyvaluedthe ooperationand friendshipbetween us.

IwouldlikethankMr.andMrs.Gongwhohavehelpedmealotsine Iwereundergraduate.

Thanks to seretaries of LM2S, Bernadette André and Véronique Banse for their availability

andamiability,aswellastheseretariesof dotoral shool,PasaleDenis,Isabelle Lelerq and

ThérèseKazarian.

Thank you to my friends for aompanying me with muh joy and my parents who have

supportedmeall thetime.

(5)

(6)

(7)

(8)

List of Figures

List of Tables

Introdution

Chapter 1

Mahine Learning and Kernels

1.1 Introdution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2 Statistial Learning Theory inClassiation . . . . . . . . . . . . . . . . . . . . . 6

1.2.1 Statistial Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2.2 Bayesdeisionrule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.3 Loss Funtionsfor Binary Classiation . . . . . . . . . . . . . . . . . . . 8

1.2.4 ExpetedGeneralized Errorand EmpirialRisk . . . . . . . . . . . . . . . 9

1.2.5 VCdimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.2.6 Strutural Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.2.7 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.3 Kernel Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.3.1 Positive DeniteKernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.3.2 Other Properties of (Positive Denite) Kernels . . . . . . . . . . . . . . . 14

1.3.3 ReproduingKernel HilbertSpae (RKHS) . . . . . . . . . . . . . . . . . 15

1.3.4 Merer's Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.3.5 Examples of Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.3.6 Representer Theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.4 Mahine Learning withKernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.4.1 Hard MarginSupportVetorMahines (SVM) . . . . . . . . . . . . . . . 18

1.4.2 Soft Margin Support VetorMahines (SVM) . . . . . . . . . . . . . . . . 19

1.4.3 Kernel Prinipal Component Analysis (KPCA) . . . . . . . . . . . . . . . 22

1.4.4 MaximumMeanDisrepany (MMD) . . . . . . . . . . . . . . . . . . . . 23

(9)

1.4.5 KernelDensityEstimation(KDE) . . . . . . . . . . . . . . . . . . . . . . 24

1.4.6 Preimage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1.5 Conlusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Chapter 2 Overview of TransferLearning 2.1 Introdution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.2 Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.2.1 Transfer Learning Categorized byavailabilityof labels insoure andtarget 31 2.2.2 Transfer Learning Categorized bydierenes infeature spae/label spae. 32 2.2.3 Transfer Learning Categorized bytransferapproah . . . . . . . . . . . . 33

2.3 OverviewofHomogeneous Transfer Learning . . . . . . . . . . . . . . . . . . . . 35

2.3.1 Indutive Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.3.2 Transdutive Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . 37

2.3.3 Impliit Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.4 OverviewofHeterogeneous Transfer Learning . . . . . . . . . . . . . . . . . . . . 40

2.4.1 Heterogenous Transfer Learning withCo-ourrenes . . . . . . . . . . . . 40

2.4.2 Heterogenous Transfer Learning without Co-ourrenes . . . . . . . . . . 43

2.5 Comments onNegative Transfer. . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.5.1 Negative Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.5.2 Overviewofliteratures againstnegativetransfer. . . . . . . . . . . . . . . 44

2.6 Appliations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.6.1 Appliationsfor Homogeneous Transfer Learning . . . . . . . . . . . . . . 45

2.6.2 Appliationsfor Heterogeneous Transfer Learning . . . . . . . . . . . . . . 46

2.7 Conlusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Chapter 3 Covariate Shift and RelaxedCovariate Shift 3.1 Introdution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.2 Bakground . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.2.1 Context and Denition ofCovariate Shift . . . . . . . . . . . . . . . . . . 50

3.3 OverviewofCovariate Shift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.3.1 Covariateshift transferlearningwith Similarity Criteriaintegrated . . . . 56

3.3.2 Covariateshift transferlearningwith SampleSeletion Strategy . . . . . . 57

3.4 RelaxedCovariate Shift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.4.1 Indutive RelaxedCovariateShift. . . . . . . . . . . . . . . . . . . . . . . 58

3.4.2 Transdutive RelaxedCovariate Shift(MLRCV) . . . . . . . . . . . . . . 59

(10)

Chapter 4

Domain Adaptation by SVM subjet toa MMD-like onstraint

4.1 Introdution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.2 Bakground . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.2.1 SVM basedTransfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.2.2 MMD basedTransfer Learning . . . . . . . . . . . . . . . . . . . . . . . . 66

4.3 Domain Adaptation bySVM ombinedwithMMD . . . . . . . . . . . . . . . . . 67

4.3.1 Kernel Mean Mathing(KMM) . . . . . . . . . . . . . . . . . . . . . . . . 67

4.3.2 Domain adapatation by SVM with MMD as a regularization term (LM and ARSVM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.3.3 DomainadapatationbySVMsubjettoaMMD-likeonstraint(SVMMMD) 69 4.4 Simulations andAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.4.1 Illustration of PriniplesofSVMMMD . . . . . . . . . . . . . . . . . . . . 73

4.4.2 SVMMMD and LMon Banana-OrangeDataset . . . . . . . . . . . . . . . 74

4.4.3 Inuene ofthe Kernel Parameter . . . . . . . . . . . . . . . . . . . . . . 74

4.5 Conlusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Chapter 5 Domain Adaptation by KPCA Alignment 5.1 Introdution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.2 Bakground . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.2.1 Dimension Redution basedDomainAdaptation . . . . . . . . . . . . . . 78

5.2.2 Domain Adaptation bySubspae Alignment . . . . . . . . . . . . . . . . . 80

5.2.3 Domain Adaptation byKernelSpae Alignment . . . . . . . . . . . . . . . 81

5.2.4 Robust Transfer using PrinipalComponent Analysis. . . . . . . . . . . . 82

5.3 Domain Adaptation byKPCACoordinate SystemAlignment . . . . . . . . . . . 83

5.3.1 KPCASubspae Transformation . . . . . . . . . . . . . . . . . . . . . . . 83

5.3.2 KPCACoordinate System Alignment (KPCA-TL) . . . . . . . . . . . . . 84

5.3.3 Posterior Linear Transformation to furtherimprove theAlignment . . . . 87

5.4 DomainAdaptationbyKernelSpaeAlignment AfteraLinearTransformationin theInput Spaeand its KernelRepresentations . . . . . . . . . . . . . . . . . . . 87

5.4.1 Step1 :Linear Transformation intheOriginalInput Spae . . . . . . . . 87

5.4.2 Step2 :KPCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5.4.3 Step3 :KernelRepresentation Alignment . . . . . . . . . . . . . . . . . . 90

(11)

5.4.4 Step4: LinearClassiation . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.4.5 Fast Searh forParameter . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.5 Simulations and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.5.1 Simulations on Syntheti Datasets . . . . . . . . . . . . . . . . . . . . . . 91

5.5.2 Eieny Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

5.5.3 Tuning ofKernelParameters . . . . . . . . . . . . . . . . . . . . . . . . . 96

5.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

5.6.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

5.6.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5.6.3 Comparisonto other state-of-the-artmethods . . . . . . . . . . . . . . . . 99

5.6.4 Analysisof parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.7 Conlusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

Chapter 6 Conlusion and Perspetives 6.1 Conlusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

6.2 Perspetives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

Annexes 1 Annexesfor Chapter4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

1.1 DualFormof KMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

1.2 FinalOptimization ProblemofLM and itsDual Form . . . . . . . . . . . 108

2 Annexesfor Chapter5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

2.1 Graph LaplaianM ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ¹⁰⁸

2.2 Denition ofSurrogate Kernel. . . . . . . . . . . . . . . . . . . . . . . . . 109

2.3 NystromKernelApproximation([113 ℄) . . . . . . . . . . . . . . . . . . . . 109

Résumé en français 1 Introdution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

1.1 Aperçudutransfert d'apprentissage . . . . . . . . . . . . . . . . . . . . . 111

1.2 Apprentissage Homogène (homogeneous transdutive transferlearning) . . 112

2 CovariateShiftÉtendu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

3 DomainAdaptation ave SVMsous ontrainte basée surlaMMD . . . . . . . . . 114

3.1 Outils fondamentaux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

3.2 DomainAdaptationparSVMsousontraintedenullitédelaMMD(SVMMMD)117 3.3 DomainAdaptation ave SVMrégularisé par MMD(LM) . . . . . . . . . 119

4 DomainAdaptation ave alignement dans unsous-espaede laKPCA . . . . . . 120

(12)

4.3 TransformationLinéaire a Posteriori (KPCA-TL-LT) . . . . . . . . . . . . 123

4.4 TransformationLinéaire a Priori(KPCAlin) . . . . . . . . . . . . . . . . . 123

4.5 Résultats expérimentaux . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

5 Conlusion etPerspetives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

5.1 Conlusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

5.2 Perspetives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

Bibliography 131

(13)