Author's personal copy

(1)

Contents lists available atSciVerse ScienceDirect

Journal of Complexity

journal homepage:www.elsevier.com/locate/jco

Vector-valued reproducing kernel Banach spaces with applications to multi-task learning

^✩

Haizhang Zhang

^a,b,^∗

, Jun Zhang

^c

aSchool of Mathematics and Computational Science, Sun Yat-sen University, Guangzhou 510275, PR China

bGuangdong Province Key Laboratory of Computational Science, Sun Yat-sen University, Guangzhou 510275, PR China

cDepartment of Psychology, University of Michigan, Ann Arbor, MI 48109, USA

a r t i c l e i n f o

Article history:

Received 18 February 2012 Accepted 31 May 2012

Available online 20 September 2012 Keywords:

Vector-valued reproducing kernel Banach spaces

Feature maps Regularized learning The representer theorem Characterization equations

a b s t r a c t

Motivated by multi-task machine learning with Banach spaces, we propose the notion of vector-valued reproducing kernel Banach spaces (RKBSs). Basic properties of the spaces and the associated reproducing kernels are investigated. We also present feature map constructions and several concrete examples of vector-valued RKBSs. The theory is then applied to multi-task machine learning.

Especially, the representer theorem and characterization equations for the minimizer of regularized learning schemes in vector-valued RKBSs are established.

1. Introduction

The purpose of this paper is to establish the notion of vector-valued reproducing kernel Banach spaces and demonstrate its applications to multi-task machine learning. Built on the theory of scalar- valued reproducing kernel Hilbert spaces (RKHSs) [3], kernel methods have been proven successful in single task machine learning [10,14,29,30,33]. Multi-task learning where the unknown target function to be learned from finite sample data is vector-valued appears more often in practice. Refs. [13,25]

proposed the development of kernel methods for learning multiple related tasks simultaneously. The mathematical foundation used there was the theory of vector-valued RKHSs [5,27]. Recent progresses

✩ This work was partially supported by the US National Science Foundation under grant 0631541 (PI: Jun Zhang), by Guangdong Provincial Government of China through the ‘‘Computational Science Innovative Research Team’’ program, and by Natural Science Foundation of China under grants 11222103, 11101438 and 91130009.

∗ Corresponding author at: School of Mathematics and Computational Science, Sun Yat-sen University, Guangzhou 510275, PR China.

E-mail addresses:[email protected](H. Zhang),[email protected](J. Zhang).

doi:10.1016/j.jco.2012.09.002

(2)

in vector-valued RKHSs can be found in [7–9,38]. In such a framework, both the space of the candidate functions used for approximation and the output space are chosen as a Hilbert space.

There are some occasions where it might be desirable to select the space of candidate functions, the output space, or both as Banach spaces. Hilbert spaces constitute a special and limited class of Banach spaces. Any two Hilbert spaces over a common number field with the same dimension are isometrically isomorphic. By reaching out to other Banach spaces, one obtains more variety in geometric structures and norms that are potentially useful for learning and approximation. Moreover, training data might come with intrinsic structures that make them impossible or inappropriate to be embedded into a Hilbert space. Learning schemes based on features in a Hilbert space may not work well for them. Finally, in some applications, a Banach space norm is engaged for some particular purpose. A typical example is the linear programming regularization in coefficient based regularization for machine learning [29], where the

ℓ

1 norm is employed to obtain sparsity in the resulting minimizer.

There has been considerable work in learning a single task with Banach spaces (see, for example, [4,6,12,15,17,20,24,26,34,36,42]). The difficulty in mapping patterns into a Banach space and making use of these features for learning mainly lies in the lack of an inner product in Banach spaces. In particular, without an appropriate correspondence of the Riesz representation of continuous linear functionals, point evaluations do not have a kernel representation in these studies. Semi-inner products, a mathematical tool discovered by Lumer [23] for the purpose of extending Hilbert space type arguments to Banach spaces, seem to be a natural substitute for inner products in Banach spaces.

An illustrative example is that we were able to extend the classical theory of frames and Riesz bases to Banach spaces via semi-inner products [40]. Semi-inner products were first used to machine learning by Der and Lee [12] for the study of large margin classification by hyperplanes in a Banach space. With this tool, we established the notion of scalar-valued reproducing kernel Banach spaces (RKBSs) and investigated regularized learning schemes in RKBSs [37,39]. There has been increasing interest in the application of this new theory [19,31,32,41].

We attempt to build a mathematical foundation for multi-task learning with Banach spaces.

Specifically, we shall propose a definition of vector-valued RKBSs and investigate its fundamental properties in the next section. Feature map representations and several concrete examples of vector- valued RKBSs will be presented in Sections 3 and 4, respectively. In Section 5, we investigate regularized learning schemes in vector-valued RKBSs.

2. Definition and basic properties

We are concerned with spaces of functions from a fixed set to a vector space. We shall allow the space of functions and the range space both to be a Banach space. Our key tool in dealing with a general Banach space is the semi-inner product [16,23]. Recall that a semi-inner product on a Banach spaceV is a function fromV

×

_V _to_C, denoted by

[· , ·]

_V, such that for allf

,

^g

,

^h

∈

_V _and

α, β ∈

_C

1. (linearity with respect to the first variable)

[ α

^f

+ β

^g

,

^h

]

_V

= α [

_f

,

^h

]

_V

+ β [

_g

,

^h

]

_V_; 2. (positivity)

[

_f

,

^f

]

_V

>

^{0 for}^f

̸=

_0;

3. (conjugate homogeneity with respect to the second variable)

[

_f

, α

^g

]

_V

= α [

_f

,

^g

]

_V_; 4. (Cauchy–Schwarz inequality)

|[

_f

,

^g

]

_V

| ≤ [

_f

,

^f

]

¹_V^/²

[

_g

,

^g

]

¹_V^/²_.

A semi-inner product

[· , ·]

_V _on_V is said to becompatibleif

[

_f

,

^f

]

¹^/²

V

= ∥

_f

∥

_V _{for all}_f

∈

_V

,

where

∥ · ∥

_Vdenotes the norm onV. Every Banach space has a compatible semi-inner product [16,23].

Let

[· , ·]

_V be a compatible semi-inner product onV. Then one sees by the Cauchy–Schwarz inequality that for eachf

∈

_B, the linear functionalf^∗onV defined by

f^∗

(

^g

) := [

_g

,

^f

]

_V

,

^g

∈

_V _(2.1)

is bounded onV. In other words,f^∗lies in the dual spaceB^∗ ofB. Moreover, we have

∥

_f^∗

∥

_V∗

= ∥

_f

∥

_V _(2.2)

(3)

and

f^∗

(

^f

) = ∥

_f

∥

_V

∥

_f^∗

∥

_V^∗

.

^(2.3)

Introduce theduality mappingJ_V fromV toV^∗by setting JV

(

^f

) :=

_f^∗

,

^f

∈

_V

.

We desire to represent the continuous linear functionals on the vector-valued RKBS by the semi- inner product. However, the semi-inner product might not be able to fulfill this important role for an arbitrary Banach space. For instance, one verifies that the continuous linear functional

µ(

^g

) :=

∞



j=₁

( −

₁

)

^j¹ 2^jg

₁ 2^j



,

^g

∈

_C

( [

₀

,

¹

] )

onC

( [

₀

,

¹

] )

endowed with the usual maximum norm cannot be represented as

µ(

^g

) = [

_g

,

^f

] ,

^g

∈

_C

( [

₀

,

¹

] )

for any compatible semi-inner product

[· , ·]

_on_C

( [

₀

,

¹

] )

^{and any}^f

∈

_C

( [

₀

,

¹

] )

^.

The above example indicates that the duality mapping might not be surjective for a general Banach space. Other problems such as non-uniqueness of compatible semi-inner products and non-injectivity of the duality mapping may also occur. To overcome these difficulties, we shall focus on Banach spaces that are uniformly convex and uniformly Fréchet differentiable in this preliminary work on vector- valued RKBSs. A Banach spaceV isuniformly convexif for all

ε >

0 there exists a

δ >

0 such that

∥

_f

+

_g

∥

_V

≤

₂

− δ

^{for all}^f

,

^g

∈

_V _with

∥

_f

∥

_V

= ∥

_g

∥

_V

=

_{1 and}

∥

_f

−

_g

∥

_V

≥ ε.

Uniform convexity ensures the injectivity of the duality mapping and the existence and uniqueness of the best approximation to a closed convex subset ofV [16]. We also say thatV isuniformly Fréchet differentiableif for allf

,

^g

∈

_V

lim

t∈_R,^t→₀

∥

_f

+

_tg

∥

_V

− ∥

_f

∥

_V

t (2.4)

exists and the limit is approached uniformly for allf

,

^gin the unit ball ofV. IfV is uniformly Fréchet differentiable then it has a unique compatible semi-inner product [16]. The differentiability(2.4)of the norm is useful to derive characterization equations for the minimizer of regularized learning schemes in Banach spaces. For simplicity, we call a Banach space uniformif it is both uniformly convex and uniformly Fréchet differentiable. An analogue of the Riesz representation theorem holds for uniform Banach spaces.

Lemma 2.1 (Giles [16]).Let V be a uniform Banach space. Then it has a unique compatible semi-inner product

[· , ·]

_Vand the duality mappingJV is bijective from V to V^∗. In other words, for each

µ ∈

_V^∗_there exists a unique f

∈

V such that

µ(

^g

) = [

_g

,

^f

]

_V _{for all g}

∈

_V

.

In this case,

[

_f^∗

,

^g^∗

]

_B∗

:= [

_g

,

^f

]

_B

,

^f

,

^g

∈

_B _(2.5)

defines a compatible semi-inner product onB^∗.

LetV be a uniform Banach space. We shall always denote by

[· , ·]

_V the unique compatible semi- inner product onV. ByLemma 2.1and Eq.(2.2), the duality mapping is bijective and isometric from V toV^∗. It is also conjugate homogeneous by property 3 of semi-inner products. However, it is non- additive unlessV reduces to a Hilbert space. As a consequence, a compatible semi-inner product is in general conjugate homogeneous but non-additive with respect to its second variable. Namely,

[

_f

,

^g

+

_h

]

_V

̸= [

_f

,

^g

]

_V

+ [

_f

,

^h

]

_V in general.

(4)

We are ready to present the definition of vector-valued RKBSs. Let

Λ

be a Banach space which we shall sometimes call theoutput spaceandX be a prescribed set which is usually called theinput space.

A spaceBis called aBanach space of

Λ

-valued functionsonXif it consists of certain functions fromX to

Λ

and the norm onBis compatible with point evaluations in the sense that

∥

_f

∥

_B

=

₀ if and only if f

(

^x

) =

₀ _{for all}_x

∈

_X

.

For instance, L^p

( [

₀

,

¹

] ),

^p

≥

1 is not a Banach space of functions while C

( [

₀

,

¹

] )

is. We restrict our consideration to Banach spaces of functions so that point evaluations (usually referred to as

‘‘sampling’’ in applications) are well-defined.

Definition 2.2. We callB^a

Λ

-valued RKBS onXif bothB^and

Λ

are uniform andBis a Banach space of functions fromX to

Λ

such that for everyx

∈

_X, the point evaluation

δ

x

:

_B

→ Λ

^{defined by}

δ

x

(

^f

) :=

_f

(

^x

),

^f

∈

_B is continuous fromBto

Λ

^.

We shall derive a reproducing kernel for so defined vector-valued RKBSs. Throughout the rest of the paper, we let

[· , ·]

_B_and

[· , ·]

_Λbe the unique semi-inner products andJBandJ_Λthe associated duality mappings onBand

Λ

, respectively. For two Banach spacesV₁

,

^V2, we denote byM

(

^V1

,

^V2

)

the set of all the bounded operators fromV₁ to V₂ and L

(

^V1

,

^V2

)

the subset ofM

(

^V1

,

^V2

)

^{of those} bounded operators that are also linear. WhenV₁

=

_V₂

,

^M

(

^V1

,

^V2

)

is abbreviated asM

(

^V1

)

^{. For each} T

∈

_M

(

^V1

,

^V2

)

, we denote by

∥

_T

∥

_M₍_V

1,V₂)the greatest lower bound of all the nonnegative constants

α

^{such that}

∥

_Tu

∥

_V

2

≤ α ∥

_u

∥

_V

1 for allu

∈

_V₁

.

WhenT is also linear, this quantity equals the operator norm

∥

_T

∥

_L₍_V

1,^V2)ofT inL

(

^V1

,

^V2

)

^{. In those} languages, we require that the point evaluation

δ

x on a

Λ

-valued RKBS onX belongs toL

(

^B

, Λ)

^for allx

∈

_X_.

Theorem 2.3. LetBbe a

Λ

-valued RKBS on X . Then there exists a unique function K from X

×

_{X to}_M

(Λ)

such that

(1) K

(

^x

, · )ξ ∈

_B_{for all x}

∈

_{X and}

ξ ∈ Λ

^, (2) for all f

∈

_B

,

^x

∈

_{X , and}

ξ ∈ Λ

[

_f

(

^x

), ξ ]

_Λ

= [

_f

,

^K

(

^x

, · )ξ ]

_B

,

^(2.6)

(3) for all x

,

^y

∈

_X

∥

_K

(

^x

,

^y

) ∥

_M₍_Λ₎

≤ ∥ δ

x

∥

_L₍_B_,_Λ₎

∥ δ

y

∥

_L₍_B_,_Λ₎

.

^(2.7) Proof. Letx

∈

_X _and

ξ ∈ Λ

^{. As}

δ

x

∈

_L

(

B

, Λ)

, we see that

|[

_f

(

^x

), ξ ]

_Λ

| ≤ ∥

_f

(

^x

) ∥

_Λ

∥ ξ ∥

_Λ

≤ ∥ δ

x

∥

_L₍_B_,_Λ₎

∥

_f

∥

_B

∥ ξ ∥

_Λ

.

^(2.8) The above inequality together with the linearity of the semi-inner product with respect to its first variable implies that

f

→ [

_f

(

^x

), ξ ]

_Λ

is a bounded linear functional onB. ByLemma 2.1, there exists a unique functiong_x_,ξ

∈

_B_{such that}

[

_f

(

^x

), ξ ]

_Λ

= [

_f

,

^gx,ξ

]

_B

.

^(2.9)

Define a functionK fromX

×

_Xto the set of operators from

Λ

^to

Λ

^{by setting} K

(

^x

,

^y

)ξ :=

_g_x_,ξ

(

^y

),

^x

,

^y

∈

_X

, ξ ∈ Λ.

(5)

Clearly,K satisfies the two requirements (1) and (2). It is also unique by the uniqueness of the function g_x_,ξ satisfying(2.9). It remains to show that it is bounded. To this end, we get by(2.8)that

∥

_K

(

^x

, · )ξ ∥

_B

=

_sup

f∈_B,∥_f∥_B≤₁

|[

_f

,

^K

(

^x

, · ) ]

_B

| =

_sup

f∈_B,∥_f∥_B≤₁

|[

_f

(

^x

), ξ ]

_Λ

| ≤ ∥ δ

x

∥

_L₍_B_,_Λ₎

∥ ξ ∥

_Λ

.

It follows that

∥

_K

(

^x

,

^y

)ξ ∥

_B

≤ ∥ δ

y

∥

_L₍_B_,_Λ₎

∥

_K

(

^x

, · )ξ ∥

_B

≤ ∥ δ

x

∥

_L₍_B_,_Λ₎

∥ δ

y

∥

_L₍_B_,_Λ₎

∥ ξ ∥

_Λ

,

which proves(2.7).

We call the above functionK thereproducing kernelofB. It coincides with the usual reproducing kernel when B is a Hilbert space and

Λ =

_C, and with the vector-valued reproducing kernel when bothB and

Λ

are Hilbert spaces. We explore basic properties of vector-valued RKBSs and its reproducing kernels for further investigation and applications.

Let

(δ

x

)

^∗ be the adjoint operator of

δ

x for allx

∈

_X. Denote for a Banach spaceV by

( · , · )

V the bilinear form onV

×

_V^∗_{defined by}

(v, µ)

V

:= µ(v), v ∈

_V

, µ ∈

_V^∗

.

Thus,

(δ

x

)

^∗is defined by

(

^f

, (δ

x

)

^∗

ξ

^∗

)

B

= (δ(

^x

)(

^f

), ξ

^∗

)

Λ

= (

^f

(

^x

), ξ

^∗

)

Λ

= [

_f

(

^x

), ξ ]

_Λ

,

^f

∈

_B

, ξ ∈ Λ.

^(2.10) Proposition 2.4. Let Bbe a

Λ

-valued RKBS on X and K its reproducing kernel. Then there holds for all x

,

^y

∈

_{X and}

ξ, η, τ ∈ Λ

^that

[

_K

(

^x

,

^x

)ξ, ξ ]

_Λ

≥

₀

, |[

_K

(

^x

,

^y

)ξ, η ]

_Λ

| ≤ [

_K

(

^x

,

^x

)ξ, ξ ]

¹_Λ^/²

[

_K

(

^y

,

^y

)η, η ]

¹_Λ^/²

,

^(2.11)

∥

_K

(

^x

,

^y

) ∥

_M₍_Λ₎

≤ ∥

_K

(

^x

,

^x

) ∥

¹^/²

M(Λ)

∥

_K

(

^y

,

^y

) ∥

¹^/²

M(Λ)

,

^(2.12)

K

(

^x

, · )ξ =

_J⁻¹

B

(δ

x

)

^∗^JΛ

(ξ),

^(2.13)

K

(

^x

,

^y

)(αξ) = α

^K

(

^x

,

^y

)ξ

^{for all}

α ∈

_C

,

^(2.14)

∥

_K

(

^x

, · )ξ ∥

_B

≤ ∥ δ

x

∥

_L₍_B_,_Λ₎

∥ ξ ∥

_Λ

, ∥

_K

(

^x

, · )ξ ∥

_B

≤ ∥

_K

(

^x

,

^x

) ∥

¹^/²

M(Λ)

∥ ξ ∥

_Λ

,

^(2.15)

(

^K

(

^x

, · )ξ)

^∗

+ (

^K

(

^x

, · )η)

^∗

= (

^K

(

^x

, · )τ)

^∗ ^whenever

τ

^∗

= ξ

^∗

+ η

^∗

,

^(2.16) span

{ (

^K

(

^x

, · )ξ)

^∗

:

_x

∈

_X

, ξ ∈ Λ }

is dense inB^∗

.

^(2.17) Proof. By(2.6),

[

_K

(

^x

,

^x

)ξ, ξ ]

_Λ

= [

_K

(

^x

, · )ξ,

^K

(

^x

, · )ξ ]

_B

= ∥

_K

(

^x

, · )ξ ∥

²_B

≥

₀

,

^(2.18) which proves the first inequality in Eq. (2.11). For the second one, we use the Cauchy–Schwarz inequality of semi-inner products to get that

|[

_K

(

^x

,

^y

)ξ, η ]

_Λ

| = |[

_K

(

^x

, · )ξ,

^K

(

^y

, · )η ]

_B

| ≤ [

_K

(

^x

, · )ξ,

^K

(

^x

, · )ξ ]

¹_B^/²

[

_K

(

^y

, · )η,

^K

(

^y

, · )η ]

¹_B^/²

= [

_K

(

^x

,

^x

)ξ, ξ ]

¹_Λ^/²

[

_K

(

^y

,

^y

)η, η ]

¹_Λ^/²

.

It follows from(2.11)that

|[

_K

(

^x

,

^y

)ξ, η ]

_Λ

| ≤ ∥

_K

(

^x

,

^x

)ξ ∥

¹^/²

Λ

∥ ξ ∥

¹^/²

Λ

∥

_K

(

^y

,

^y

)η ∥

¹^/²

Λ

∥ η ∥

¹^/²

Λ

≤ ∥

_K

(

^x

,

^x

) ∥

¹^/²

M(Λ)

∥

_K

(

^y

,

^y

) ∥

¹^/²

M(Λ)

∥ ξ ∥

_Λ

∥ η ∥

_Λ

.

Since

∥

_K

(

^x

,

^y

)ξ ∥

_Λ

=

_sup

{|[

_K

(

^x

,

^y

)ξ, η ]

_Λ

| : η ∈ Λ, ∥ η ∥

_Λ

=

₁

}

, we have by the above equation that

∥

_K

(

^x

,

^y

)ξ ∥

_Λ

≤ ∥

_K

(

^x

,

^x

) ∥

¹^/²

M(Λ)

∥

_K

(

^y

,

^y

) ∥

¹^/²

M(Λ)

∥ ξ ∥

_Λ

,

which proves(2.12).

(6)

Turning to(2.13), we notice for eachf

∈

_B_that

[

_f

,

^JB⁻¹

(δ

x

)

^∗^JΛ

(ξ) ]

_B

= (

^f

, (δ

x

)

^∗^JΛ

(ξ))

B

= (δ

x

(

^f

), ξ

^∗

)

Λ

= (

^f

(

^x

), ξ

^∗

)

Λ

= [

_f

(

^x

), ξ ]

_Λ

,

which together with(2.6)confirms(2.13). Since the duality mappings are conjugate homogeneous, we have by(2.13)that

K

(

^x

, · )(αξ) =

_J_B⁻¹

(δ

x

)

^∗J_Λ

(αξ) = α

J_B⁻¹

(δ

x

)

^∗J_Λ

(ξ) = α

^K

(

^x

, · )ξ,

which implies(2.14).

Recall that the duality mappingsJBandJ_Λare isometric. Note also that a bounded linear operator and its adjoint have equal operator norms. Using these two facts, we obtain from Eq.(2.13)that

∥

_K

(

^x

, · )ξ ∥

_B

≤ ∥ (δ

x

)

^∗

∥

_L₍_Λ∗,^B^∗)

∥ ξ ∥

_Λ

= ∥ δ

x

∥

_L₍_B_,_Λ₎

∥ ξ ∥

_Λ

,

which is the first inequality in(2.15). The second one follows immediately from(2.18).

Let

ξ, η, τ ∈ Λ

be such that

τ

^∗

= ξ

^∗

+ η

^∗^{. By}^(2.13),

(

^K

(

^x

, · )ξ)

^∗

+ (

^K

(

^x

, · )η)

^∗

= (δ

x

)

^∗

ξ

^∗

+ (δ

x

)

^∗

η

^∗

= (δ

x

)

^∗

(ξ

^∗

+ η

^∗

) = (δ

x

)

^∗

τ

^∗

= (

^K

(

^x

, · )τ)

^∗

.

Eq.(2.16)hence holds true.

For the last property, let us assume that there exists some f

∈

_B that vanishes on span

{ (

^K

(

^x

, · )ξ)

^∗

:

_x

∈

_X

, ξ ∈ Λ }

_{. Then}

[

_f

(

^x

), ξ ]

_Λ

= [

_f

,

^K

(

^x

, · )ξ ]

_B

= (

^f

, (

^K

(

^x

, · )ξ)

^∗

)

B

=

₀ _{for all}_x

∈

_X

, ξ ∈ Λ,

which implies thatf

(

^x

) =

_{0 for all}_x

∈

_X_{. As}_Bis a Banach space of functions,f

=

0 as a vector in the Banach spaceB. Therefore,(2.17)is true. The proof is complete.

We observe by the above proposition that the reproducing kernel of a vector-valued RKBS enjoys many properties similar to those of the reproducing kernel of a vector-valued RKHS. However, there are many significant differences due to the nature of a semi-inner product. First, although for all x

,

^y

∈

_X

,

^K

(

^x

,

^y

)

remains a homogeneous bounded operator on

Λ

, it is generally non-additive. This can be seen from(2.13), whereJ_ΛorJ_B⁻¹is non-additive. Second, it is well-known that when

Λ

^{is a} Hilbert space, a functionK

:

_X

×

_X

→

_L

(Λ)

is the reproducing kernel of some

Λ

-valued RKHS onX if and only if for all finite

ξ

j

∈ Λ

and pairwise distinctx_j

∈

_X

,

^j

=

₁

,

²

, . . . ,

^m,

m



j=₁ m



k=₁

[

_K

(

^xj

,

^xk

)ξ

j

, ξ

k

]

_Λ

≥

₀

.

^(2.19)

Although(2.19)still holds for the reproducing kernel of a vector-valued RKBS whenm

≤

_{2 and the} number field isR, it may cease to be true once the number of sampling pointsmexceeds 2. An example will be constructed in the next section. Finally, the denseness property(2.17)in the dual spaceB^∗does not necessarily imply that

span

{

_K

(

^x

, · )ξ :

_x

∈

_X

, ξ ∈ Λ } =

_B

.

^(2.20)

A negative example will also be given in the next section after we present a construction of vector- valued RKBSs through feature maps. Before that, we present another important property of a vector- valued RKBS.

Proposition 2.5. LetBbe a

Λ

-valued RKBS on X . Suppose that f_n

∈

_B

,

ⁿ

∈

_Nconverges to some f₀

∈

_B then f_n

(

^x

)

converges to f₀

(

^x

)

in the topology of

Λ

^{for each x}

∈

X . The convergence is uniform on the set where

∥

_K

(

^x

,

^x

) ∥

_M₍_Λ₎is bounded.

Proof. Suppose that

∥

_f_n

−

_f

∥

_Bconverges 0 asntends to infinity. We get by(2.15)that

∥

_f_n

(

^x

) −

_f

(

^x

) ∥

_Λ

=

_sup

ξ∈_Λ,∥ξ∥_Λ=₁

|[

_f_n

(

^x

) −

_f

(

^x

), ξ ]

_Λ

|

=

_sup

ξ∈_Λ,∥ξ∥_Λ=₁

|[

_f_n

−

_f

,

^K

(

^x

, · )ξ ]

_B

| ≤

_sup

ξ∈_Λ,∥ξ∥_Λ=₁

∥

_f_n

−

_f

∥

_B

∥

_K

(

^x

, · )ξ ∥

_B

≤ ∥

_f_n

−

_f

∥

_B

∥

_K

(

^x

,

^x

) ∥

¹^/²

M(Λ)

.

(7)

Therefore,f_n

(

^x

)

converges pointwise tof

(

^x

)

^on^X and the convergence is uniform on the set where

∥

_K

(

^x

,

^x

) ∥

_M₍_Λ₎is bounded.

3. Feature map representations

Feature map representations form the most important way of expressing reproducing kernels. To introduce feature maps for the reproducing kernel of a vector-valued RKBS, we need the notion of the generalized adjoint [22] of a bounded linear operator between Banach spaces. LetV₁

,

^V2 be two uniform Banach spaces with the compatible semi-inner products

[· , ·]

_V

1 and

[· , ·]

_V

2, respectively. The generalized adjoint T^Ďof aT

∈

_L

(

^V1

,

^V2

)

is an operator inM

(

^V2

,

^V1

)

^{defined by}

[

_Tu

, v ]

_V

2

= [

_u

,

^T^Ď

v ]

_V

1

,

^u

∈

_V₁

, v ∈

_V₂

.

It can be verified that

T^Ď

=

_J_V⁻¹

1T^∗JV₂

.

Thus,T^Ďis indeed bounded as

∥

_T^Ď

∥

_M₍_V

2,^V1)

= ∥

_T^∗

∥

_L₍_V^∗

2,V∗

1)

= ∥

_T

∥

_L₍_V

1,^V2)

.

We are in a position to present a characterization of the reproducing kernel of a vector-valued RKBS.

Theorem 3.1. A function K

:

_X

×

_X

→

_M

(Λ)

is the reproducing kernel of some

Λ

-valued RKBS on X if and only if there exist a uniform Banach spaceW and a mapping

Φ :

_X

→

_L

(

W

, Λ)

^{such that}

K

(

^x

,

^y

) = Φ (

^y

)Φ

^Ď

(

^x

),

^x

,

^y

∈

_X

,

^(3.1)

and

span

{ (Φ

^Ď

(

^x

)ξ)

^∗

:

_x

∈

_X

, ξ ∈ Λ } =

_W^∗

.

^(3.2)

Here

Φ

^Ďis the function from X toM

(Λ,

^W

)

^{defined by}

Φ

^Ď

(

^x

) := (Φ(

^x

))

^Ď

,

^x

∈

_{X .}

Proof. Suppose thatK is the reproducing kernel of some

Λ

-valued RKBS BonX. SetW

:=

_B _and define

Φ :

_X

→

_L

(

^W

, Λ)

^by

(Φ(

^x

))(

^f

) :=

_f

(

^x

),

^f

∈

_B

,

^x

∈

_X

.

To identify

Φ

^Ď, we observe by the reproducing property(2.6)for all

ξ ∈ Λ

^and^f

∈

_B_that

[

_f

, Φ

^Ď

(

^x

)ξ ]

_B

= [ (Φ(

^x

))

^f

, ξ ]

_Λ

= [

_f

(

^x

), ξ ]

_Λ

= [

_f

,

^K

(

^x

, · )ξ ]

_B

,

^x

∈

_X

, ξ ∈ Λ,

which implies that

Φ

^Ď

(

^x

)ξ =

_K

(

^x

, · )ξ

^{for all}^x

∈

_X _and

ξ ∈ Λ

. Requirement(3.2)is fulfilled by(2.17).

By the forms of

Φ

^and

Φ

^Ď, we obtain that

Φ(

^y

)Φ

^Ď

(

^x

)ξ = Φ(

^y

)(

^K

(

^x

, · )ξ) =

_K

(

^x

,

^y

)ξ,

which proves(3.1).

On the other hand, suppose thatK is of the form(3.1)in terms of some mapping

Φ

satisfying the denseness condition(3.2). We shall construct the RKBS that takesK as its reproducing kernel. For this purpose, we letBbe composed of functions fromX to

Λ

of the following form

f_u

(

^x

) := Φ(

^x

)

^u

,

^x

∈

_X_{for some}_u

∈

_W

.

Since each

Φ(

^x

)

is a linear operator,Bis a linear vector space. We impose a norm onBby setting

∥

_f_u

∥

_B

:= ∥

_u

∥

_W

,

^u

∈

_W

.

To verify that this is a well-defined norm, it suffices to show that the representeruof a functionf_u

∈

_B is unique. Assume thatf_u

=

0. Then for allx

∈

_X _and

ξ ∈ Λ

^,

(

^u

, (Φ

^Ď

(

^x

)ξ)

^∗

)

W

= [

_u

, Φ

^Ď

(

^x

)ξ ]

_W

= [ Φ(

^x

)

^u

, ξ ]

_Λ

= [

₀

, ξ ]

_Λ

=

₀

,

(8)

which combined with(3.2)implies thatu

=

0. The arguments also show thatBis a Banach space of functions. Moreover, it is a uniform Banach space as it is isometrically isomorphic toW. Clearly, we have for eachx

∈

_X_and_u

∈

_W _that

∥

_f_u

(

^x

) ∥

_Λ

= ∥ Φ(

^x

)

^u

∥

_Λ

≤ ∥ Φ(

^x

) ∥

_L₍_W_,_Λ₎

∥

_u

∥

_W

= ∥ Φ(

^x

) ∥

_L₍_W_,_Λ₎

∥

_f_u

∥

_B

,

which shows that point evaluations are bounded onB. We conclude thatBis a

Λ

-valued RKBS onX. It remains to prove thatKis the reproducing kernel ofB. To this end, we identify the unique compatible semi-inner product onBas

[

_f_u

,

^fv

]

_B

:= [

_u

, v ]

_W

,

^u

, v ∈

_W

,

and observe for allu

∈

_W _and_x

∈

_X _that

[

_f_u

,

^K

(

^x

, · )ξ ]

_B

= [

_f_u

, Φ( · )Φ

^Ď

(

^x

)ξ ]

_B

= [

_u

, Φ

^Ď

(

^x

)ξ ]

_W

= [ Φ (

^x

)

^u

, ξ ]

_Λ

= [

_f_u

(

^x

), ξ ]

_Λ

,

which is what we want. The proof is complete.

We call the Banach spaceW and the mapping

Φ

ⁱⁿTheorem 3.1a pair offeature spaceandfeature mapforK, respectively. The proof ofTheorem 3.1contains a construction of vector-valued RKBSs by feature maps, which we pull out separately as a corollary below.

Corollary 3.2. Let W be a uniform Banach space and

Φ :

_X

→

_L

(

^W

, Λ)

be a feature map of K that satisfies(3.1)and(3.2). Then the linear vector space

B

:= { Φ( · )

^u

:

_u

∈

_W

}

endowed with the norm

∥ Φ( · )

^u

∥

_B

:= ∥

_u

∥

_W

,

^u

∈

_W and compatible semi-inner product

[ Φ( · )

^u

, Φ( · )v ]

_B

:= [

_u

, v ]

_W

,

^u

, v ∈

_W

is a

Λ

-valued RKBS on X with the reproducing kernel K given by(3.1).

As an interesting application ofCorollary 3.2, we shall show that a vector-valued RKBS is always isometrically isomorphic to a scalar-valued RKBS on a different input space.

Corollary 3.3. If Bis a

Λ

-valued RKBS on X then the following linear vector space_B

˜

of complex-valued functions_{f on}

˜

_X

˜ :=

_X

× Λ

of the form

f

˜ (

^x

, ξ) := [

_f

(

^x

), ξ ]

_Λ

,

^x

∈

_X

, ξ ∈ Λ,

^f

∈

_B is an RKBS onX with the norm

˜

∥˜

_f

∥

_B_˜

:= ∥

_f

∥

_B

,

^f

∈

_B

and the compatible semi-inner product

[˜

_f

,

_g

˜ ]

_B_˜

:= [

_f

,

^g

]

_B

,

^f

,

^g

∈

_B

.

The reproducing kernel_{K of}

˜

_B

˜

_is

K

˜ ((

^x

, ξ), (

^y

, η)) := [

_K

(

^x

,

^y

)ξ, η ]

_Λ

,

^x

,

^y

∈

_X

, ξ, η ∈ Λ.

Proof. It suffices to point out that_B

˜

is constructed byCorollary 3.2via the choices

Λ :=

_C

,

^W

:=

_B

, Φ(

^x

, ξ) := (

^K

(

^x

, · )ξ)

^∗

, (

^x

, ξ) ∈ ˜

_X

.

The feature map satisfies the denseness condition by(2.17).

(9)

We shall next construct byCorollary 3.2a simple vector-valued RKBS to show that the reproducing kernel of a general vector-valued RKBS might not satisfy (2.19)or (2.20). Letp

,

^q

,

^r

,

^s

∈ (

¹

, +∞ )

satisfy that

1 p

+

¹

q

=

¹ r

+

¹

s

=

₁

.

^(3.3)

Here, for the sake of convenience in enumerating elements from a finite set, we setNl

:= {

₁

,

²

, . . . ,

^l

}

forl

∈

_N_{. For each}

γ ∈ (

¹

, +∞ )

^and^l

∈

_N

, ℓ

^l_γ denotes the Banach space of all vectorsu

= (

^uj

:

_j

∈

Nl

) ∈

_C^lwith the norm

∥

_u

∥

_ℓl γ

:=

 _l



j=₁

|

_u_j

|

^γ

1/γ

< +∞ .

The space

ℓ

^l_γ is a uniform Banach space with the compatible semi-inner product

[

_u

, v ]

ℓ^l_γ

:=

l



j=₁

u_j

v

j

| v

j

|

^γ⁻²

∥ v ∥

^γ⁻²

ℓ^l_γ

,

^u

, v ∈ ℓ

^l_γ

.

The dual elementu^∗ofu

∈ ℓ

^l_γ is hence given by

u^∗

:=





v

j

| v

j

|

^γ⁻²

∥ v ∥

^γ⁻²

ℓ^l_γ

:

_j

∈

_N_l





,

^u

∈ ℓ

^l_γ

.

^(3.4)

Non-completeness of the linear span of the reproducing kernel inB. We give a counterexample of(2.20) first. Letm

,

ⁿ

∈

_N. We choose the output space

Λ

and feature spaceW as

ℓ

ⁿ_p ^and

ℓ

^m_r , respectively.

Thus, we have that

Λ

^∗

= ℓ

ⁿ_q^andW^∗

= ℓ

^m_s . The input space will be chosen as a set ofmdiscrete points X

:= {

_x_j

:

_j

∈

_N_m

}

. A feature map

Φ :

_X

→

_L

(

^W

, Λ)

should satisfy the denseness condition(3.2).

We note by the definition of the generalized adjoint that this condition is equivalent to

span

{ Φ

^∗

(

^x

)ξ

^∗

:

_x

∈

_X

, ξ ∈ Λ } =

_W^∗

,

^(3.5)

where

Φ

^∗

(

^x

) := (Φ(

^x

))

^∗^{for all}^x

∈

_X.

Let us take a close look at Eq. (2.20). By Corollary 3.2, a general function in B is of the form f_u

:= Φ ( · )

^u^{for some}^u

∈

_W_{. Eq.} _(2.20)does not hold true if and only if there exists a nontrivial u

∈

_W _{such that}

[

_K

(

^x

, · )ξ,

^fu

]

_B

= [ Φ ( · )Φ

^Ď

(

^x

)ξ, Φ( · )

^u

]

_B

= [ Φ

^Ď

(

^x

)ξ,

^u

]

_W

=

₀

,

which in turn is equivalent to that span

{ Φ

^Ď

(

^x

)ξ :

_x

∈

_X

, ξ ∈ Λ }

is not dense inW. We conclude that to construct a

Λ

-valued RKBS for which (2.20) is not true, it suffices to find a feature map

Φ :

_X

→

_L

(

^W

, Λ)

that satisfies(3.5)but

span

{ Φ

^Ď

(

^x

)ξ :

_x

∈

_X

, ξ ∈ Λ }

_$_W

.

^(3.6)

To this end, we find a sequence of vectors

w

j

∈

_C^m_{and set}

Φ

^∗

(

^xj

)ξ

^∗

:= (ξ

^∗

)

1

w

j

,

^j

∈

_N_m

,

^(3.7)

where

(ξ

^∗

)

1 is the first component of the vector

ξ

^∗

∈

_Cⁿ. Since for each j

∈

_N_m

, Φ

^∗

(

^xj

)

^{is a} linear operator from

Λ

^∗ ^to ^W^∗ and both the spaces are finite-dimensional,

Φ

^∗

(

^xj

)

is bounded. We reformulate(3.5)and(3.6)to get that they are respectively equivalent to

span

{ w

j

:

_j

∈

_N_m

} =

_C^m _(3.8)

and

span

{

_J⁻¹

W

w

j

:

_j

∈

_N_m

}

_{$ C}^m

.

^(3.9)