72

Chapter 3

for any positive x and any point h in Sm . Let us define Dm > 0 such that   p (3.2) φm ε Dm = εDm and consider some family of weights {xm }m∈M such that X e−xm = Σ < ∞. m∈M

Let K be some constant with K > 1 and take p 2 √ pen (m) > Kε2 Dm + 2xm .

(3.3)

2

We set for  all  g ∈ H, Lε (g) = kgk − 2Yε (g) and consider some collection of ρ-LSEs fbm i.e., for any m ∈ M, m∈M

  Lε fbm 6 Lε (g) + ρ, for all g ∈ Sm . Defining a penalized ρ-LSE as fe = fbm b , the following risk bound holds for all f ∈H    2   e 2 2 Ef f − f 6 C (K) inf d (f, Sm ) + pen (m) + ε (Σ + 1) + ρ . m∈M

(3.4) Proof. We first recall that for every m ∈ M and any point f ∈ H, the projection fm of f onto the closed and convex model Sm satisfies the following properties kf − fm k = d (f, Sm ) kfm − gk 6 kf − gk , for all g ∈ Sm .

(3.5) (3.6)

The first property is just the definition of the projection point fm and the second one is merely the contraction property of the projection on a closed convex set in a Hilbert space. Let us assume for n the sake of simplicity   that ρ = 0. We now  fix some mo∈ M and define M0 = m0 ∈ M, Lε fbm0 + pen (m0 ) 6 Lε fbm + pen (m) . By definition, for every m0 ∈ M0     Lε fbm0 + pen (m0 ) 6 Lε fbm + pen (m) 6 Lε (fm ) + pen (m) . Let us now assume that the target f belongs to model Sm (we shall relax this assumption afterwards) which of course means that fm = f . Noticing that 2

2

Lε (g) = kg − f k − kf k − 2εW (g) ,

Model Choice and Model Aggregation, F. Bertrand - Editions Techip

For over fourty years, choosing a statistical model thanks to data consisted in optimizing a criterion based on penalized likelihood (H. Aka...

Model Choice and Model Aggregation, F. Bertrand - Editions Techip

For over fourty years, choosing a statistical model thanks to data consisted in optimizing a criterion based on penalized likelihood (H. Aka...