Download pdf Machine learning a bayesian and optimization perspective 1st edition theodoridis soluti by edwardspain6384

SolutionsToProblemsOfChapter8

8.1.ProvetheCauchy-Schwartz’sinequalityinageneralHilbertspace.

Solution:Wehavetoshowthat ∀x, y ∈ H, | x, y |≤ x y , andthatequalityholdsonlyif y = a

Solution:Theinequalityholdstruefor x and/or y = 0.Letnow y, x = 0. Wehave

Sincethelastinequalityisvalidforany λ ∈ C,let

Thus

, fromwhicha)theinequalityresultsandb)thefactthattheequality holds iﬀ x = ay

Indeed,if x = ay,thenequalityistriviallyshown.Letusnowassume thatequalityholdstrue.Then

andfromthepropertiesoftheinnerproductinaHilbertspacewehave

, fromwhichitisreadilyseenthat

whichprovestheclaim.

8.2.Showa)thatthesetofpointsinaHilbertspace H,

isaconvexset,andb)thesetofpoints

x, a ∈

0 ≤ x λy 2 = x λy, x λy = x 2 + |λ|2 y 2 λ∗ x, y − λ y, x .

= x, y y 2

0 ≤

x 2

y 2

x, x y, y = x, y ∗ x, y

x = x, x, y y

x = x

2 y,

, ||y||2

= {x : x ≤ 1}

= {x : x =1}

2 isanonconvexone.

Solution:FromthedeﬁnitionofaHilbertspace(seeAppendix)thenorm istheinducedbytheinnerproductnorm,i.e., x =( x, x ) 1 2 .

a)Letusnowconsidertwopoints, x1, x2 ∈ H,suchthat

andlet

Then,bythetriangleinequalitypropertyofanorm

andsince λ ∈ [0, 1]

b)Lettwopointssuchthat,

and

Thenwehavethat

FromtheSchwartzinequality(Problem8.1),wehavethat

From(1)and(3)isreadilyseenthat

Asamatteroffact,theonlywayfor x 2 =1,isthat x1 , x2 =1= x1 x2 . However,thisisnotpossible.Equalityin(2)isattainedif x1 = ax2, andsince x1 = x2 =1,thiscanonlyhappeninthetrivialcaseof x1 = x2.

x2 ≤

λx1 +(1 λ)x2 ≤ λx1 + (1

x ≤

x2 ≤ (λ +1

≤ 1

≤ 1,

= λx

+(1 λ)x2 ,λ ∈ [0, 1]

λ)

λ x

+(1

)

λ)1

x = λx1 +(1 λ)x2

=1, x

=1,

x 2 = λx1 +(1 λ)x2 ,λx1 +(1 λ)x2 = λ2 x1 2 +(1 λ)2 x2 2 +2λ(1 λ) x1 , x2 = λ2 +(1 λ)2 +2λ(1 λ) x1 , x2 . (1)

| x1 , x2 |≤ x1 x2 , (2)

1 ≤ x1 , x2 ≤ 1. (3)

x 2 ≤ 1

8.3.Showtheﬁrstorderconvexitycondition.

Solution:Assumethat f isconvex.Then f λy +(1 λ)x ≤ λf (y)+(1 λ)f (x) or f x + λ(y x) f (x) ≤

) f (x) . Taking λ −→ 0,wecanemploytheTaylorexpansionandget

fromwhich,inthelimitweobtain

(4) b)Assumethat

(x)(y x), isvalid ∀x, y ∈ X,where X isthedomainofdeﬁnitionof f .Then,we have

y1 x), (5) and f (y2) ≥ f (x)+ ∇T f (x)(y2 x) (6)

Combiningtheprevioustwoinequalitiestogether,weobtain

(y1)+(1 λ)f (y2 ) ≥

(

)+(1 λ)f (x)+ λ∇T f (x)(y1 x)+ (1 λ)∇T f (x)(y2 x), (7) for λ ∈ (0, 1).Sincethisistrueforany x,itwillalsobetruefor x = λy1 +(1 λ)y2 , whichresultsin f λy1 +(1 λ)y2 ≤ λf (y1 )+(1 λ)f (y2), (8) whichprovestheclaim.

8.4.Showthatafunction f isconvex,iﬀtheone-dimensionalfunction, g(t):= f (x + ty), isconvex, ∀x, y inthedomainofdeﬁnitionof f

Solution:Observethat,

g(λt1 +(1 λ)t2 )= f (x + λt1 y +(1 λ)t2 y) = f (λx +(1 λ)x + λt1 y +(1 λ)t2 y) = f λ(x + t1 y)+(1 λ)(x + t2y) .

λ f

λ(y x) f (x) ≈ f (x)+λ∇T f (x)(y x) f (x) ≤ λ f (y

∇

) f (

) ,

f (y) ≥ f (x)+ ∇T f (x)(y x).

f (y) ≥ f (x)+

T f

(y1) ≥

(

)(

λf

Alsonotethat

g(t1 )= f (x + t1y),g(t2)= f (x + t2y),

andtakingthedeﬁnitionofconvexity,theclaimisnowstraightforwardto beshown.

8.5.Showthesecondorderconvexitycondition.

Hint:Showtheclaimﬁrstfortheone-dimensionalcaseandthenuse the resultofthepreviousproblem,forthegeneralization.

Solution:Westartwiththeone-dimensionalcase.Letafunction f (x) beconvex.Thenweknowfromtheﬁrstorderconvexityconditionthat

f ′(x)(y x) ≤ f (y) f (x) ≤ f ′(y)(y x), anddividingbothsidesbythepositivequantity(y x)2 ,weget

f ′(y) f ′(x) y x ≥ 0,

andtakingthelimit y −→ x weobtain f ′′ (x) ≥ 0. (9)

Assumenowthatthesecondderivativeisnon-negativeeverywhere.Then select y>x andweget

0 ≤ y x f ′′ (z)(y z)dz = f ′(z)(y z) |z=y z=x + y x f ′(z)dz = f ′(x)(y x)+ f (y) f (x) (10)

Theaboveistruefor y>x.Notethatwecanalsoshowthat

f (x) ≥ f ′(y)(x y)+ f (y), byusingtheidentity

0 ≤ y x f ′′ (z)(z x)dz.

Thusweproved f isconvex.Forthemoregeneralcase,consider g(t)= f (x + ty),

formwhichweget g ′′ (t)= y T ∇2 f (x + ty)y

Sincethisistrueforany x, y and t andusingthepreviouslyobtained results,theclaimisreadilyshown.

8.6.Showthatafunction

f : Rl −→ R isconvexiﬀitsepigraphisconvex.

Solution:a)Assume f tobeconvex.Wehavetoshowthatitsepigraph isaconvexset.Lettwopoints, x1, x2.Fromtheconvexityof f wehave

(11)

Considertwopointsintheepigraph y1 =(x1 ,r1)and y2 =(x2 ,r2).Then wehave

with

(12)

(13) andsince

(14)

Combining(11)and(14)weget

hence y =(x,r) ∈ epi(f )andtheepigraphisconvex. b)Assumetheepigraphtobeconvex.Then y = λy1 +(1 λ)y2 ∈

hence

forany

Thus(15)isalsovalidfor r1 = f (x1),r2 = f (x2)andtherefore f (λx1 +(1 λ)x2) ≤ λf (x1)+(1 λ)f (x2)

8.7.Showthatifafunctionisconvex,thenitslowerlevelsetisconvexforany ξ.

Solution:Letthefunction f beconvexandtwopoints, x, y,whichliein thelev≤ξ (f ).Then, f (x) ≤ ξ,f (y) ≤ ξ. Hence,bythedeﬁnitionofconvexity, f (λx +(1 λ)y) ≤ λf (x)+(1 λ)f (y) ≤ λξ +(1 λ)ξ ≤ ξ, whichprovestheclaim,that λx +(1 λ)y ∈ lev≤ξ (f ).

f (λx1 +(1 λ)x2) ≤ λf (x1)+(1 λ)f (x2)

x = λx1 +(1 λ

x2 ,

r = λr1 +(1 λ)r2

y1, y2 ∈ epi(f ) f (x1) ≤ r1 ,f (x2) ≤ r2

y1 +(1 λ)y2 :=

=(x,r)

)

f (x

(x1)+(1 λ)f (x2 ) ≤ λr1 +(1 λ)r2

epi(f )

r = λr1 +(1 λ)r2 ≥ f (λx1 +(1 λ)x2

) ≤ λf

)(15)

r1 ≥ f (x

),r

≥ f (x2)

8.8.ShowthatinaHilbertspace, H,theparallelogramrule, x + y 2 + x y 2 =2 x 2 + y 2 , ∀x, y ∈ H. holdstrue.

Solution:Theproofisstraightforwardfromtherespectivedeﬁnitionsand thepropertiesoftheinnerproductrelation.

fromwhichtheparallelogramruleisobtained.

8.9.Showthatif x, y ∈ H,where H isaHilbertspace,thentheinducedby theinnerproductnormsatisﬁesthetriangleinequality,asrequired byany norm,i.e., x + y ≤ x + y

Solution:Bytherespectivedeﬁnitionswehave

wheretheCauchy-Schwartzinequalityhasbeenused.

8.10.Showthatifapoint x∗ isalocalminimizerofaconvexfunction,itisnecessarilyaglobalone.Moreover,itistheuniqueminimizerifthefunction isstrictlyconvex.

Solution:Let x∗ bealocalminimizer.Then,thereexists ǫ> 0and y∗ / ∈ B[0,ǫ]suchthat f (x∗) ≤ f (x∗ +∆), ∀∆ ∈ B[0,ǫ], and f (y∗) <f (x∗). Let

Then,

x, y

x y

= x 2 2 x, y

+ y 2 = x 2 +2

+ y 2 ,

x + y

x 2 + x, y + y, x + y

x + y 2 = x 2 +2Real( x, y )+ y 2 ≤ x 2 +2| x, y | + y 2 ≤ x 2 + y 2 +2 x y =( x + y )2

+ y 2 :=

2 ,

y∗

∗

:= ǫ 2||y∗ x∗||

(

) ∈ B[0,ǫ].

Hence,

whichisnotpossible,since x∗ isalocalminimizer.

Assumenowthat f isstrictlyconvexandthatthereexisttwominimizers, x∗ = y∗.Then,bythedeﬁnitionofstrictconvexity,wehavethat

whichisnotpossible,since x∗ isaglobalminimizer.

8.11.Let C beaclosedconvexsetinaHilbertspace, H.Thenshowthat ∀x ∈ H,thereexistsapoint,denotedas PC (x) ∈ C,suchthat

Solution:Let x ∈ C,otherwiseitistrivial.Let ρ bethelargestlower boundof x y , y ∈ C,i.e., ρ :=inf y∈C x y > 0

Considerthesequence,

ρn = ρ + 1 n

Bythedefinitionoftheinfimum,foreach n,therewillbeatleastone element xn ∈ C,suchthat x xn <ρn, whichthendefinesasequence, {xn},ofpointsforwhichwehavethat

whichnecessarilyleadsto lim n→∞ x xn = ρ (16)

Fromtheparallelogramlaw,wecanwrite

y∗

∗

(1 λ

x∗

x∗ +

(

) ≤

)

(

)+ λf (y∗) <f (x∗),

f ( 1 2 x∗ + 1 2 y∗) < 1 2 f (x∗)+ 1 2 f (y∗)= f (x∗)

∈

PC (x) =min y

C x y .

n→∞

n→∞ ρ

≤ x xn <ρn,

ρ ≤ lim

x xn < lim

n = ρ,

2 + (x xm) (x xn) 2 =2 x xm 2 + x xn 2 ,

xn xm 2 =2( x xm 2 + x xn 2) 4 x 1 2 (xn + xm) 2

(x xm)+(x xn)

However,since C isconvex,thepoint 1 2 (xn + xm) ∈ C andwededuce that xn xm 2 ≤ 2 x xm 2 + x xn 2 4 ρ

Takingthelimitfor n,m →∞ onbothsidesweget

lim n,m→∞ xm xn 2 ≤ 0 ⇒ lim n,m→∞ xm xn =0

Thatis, xn isaCauchysequenceandsince H isHilbertthesequence convergestoapoint x∗.Moreoveritconvergesin C,since C isclosed, i.e., x ∈ C.Hencewehave

x x∗ = x xn + xn x∗ ≤ x xn + xn x∗

Takingthelimitandusing(16)weobtain x x∗ ≤ ρ.

However,since x∗ ∈ C, x x∗ ≥ ρ, whichmeansthat x x∗ = ρ, thatistheinﬁmumisattained,whichprovestheclaim.Uniquenesshas beenestablishedinthetext.

8.12.Showthattheprojectionofapoint x ∈ H ontoanon-emptyclosedconvex set, C ⊂ H,liesontheboundaryof C.

Solution:Assumethat PC (x)isaninteriorpointof C.Bythedeﬁnitionofinteriorpoints, ∃δ> 0suchthat Sδ := {y : y PC (x) <δ}⊂ C.

Let z := PC (x)+ δ 2 · x PC (x) x PC (x) , wherebyassumption x PC (x) > 0,since x ∈ C.Obviously z ∈ Sδ . Hence, x z = |x PC (x) 1 δ 2 x PC (x)

However, δ canbechosenarbitrarilysmall,thuschoose δ< x PC (x) .

However,thisisnotpossible,sinceinthiscase x z < x PC (x) , whichviolatesthedeﬁnitionofprojection.Thus, PC (x)liesontothe boundaryof C.

8.13.Derivetheformulafortheprojectionontoahyperplaneina(real)Hilbert space, H.

Solution:Letusﬁrstshowthatahyperplane, H,isaclosedconvexset. Convexityisshowntrivially.Toshowcloseness,let yn ∈ H −→ y∗.We willshowthat y∗ ∈ H.Let

Hence,

, whichprovestheclaim.

Letnow z ∈ H betheprojectionof x ∈ H,i.e., z : PC (x).Thenbythe deﬁnition z :=argmin θ,z +θ0 =0 x z, x z

UsingLagrangemultipliers,weobtaintheLagrangian

Forthosenotfamiliarwithinfinitedimensionalspaces,itsufficestosay thatsimilarrulesofdifferentiationapply,althoughtherespectivedefinitionsaredifferent(moregeneral).

AfterdiﬀerentiationoftheLagrangian,weobtain 2z 2x λθ =0 or z = 1 2 (2x + λθ).

Pluggingintotheconstraint,weobtain

whichthenresultsinthesolution.

∗

y∗

− θ

= θ

y∗

≤ θ, y

+ θ0 2 =

, yn

n + θ0 2

yn 2 .

0 ≤ θ, y∗ + θ0 2 =lim n→∞ θ, y∗ yn 2 =0,

y∗

θ0

x z, x z − λ θ, z + θ0

(

,λ)=

λ = 2 θ, x + θ0 θ 2 ,

8.14.Derivetheformulafortheprojectionontoaclosedball, B[0,ρ].

Solution:Theclosedballisdeﬁnedas

Wehavealreadyseen(Problem8.2)thatitisconvex.Letusshowthe closeness.Let

Wehavetoshowthat y

whichprovestheclaim.

Toderivetheprojection,wefollowsimilarstepsasinProblem8.13,by replacingtheconstraintby

sincetheprojectionisontheboundary.Takingthegradientofthe Lagrangianweget

Pluggingintotheconstraint,weget

[0,ρ]= {y : y ≤ ρ}

yn ∈ B[0,ρ] → y ∗

∗ ∈ B

y ∗ 2 = y ∗ yn + yn 2 ≤ yn 2 + y ∗ yn 2 ≤ ρ 2 + y ∗ yn 2 , or y ∗ 2 ≤ ρ 2 +lim n→∞ y ∗ yn 2 , or y ∗ 2 ≤ ρ 2 ,

[

,ρ].

z 2 = ρ 2 ,

2(z x)+2λz =0, or z = 1 1+ λ x.

1+ λ| =

ρ x When1+ λ = 1 ρ x ,weget

= ρ x x, andwhen1+ λ = 1 ρ x

= ρ x x

Fromthetwopossiblevectors,wehavetokeeptheonethathasthesmaller distancefrom x.However, x ρ x x < x + ρ x x , since ρ x < 1.Thus, 1+ λ = 1 ρ x , andtheprojectionisequalto

8.15.Findanexampleofapointwhoseprojectiononthe ℓ1 ballisnotunique.

Solution:The ℓ1 ballofradius ρ =1in Rl isdeﬁnedas S1[0,ρ]= {y : i=1 |yi|≤ ρ}

Letthepoint x =[1, 1]T ∈ R2 , whichobviouslydoesnotlieinsidethe ℓ1 ballofradius ρ =1,since x 1 =2 > 1.Foranypoint y ∈ S1[0, 1]wehave

Thatis,the ℓ1 normof x fromanypointintheset S1[0, 1]isbounded belowby1.Considerthetwopoints y1 =[1, 0]T and y1 =[0, 1]T

Forbothofthemthelowerboundisachieved,i.e., x y1 = x y2 =1

Moreover,onecaneasilycheckoutthatallpointsonthelinesegment

1 + y2 =1 canbeprojectionpointsof x.

8.16.Showthatif C ⊂ H,isaclosedconvexsetinaHilbertspace,then ∀x ∈ H and ∀y ∈ C,theprojection PC (x)satisﬁesthefollowingproperties:

B[0,ρ](x)=

x x, ||x|| >ρ x otherwise

x y 1 = |1 y1| + |1 y2

1 −|y1| +1 −|y2|

x y 1 ≥ 2

≥

|≥

−

: y

Solution:Weknowthat PC (x) ∈ C.Hence,duetotheconvexityof C, ∀λ ∈ [0, 1]

whichbythedeﬁnitionofthenorm,gives

Takingthelimit λ → 0,weprovetheﬁrstproperty.

Toprovethesecondproperty,since PC (y) ∈ C,weapplytheprevious propertywith PC (y)inplaceof y,i.e.,

Afteraddingtheaboveinequalitiestogetherandrearrangingthetermswe obtainthesecondproperty.

8.17.Provethatif S isaclosedsubspace S ⊂ H inaHilbertspace H,then ∀x, y ∈ H, x,PS (y) = PS (x),

(x),PS (y) and PS (ax + by)= aPS (x)+ bPS (y).

Hint:UsetheresultofProblem8.18.

Solution:Wehavethat x,PS (y) = PS (x)+(x PS (x)),PS (y) = PS (x),PS (y) , since x PS (x) ⊥ PS (y)

• Real{ x PC (x), y PC (x) }≤ 0. • PC (x) PC (y) 2 ≤ Real{ x y,PC (x) PC (y) }

≤ x (λy

≤

Real{ x PC

y PC (x) }≤

y PC (x

λy +(1 λ)PC (x) ∈ C. Hence

PC (

) 2

+(1 λ)PC (

)) 2

(x PC (x)) λ(y PC (x)) 2 ,

PC (x) 2 ≤

PC (x) 2 + λ2 y PC (x) 2 2λReal{ x PC (x), y PC (x) } or

(

λ 2

) 2 .

Real { x PC (x),PC (

) PC (x) }≤ 0.

Similarly Real { y PC (y),PC (x) PC (y) }≤ 0.

fromProblem8.18.Similarly,wecanshowthat PS (x), y = PS (x),PS (y)

Hence x,PS (y) = PS (x), y

Forthelinearity,wehave

x = PS (x)+(x PS (x)), y = PS (y)+(y PS (y)), where PS (x),PS (y) ∈ S and x PS (x) ∈ S⊥ and y PS (y) ∈ S⊥.Hence

ax + by =(aPS (x)+ bPS (y))+(a(x PS (x))+ b(y PS (y))) , andsincetheterminthesecondparenthesisontherighthandsideliesin S⊥ wereadilyobtainthat PS (ax + by)= aPS (x)+ bPS (y)

8.18.Let S beaclosedconvexsubspaceinaHilbertspace H, S ⊂ H.Let S⊥ be thesetofallelements x ∈ H whichareorthogonalto S.Thenshowthat, a) S⊥ isalsoaclosed,subspace,b) S ∩ S⊥ = {0},c) H = S ⊕ S⊥;that is, ∀x ∈ H, ∃x1 ∈ S and x2 ∈ S⊥ : x = x1 +x2,where x1, x2 are unique

Solution: a)Wewillﬁrstprovethat S⊥ isasubspace.Indeedif x1 ∈ S⊥ and x2 ∈ S⊥ then x1 , y = x2, y =0, ∀y ∈ S. or ax1 + bx2, y =0 ⇒ ax1 + bx2 ∈ S⊥ .

Also, 0 ∈ S⊥ since x, 0 =0.Hence S⊥ isasubspace. Wewillprovethat S⊥ isalsoclosed.Let {xn}∈ S⊥ and lim n→∞ xn = x∗.

Wewillshowthat x∗ ∈ S⊥.Bythedeﬁnition xn, y =0, ∀y ∈ S.

Moreover, | x∗, y | = | xn, y − x∗, y | = | xn x∗, y |≤ xn x∗ y → 0,

wheretheCauchy-Schwartzinequalityhasbeenused.Thelastinequality leadsto x∗, y =0 ⇒ x∗ ∈ S⊥ .

b)Let x ∈ S ∩ S⊥.Bydeﬁnition,sinceitbelongstobothsubspaces, x, x =0 ⇒ x = 0,

c)Let x ∈ H.Wehavethat x = PS (x)+(x PS (x)).

Wewillﬁrstshowthat x PS (x) ∈ S⊥.Thenwewillshowthatthis decompositionisunique.Wealreadyknowthat

{ x PS (x), y PS (x) }≤ 0, ∀y ∈ S.

Also,since S isasubspace, ay ∈ S, ∀a ∈ R,hence

{ x PS (x),ay PS (x) }≤ 0, or aReal{ x PS (x), y }≤

{ x PS (x),PS (y) }, whichcanonlybetrueif

{ x PS (x), y } =0

Weapplythesamefor jy.Thenwehavethat x PS (x),jy = j x PS (x), y

Recallthatif c ∈ C

Imag{c} =Real{−jc}, Hence, Real{−j x PS (x), y } =0=Imag{ x PS (x), y }

Thus, x PS (x), y } =0, ∀y ∈ S, and x PS (x) ∈ S⊥

Thus x = x1 + x2, x1 = PS (x) ∈ S, x2 = x PS (x) ∈ S⊥ Letusnowassumethatthereisanotherdecomposition, x = x3 + x4, x3 ∈ S, x4 ∈ S⊥ .

Real

Then

, whichnecessarilyimpliesthattheyareequaltothesinglepointcomprising S ∩ S⊥,i.e.,

, hencethedecompositionisuniqueandwehaveprovedtheclaim. Letuselaborateabitmore.wewillshowthat,

Indeed, PS⊥ (x)isunique.Also,

(

), since x PS (x) ∈ S⊥.Notethatweusedthefactthatif y ∈ S then

Indeed,

since

8.19.Showthattherelaxedprojectionoperatorisanon-expansivemapping.

Solution:Bytherespectivedeﬁnitionswehave,

a) µ ∈ (0, 1].Recallingthetrianglepropertyofanorm(Appendix8.15), weget

x1 + x2 = x3 + x4

S → x1 x3 = x4 x2 ∈ S⊥

x1 x3 =0= x4 x2

PS⊥ (x)= x PS (x)

x = PS (x)+(x PS (x))

PS⊥ (x)= 0 + PS⊥ (x PS (x))

= x PS

S⊥ (y)= 0

y 0 2 = y 2 < y a 2 , ∀a = 0 ∈ S⊥

y a 2 = y 2 + a 2 2Real y, a = y 2 + a 2

TC (x) TC (y) = x + µ(PC (x) x) y µ(PC (y) y) = (1 µ)(x y)+ µ(PC (x) PC (y))

TC (x) TC (y) ≤|1 µ| x y + µ PC (x) PC (y) ≤|1 µ| x y + µ x y ≤ x y .

b) µ ∈ (1, 2).Inthiscase

TC (x) TC (y) 2 =(1 µ)2 x y 2 + µ 2 PC (x) PC (y) 2

+2µ(1 µ)Real{ x y,PC (x) PC (y) } ≤ (1 µ)2 x y 2 + µ 2 PC (x) PC (y) 2

+2µ(1 µ) PC (x) PC (y) 2 =(1 µ)2 x y 2 + µ(2 µ) PC (x) PC (y) 2

≤ (1 µ)2 x y 2 + µ(2 µ) x y 2 ≤ x y 2

Toderivetheboundsweusedthat1 µ< 0and2 µ> 0,for µ ∈ (1, 2).

8.20.Showthattherelaxedprojectionoperatorisastronglyattractivemapping.

Solution:Bytherespectivedeﬁnitionwehave,

TC (x) y 2 = x + µ(PC (x) x) y 2

= x y 2 + µ 2 PC (x) x 2 +2µReal{ x y,PC (x) x }

= x y 2 + µ 2 PC (x) x 2

+2µReal{ x PC (x)+ PC (x) y,PC (x) x }

= x y 2 + µ 2 PC (x) x 2 2µ PC (x) x 2

+2µReal{ PC (x) y,PC (x) x }

= x y 2 + µ 2 PC (x) x 2 2µ PC (x) x 2

+2µReal{ x PC (x), y PC (y) } ≤ x y 2 µ(2 µ) PC (x) x 2 where(8.15)hasbeenused.Thus TC (x) y 2 ≤ x y 2 µ(2 µ) µ2 TC (x) x 2 = x y 2 (2 µ) µ TC (x) x 2 ,

whereweusedthat PC (x) x = 1 µ (TC (x) x).

8.21.GiveanexampleofasequenceinaHilbertspace H,whichconverges weaklybutnotstrongly.

Solution:Deﬁnethesequenceofpoints xn ∈ l2 , xn = {0, 0,..., 1, 0, 0,...} := {δni},i =0, 1, 2,...

Thatis,eachpoint, xn isitselfasequence,withzeroseverywhere,except attimeindex, n,whereitis1.Foreverypoint(sequence) y ∈ l2,wehave that

bythedeﬁnitionof l2 space(Appendix8.15).Thepreviousinequality impliesthat

Ontheotherhand,

since

8.22.Provethatif C1 ...CK areclosedconvexsetsinaHilbertspace H,then theoperator

, isa regular one;thatis,

where T n := TT...T istheapplicationof Tn successivetimes.

Solution: Fact1:

isanon-expansivemapping.

Indeed, ∀x, y ∈ H

Fact2:

Indeed,if x ∈ C then

∞ n=1 |yn|2 = ∞ n=1 | xn, y |2 < ∞,

y −−−−→ n→∞ 0= 0

2 :=

n −→ 0,

n 0 =

n =1.

T = TCK

T n 1(x) T n(x) −→ 0

···

−→∞,

T = TCK TCK 1 TC1 := TK T1

T (x) T (y) = TK (TK 1 T1)(x) TK (TK 1 T1)(y) ≤ TK 1 T1(x) TK 1 T1(y) ≤ ≤ T1(x) T1(y) ≤ x y

Fix(T )= K k=1 Ck := C.

TK TK 1 T1(x)= TK TK 1 T2(x)= = TK (x)= x.

Moreover,letusassumethat ∃ x ∈ C: T

(x)= x

Then ∀y ∈ C wehave

, asshownbefore.Thus,

whichcanonlybetrueif x ∈ C andhence T1(x)= x.Notethatthe previoustwofactsarevalidforgeneralHilbertspaces.

Fact3:If C isaclosedsubspace,then Tk,k =1,...,K,and T = TK TK 1 T1 arelinearoperators.Theproofistrivialfromtherespectivelinearityoftheprojectionoperators, Pk ,k =1,...,K.Thisisalso trueforgeneralHilbertspaces.

Fact4:Theoperator T isa regular one,i.e.,

RecallfromProblem8.20that ∀x ∈ H,

andbythedeﬁnition

weget

x y

T (x) T (y) ≤ T1(x) T1(y)

x y = T1(x) T1(y) = T1(x) y ≤ x y

T1(x) y = x y , ∀y ∈ C.

T n(x) T n 1(x) −→ n−→∞ 0

T1(x) y 2 ≤ x y 2 µ1 (2 µ1) x P1 (x) 2

x P1 (x) 2 ≤ 1 µ1 (2 µ1) x y 2 − T1(x) y 2 ,

T1(x)= x + µ1 (P1(x) x),

x T1(x) = µ 2 1 x P1 (x) 2 ≤ µ1 2 µ1 x y 2 − T1(x) y 2 .

x T2T1(x) 2 = x T1(x)+ T1(x) T2T1(x) 2 ≤ ( x T1(x) + T2T1(x) T1(x) )2 ≤ 2 x T1(x) 2 + T2T1(x) T1(x) 2 ,

Letnow

Let

Thenobviously,wecanwrite

Followingasimilarrationaleandbyinductionwecanshowthat

where,

and

Nowbyinduction,

Summingbyparts(17)–(19),weobtain

Hence,

Notethattillnow,everythingisvalidforgeneralHilbertspaces.

8.23.ShowthefundamentalPOCStheoremforthecaseofclosedsubspacesin aHilbertspace, H

Solution:Fact1:Therelaxedprojectionoperatorisselfadjoint,i.e., x,TCi (y) = TCi (x), y , ∀x, y ∈ H.

x T2T1(x) 2 ≤ 2µ1 2 µ1 ( x y 2 − T1(x) y 2) + 2µ2 2 µ2 ( T1(x) y 2 − T2T1(x) y 2).

b2 =max 2µ1 2 µ1 , 2µ2 2 µ2

x T2T1(x) 2 ≤ 2b2( x y 2 − T12(x) y 2),

T12(x)= T2 T1(x)

where

x T (x) 2 ≤ bK 2K 1( x y 2 − T (x) y 2), (17)

T = TK TK 1 T1,

bK =max 1≤k≤K µk 2 µk .

T (x) T 2(x) 2 ≤ bK 2K 1( T (x) y) 2 − T 2(x) y (18) T n 1(x) T n(x) 2 ≤ bK 2K 1( T n 1(x) y) 2 − T n(x) y (19)

∞ n=1 T n 1(x) T n(x) 2 ≤ bK 2K 1 x y 2 < +∞.

n−→∞ T n 1(x) T n(x) =0

lim

Thisisadirectconsequenceoftheself-adjointpropertyoftheprojection, when Ck isaclosedsubspace,i.e.,

Fact2:Foraclosedsubspace, Ck ,therespectedrelaxedprojectionoperatorislinear,i.e.,

Thisisalsoadirectconsequenceofthelinearpropertyoftheprojection operatorontosubspaces.Thispropertyiseasilycheckedoutthatitis readilytransferredto T := TCK TC1 .

Fact3:

Fact4:Lettheoperator T = TCK TC1 ,withFix(T )= C,where C isa closedsubspace.Then,theset

S := {y : y =(I T )(z), ∀z ∈ H} isalsoa(closed)subspaceanditistheorthogonalcomplementof C,i.e., S = C⊥

Theproofthat S isasubspaceistrivial,fromthelinearityof T .Also,let x ∈ S⊥.Then,bytherespectivedeﬁnition, 0= x, (I T )(z) = x, z − x,T (z) = x, z − T ∗(x), z = (I T ∗)(x), z , ∀z ∈ H. Hence, (I T ∗)(x)= x T ∗(x)= 0, or T ∗(x)= x, andsince T ∗ and T havethesameﬁxedpointset(theprooftrivial), S⊥ ⊆ C.

Ck (y) = PCk (x), y = PCk (x),PCk (y)

TCk (ax + by)= aTCk (x)+ bTCk (y), ∀x, y ∈ H

= x,TCK TC1 (y) = TCK (x)

CK 1 TC1 (y) = ... = TC1 TCK (x)

y = T ∗(x)

T ∗ = TC1 TC2 TCK .

,T (

)

, where

Letnow x ∈ C.Then

x, (I T )z = x, z − x,T (z) = x, z − T ∗(x), z, = x T ∗(x), z =0, since

T ∗(x)= x, whichprovesthat S⊥ = C.

NotethatwhatwehavesaidsofarisageneralizationofProblem8.18.

Wearenowreadytoestablish strongconvergence

Therepeatedapplicationof T onany x ∈ H leadsto T n(x)=(TTT )n(x).

Weknowthat ∀x ∈ H thereisauniquedecompositionintotwoorthogonal complement(closed)subspaces,i.e.,

x = y + z, y ∈ C and z ∈ C⊥ , ∀x ∈ H

andthat y = PC (x).

Hence,duetothelinearityof T n (C subspacein H)

T n(x)= T n(y)+ T n(z) = y + T n(z), since C =Fix(T n).However,

T n(z)= T n(I T )w, forsomew ∈ H = T n(w) T n+1(w)

andweknowthat T n(z) = T n(w) T n+1(w) −→ 0

Thus, T n(x) PC (x) −→ 0 whichprovestheclaim.

8.24.Derivethesubdiﬀerentialofthemetricdistancefunction dC (x),where C isaclosedconvexset C ⊆ Rl and x ∈ Rl

Solution:Bydeﬁnitionwehave

∂dC (x)= {g : g T (y x)+ dC (x) ≤ dC (y), ∀y ∈ Rl}.

Thuslet g beasubgradient,then g T (y x) ≤ dC (y) dC (x) ≤ y x .

Theaboveiseasilyshownbytherespectivedeﬁnition

dC (y)=min z∈C y z ≤ min z∈C( y x + x z ) ≤ y x +min z∈C x z , or

dC (y) dC (x) ≤ y x . Hence, g T (y x) ≤ y x . Sincethisistrue ∀y,let y : y x =

or g ∈ B[0, 1]

a)Let x / ∈ C and g anysubgradient.Forany y ∈ Rl ,

g T (y + PC (x) x) ≤ dC (y + PC (x)) dC (x). However, dC (y + PC (x))=min z∈C y + PC (x) z ≤ y +min z∈C PC (x) z , andletting z = PC (x)

dC (y + PC (x)) ≤ y . Hence,wecanwritethat g T (y + PC (x) x) ≤ y − dC (x), ∀y ∈ Rl Set y = 0.Then, g T (x PC (x)) ≤− x PC (x) , or

g T (x PC (x)) ≥ x PC (x) However, g ≤ 1, andrecallingtheCauchy-Schwartzinequality,weobtain

x PC (x) g ≥ x PC (x) ⇒ g =1, and g T (x PC (x))= x PC (x) ,

g ⇒ g 2 ≤ g ⇒ g ≤ 1,

whichimplies(recallconditionforequalityintheCauchy-Schwartztheorem)

g = (x PC (x)) (x PC (x)) , whichprovestheclaim.

b)Let x ∈ C.Thenbydeﬁnition,wehave

andforany y ∈ C

Ifinaddition x isaninteriorpoint,therewillbe ε> 0: ∀z ∈ Rl

) ≤

, since x ε(z x) ∈ C andthecondition(20)hasbeenused.Thus,

Set z x = g,whichleadsto g = 0

Thiscompletestheproof.

8.25.Derivetheboundin(8.55).

Solution:Subtracting θ∗ frombothsidesoftherecursion,squaringand takingintoaccountthedeﬁnitionofthesubgradient,itisreadilyshown that,

Applyingthepreviousrecursively,weobtain

Takingintoaccounttheboundofthesubgradientandthefactthatthe lefthandsideoftheinequalityisanon-negativenumber,weobtain

g T (y x) ≤ dC (y) dC (x),

g T (y x) ≤ 0, g ≤ 1

(20)

z x

g T (z x) ≤ 0, ∀z ∈ R

T (x ε(

)

||θ(i) θ∗||2 ≤||θ(i 1) θ∗||2 2µi J(θ(i 1)) J(θ∗) + µ 2 i ||J ′(θ(i 1))||2

||θ(i) θ∗||2 ≤||θ(0) θ∗||2 2 i k=1 µk J(θ(k 1) ) J(θ∗) + i k=1 µ 2 k ||J ′(θ(k 1))||2

2 i k=1 µk J(θ(k 1)) J(θ∗) ≤||θ(0) θ∗||2 + i k=1 µ 2 k G2 . (21)

However,bytherespectivedeﬁnitionweget

Employingthepreviousboundin(21),theclaimisreadilyobtained,i.e.,

8.26.Showthatifafunctionis γ-Lipschitz,thenanyofitssubgradientsis bounded.

Solution:Bythedeﬁnitionofthesubgradientwehavethat ∀u,v wehave that f (u) f (v) ≥<f ′ (v)(u v) >.

Sincethisistrueforall, u,v,Icanalwaysselecta u sothat u v tobe parallelto f ′ (v),Then

Then,ifweplugintheLipschitzcondition,weshowthat ||f ′ (v)|| is bounded.

8.27.Showtheconvergenceofthegenericprojectedsubgradientalgorithmin (8.61).

Solution:Letusbreaktheiterationintotwosteps,

Then,followingthesameargumentsastheonesadoptedinProblem8.25, weget

However,fromthenon-expansivepropertyoftheprojectionoperator,and takingintoaccountthat θ∗ ∈ C,sinceitisasolution,

CombiningthelasttwoformulastheproofproceedsasinProblem8.25.

J(θ(k 1)) J(θ∗) ≥ J (i) ∗ J(θ∗),k =1

,...,i.

J (i) ∗ J(θ) ≤ ||θ(0) θ∗||2 2 i k=1 µk + i k=1 µ2 k 2 i k=1 µk G2

<f ′ (v)(u v) >= | <f ′ (v)(u v) > | = ||f ′ (v

u v||

)||||

z(i) = θ(i 1) µiJ ′(θ(i 1)), (22) θ(i) = PC (z(i) ) (23)

||z(i) θ∗||2 ≤||θ(0) θ∗||2 2 i k=1 µk J(θ(k 1)) J(θ∗) + i k=1 µ 2 k ||J ′(θ(k 1))||2 . (24)

||θ(i) θ∗||2 = ||PC (z(i) ) PC (θ∗)||2 ≤||z(i) θ∗||2 . (25)

8.28.Deriveequation(8.99).

Solution:Bythedeﬁnition

Hence,

Hence

ExpandingthelefthandsidetoaﬁrstorderTaylorapproximationwe get,

whichﬁnallyprovestheclaim.

8.29.ConsidertheonlineversionofPDMbin(8.64),i.e.,

C (θ

wherewehaveassumedthat J∗ =0.Ifthisisnotthecase,ashift canaccommodateforthediﬀerence.Thusweassumethatweknow the minimum.Forexample,thisisthecaseforanumbertasks,suchasthe hingelossfunction,assuminglinearlyseparableclasses,orthelinear ǫinsensitivelossfunction,forboundednoise.Assumethat

ThenderivethatAPSMalgorithmof(8.39).

Solution:Letthelossfunctionbe

Jn(θ

1 n n k=1 L(yk , xk, θ

)

Jn(θ)= 1 n n 1 k=1 L(yk , xk , θ)+ L(yn, xn, θ) = n 1 n Jn 1(θ)+ 1 n L(yn, xn, θ),

∇Jn(θ)= n 1 n ∇Jn 1(θ)+ 1 n ∇L(yn, xn, θ)

∇Jn(θ∗(n 1))=0+ 1 n ∇L(yn, xn, θ∗(n 1))

∇Jn(θ∗(n))= 0 = ∇Jn(θ∗(n 1))+∇2Jn(θ∗(n 1))(θ∗(n) θ∗(n 1)) ,

∇Jn(θ∗(n 1))= ∇2 Jn(θ∗(n 1))(θ∗(n) θ∗(n 1)) ,

PC θn 1 µn J (θn 1 ) ||J ′(θn 1 )||2 J ′(θn 1)

n 1), If J ′(θn 1)= 0, (26)

n =

, If J ′(θn 1) = 0, P

Ln(θ)= n k=n q+1 ωk dCk (θn 1) n k=n q+1 ωkdCk (θn 1) dCk (θ)

n k=n

ωk dCk (θ

n k=n q+1 ωkdCk (θn 1) dCk (θ)= n k=n q+1 βk dCk (θ),

n(θ)=

q+1

n 1)

Thenfortherecursion(26),weneedtocomputethesubgradient of Ln(θ), whichbyExample8.5(andfor θ / ∈ Ck )becomes

Hence,(26)nowbecomes(using µ′ n instead,forreasonstobecomeapparentsoon),

theAPSMalgorithmresults.

8.30.Derivetheregretboundforthesubgradientalgorithmin(8.82).

26 where n k=n q+1 βk =1

L ´ n(θn 1)= n k=n q+1 βk d′ Ck (θ) θ=θn 1 = n k=n q+1 βk θn 1 PCk (θn 1) dCk (θn 1) , or L ´ n(θn 1)= 1 L n k=n q+1 ωk(θn 1 PCk (θ

1)), with L = n k=n q+1 ωk dCk (θn 1)

θn = θn 1 µ ′ n 1 L n k=n q+1 ωk d2 Ck (θn 1) 1 L 2 n k=n q+1 ωk(θn 1 PCk (θn 1) 2 1 L n k=n q+1 ωk(θn 1 PCk (θn 1)) = θn 1 + µ ′ nM   n k=n q+1 ωk (PCk (θn 1) θn 1))  = θn 1 + µ ′ nM   n k=n q+1 ωk PCk (θn 1) θn 1  and M := n k=n q+1 ωkd2 Ck (θn 1) n k=n q+1 ωkPCk (θn 1) θn 1 2

µn = µ ′ nM ∈ (0, 2M )

Setting

Solution:Fromthetext,wehavethat

Summingupbothsides,resultsin

Carryingoutthesummationsonthelefthandsideweget

Takingintoaccountthebound ||

,andselectingthestep-size tobeadecreasingsequence,wereadilyget

whichtheneasilyleadsto

Combiningtheabovewith(28),theclaimisproved.

Ln(θn 1) −Ln(h) ≤ g T n (θn 1 h) ≤ 1 2µn ||θn 1 h||2 −||θn h||2 + µn 2 G2 (27)

N n=1 Ln(θn 1) N n=1 Ln(h) ≤ N n=1 1 2µn ||θn 1 h||2 −||θn h||2 + G2 2 N n=1 µn (28)

A := 1 2µ1 ||θ0 h||2 1 2µ1 ||θ1 h||2 + 1 2µ2 ||θ1 h||2 1 2µ2 ||θ2 h||2 + 1 2µN ||θN 1 h||2 1 2µN ||θN h||2 + G2 2 N n=1 µn, or A ≤ 1 2µ1 ||θ0 h||2 1 2µ1 ||θ1 h||2 + 1 2µ2 ||θ1 h||2 1 2µ2 ||θ2 h||2 + ......... 1 2µN ||θN 1 h||2 + G2 2 N n=1 µn,

θn h||2 ≤ F 2

A ≤ F 2 1 2µ1 + 1 2 N n=2 1 µn 1 µn 1 + G2 2 N n=1 µn, (29)

A ≤ 1 2µN F 2 + G2 2 N n=1 µn,. (30)

8.31.Showthatafunction f (x)is σ-stronglyconvexifandonlyifthefunction f (x) σ 2 ||x||2 isconvex.

Solution: a)Assumethat

||2 , isconvex.Then,bythedeﬁnitionofthesubgradientat x,wehave

whichreadilyimpliesthat

fromwhichthestrongconvexityof f (x)isdeduced.

b)Assumethat f (x)isstronglyconvex.Thenbyitsdeﬁnitionwehave,

fromwhichweobtain,

whichprovestheclaimthat

||2 isconvex.

8.32.Showthatifthelossfunctionis σ-stronglyconvex,thenif µn = 1 σn ,the regretboundforthesubgradientalgorithmbecomes

Solution:Takingintoaccountthestrongconvexitywehavethat,

andfollowingsimilarargumentsasforProblem8.30,weget

f (x) σ 2

f (y) σ 2 ||y||2 f (x)+ σ 2 ||x||2 ≥ g T (y x) σx T (y x), (31)

f (y) f (x) ≥ g T (y x)+ σ 2 ||y x||2 , (32)

f (y) f (x) ≥ g T (y x)+ σ 2 ||y||2 + σ 2 ||x||2 σx T y +σ||x|||2 σ||x||2 ,

f (y) f (x) ≥ g T (y x)+ σ 2 ||y||2 σ 2 ||x||2 σx T y + σ||x||2 , (33)

f (y) σ 2 ||y||2 f (x)+ σ 2 ||x||2 ≥ g T (y x) σx T (y x), (34)

x) σ 2 ||x

f (

1 N N n=1 Ln(θn 1 ) ≤ 1 N N n=1 Ln(θ∗)+ G2 (1+ln N ) 2σN (35)

Ln(θn 1 ) −Ln(θ∗) ≤ g T n (θn 1 θ∗) σ 2 ||θn 1 θ∗||2 , (36)

Ln(θn 1) −Ln(θ∗) ≤ 1 2µn ||θn 1 θ∗||2 −||θn θ∗||2 σ 2 ||θn 1 θ∗||2 + µn 2 G2 (37)

Using µn = 1 σn ,resultsin

Summingupbothsidesweobtain

Usingnowthebound

theclaimisproved.

2 Ln(θn 1) −Ln(θ∗) ≤ σn ||θn 1 θ∗||2 −||θn θ∗||2 σ||θn 1 θ∗||2 + 1 σn G2 (38)

2 N n=1 Ln(θn 1) −Ln(θ∗) ≤ σ ||θ0 θ∗||2 −||θ1 θ∗||2 σ||θ0 θ∗||2 + 2σ ||θ1 θ∗||2 −||θ2 θ∗||2 σ||θ1 θ∗||2 + .................................................................. Nσ ||θN 1 θ∗||2 −||θN θ∗||2 σ||θN 1 θ∗||2 + G2 N n=1 1 σn , or 2 N n=1 Ln(θn 1) −Ln(θ∗)||2 ≤ σ ||θ0 θ∗||2 −||θ1 θ∗ σ||θ0 θ∗||2 + 2σ ||θ1 θ∗||2 −||θ2 θ∗ σ||θ1 θ∗||2 + Nσ ||θN 1 θ∗||2 σ||θN 1 θ∗||2 + G2 N n=1 1 σn ≤ G2 N n=1 1 σn .

N n=1 1 n ≤ 1+ N 1 1 t dt =(1+ln N ),

8.33.Considerabatchalgorithmthatcomputestheminimumoftheempirical lossfunction, θ∗(N ),havingaquadraticconvergencerate,i.e.,

lnln 1

(i) θ

(N )||2 ∼ i.

Showthatanonlinealgorithm,runningfor n timeinstantssothattospend thesamecomputationalprocessingresourcesasthebatchone, achieves forlargevaluesof N betterperformancethanthebatchalgorithm,i.e., ([Bott03])

Hint:Usethefactthat

Solution:Let K bethenumberofoperationsperiterationfortheonlinealgorithm.Thisamountstoatotalof Kn operations.Thebatch algorithm,inordertomakesense,shouldperform O(lnln N )operations, sothattogetcloseto ||θ(i)

(N )||2 ∼ 1/N .Assumingthatateachiterationitperforms,approximately, K1N operations,thisamountstoatotal of K1N lnln N operations.Tokeepthesameloadforbothalgorithms,it shouldbe,

Kn = K1N lnln N.

Thisleadstothefollowingapproximateaccuracies,

whichprovestheclaim.Notethatinpractice,thevaluesof K and K1 playanimportantroleaswell.

8.34.Showproperty(8.110)fortheproximaloperator.

Solution:Assumeﬁrstthat p =Proxλf (x).Bydeﬁnition, f (p)+ 1 2λ x p 2 ≤ f (v)+ 1 2λ x v 2 , ∀v ∈ Rl

Sincethepreviousinequalityholdstrueforany v ∈ Rl,italsoholdstrue for αv +(1 α)p,where v isanyvectorin Rl,and α anyrealnumber

||θ

∗

||θn θ∗||2 ∼ 1 N lnln N << 1 N ∼||θ∗(N ) θ∗||2

||θn θ∗||2

1 n , and ||θ∗(N ) θ∗||2 ∼ 1 N .

∼

θ∗

||θn θ∗||2 ∼ 1 N lnln N << 1 N ∼||θ∗(N ) θ∗||2 ,

within(0, 1).Hence,

Afterre-arrangingtermsinthepreviousrelation,

Applicationoflimα→0 onbothsidesofthepreviousinequalityresultsin thedesired v

Conversely,assumethat

Thepreviousinequalityclearlysuggeststhat p =Proxλf (x). 8.35.Showproperty(8.111)fortheproximaloperator.

Solution:Forcompactnotations,deﬁne pj :=Proxλf (xj ), j =1, 2.Then, p

Addingthepreviousinequalitiesresultsinto

whichinturnleadstothedesired

λf (p) ≤ λf (αv +(1 α)p)+ 1 2 x αv (1 α)p 2 1 2 x p 2 ≤ λαf (v)+ λ(1 α)f (p)+ 1 2 x p 2 + 1 2 α 2 v p 2 α x p, v p − 1 2 x p 2 = λαf (v)+ λ(1 α)f (p)+ 1 2 α 2 v p 2 α x p, v p

λf (p) ≤ λf (v)+ 1 2 α v p 2 − x p, v p , ∀α ∈ (0, 1)

x p ≤ λ f (v) f (p)

p, x p /λ ≤ f (v) f (p

f (p)+ 1 2λ x p 2 ≤ f (v)+ 1 2λ x p 2 1 λ v p, x p = f (v)+ 1 2λ (x v)+(v p) 2 1 λ v p, x p = f (v)+ 1 2λ x v 2 + 1 2λ v p 2 + 1 λ v p, x v − 1 λ v p, x p = f

1 2λ x v 2 + 1 2λ v p 2 1 λ v p 2 = f

1 2λ x v 2 1 2λ v p 2 ≤ f

1 2λ x v 2 , ∀v ∈ Rl .

).Then,

(

(v)+

(

1 ≤ λ

2 p1, x1 p

f (p2) f (p1) , p1 p2, x2 p2 ≤ λ f (p1) f (p2)

p1 p2, (p1 p2) (x1 x2) ≤ 0,

p1 p2 2 ≤ p1 p2, x1 x2 .

8.36.Provethatthereﬂectedproximaloperatorisnon-expansiveandthenthat therecursionin(8.117)convergestoaminimizerof f .

Solution:Deﬁnethemapping R :=2Proxλf I.Then,(8.117)takes thefollowingform:

Noticethat R isnon-expansive:

Inturn,let z beaﬁxedpoint,then

k (2 µk ) Proxλf (xk ) xk 2 = xk z 2 µk (2 µk) Proxλf (xk) xk 2 .

Hence, ∀k, µk (2 µk ) Proxλf (xk ) xk 2 ≤ xk z 2 − xk+1 z 2

(xk ) z

Givenanynon-negativeinteger k0 ,theprevioustelescopinginequalityis

xk+1 = xk + µk 2 R(xk ) xk = 1 µk 2 xk + µk 2 R(xk)

∀

x2 ∈ Rl , R(x1) R(x2) 2 = 2 Proxλf (x1 ) Proxλf (x2) (x1 x2) 2 =4 Proxλf

x1) Proxλf (x2) 2 + x1 x2 2 4 Prox

x1)

λf (x2)

x1 x2 ≤ 4 Proxλf

Proxλf (x2) 2 + x1 x2 2 4 Prox

x1 ) Prox

x2) 2 = x1 x2 2

(

λf (

Prox

)

λf (

(xk ) xk 2 = 1 µk 2 xk z 2 + µk 2 2 Prox

µk

xk+1 z 2 = 1 µk 2 (xk z)+ µk 2 R(xk) z 2 = 1 µk 2 xk z 2 + µk 2 R(xk ) z 2 µk 2 1 µk 2 R

λf (xk ) z (xk z)

µk (2 µk ) Proxλf (xk ) xk 2 = 1 µk 2 xk z 2 + µk 2 xk z 2 +2µk Proxλf (xk ) z 2 2µk Proxλf (xk ) z, xk z µk (2 µk ) Proxλf (xk ) xk 2 ≤ xk z 2 +2µk Proxλf (xk) z 2 2

Proxλf

2 µ

utilizedforall k ∈{0,...,k0} toproduce

Sincethepreviousrelationholdsforany k0 ,applyinglim

→∞ onboth sidesoftheinequalityresultsinto

Moreover,noticethat

= Proxλf (

Since( Proxλf (xk ) xk )k∈N ismonotonicallynon-increasing,andbounded frombelow,itconverges.Necessarily,limk→∞ Proxλf (xk ) xk 2 =0. Otherwise,thereexistsan ǫ> 0andasubsequence(km)m∈N suchthat

Thistogetherwiththefactthatlim

∞,and(39) implythat

whichisclearlyabsurd.

k0 k=0 µk(2 µk ) Proxλf (xk ) xk 2 ≤ x0 z 2 − xk0 +1 z 2 ≤ x0 z 2 .

+∞ k=0 µk(2 µk ) Proxλf (xk) xk 2 < +∞ (39)

Proxλf (xk+1 ) xk+1 = 1 2 R(xk+1) xk+1 = 1 2 R(xk+1) R(xk )+ 1 µk 2 R(xk ) xk ≤ 1 2 R(xk+1 ) R(xk ) + 1 2 1 µk 2 R(xk) xk ≤ 1 2 xk+1 xk + 1 µk 2 Proxλf (xk ) xk = µk 2 Proxλf (xk ) xk + 1 µk 2 Proxλf (xk ) xk

)

Prox

xkm ) xkm 2 ≥ ǫ, ∀m ∈ N

λf (

m→∞ km i=0 µi(2 µi

+∞ > +∞ k=0 µk (2 µk ) Proxλf (xk ) xk 2 ≥ +∞ m=0 µkm (2 µkm ) Proxλf (xkm ) xkm 2 ≥ ǫ +∞ m=0 µkm (2 µkm )=+∞,

)=+

Let x∗ beanarbitraryclusterpoint.Noticethat x∗ Proxλf (x∗) 2 = x∗ xkm + xkm Proxλf (x∗) 2 = x∗ xkm 2 + xkm Proxλf (xkm )+Proxλf (xkm ) Proxλf (x∗) 2

+2 x∗ xkm , xkm Proxλf (x∗) = x∗ xkm 2 + xkm

λf (xkm ) 2 + Proxλf (xkm ) Proxλf (x∗) 2

+2 xkm Proxλf (xkm ), Proxλf (xkm ) Proxλf (x∗)

+2 x∗ xkm , xkm x∗ + x∗ Proxλf (x∗)

x∗ xkm 2 + xkm

λf (xkm ) 2 + xkm x∗ 2 +2 xkm Proxλf (xkm ),

λf (xkm ) Prox

Applyinglimm→∞ onbothsidesofthepreviousinequalityresultsinto x∗ =Proxλf (x∗) ⇔ x∗ ∈ Fix(Proxλf ).Since x∗ waschosenarbitrarily withinthesetofallclusterpointsof(xk )k∈N,thenitcanbereadilyseen thatallclusterpointsbelongtoFix(Proxλf ). Wehavealreadyseenthatthesequence( xn x 2)n∈N convergesfor any x ∈ Fix(Proxλf ).Moreover,anyclusterpointof(xk)k∈N belongsto Fix(Proxλf ).Letusshownowthat(xk )k∈N possessesonlyonecluster point.Tothisend,assumetwoclusterpoints x, y of(xk)k∈N .Thismeans thatthereexistsubsequences(xkm )m∈N and(xlm )m∈N whichconvergeto x and y,respectively.Moreover,noticethat xk, x y = 1 2 xk y 2 − xk x 2 + x 2 − y 2 .

Sinceboth( xk x 2)k∈N and( xk y 2 )k∈N converge,sodoesalsothe sequence( xk , x y )k∈N .Hence, x, x y = lim

xlm , x y = y, x y ,

andinturn, x y 2 =0 ⇒ x = y.Toconclude,(xk )k∈N convergestoa pointinFix(Proxλf )=argminv∈Rl f (v).

Prox

≤

λf (x∗) 2 x∗ xkm 2 +2 x∗ xkm , x∗ Proxλf (x∗) ≤ xkm Proxλf (xkm ) 2 +2

km Proxλf (xkm ) Proxλf (xkm ) Proxλf (x∗) +2 x∗ xkm x∗ Proxλf (x∗) ≤ xkm Proxλf (xkm ) 2 +2 xkm Proxλf (xkm ) xkm x∗ +2

∗ xkm x∗ Proxλf

x∗)

Prox

(

m→∞

=lim k→∞

m→∞

xkm , x y =lim

xkm , x y

xk, x y =lim

xlm , x y

lim

8.37.Derive(8.121)from(8.120).

Solution:Usethematrixinversionlemma

(A + BD 1C) 1 = A 1 A 1 B(D + CA 1 B) 1 CA 1 , with B = C = I, D = A 1 and A = ǫI,whichgives (A + ǫI) 1 =

whichﬁnallyleadstotheresult.

1 ǫ I 1 ǫ2 1 ǫ I + A 1 1 = 1 ǫ I 1 ǫ2 1 ǫ A + I 1A = 1 ǫ I 1 ǫ A + ǫI) 1A,