SolutionsToProblemsOfChapter8
8.1.ProvetheCauchy-Schwartz’sinequalityinageneralHilbertspace.
Solution:Wehavetoshowthat ∀x, y ∈ H, | x, y |≤ x y , andthatequalityholdsonlyif y = a
C.
Solution:Theinequalityholdstruefor x and/or y = 0.Letnow y, x = 0. Wehave
Sincethelastinequalityisvalidforany λ ∈ C,let
Thus
|
|2
,
, fromwhicha)theinequalityresultsandb)thefactthattheequality holds iff x = ay
Indeed,if x = ay,thenequalityistriviallyshown.Letusnowassume thatequalityholdstrue.Then
andfromthepropertiesoftheinnerproductinaHilbertspacewehave
, fromwhichitisreadilyseenthat
whichprovestheclaim.
8.2.Showa)thatthesetofpointsinaHilbertspace H,
isaconvexset,andb)thesetofpoints
1
x, a ∈
0 ≤ x λy 2 = x λy, x λy = x 2 + |λ|2 y 2 λ∗ x, y − λ y, x .
λ
= x, y y 2
0 ≤
x 2
x
y
y 2
x, x y, y = x, y ∗ x, y
x
x = x, x, y y
x = x
y
y
2 y,
, ||y||2
,
||
||
C
C
= {x : x ≤ 1}
= {x : x =1}
2 isanonconvexone.
Solution:FromthedefinitionofaHilbertspace(seeAppendix)thenorm istheinducedbytheinnerproductnorm,i.e., x =( x, x ) 1 2 .
a)Letusnowconsidertwopoints, x1, x2 ∈ H,suchthat
andlet
Then,bythetriangleinequalitypropertyofanorm
andsince λ ∈ [0, 1]
b)Lettwopointssuchthat,
and
Thenwehavethat
FromtheSchwartzinequality(Problem8.1),wehavethat
or
From(1)and(3)isreadilyseenthat
Asamatteroffact,theonlywayfor x 2 =1,isthat x1 , x2 =1= x1 x2 . However,thisisnotpossible.Equalityin(2)isattainedif x1 = ax2, andsince x1 = x2 =1,thiscanonlyhappeninthetrivialcaseof x1 = x2.
1
x2 ≤
x
1
x
λx1 +(1 λ)x2 ≤ λx1 + (1
x2
x ≤
1
λ
x2 ≤ (λ +1
≤ 1
x
≤ 1,
1,
= λx
+(1 λ)x2 ,λ ∈ [0, 1]
=
λ)
,
λ x
+(1
)
λ)1
1
2
x = λx1 +(1 λ)x2
x
=1, x
=1,
x 2 = λx1 +(1 λ)x2 ,λx1 +(1 λ)x2 = λ2 x1 2 +(1 λ)2 x2 2 +2λ(1 λ) x1 , x2 = λ2 +(1 λ)2 +2λ(1 λ) x1 , x2 . (1)
| x1 , x2 |≤ x1 x2 , (2)
1 ≤ x1 , x2 ≤ 1. (3)
x 2 ≤ 1
.
8.3.Showthefirstorderconvexitycondition.
Solution:Assumethat f isconvex.Then f λy +(1 λ)x ≤ λf (y)+(1 λ)f (x) or f x + λ(y x) f (x) ≤
) f (x) . Taking λ −→ 0,wecanemploytheTaylorexpansionandget
fromwhich,inthelimitweobtain
(4) b)Assumethat
(x)(y x), isvalid ∀x, y ∈ X,where X isthedomainofdefinitionof f .Then,we have
y1 x), (5) and f (y2) ≥ f (x)+ ∇T f (x)(y2 x) (6)
Combiningtheprevioustwoinequalitiestogether,weobtain
(y1)+(1 λ)f (y2 ) ≥
(
)+(1 λ)f (x)+ λ∇T f (x)(y1 x)+ (1 λ)∇T f (x)(y2 x), (7) for λ ∈ (0, 1).Sincethisistrueforany x,itwillalsobetruefor x = λy1 +(1 λ)y2 , whichresultsin f λy1 +(1 λ)y2 ≤ λf (y1 )+(1 λ)f (y2), (8) whichprovestheclaim.
8.4.Showthatafunction f isconvex,ifftheone-dimensionalfunction, g(t):= f (x + ty), isconvex, ∀x, y inthedomainofdefinitionof f
Solution:Observethat,
g(λt1 +(1 λ)t2 )= f (x + λt1 y +(1 λ)t2 y) = f (λx +(1 λ)x + λt1 y +(1 λ)t2 y) = f λ(x + t1 y)+(1 λ)(x + t2y) .
3
λ f
x
λ(y x) f (x) ≈ f (x)+λ∇T f (x)(y x) f (x) ≤ λ f (y
x
∇
f
f
x
∇
f
x
(y
f
+
) f (
) ,
f (y) ≥ f (x)+ ∇T f (x)(y x).
f (y) ≥ f (x)+
T f
(y1) ≥
(
)+
T
(
)(
λf
λf
x
Alsonotethat
g(t1 )= f (x + t1y),g(t2)= f (x + t2y),
andtakingthedefinitionofconvexity,theclaimisnowstraightforwardto beshown.
8.5.Showthesecondorderconvexitycondition.
Hint:Showtheclaimfirstfortheone-dimensionalcaseandthenuse the resultofthepreviousproblem,forthegeneralization.
Solution:Westartwiththeone-dimensionalcase.Letafunction f (x) beconvex.Thenweknowfromthefirstorderconvexityconditionthat
f ′(x)(y x) ≤ f (y) f (x) ≤ f ′(y)(y x), anddividingbothsidesbythepositivequantity(y x)2 ,weget
f ′(y) f ′(x) y x ≥ 0,
andtakingthelimit y −→ x weobtain f ′′ (x) ≥ 0. (9)
Assumenowthatthesecondderivativeisnon-negativeeverywhere.Then select y>x andweget
0 ≤ y x f ′′ (z)(y z)dz = f ′(z)(y z) |z=y z=x + y x f ′(z)dz = f ′(x)(y x)+ f (y) f (x) (10)
Theaboveistruefor y>x.Notethatwecanalsoshowthat
f (x) ≥ f ′(y)(x y)+ f (y), byusingtheidentity
0 ≤ y x f ′′ (z)(z x)dz.
Thusweproved f isconvex.Forthemoregeneralcase,consider g(t)= f (x + ty),
formwhichweget g ′′ (t)= y T ∇2 f (x + ty)y
Sincethisistrueforany x, y and t andusingthepreviouslyobtained results,theclaimisreadilyshown.
4
8.6.Showthatafunction
f : Rl −→ R isconvexiffitsepigraphisconvex.
Solution:a)Assume f tobeconvex.Wehavetoshowthatitsepigraph isaconvexset.Lettwopoints, x1, x2.Fromtheconvexityof f wehave
(11)
Considertwopointsintheepigraph y1 =(x1 ,r1)and y2 =(x2 ,r2).Then wehave
with
(12)
(13) andsince
(14)
Combining(11)and(14)weget
hence y =(x,r) ∈ epi(f )andtheepigraphisconvex. b)Assumetheepigraphtobeconvex.Then y = λy1 +(1 λ)y2 ∈
hence
forany
Thus(15)isalsovalidfor r1 = f (x1),r2 = f (x2)andtherefore f (λx1 +(1 λ)x2) ≤ λf (x1)+(1 λ)f (x2)
8.7.Showthatifafunctionisconvex,thenitslowerlevelsetisconvexforany ξ.
Solution:Letthefunction f beconvexandtwopoints, x, y,whichliein thelev≤ξ (f ).Then, f (x) ≤ ξ,f (y) ≤ ξ. Hence,bythedefinitionofconvexity, f (λx +(1 λ)y) ≤ λf (x)+(1 λ)f (y) ≤ λξ +(1 λ)ξ ≤ ξ, whichprovestheclaim,that λx +(1 λ)y ∈ lev≤ξ (f ).
5
f (λx1 +(1 λ)x2) ≤ λf (x1)+(1 λ)f (x2)
λ
y
x = λx1 +(1 λ
x2 ,
r = λr1 +(1 λ)r2
y1, y2 ∈ epi(f ) f (x1) ≤ r1 ,f (x2) ≤ r2
y1 +(1 λ)y2 :=
=(x,r)
)
,
f (x
(x1)+(1 λ)f (x2 ) ≤ λr1 +(1 λ)r2
r,
epi(f )
r = λr1 +(1 λ)r2 ≥ f (λx1 +(1 λ)x2
1
2
) ≤ λf
=
,
)(15)
r1 ≥ f (x
),r
≥ f (x2)
8.8.ShowthatinaHilbertspace, H,theparallelogramrule, x + y 2 + x y 2 =2 x 2 + y 2 , ∀x, y ∈ H. holdstrue.
Solution:Theproofisstraightforwardfromtherespectivedefinitionsand thepropertiesoftheinnerproductrelation.
fromwhichtheparallelogramruleisobtained.
8.9.Showthatif x, y ∈ H,where H isaHilbertspace,thentheinducedby theinnerproductnormsatisfiesthetriangleinequality,asrequired byany norm,i.e., x + y ≤ x + y
Solution:Bytherespectivedefinitionswehave
or
wheretheCauchy-Schwartzinequalityhasbeenused.
8.10.Showthatifapoint x∗ isalocalminimizerofaconvexfunction,itisnecessarilyaglobalone.Moreover,itistheuniqueminimizerifthefunction isstrictlyconvex.
Solution:Let x∗ bealocalminimizer.Then,thereexists ǫ> 0and y∗ / ∈ B[0,ǫ]suchthat f (x∗) ≤ f (x∗ +∆), ∀∆ ∈ B[0,ǫ], and f (y∗) <f (x∗). Let
Then,
6
x
x, y
x y
= x 2 2 x, y
+ y 2 = x 2 +2
+ y 2 ,
2
+ y 2 ,
x
x
y
x + y
x 2 + x, y + y, x + y
x + y 2 = x 2 +2Real( x, y )+ y 2 ≤ x 2 +2| x, y | + y 2 ≤ x 2 + y 2 +2 x y =( x + y )2
+ y 2 :=
+
,
=
2 ,
λ
λ
y∗
∗
:= ǫ 2||y∗ x∗||
(
x
) ∈ B[0,ǫ].
Hence,
whichisnotpossible,since x∗ isalocalminimizer.
Assumenowthat f isstrictlyconvexandthatthereexisttwominimizers, x∗ = y∗.Then,bythedefinitionofstrictconvexity,wehavethat
whichisnotpossible,since x∗ isaglobalminimizer.
8.11.Let C beaclosedconvexsetinaHilbertspace, H.Thenshowthat ∀x ∈ H,thereexistsapoint,denotedas PC (x) ∈ C,suchthat
Solution:Let x ∈ C,otherwiseitistrivial.Let ρ bethelargestlower boundof x y , y ∈ C,i.e., ρ :=inf y∈C x y > 0
Considerthesequence,
ρn = ρ + 1 n
Bythedefinitionoftheinfimum,foreach n,therewillbeatleastone element xn ∈ C,suchthat x xn <ρn, whichthendefinesasequence, {xn},ofpointsforwhichwehavethat
or
whichnecessarilyleadsto lim n→∞ x xn = ρ (16)
Fromtheparallelogramlaw,wecanwrite
or
7
f
λ
y∗
∗
(1 λ
f
x∗
x∗ +
(
x
) ≤
)
(
)+ λf (y∗) <f (x∗),
f ( 1 2 x∗ + 1 2 y∗) < 1 2 f (x∗)+ 1 2 f (y∗)= f (x∗)
,
x
∈
PC (x) =min y
C x y .
ρ
n→∞
n→∞ ρ
≤ x xn <ρn,
ρ ≤ lim
x xn < lim
n = ρ,
2 + (x xm) (x xn) 2 =2 x xm 2 + x xn 2 ,
xn xm 2 =2( x xm 2 + x xn 2) 4 x 1 2 (xn + xm) 2
(x xm)+(x xn)
However,since C isconvex,thepoint 1 2 (xn + xm) ∈ C andwededuce that xn xm 2 ≤ 2 x xm 2 + x xn 2 4 ρ
Takingthelimitfor n,m →∞ onbothsidesweget
lim n,m→∞ xm xn 2 ≤ 0 ⇒ lim n,m→∞ xm xn =0
Thatis, xn isaCauchysequenceandsince H isHilbertthesequence convergestoapoint x∗.Moreoveritconvergesin C,since C isclosed, i.e., x ∈ C.Hencewehave
x x∗ = x xn + xn x∗ ≤ x xn + xn x∗
Takingthelimitandusing(16)weobtain x x∗ ≤ ρ.
However,since x∗ ∈ C, x x∗ ≥ ρ, whichmeansthat x x∗ = ρ, thatistheinfimumisattained,whichprovestheclaim.Uniquenesshas beenestablishedinthetext.
8.12.Showthattheprojectionofapoint x ∈ H ontoanon-emptyclosedconvex set, C ⊂ H,liesontheboundaryof C.
Solution:Assumethat PC (x)isaninteriorpointof C.Bythedefinitionofinteriorpoints, ∃δ> 0suchthat Sδ := {y : y PC (x) <δ}⊂ C.
Let z := PC (x)+ δ 2 · x PC (x) x PC (x) , wherebyassumption x PC (x) > 0,since x ∈ C.Obviously z ∈ Sδ . Hence, x z = |x PC (x) 1 δ 2 x PC (x)
However, δ canbechosenarbitrarilysmall,thuschoose δ< x PC (x) .
8
However,thisisnotpossible,sinceinthiscase x z < x PC (x) , whichviolatesthedefinitionofprojection.Thus, PC (x)liesontothe boundaryof C.
8.13.Derivetheformulafortheprojectionontoahyperplaneina(real)Hilbert space, H.
Solution:Letusfirstshowthatahyperplane, H,isaclosedconvexset. Convexityisshowntrivially.Toshowcloseness,let yn ∈ H −→ y∗.We willshowthat y∗ ∈ H.Let
Hence,
or
, whichprovestheclaim.
Letnow z ∈ H betheprojectionof x ∈ H,i.e., z : PC (x).Thenbythe definition z :=argmin θ,z +θ0 =0 x z, x z
UsingLagrangemultipliers,weobtaintheLagrangian
Forthosenotfamiliarwithinfinitedimensionalspaces,itsufficestosay thatsimilarrulesofdifferentiationapply,althoughtherespectivedefinitionsaredifferent(moregeneral).
AfterdifferentiationoftheLagrangian,weobtain 2z 2x λθ =0 or z = 1 2 (2x + λθ).
Pluggingintotheconstraint,weobtain
whichthenresultsinthesolution.
9
0
∗
θ
y∗
θ
− θ
y
= θ
y∗
≤ θ, y
+ θ0 2 =
,
+
, yn
,
n + θ0 2
,
yn 2 .
0 ≤ θ, y∗ + θ0 2 =lim n→∞ θ, y∗ yn 2 =0,
θ
y∗
θ0
,
+
=0
L
z
x z, x z − λ θ, z + θ0
(
,λ)=
λ = 2 θ, x + θ0 θ 2 ,
8.14.Derivetheformulafortheprojectionontoaclosedball, B[0,ρ].
Solution:Theclosedballisdefinedas
Wehavealreadyseen(Problem8.2)thatitisconvex.Letusshowthe closeness.Let
Wehavetoshowthat y
whichprovestheclaim.
Toderivetheprojection,wefollowsimilarstepsasinProblem8.13,by replacingtheconstraintby
sincetheprojectionisontheboundary.Takingthegradientofthe Lagrangianweget
Pluggingintotheconstraint,weget
10
B
[0,ρ]= {y : y ≤ ρ}
yn ∈ B[0,ρ] → y ∗
∗ ∈ B
0
y ∗ 2 = y ∗ yn + yn 2 ≤ yn 2 + y ∗ yn 2 ≤ ρ 2 + y ∗ yn 2 , or y ∗ 2 ≤ ρ 2 +lim n→∞ y ∗ yn 2 , or y ∗ 2 ≤ ρ 2 ,
[
,ρ].
z 2 = ρ 2 ,
2(z x)+2λz =0, or z = 1 1+ λ x.
|
1
z
z
1+ λ| =
ρ x When1+ λ = 1 ρ x ,weget
= ρ x x, andwhen1+ λ = 1 ρ x
= ρ x x
Fromthetwopossiblevectors,wehavetokeeptheonethathasthesmaller distancefrom x.However, x ρ x x < x + ρ x x , since ρ x < 1.Thus, 1+ λ = 1 ρ x , andtheprojectionisequalto
8.15.Findanexampleofapointwhoseprojectiononthe ℓ1 ballisnotunique.
Solution:The ℓ1 ballofradius ρ =1in Rl isdefinedas S1[0,ρ]= {y : i=1 |yi|≤ ρ}
Letthepoint x =[1, 1]T ∈ R2 , whichobviouslydoesnotlieinsidethe ℓ1 ballofradius ρ =1,since x 1 =2 > 1.Foranypoint y ∈ S1[0, 1]wehave
or
Thatis,the ℓ1 normof x fromanypointintheset S1[0, 1]isbounded belowby1.Considerthetwopoints y1 =[1, 0]T and y1 =[0, 1]T
Forbothofthemthelowerboundisachieved,i.e., x y1 = x y2 =1
Moreover,onecaneasilycheckoutthatallpointsonthelinesegment
1 + y2 =1 canbeprojectionpointsof x.
8.16.Showthatif C ⊂ H,isaclosedconvexsetinaHilbertspace,then ∀x ∈ H and ∀y ∈ C,theprojection PC (x)satisfiesthefollowingproperties:
11
P
ρ
B[0,ρ](x)=
x x, ||x|| >ρ x otherwise
x y 1 = |1 y1| + |1 y2
1 −|y1| +1 −|y2|
x y 1 ≥ 2
y
≥
|≥
−
1
1
y
: y
Solution:Weknowthat PC (x) ∈ C.Hence,duetotheconvexityof C, ∀λ ∈ [0, 1]
whichbythedefinitionofthenorm,gives
Takingthelimit λ → 0,weprovethefirstproperty.
Toprovethesecondproperty,since PC (y) ∈ C,weapplytheprevious propertywith PC (y)inplaceof y,i.e.,
Afteraddingtheaboveinequalitiestogetherandrearrangingthetermswe obtainthesecondproperty.
8.17.Provethatif S isaclosedsubspace S ⊂ H inaHilbertspace H,then ∀x, y ∈ H, x,PS (y) = PS (x),
=
(x),PS (y) and PS (ax + by)= aPS (x)+ bPS (y).
Hint:UsetheresultofProblem8.18.
Solution:Wehavethat x,PS (y) = PS (x)+(x PS (x)),PS (y) = PS (x),PS (y) , since x PS (x) ⊥ PS (y)
12
• Real{ x PC (x), y PC (x) }≤ 0. • PC (x) PC (y) 2 ≤ Real{ x y,PC (x) PC (y) }
x
x
≤ x (λy
x
≤
x
x
Real{ x PC
x
y PC (x) }≤
y PC (x
λy +(1 λ)PC (x) ∈ C. Hence
PC (
) 2
+(1 λ)PC (
)) 2
(x PC (x)) λ(y PC (x)) 2 ,
PC (x) 2 ≤
PC (x) 2 + λ2 y PC (x) 2 2λReal{ x PC (x), y PC (x) } or
(
),
λ 2
) 2 .
y
Real { x PC (x),PC (
) PC (x) }≤ 0.
Similarly Real { y PC (y),PC (x) PC (y) }≤ 0.
y
PS
fromProblem8.18.Similarly,wecanshowthat PS (x), y = PS (x),PS (y)
Hence x,PS (y) = PS (x), y
Forthelinearity,wehave
x = PS (x)+(x PS (x)), y = PS (y)+(y PS (y)), where PS (x),PS (y) ∈ S and x PS (x) ∈ S⊥ and y PS (y) ∈ S⊥.Hence
ax + by =(aPS (x)+ bPS (y))+(a(x PS (x))+ b(y PS (y))) , andsincetheterminthesecondparenthesisontherighthandsideliesin S⊥ wereadilyobtainthat PS (ax + by)= aPS (x)+ bPS (y)
8.18.Let S beaclosedconvexsubspaceinaHilbertspace H, S ⊂ H.Let S⊥ be thesetofallelements x ∈ H whichareorthogonalto S.Thenshowthat, a) S⊥ isalsoaclosed,subspace,b) S ∩ S⊥ = {0},c) H = S ⊕ S⊥;that is, ∀x ∈ H, ∃x1 ∈ S and x2 ∈ S⊥ : x = x1 +x2,where x1, x2 are unique
Solution: a)Wewillfirstprovethat S⊥ isasubspace.Indeedif x1 ∈ S⊥ and x2 ∈ S⊥ then x1 , y = x2, y =0, ∀y ∈ S. or ax1 + bx2, y =0 ⇒ ax1 + bx2 ∈ S⊥ .
Also, 0 ∈ S⊥ since x, 0 =0.Hence S⊥ isasubspace. Wewillprovethat S⊥ isalsoclosed.Let {xn}∈ S⊥ and lim n→∞ xn = x∗.
Wewillshowthat x∗ ∈ S⊥.Bythedefinition xn, y =0, ∀y ∈ S.
Moreover, | x∗, y | = | xn, y − x∗, y | = | xn x∗, y |≤ xn x∗ y → 0,
13
wheretheCauchy-Schwartzinequalityhasbeenused.Thelastinequality leadsto x∗, y =0 ⇒ x∗ ∈ S⊥ .
b)Let x ∈ S ∩ S⊥.Bydefinition,sinceitbelongstobothsubspaces, x, x =0 ⇒ x = 0,
c)Let x ∈ H.Wehavethat x = PS (x)+(x PS (x)).
Wewillfirstshowthat x PS (x) ∈ S⊥.Thenwewillshowthatthis decompositionisunique.Wealreadyknowthat
{ x PS (x), y PS (x) }≤ 0, ∀y ∈ S.
Also,since S isasubspace, ay ∈ S, ∀a ∈ R,hence
{ x PS (x),ay PS (x) }≤ 0, or aReal{ x PS (x), y }≤
{ x PS (x),PS (y) }, whichcanonlybetrueif
{ x PS (x), y } =0
Weapplythesamefor jy.Thenwehavethat x PS (x),jy = j x PS (x), y
Recallthatif c ∈ C
Imag{c} =Real{−jc}, Hence, Real{−j x PS (x), y } =0=Imag{ x PS (x), y }
Thus, x PS (x), y } =0, ∀y ∈ S, and x PS (x) ∈ S⊥
Thus x = x1 + x2, x1 = PS (x) ∈ S, x2 = x PS (x) ∈ S⊥ Letusnowassumethatthereisanotherdecomposition, x = x3 + x4, x3 ∈ S, x4 ∈ S⊥ .
14
Real
Real
Real
Real
Then
or
, whichnecessarilyimpliesthattheyareequaltothesinglepointcomprising S ∩ S⊥,i.e.,
, hencethedecompositionisuniqueandwehaveprovedtheclaim. Letuselaborateabitmore.wewillshowthat,
Indeed, PS⊥ (x)isunique.Also,
or
(
), since x PS (x) ∈ S⊥.Notethatweusedthefactthatif y ∈ S then
Indeed,
since
8.19.Showthattherelaxedprojectionoperatorisanon-expansivemapping.
Solution:Bytherespectivedefinitionswehave,
a) µ ∈ (0, 1].Recallingthetrianglepropertyofanorm(Appendix8.15), weget
15
x1 + x2 = x3 + x4
S → x1 x3 = x4 x2 ∈ S⊥
x1 x3 =0= x4 x2
PS⊥ (x)= x PS (x)
x = PS (x)+(x PS (x))
PS⊥ (x)= 0 + PS⊥ (x PS (x))
P
= x PS
x
S⊥ (y)= 0
y 0 2 = y 2 < y a 2 , ∀a = 0 ∈ S⊥
y a 2 = y 2 + a 2 2Real y, a = y 2 + a 2
TC (x) TC (y) = x + µ(PC (x) x) y µ(PC (y) y) = (1 µ)(x y)+ µ(PC (x) PC (y))
TC (x) TC (y) ≤|1 µ| x y + µ PC (x) PC (y) ≤|1 µ| x y + µ x y ≤ x y .
b) µ ∈ (1, 2).Inthiscase
TC (x) TC (y) 2 =(1 µ)2 x y 2 + µ 2 PC (x) PC (y) 2
+2µ(1 µ)Real{ x y,PC (x) PC (y) } ≤ (1 µ)2 x y 2 + µ 2 PC (x) PC (y) 2
+2µ(1 µ) PC (x) PC (y) 2 =(1 µ)2 x y 2 + µ(2 µ) PC (x) PC (y) 2
≤ (1 µ)2 x y 2 + µ(2 µ) x y 2 ≤ x y 2
Toderivetheboundsweusedthat1 µ< 0and2 µ> 0,for µ ∈ (1, 2).
8.20.Showthattherelaxedprojectionoperatorisastronglyattractivemapping.
Solution:Bytherespectivedefinitionwehave,
TC (x) y 2 = x + µ(PC (x) x) y 2
= x y 2 + µ 2 PC (x) x 2 +2µReal{ x y,PC (x) x }
= x y 2 + µ 2 PC (x) x 2
+2µReal{ x PC (x)+ PC (x) y,PC (x) x }
= x y 2 + µ 2 PC (x) x 2 2µ PC (x) x 2
+2µReal{ PC (x) y,PC (x) x }
= x y 2 + µ 2 PC (x) x 2 2µ PC (x) x 2
+2µReal{ x PC (x), y PC (y) } ≤ x y 2 µ(2 µ) PC (x) x 2 where(8.15)hasbeenused.Thus TC (x) y 2 ≤ x y 2 µ(2 µ) µ2 TC (x) x 2 = x y 2 (2 µ) µ TC (x) x 2 ,
whereweusedthat PC (x) x = 1 µ (TC (x) x).
8.21.GiveanexampleofasequenceinaHilbertspace H,whichconverges weaklybutnotstrongly.
Solution:Definethesequenceofpoints xn ∈ l2 , xn = {0, 0,..., 1, 0, 0,...} := {δni},i =0, 1, 2,...
16
Thatis,eachpoint, xn isitselfasequence,withzeroseverywhere,except attimeindex, n,whereitis1.Foreverypoint(sequence) y ∈ l2,wehave that
bythedefinitionof l2 space(Appendix8.15).Thepreviousinequality impliesthat
Ontheotherhand,
since
8.22.Provethatif C1 ...CK areclosedconvexsetsinaHilbertspace H,then theoperator
, isa regular one;thatis,
,n
where T n := TT...T istheapplicationof Tn successivetimes.
Solution: Fact1:
isanon-expansivemapping.
Indeed, ∀x, y ∈ H
Fact2:
Indeed,if x ∈ C then
17
y
∞ n=1 |yn|2 = ∞ n=1 | xn, y |2 < ∞,
xn
y −−−−→ n→∞ 0= 0
y
x
x
x
2 :=
,
,
.
n −→ 0,
n 0 =
n =1.
T = TCK
T
T n 1(x) T n(x) −→ 0
···
C1
−→∞,
T = TCK TCK 1 TC1 := TK T1
T (x) T (y) = TK (TK 1 T1)(x) TK (TK 1 T1)(y) ≤ TK 1 T1(x) TK 1 T1(y) ≤ ≤ T1(x) T1(y) ≤ x y
Fix(T )= K k=1 Ck := C.
TK TK 1 T1(x)= TK TK 1 T2(x)= = TK (x)= x.
Moreover,letusassumethat ∃ x ∈ C: T
(x)= x
Then ∀y ∈ C wehave
=
, asshownbefore.Thus,
or
whichcanonlybetrueif x ∈ C andhence T1(x)= x.Notethatthe previoustwofactsarevalidforgeneralHilbertspaces.
Fact3:If C isaclosedsubspace,then Tk,k =1,...,K,and T = TK TK 1 T1 arelinearoperators.Theproofistrivialfromtherespectivelinearityoftheprojectionoperators, Pk ,k =1,...,K.Thisisalso trueforgeneralHilbertspaces.
Fact4:Theoperator T isa regular one,i.e.,
RecallfromProblem8.20that ∀x ∈ H,
or
andbythedefinition
weget
18
K
x y
T (x) T (y) ≤ T1(x) T1(y)
x y = T1(x) T1(y) = T1(x) y ≤ x y
T1(x) y = x y , ∀y ∈ C.
T n(x) T n 1(x) −→ n−→∞ 0
T1(x) y 2 ≤ x y 2 µ1 (2 µ1) x P1 (x) 2
x P1 (x) 2 ≤ 1 µ1 (2 µ1) x y 2 − T1(x) y 2 ,
T1(x)= x + µ1 (P1(x) x),
x T1(x) = µ 2 1 x P1 (x) 2 ≤ µ1 2 µ1 x y 2 − T1(x) y 2 .
x T2T1(x) 2 = x T1(x)+ T1(x) T2T1(x) 2 ≤ ( x T1(x) + T2T1(x) T1(x) )2 ≤ 2 x T1(x) 2 + T2T1(x) T1(x) 2 ,
Letnow
or
Let
Thenobviously,wecanwrite
Followingasimilarrationaleandbyinductionwecanshowthat
where,
and
Nowbyinduction,
Summingbyparts(17)–(19),weobtain
Hence,
Notethattillnow,everythingisvalidforgeneralHilbertspaces.
8.23.ShowthefundamentalPOCStheoremforthecaseofclosedsubspacesin aHilbertspace, H
Solution:Fact1:Therelaxedprojectionoperatorisselfadjoint,i.e., x,TCi (y) = TCi (x), y , ∀x, y ∈ H.
19
x T2T1(x) 2 ≤ 2µ1 2 µ1 ( x y 2 − T1(x) y 2) + 2µ2 2 µ2 ( T1(x) y 2 − T2T1(x) y 2).
b2 =max 2µ1 2 µ1 , 2µ2 2 µ2
x T2T1(x) 2 ≤ 2b2( x y 2 − T12(x) y 2),
T12(x)= T2 T1(x)
where
x T (x) 2 ≤ bK 2K 1( x y 2 − T (x) y 2), (17)
T = TK TK 1 T1,
bK =max 1≤k≤K µk 2 µk .
T (x) T 2(x) 2 ≤ bK 2K 1( T (x) y) 2 − T 2(x) y (18) T n 1(x) T n(x) 2 ≤ bK 2K 1( T n 1(x) y) 2 − T n(x) y (19)
∞ n=1 T n 1(x) T n(x) 2 ≤ bK 2K 1 x y 2 < +∞.
n−→∞ T n 1(x) T n(x) =0
lim
Thisisadirectconsequenceoftheself-adjointpropertyoftheprojection, when Ck isaclosedsubspace,i.e.,
Fact2:Foraclosedsubspace, Ck ,therespectedrelaxedprojectionoperatorislinear,i.e.,
Thisisalsoadirectconsequenceofthelinearpropertyoftheprojection operatorontosubspaces.Thispropertyiseasilycheckedoutthatitis readilytransferredto T := TCK TC1 .
Fact3:
Fact4:Lettheoperator T = TCK TC1 ,withFix(T )= C,where C isa closedsubspace.Then,theset
S := {y : y =(I T )(z), ∀z ∈ H} isalsoa(closed)subspaceanditistheorthogonalcomplementof C,i.e., S = C⊥
Theproofthat S isasubspaceistrivial,fromthelinearityof T .Also,let x ∈ S⊥.Then,bytherespectivedefinition, 0= x, (I T )(z) = x, z − x,T (z) = x, z − T ∗(x), z = (I T ∗)(x), z , ∀z ∈ H. Hence, (I T ∗)(x)= x T ∗(x)= 0, or T ∗(x)= x, andsince T ∗ and T havethesamefixedpointset(theprooftrivial), S⊥ ⊆ C.
20
x
Ck (y) = PCk (x), y = PCk (x),PCk (y)
,P
.
TCk (ax + by)= aTCk (x)+ bTCk (y), ∀x, y ∈ H
x
y
= x,TCK TC1 (y) = TCK (x)
CK 1 TC1 (y) = ... = TC1 TCK (x)
y = T ∗(x)
y
T ∗ = TC1 TC2 TCK .
,T (
)
,T
,
,
, where
Letnow x ∈ C.Then
x, (I T )z = x, z − x,T (z) = x, z − T ∗(x), z, = x T ∗(x), z =0, since
T ∗(x)= x, whichprovesthat S⊥ = C.
NotethatwhatwehavesaidsofarisageneralizationofProblem8.18.
Wearenowreadytoestablish strongconvergence
Therepeatedapplicationof T onany x ∈ H leadsto T n(x)=(TTT )n(x).
Weknowthat ∀x ∈ H thereisauniquedecompositionintotwoorthogonal complement(closed)subspaces,i.e.,
x = y + z, y ∈ C and z ∈ C⊥ , ∀x ∈ H
andthat y = PC (x).
Hence,duetothelinearityof T n (C subspacein H)
T n(x)= T n(y)+ T n(z) = y + T n(z), since C =Fix(T n).However,
T n(z)= T n(I T )w, forsomew ∈ H = T n(w) T n+1(w)
andweknowthat T n(z) = T n(w) T n+1(w) −→ 0
Thus, T n(x) PC (x) −→ 0 whichprovestheclaim.
8.24.Derivethesubdifferentialofthemetricdistancefunction dC (x),where C isaclosedconvexset C ⊆ Rl and x ∈ Rl
Solution:Bydefinitionwehave
∂dC (x)= {g : g T (y x)+ dC (x) ≤ dC (y), ∀y ∈ Rl}.
Thuslet g beasubgradient,then g T (y x) ≤ dC (y) dC (x) ≤ y x .
21
Theaboveiseasilyshownbytherespectivedefinition
dC (y)=min z∈C y z ≤ min z∈C( y x + x z ) ≤ y x +min z∈C x z , or
dC (y) dC (x) ≤ y x . Hence, g T (y x) ≤ y x . Sincethisistrue ∀y,let y : y x =
or g ∈ B[0, 1]
a)Let x / ∈ C and g anysubgradient.Forany y ∈ Rl ,
g T (y + PC (x) x) ≤ dC (y + PC (x)) dC (x). However, dC (y + PC (x))=min z∈C y + PC (x) z ≤ y +min z∈C PC (x) z , andletting z = PC (x)
dC (y + PC (x)) ≤ y . Hence,wecanwritethat g T (y + PC (x) x) ≤ y − dC (x), ∀y ∈ Rl Set y = 0.Then, g T (x PC (x)) ≤− x PC (x) , or
g T (x PC (x)) ≥ x PC (x) However, g ≤ 1, andrecallingtheCauchy-Schwartzinequality,weobtain
x PC (x) g ≥ x PC (x) ⇒ g =1, and g T (x PC (x))= x PC (x) ,
22
g ⇒ g 2 ≤ g ⇒ g ≤ 1,
whichimplies(recallconditionforequalityintheCauchy-Schwartztheorem)
g = (x PC (x)) (x PC (x)) , whichprovestheclaim.
b)Let x ∈ C.Thenbydefinition,wehave
andforany y ∈ C
Ifinaddition x isaninteriorpoint,therewillbe ε> 0: ∀z ∈ Rl
) ≤
, since x ε(z x) ∈ C andthecondition(20)hasbeenused.Thus,
Set z x = g,whichleadsto g = 0
Thiscompletestheproof.
8.25.Derivetheboundin(8.55).
Solution:Subtracting θ∗ frombothsidesoftherecursion,squaringand takingintoaccountthedefinitionofthesubgradient,itisreadilyshown that,
Applyingthepreviousrecursively,weobtain
Takingintoaccounttheboundofthesubgradientandthefactthatthe lefthandsideoftheinequalityisanon-negativenumber,weobtain
23
g T (y x) ≤ dC (y) dC (x),
g T (y x) ≤ 0, g ≤ 1
(20)
g
z x
x
g T (z x) ≤ 0, ∀z ∈ R
T (x ε(
)
0
l
||θ(i) θ∗||2 ≤||θ(i 1) θ∗||2 2µi J(θ(i 1)) J(θ∗) + µ 2 i ||J ′(θ(i 1))||2
||θ(i) θ∗||2 ≤||θ(0) θ∗||2 2 i k=1 µk J(θ(k 1) ) J(θ∗) + i k=1 µ 2 k ||J ′(θ(k 1))||2
2 i k=1 µk J(θ(k 1)) J(θ∗) ≤||θ(0) θ∗||2 + i k=1 µ 2 k G2 . (21)
However,bytherespectivedefinitionweget
Employingthepreviousboundin(21),theclaimisreadilyobtained,i.e.,
8.26.Showthatifafunctionis γ-Lipschitz,thenanyofitssubgradientsis bounded.
Solution:Bythedefinitionofthesubgradientwehavethat ∀u,v wehave that f (u) f (v) ≥<f ′ (v)(u v) >.
Sincethisistrueforall, u,v,Icanalwaysselecta u sothat u v tobe parallelto f ′ (v),Then
Then,ifweplugintheLipschitzcondition,weshowthat ||f ′ (v)|| is bounded.
8.27.Showtheconvergenceofthegenericprojectedsubgradientalgorithmin (8.61).
Solution:Letusbreaktheiterationintotwosteps,
Then,followingthesameargumentsastheonesadoptedinProblem8.25, weget
However,fromthenon-expansivepropertyoftheprojectionoperator,and takingintoaccountthat θ∗ ∈ C,sinceitisasolution,
CombiningthelasttwoformulastheproofproceedsasinProblem8.25.
24
J(θ(k 1)) J(θ∗) ≥ J (i) ∗ J(θ∗),k =1
,...,i.
J (i) ∗ J(θ) ≤ ||θ(0) θ∗||2 2 i k=1 µk + i k=1 µ2 k 2 i k=1 µk G2
<f ′ (v)(u v) >= | <f ′ (v)(u v) > | = ||f ′ (v
u v||
)||||
z(i) = θ(i 1) µiJ ′(θ(i 1)), (22) θ(i) = PC (z(i) ) (23)
||z(i) θ∗||2 ≤||θ(0) θ∗||2 2 i k=1 µk J(θ(k 1)) J(θ∗) + i k=1 µ 2 k ||J ′(θ(k 1))||2 . (24)
||θ(i) θ∗||2 = ||PC (z(i) ) PC (θ∗)||2 ≤||z(i) θ∗||2 . (25)
8.28.Deriveequation(8.99).
Solution:Bythedefinition
Hence,
Hence
ExpandingthelefthandsidetoafirstorderTaylorapproximationwe get,
whichfinallyprovestheclaim.
8.29.ConsidertheonlineversionofPDMbin(8.64),i.e.,
C (θ
wherewehaveassumedthat J∗ =0.Ifthisisnotthecase,ashift canaccommodateforthedifference.Thusweassumethatweknow the minimum.Forexample,thisisthecaseforanumbertasks,suchasthe hingelossfunction,assuminglinearlyseparableclasses,orthelinear ǫinsensitivelossfunction,forboundednoise.Assumethat
ThenderivethatAPSMalgorithmof(8.39).
Solution:Letthelossfunctionbe
25
Jn(θ
1 n n k=1 L(yk , xk, θ
)=
)
Jn(θ)= 1 n n 1 k=1 L(yk , xk , θ)+ L(yn, xn, θ) = n 1 n Jn 1(θ)+ 1 n L(yn, xn, θ),
∇Jn(θ)= n 1 n ∇Jn 1(θ)+ 1 n ∇L(yn, xn, θ)
or
∇Jn(θ∗(n 1))=0+ 1 n ∇L(yn, xn, θ∗(n 1))
∇Jn(θ∗(n))= 0 = ∇Jn(θ∗(n 1))+∇2Jn(θ∗(n 1))(θ∗(n) θ∗(n 1)) ,
∇Jn(θ∗(n 1))= ∇2 Jn(θ∗(n 1))(θ∗(n) θ∗(n 1)) ,
or
θ
PC θn 1 µn J (θn 1 ) ||J ′(θn 1 )||2 J ′(θn 1)
n 1), If J ′(θn 1)= 0, (26)
n =
, If J ′(θn 1) = 0, P
Ln(θ)= n k=n q+1 ωk dCk (θn 1) n k=n q+1 ωkdCk (θn 1) dCk (θ)
L
n k=n
ωk dCk (θ
n k=n q+1 ωkdCk (θn 1) dCk (θ)= n k=n q+1 βk dCk (θ),
n(θ)=
q+1
n 1)
Thenfortherecursion(26),weneedtocomputethesubgradient of Ln(θ), whichbyExample8.5(andfor θ / ∈ Ck )becomes
Hence,(26)nowbecomes(using µ′ n instead,forreasonstobecomeapparentsoon),
theAPSMalgorithmresults.
8.30.Derivetheregretboundforthesubgradientalgorithmin(8.82).
26 where n k=n q+1 βk =1
L ´ n(θn 1)= n k=n q+1 βk d′ Ck (θ) θ=θn 1 = n k=n q+1 βk θn 1 PCk (θn 1) dCk (θn 1) , or L ´ n(θn 1)= 1 L n k=n q+1 ωk(θn 1 PCk (θ
1)), with L = n k=n q+1 ωk dCk (θn 1)
n
θn = θn 1 µ ′ n 1 L n k=n q+1 ωk d2 Ck (θn 1) 1 L 2 n k=n q+1 ωk(θn 1 PCk (θn 1) 2 1 L n k=n q+1 ωk(θn 1 PCk (θn 1)) = θn 1 + µ ′ nM n k=n q+1 ωk (PCk (θn 1) θn 1)) = θn 1 + µ ′ nM n k=n q+1 ωk PCk (θn 1) θn 1 and M := n k=n q+1 ωkd2 Ck (θn 1) n k=n q+1 ωkPCk (θn 1) θn 1 2
µn = µ ′ nM ∈ (0, 2M )
Setting
Solution:Fromthetext,wehavethat
Summingupbothsides,resultsin
Carryingoutthesummationsonthelefthandsideweget
Takingintoaccountthebound ||
,andselectingthestep-size tobeadecreasingsequence,wereadilyget
whichtheneasilyleadsto
Combiningtheabovewith(28),theclaimisproved.
27
Ln(θn 1) −Ln(h) ≤ g T n (θn 1 h) ≤ 1 2µn ||θn 1 h||2 −||θn h||2 + µn 2 G2 (27)
N n=1 Ln(θn 1) N n=1 Ln(h) ≤ N n=1 1 2µn ||θn 1 h||2 −||θn h||2 + G2 2 N n=1 µn (28)
A := 1 2µ1 ||θ0 h||2 1 2µ1 ||θ1 h||2 + 1 2µ2 ||θ1 h||2 1 2µ2 ||θ2 h||2 + 1 2µN ||θN 1 h||2 1 2µN ||θN h||2 + G2 2 N n=1 µn, or A ≤ 1 2µ1 ||θ0 h||2 1 2µ1 ||θ1 h||2 + 1 2µ2 ||θ1 h||2 1 2µ2 ||θ2 h||2 + ......... 1 2µN ||θN 1 h||2 + G2 2 N n=1 µn,
θn h||2 ≤ F 2
A ≤ F 2 1 2µ1 + 1 2 N n=2 1 µn 1 µn 1 + G2 2 N n=1 µn, (29)
A ≤ 1 2µN F 2 + G2 2 N n=1 µn,. (30)
8.31.Showthatafunction f (x)is σ-stronglyconvexifandonlyifthefunction f (x) σ 2 ||x||2 isconvex.
Solution: a)Assumethat
||2 , isconvex.Then,bythedefinitionofthesubgradientat x,wehave
whichreadilyimpliesthat
fromwhichthestrongconvexityof f (x)isdeduced.
b)Assumethat f (x)isstronglyconvex.Thenbyitsdefinitionwehave,
fromwhichweobtain,
whichprovestheclaimthat
||2 isconvex.
8.32.Showthatifthelossfunctionis σ-stronglyconvex,thenif µn = 1 σn ,the regretboundforthesubgradientalgorithmbecomes
Solution:Takingintoaccountthestrongconvexitywehavethat,
andfollowingsimilarargumentsasforProblem8.30,weget
28
f (x) σ 2
x
f (y) σ 2 ||y||2 f (x)+ σ 2 ||x||2 ≥ g T (y x) σx T (y x), (31)
f (y) f (x) ≥ g T (y x)+ σ 2 ||y x||2 , (32)
||
f (y) f (x) ≥ g T (y x)+ σ 2 ||y||2 + σ 2 ||x||2 σx T y +σ||x|||2 σ||x||2 ,
f (y) f (x) ≥ g T (y x)+ σ 2 ||y||2 σ 2 ||x||2 σx T y + σ||x||2 , (33)
f (y) σ 2 ||y||2 f (x)+ σ 2 ||x||2 ≥ g T (y x) σx T (y x), (34)
x) σ 2 ||x
or
f (
1 N N n=1 Ln(θn 1 ) ≤ 1 N N n=1 Ln(θ∗)+ G2 (1+ln N ) 2σN (35)
Ln(θn 1 ) −Ln(θ∗) ≤ g T n (θn 1 θ∗) σ 2 ||θn 1 θ∗||2 , (36)
Ln(θn 1) −Ln(θ∗) ≤ 1 2µn ||θn 1 θ∗||2 −||θn θ∗||2 σ 2 ||θn 1 θ∗||2 + µn 2 G2 (37)
Using µn = 1 σn ,resultsin
Summingupbothsidesweobtain
Usingnowthebound
theclaimisproved.
29
2 Ln(θn 1) −Ln(θ∗) ≤ σn ||θn 1 θ∗||2 −||θn θ∗||2 σ||θn 1 θ∗||2 + 1 σn G2 (38)
2 N n=1 Ln(θn 1) −Ln(θ∗) ≤ σ ||θ0 θ∗||2 −||θ1 θ∗||2 σ||θ0 θ∗||2 + 2σ ||θ1 θ∗||2 −||θ2 θ∗||2 σ||θ1 θ∗||2 + .................................................................. Nσ ||θN 1 θ∗||2 −||θN θ∗||2 σ||θN 1 θ∗||2 + G2 N n=1 1 σn , or 2 N n=1 Ln(θn 1) −Ln(θ∗)||2 ≤ σ ||θ0 θ∗||2 −||θ1 θ∗ σ||θ0 θ∗||2 + 2σ ||θ1 θ∗||2 −||θ2 θ∗ σ||θ1 θ∗||2 + Nσ ||θN 1 θ∗||2 σ||θN 1 θ∗||2 + G2 N n=1 1 σn ≤ G2 N n=1 1 σn .
N n=1 1 n ≤ 1+ N 1 1 t dt =(1+ln N ),
8.33.Considerabatchalgorithmthatcomputestheminimumoftheempirical lossfunction, θ∗(N ),havingaquadraticconvergencerate,i.e.,
lnln 1
(i) θ
(N )||2 ∼ i.
Showthatanonlinealgorithm,runningfor n timeinstantssothattospend thesamecomputationalprocessingresourcesasthebatchone, achieves forlargevaluesof N betterperformancethanthebatchalgorithm,i.e., ([Bott03])
Hint:Usethefactthat
Solution:Let K bethenumberofoperationsperiterationfortheonlinealgorithm.Thisamountstoatotalof Kn operations.Thebatch algorithm,inordertomakesense,shouldperform O(lnln N )operations, sothattogetcloseto ||θ(i)
(N )||2 ∼ 1/N .Assumingthatateachiterationitperforms,approximately, K1N operations,thisamountstoatotal of K1N lnln N operations.Tokeepthesameloadforbothalgorithms,it shouldbe,
Kn = K1N lnln N.
Thisleadstothefollowingapproximateaccuracies,
whichprovestheclaim.Notethatinpractice,thevaluesof K and K1 playanimportantroleaswell.
8.34.Showproperty(8.110)fortheproximaloperator.
Solution:Assumefirstthat p =Proxλf (x).Bydefinition, f (p)+ 1 2λ x p 2 ≤ f (v)+ 1 2λ x v 2 , ∀v ∈ Rl
Sincethepreviousinequalityholdstrueforany v ∈ Rl,italsoholdstrue for αv +(1 α)p,where v isanyvectorin Rl,and α anyrealnumber
30
||θ
∗
||θn θ∗||2 ∼ 1 N lnln N << 1 N ∼||θ∗(N ) θ∗||2
||θn θ∗||2
1 n , and ||θ∗(N ) θ∗||2 ∼ 1 N .
∼
θ∗
||θn θ∗||2 ∼ 1 N lnln N << 1 N ∼||θ∗(N ) θ∗||2 ,
within(0, 1).Hence,
Afterre-arrangingtermsinthepreviousrelation,
Applicationoflimα→0 onbothsidesofthepreviousinequalityresultsin thedesired v
Conversely,assumethat
Thepreviousinequalityclearlysuggeststhat p =Proxλf (x). 8.35.Showproperty(8.111)fortheproximaloperator.
Solution:Forcompactnotations,define pj :=Proxλf (xj ), j =1, 2.Then, p
Addingthepreviousinequalitiesresultsinto
whichinturnleadstothedesired
31
λf (p) ≤ λf (αv +(1 α)p)+ 1 2 x αv (1 α)p 2 1 2 x p 2 ≤ λαf (v)+ λ(1 α)f (p)+ 1 2 x p 2 + 1 2 α 2 v p 2 α x p, v p − 1 2 x p 2 = λαf (v)+ λ(1 α)f (p)+ 1 2 α 2 v p 2 α x p, v p
λf (p) ≤ λf (v)+ 1 2 α v p 2 − x p, v p , ∀α ∈ (0, 1)
x p ≤ λ f (v) f (p)
p,
p, x p /λ ≤ f (v) f (p
f (p)+ 1 2λ x p 2 ≤ f (v)+ 1 2λ x p 2 1 λ v p, x p = f (v)+ 1 2λ (x v)+(v p) 2 1 λ v p, x p = f (v)+ 1 2λ x v 2 + 1 2λ v p 2 + 1 λ v p, x v − 1 λ v p, x p = f
v
1 2λ x v 2 + 1 2λ v p 2 1 λ v p 2 = f
1 2λ x v 2 1 2λ v p 2 ≤ f
v
1 2λ x v 2 , ∀v ∈ Rl .
v
).Then,
(
)+
(v)+
(
)+
1 ≤ λ
2 p1, x1 p
f (p2) f (p1) , p1 p2, x2 p2 ≤ λ f (p1) f (p2)
p1 p2, (p1 p2) (x1 x2) ≤ 0,
p1 p2 2 ≤ p1 p2, x1 x2 .
8.36.Provethatthereflectedproximaloperatorisnon-expansiveandthenthat therecursionin(8.117)convergestoaminimizerof f .
Solution:Definethemapping R :=2Proxλf I.Then,(8.117)takes thefollowingform:
Noticethat R isnon-expansive:
Inturn,let z beafixedpoint,then
k (2 µk ) Proxλf (xk ) xk 2 = xk z 2 µk (2 µk) Proxλf (xk) xk 2 .
Hence, ∀k, µk (2 µk ) Proxλf (xk ) xk 2 ≤ xk z 2 − xk+1 z 2
(xk ) z
Givenanynon-negativeinteger k0 ,theprevioustelescopinginequalityis
32
xk+1 = xk + µk 2 R(xk ) xk = 1 µk 2 xk + µk 2 R(xk)
∀
1
x2 ∈ Rl , R(x1) R(x2) 2 = 2 Proxλf (x1 ) Proxλf (x2) (x1 x2) 2 =4 Proxλf
x1) Proxλf (x2) 2 + x1 x2 2 4 Prox
x1)
λf (x2)
x1 x2 ≤ 4 Proxλf
1
Proxλf (x2) 2 + x1 x2 2 4 Prox
x1 ) Prox
x2) 2 = x1 x2 2
x
,
(
λf (
Prox
,
(x
)
λf (
λf (
(xk ) xk 2 = 1 µk 2 xk z 2 + µk 2 2 Prox
2
µk
xk+1 z 2 = 1 µk 2 (xk z)+ µk 2 R(xk) z 2 = 1 µk 2 xk z 2 + µk 2 R(xk ) z 2 µk 2 1 µk 2 R
λf (xk ) z (xk z)
µk (2 µk ) Proxλf (xk ) xk 2 = 1 µk 2 xk z 2 + µk 2 xk z 2 +2µk Proxλf (xk ) z 2 2µk Proxλf (xk ) z, xk z µk (2 µk ) Proxλf (xk ) xk 2 ≤ xk z 2 +2µk Proxλf (xk) z 2 2
Proxλf
2 µ
utilizedforall k ∈{0,...,k0} toproduce
Sincethepreviousrelationholdsforany k0 ,applyinglim
→∞ onboth sidesoftheinequalityresultsinto
Moreover,noticethat
= Proxλf (
Since( Proxλf (xk ) xk )k∈N ismonotonicallynon-increasing,andbounded frombelow,itconverges.Necessarily,limk→∞ Proxλf (xk ) xk 2 =0. Otherwise,thereexistsan ǫ> 0andasubsequence(km)m∈N suchthat
Thistogetherwiththefactthatlim
∞,and(39) implythat
whichisclearlyabsurd.
33
k0 k=0 µk(2 µk ) Proxλf (xk ) xk 2 ≤ x0 z 2 − xk0 +1 z 2 ≤ x0 z 2 .
0
+∞ k=0 µk(2 µk ) Proxλf (xk) xk 2 < +∞ (39)
k
Proxλf (xk+1 ) xk+1 = 1 2 R(xk+1) xk+1 = 1 2 R(xk+1) R(xk )+ 1 µk 2 R(xk ) xk ≤ 1 2 R(xk+1 ) R(xk ) + 1 2 1 µk 2 R(xk) xk ≤ 1 2 xk+1 xk + 1 µk 2 Proxλf (xk ) xk = µk 2 Proxλf (xk ) xk + 1 µk 2 Proxλf (xk ) xk
xk
xk
)
Prox
xkm ) xkm 2 ≥ ǫ, ∀m ∈ N
λf (
m→∞ km i=0 µi(2 µi
+∞ > +∞ k=0 µk (2 µk ) Proxλf (xk ) xk 2 ≥ +∞ m=0 µkm (2 µkm ) Proxλf (xkm ) xkm 2 ≥ ǫ +∞ m=0 µkm (2 µkm )=+∞,
)=+
Let x∗ beanarbitraryclusterpoint.Noticethat x∗ Proxλf (x∗) 2 = x∗ xkm + xkm Proxλf (x∗) 2 = x∗ xkm 2 + xkm Proxλf (xkm )+Proxλf (xkm ) Proxλf (x∗) 2
+2 x∗ xkm , xkm Proxλf (x∗) = x∗ xkm 2 + xkm
λf (xkm ) 2 + Proxλf (xkm ) Proxλf (x∗) 2
+2 xkm Proxλf (xkm ), Proxλf (xkm ) Proxλf (x∗)
+2 x∗ xkm , xkm x∗ + x∗ Proxλf (x∗)
x∗ xkm 2 + xkm
λf (xkm ) 2 + xkm x∗ 2 +2 xkm Proxλf (xkm ),
λf (xkm ) Prox
Applyinglimm→∞ onbothsidesofthepreviousinequalityresultsinto x∗ =Proxλf (x∗) ⇔ x∗ ∈ Fix(Proxλf ).Since x∗ waschosenarbitrarily withinthesetofallclusterpointsof(xk )k∈N,thenitcanbereadilyseen thatallclusterpointsbelongtoFix(Proxλf ). Wehavealreadyseenthatthesequence( xn x 2)n∈N convergesfor any x ∈ Fix(Proxλf ).Moreover,anyclusterpointof(xk)k∈N belongsto Fix(Proxλf ).Letusshownowthat(xk )k∈N possessesonlyonecluster point.Tothisend,assumetwoclusterpoints x, y of(xk)k∈N .Thismeans thatthereexistsubsequences(xkm )m∈N and(xlm )m∈N whichconvergeto x and y,respectively.Moreover,noticethat xk, x y = 1 2 xk y 2 − xk x 2 + x 2 − y 2 .
Sinceboth( xk x 2)k∈N and( xk y 2 )k∈N converge,sodoesalsothe sequence( xk , x y )k∈N .Hence, x, x y = lim
xlm , x y = y, x y ,
andinturn, x y 2 =0 ⇒ x = y.Toconclude,(xk )k∈N convergestoa pointinFix(Proxλf )=argminv∈Rl f (v).
34
Prox
≤
λf (x∗) 2 x∗ xkm 2 +2 x∗ xkm , x∗ Proxλf (x∗) ≤ xkm Proxλf (xkm ) 2 +2
km Proxλf (xkm ) Proxλf (xkm ) Proxλf (x∗) +2 x∗ xkm x∗ Proxλf (x∗) ≤ xkm Proxλf (xkm ) 2 +2 xkm Proxλf (xkm ) xkm x∗ +2
∗ xkm x∗ Proxλf
x∗)
Prox
Prox
x
x
(
m→∞
m→∞
=lim k→∞
m→∞
=
m→∞
xkm , x y =lim
xkm , x y
xk, x y =lim
xlm , x y
lim
8.37.Derive(8.121)from(8.120).
Solution:Usethematrixinversionlemma
(A + BD 1C) 1 = A 1 A 1 B(D + CA 1 B) 1 CA 1 , with B = C = I, D = A 1 and A = ǫI,whichgives (A + ǫI) 1 =
whichfinallyleadstotheresult.
35
1 ǫ I 1 ǫ2 1 ǫ I + A 1 1 = 1 ǫ I 1 ǫ2 1 ǫ A + I 1A = 1 ǫ I 1 ǫ A + ǫI) 1A,