Download pdf Machine learning a bayesian and optimization perspective 1st edition theodoridis soluti

Page 1

Go to download the full and correct content document: https://testbankfan.com/product/machine-learning-a-bayesian-and-optimization-persp ective-1st-edition-theodoridis-solutions-manual/

Machine Learning A Bayesian and Optimization Perspective 1st Edition Theodoridis Solutions Manual

More products digital (pdf, epub, mobi) instant download maybe you interests ...

Understanding Machine Learning From Theory to Algorithms 1st Edition Shwartz Solutions Manual

https://testbankfan.com/product/understanding-machine-learningfrom-theory-to-algorithms-1st-edition-shwartz-solutions-manual/

Optimization in Operations Research 1st Edition Rardin Solutions Manual

https://testbankfan.com/product/optimization-in-operationsresearch-1st-edition-rardin-solutions-manual/

Management A Faith Based Perspective 1st Edition Cafferky Solutions Manual

https://testbankfan.com/product/management-a-faith-basedperspective-1st-edition-cafferky-solutions-manual/

Learning Sage 50 Accounting 2016 A Modular Approach 1st Edition Freedman Solutions Manual

https://testbankfan.com/product/learningsage-50-accounting-2016-a-modular-approach-1st-edition-freedmansolutions-manual/

Optimization in Operations Research 2nd Edition Rardin Solutions Manual

https://testbankfan.com/product/optimization-in-operationsresearch-2nd-edition-rardin-solutions-manual/

Learning A Behavioral Cognitive and Evolutionary Synthesis 1st Edition Frieman Test Bank

https://testbankfan.com/product/learning-a-behavioral-cognitiveand-evolutionary-synthesis-1st-edition-frieman-test-bank/

Financial Reporting for Managers A Value Creation Perspective 1st Edition Pratt Solutions Manual

https://testbankfan.com/product/financial-reporting-for-managersa-value-creation-perspective-1st-edition-pratt-solutions-manual/

Image Processing Analysis and Machine Vision 4th Edition Sonka Solutions Manual

https://testbankfan.com/product/image-processing-analysis-andmachine-vision-4th-edition-sonka-solutions-manual/

Active Learning Guide 1st Edition Heuvelen Solutions Manual

https://testbankfan.com/product/active-learning-guide-1stedition-heuvelen-solutions-manual/

SolutionsToProblemsOfChapter8

8.1.ProvetheCauchy-Schwartz’sinequalityinageneralHilbertspace.

Solution:Wehavetoshowthat ∀x, y ∈ H, | x, y |≤ x y , andthatequalityholdsonlyif y = a

C.

Solution:Theinequalityholdstruefor x and/or y = 0.Letnow y, x = 0. Wehave

Sincethelastinequalityisvalidforany λ ∈ C,let

Thus

|

|2

,

, fromwhicha)theinequalityresultsandb)thefactthattheequality holds iff x = ay

Indeed,if x = ay,thenequalityistriviallyshown.Letusnowassume thatequalityholdstrue.Then

andfromthepropertiesoftheinnerproductinaHilbertspacewehave

, fromwhichitisreadilyseenthat

whichprovestheclaim.

8.2.Showa)thatthesetofpointsinaHilbertspace H,

isaconvexset,andb)thesetofpoints

1
x, a ∈
0 ≤ x λy 2 = x λy, x λy = x 2 + |λ|2 y 2 λ∗ x, y − λ y, x .
λ
= x, y y 2
0 ≤
x 2
x
y
y 2
x, x y, y = x, y ∗ x, y
x
x = x, x, y y
x = x
y
y
2 y,
, ||y||2
,
||
||
C
C
= {x : x ≤ 1}
= {x : x =1}

2 isanonconvexone.

Solution:FromthedefinitionofaHilbertspace(seeAppendix)thenorm istheinducedbytheinnerproductnorm,i.e., x =( x, x ) 1 2 .

a)Letusnowconsidertwopoints, x1, x2 ∈ H,suchthat

andlet

Then,bythetriangleinequalitypropertyofanorm

andsince λ ∈ [0, 1]

b)Lettwopointssuchthat,

and

Thenwehavethat

FromtheSchwartzinequality(Problem8.1),wehavethat

or

From(1)and(3)isreadilyseenthat

Asamatteroffact,theonlywayfor x 2 =1,isthat x1 , x2 =1= x1 x2 . However,thisisnotpossible.Equalityin(2)isattainedif x1 = ax2, andsince x1 = x2 =1,thiscanonlyhappeninthetrivialcaseof x1 = x2.

1
x2 ≤
x
1
x
λx1 +(1 λ)x2 ≤ λx1 + (1
x2
x ≤
1
λ
x2 ≤ (λ +1
≤ 1
x
≤ 1,
1,
= λx
+(1 λ)x2 ,λ ∈ [0, 1]
=
λ)
,
λ x
+(1
)
λ)1
1
2
x = λx1 +(1 λ)x2
x
=1, x
=1,
x 2 = λx1 +(1 λ)x2 ,λx1 +(1 λ)x2 = λ2 x1 2 +(1 λ)2 x2 2 +2λ(1 λ) x1 , x2 = λ2 +(1 λ)2 +2λ(1 λ) x1 , x2 . (1)
| x1 , x2 |≤ x1 x2 , (2)
1 ≤ x1 , x2 ≤ 1. (3)
x 2 ≤ 1
.

8.3.Showthefirstorderconvexitycondition.

Solution:Assumethat f isconvex.Then f λy +(1 λ)x ≤ λf (y)+(1 λ)f (x) or f x + λ(y x) f (x) ≤

) f (x) . Taking λ −→ 0,wecanemploytheTaylorexpansionandget

fromwhich,inthelimitweobtain

(4) b)Assumethat

(x)(y x), isvalid ∀x, y ∈ X,where X isthedomainofdefinitionof f .Then,we have

y1 x), (5) and f (y2) ≥ f (x)+ ∇T f (x)(y2 x) (6)

Combiningtheprevioustwoinequalitiestogether,weobtain

(y1)+(1 λ)f (y2 ) ≥

(

)+(1 λ)f (x)+ λ∇T f (x)(y1 x)+ (1 λ)∇T f (x)(y2 x), (7) for λ ∈ (0, 1).Sincethisistrueforany x,itwillalsobetruefor x = λy1 +(1 λ)y2 , whichresultsin f λy1 +(1 λ)y2 ≤ λf (y1 )+(1 λ)f (y2), (8) whichprovestheclaim.

8.4.Showthatafunction f isconvex,ifftheone-dimensionalfunction, g(t):= f (x + ty), isconvex, ∀x, y inthedomainofdefinitionof f

Solution:Observethat,

g(λt1 +(1 λ)t2 )= f (x + λt1 y +(1 λ)t2 y) = f (λx +(1 λ)x + λt1 y +(1 λ)t2 y) = f λ(x + t1 y)+(1 λ)(x + t2y) .

3
λ f
x
λ(y x) f (x) ≈ f (x)+λ∇T f (x)(y x) f (x) ≤ λ f (y
x
f
f
x
f
x
(y
f
+
) f (
) ,
f (y) ≥ f (x)+ ∇T f (x)(y x).
f (y) ≥ f (x)+
T f
(y1) ≥
(
)+
T
(
)(
λf
λf
x

Alsonotethat

g(t1 )= f (x + t1y),g(t2)= f (x + t2y),

andtakingthedefinitionofconvexity,theclaimisnowstraightforwardto beshown.

8.5.Showthesecondorderconvexitycondition.

Hint:Showtheclaimfirstfortheone-dimensionalcaseandthenuse the resultofthepreviousproblem,forthegeneralization.

Solution:Westartwiththeone-dimensionalcase.Letafunction f (x) beconvex.Thenweknowfromthefirstorderconvexityconditionthat

f ′(x)(y x) ≤ f (y) f (x) ≤ f ′(y)(y x), anddividingbothsidesbythepositivequantity(y x)2 ,weget

f ′(y) f ′(x) y x ≥ 0,

andtakingthelimit y −→ x weobtain f ′′ (x) ≥ 0. (9)

Assumenowthatthesecondderivativeisnon-negativeeverywhere.Then select y>x andweget

0 ≤ y x f ′′ (z)(y z)dz = f ′(z)(y z) |z=y z=x + y x f ′(z)dz = f ′(x)(y x)+ f (y) f (x) (10)

Theaboveistruefor y>x.Notethatwecanalsoshowthat

f (x) ≥ f ′(y)(x y)+ f (y), byusingtheidentity

0 ≤ y x f ′′ (z)(z x)dz.

Thusweproved f isconvex.Forthemoregeneralcase,consider g(t)= f (x + ty),

formwhichweget g ′′ (t)= y T ∇2 f (x + ty)y

Sincethisistrueforany x, y and t andusingthepreviouslyobtained results,theclaimisreadilyshown.

4

8.6.Showthatafunction

f : Rl −→ R isconvexiffitsepigraphisconvex.

Solution:a)Assume f tobeconvex.Wehavetoshowthatitsepigraph isaconvexset.Lettwopoints, x1, x2.Fromtheconvexityof f wehave

(11)

Considertwopointsintheepigraph y1 =(x1 ,r1)and y2 =(x2 ,r2).Then wehave

with

(12)

(13) andsince

(14)

Combining(11)and(14)weget

hence y =(x,r) ∈ epi(f )andtheepigraphisconvex. b)Assumetheepigraphtobeconvex.Then y = λy1 +(1 λ)y2 ∈

hence

forany

Thus(15)isalsovalidfor r1 = f (x1),r2 = f (x2)andtherefore f (λx1 +(1 λ)x2) ≤ λf (x1)+(1 λ)f (x2)

8.7.Showthatifafunctionisconvex,thenitslowerlevelsetisconvexforany ξ.

Solution:Letthefunction f beconvexandtwopoints, x, y,whichliein thelev≤ξ (f ).Then, f (x) ≤ ξ,f (y) ≤ ξ. Hence,bythedefinitionofconvexity, f (λx +(1 λ)y) ≤ λf (x)+(1 λ)f (y) ≤ λξ +(1 λ)ξ ≤ ξ, whichprovestheclaim,that λx +(1 λ)y ∈ lev≤ξ (f ).

5
f (λx1 +(1 λ)x2) ≤ λf (x1)+(1 λ)f (x2)
λ
y
x = λx1 +(1 λ
x2 ,
r = λr1 +(1 λ)r2
y1, y2 ∈ epi(f ) f (x1) ≤ r1 ,f (x2) ≤ r2
y1 +(1 λ)y2 :=
=(x,r)
)
,
f (x
(x1)+(1 λ)f (x2 ) ≤ λr1 +(1 λ)r2
r,
epi(f )
r = λr1 +(1 λ)r2 ≥ f (λx1 +(1 λ)x2
1
2
) ≤ λf
=
,
)(15)
r1 ≥ f (x
),r
≥ f (x2)

8.8.ShowthatinaHilbertspace, H,theparallelogramrule, x + y 2 + x y 2 =2 x 2 + y 2 , ∀x, y ∈ H. holdstrue.

Solution:Theproofisstraightforwardfromtherespectivedefinitionsand thepropertiesoftheinnerproductrelation.

fromwhichtheparallelogramruleisobtained.

8.9.Showthatif x, y ∈ H,where H isaHilbertspace,thentheinducedby theinnerproductnormsatisfiesthetriangleinequality,asrequired byany norm,i.e., x + y ≤ x + y

Solution:Bytherespectivedefinitionswehave

or

wheretheCauchy-Schwartzinequalityhasbeenused.

8.10.Showthatifapoint x∗ isalocalminimizerofaconvexfunction,itisnecessarilyaglobalone.Moreover,itistheuniqueminimizerifthefunction isstrictlyconvex.

Solution:Let x∗ bealocalminimizer.Then,thereexists ǫ> 0and y∗ / ∈ B[0,ǫ]suchthat f (x∗) ≤ f (x∗ +∆), ∀∆ ∈ B[0,ǫ], and f (y∗) <f (x∗). Let

Then,

6
x
x, y
x y
= x 2 2 x, y
+ y 2 = x 2 +2
+ y 2 ,
2
+ y 2 ,
x
x
y
x + y
x 2 + x, y + y, x + y
x + y 2 = x 2 +2Real( x, y )+ y 2 ≤ x 2 +2| x, y | + y 2 ≤ x 2 + y 2 +2 x y =( x + y )2
+ y 2 :=
+
,
=
2 ,
λ
λ
y∗
:= ǫ 2||y∗ x∗||
(
x
) ∈ B[0,ǫ].

Hence,

whichisnotpossible,since x∗ isalocalminimizer.

Assumenowthat f isstrictlyconvexandthatthereexisttwominimizers, x∗ = y∗.Then,bythedefinitionofstrictconvexity,wehavethat

whichisnotpossible,since x∗ isaglobalminimizer.

8.11.Let C beaclosedconvexsetinaHilbertspace, H.Thenshowthat ∀x ∈ H,thereexistsapoint,denotedas PC (x) ∈ C,suchthat

Solution:Let x ∈ C,otherwiseitistrivial.Let ρ bethelargestlower boundof x y , y ∈ C,i.e., ρ :=inf y∈C x y > 0

Considerthesequence,

ρn = ρ + 1 n

Bythedefinitionoftheinfimum,foreach n,therewillbeatleastone element xn ∈ C,suchthat x xn <ρn, whichthendefinesasequence, {xn},ofpointsforwhichwehavethat

or

whichnecessarilyleadsto lim n→∞ x xn = ρ (16)

Fromtheparallelogramlaw,wecanwrite

or

7
f
λ
y∗
(1 λ
f
x∗
x∗ +
(
x
) ≤
)
(
)+ λf (y∗) <f (x∗),
f ( 1 2 x∗ + 1 2 y∗) < 1 2 f (x∗)+ 1 2 f (y∗)= f (x∗)
,
x
PC (x) =min y
C x y .
ρ
n→∞
n→∞ ρ
≤ x xn <ρn,
ρ ≤ lim
x xn < lim
n = ρ,
2 + (x xm) (x xn) 2 =2 x xm 2 + x xn 2 ,
xn xm 2 =2( x xm 2 + x xn 2) 4 x 1 2 (xn + xm) 2
(x xm)+(x xn)

However,since C isconvex,thepoint 1 2 (xn + xm) ∈ C andwededuce that xn xm 2 ≤ 2 x xm 2 + x xn 2 4 ρ

Takingthelimitfor n,m →∞ onbothsidesweget

lim n,m→∞ xm xn 2 ≤ 0 ⇒ lim n,m→∞ xm xn =0

Thatis, xn isaCauchysequenceandsince H isHilbertthesequence convergestoapoint x∗.Moreoveritconvergesin C,since C isclosed, i.e., x ∈ C.Hencewehave

x x∗ = x xn + xn x∗ ≤ x xn + xn x∗

Takingthelimitandusing(16)weobtain x x∗ ≤ ρ.

However,since x∗ ∈ C, x x∗ ≥ ρ, whichmeansthat x x∗ = ρ, thatistheinfimumisattained,whichprovestheclaim.Uniquenesshas beenestablishedinthetext.

8.12.Showthattheprojectionofapoint x ∈ H ontoanon-emptyclosedconvex set, C ⊂ H,liesontheboundaryof C.

Solution:Assumethat PC (x)isaninteriorpointof C.Bythedefinitionofinteriorpoints, ∃δ> 0suchthat Sδ := {y : y PC (x) <δ}⊂ C.

Let z := PC (x)+ δ 2 · x PC (x) x PC (x) , wherebyassumption x PC (x) > 0,since x ∈ C.Obviously z ∈ Sδ . Hence, x z = |x PC (x) 1 δ 2 x PC (x)

However, δ canbechosenarbitrarilysmall,thuschoose δ< x PC (x) .

8

However,thisisnotpossible,sinceinthiscase x z < x PC (x) , whichviolatesthedefinitionofprojection.Thus, PC (x)liesontothe boundaryof C.

8.13.Derivetheformulafortheprojectionontoahyperplaneina(real)Hilbert space, H.

Solution:Letusfirstshowthatahyperplane, H,isaclosedconvexset. Convexityisshowntrivially.Toshowcloseness,let yn ∈ H −→ y∗.We willshowthat y∗ ∈ H.Let

Hence,

or

, whichprovestheclaim.

Letnow z ∈ H betheprojectionof x ∈ H,i.e., z : PC (x).Thenbythe definition z :=argmin θ,z +θ0 =0 x z, x z

UsingLagrangemultipliers,weobtaintheLagrangian

Forthosenotfamiliarwithinfinitedimensionalspaces,itsufficestosay thatsimilarrulesofdifferentiationapply,althoughtherespectivedefinitionsaredifferent(moregeneral).

AfterdifferentiationoftheLagrangian,weobtain 2z 2x λθ =0 or z = 1 2 (2x + λθ).

Pluggingintotheconstraint,weobtain

whichthenresultsinthesolution.

9
0
θ
y∗
θ
− θ
y
= θ
y∗
≤ θ, y
+ θ0 2 =
,
+
, yn
,
n + θ0 2
,
yn 2 .
0 ≤ θ, y∗ + θ0 2 =lim n→∞ θ, y∗ yn 2 =0,
θ
y∗
θ0
,
+
=0
L
z
x z, x z − λ θ, z + θ0
(
,λ)=
λ = 2 θ, x + θ0 θ 2 ,

8.14.Derivetheformulafortheprojectionontoaclosedball, B[0,ρ].

Solution:Theclosedballisdefinedas

Wehavealreadyseen(Problem8.2)thatitisconvex.Letusshowthe closeness.Let

Wehavetoshowthat y

whichprovestheclaim.

Toderivetheprojection,wefollowsimilarstepsasinProblem8.13,by replacingtheconstraintby

sincetheprojectionisontheboundary.Takingthegradientofthe Lagrangianweget

Pluggingintotheconstraint,weget

10
B
[0,ρ]= {y : y ≤ ρ}
yn ∈ B[0,ρ] → y ∗
∗ ∈ B
0
y ∗ 2 = y ∗ yn + yn 2 ≤ yn 2 + y ∗ yn 2 ≤ ρ 2 + y ∗ yn 2 , or y ∗ 2 ≤ ρ 2 +lim n→∞ y ∗ yn 2 , or y ∗ 2 ≤ ρ 2 ,
[
,ρ].
z 2 = ρ 2 ,
2(z x)+2λz =0, or z = 1 1+ λ x.
|
1
z
z
1+ λ| =
ρ x When1+ λ = 1 ρ x ,weget
= ρ x x, andwhen1+ λ = 1 ρ x
= ρ x x

Fromthetwopossiblevectors,wehavetokeeptheonethathasthesmaller distancefrom x.However, x ρ x x < x + ρ x x , since ρ x < 1.Thus, 1+ λ = 1 ρ x , andtheprojectionisequalto

8.15.Findanexampleofapointwhoseprojectiononthe ℓ1 ballisnotunique.

Solution:The ℓ1 ballofradius ρ =1in Rl isdefinedas S1[0,ρ]= {y : i=1 |yi|≤ ρ}

Letthepoint x =[1, 1]T ∈ R2 , whichobviouslydoesnotlieinsidethe ℓ1 ballofradius ρ =1,since x 1 =2 > 1.Foranypoint y ∈ S1[0, 1]wehave

or

Thatis,the ℓ1 normof x fromanypointintheset S1[0, 1]isbounded belowby1.Considerthetwopoints y1 =[1, 0]T and y1 =[0, 1]T

Forbothofthemthelowerboundisachieved,i.e., x y1 = x y2 =1

Moreover,onecaneasilycheckoutthatallpointsonthelinesegment

1 + y2 =1 canbeprojectionpointsof x.

8.16.Showthatif C ⊂ H,isaclosedconvexsetinaHilbertspace,then ∀x ∈ H and ∀y ∈ C,theprojection PC (x)satisfiesthefollowingproperties:

11
P
ρ
B[0,ρ](x)=
x x, ||x|| >ρ x otherwise
x y 1 = |1 y1| + |1 y2
1 −|y1| +1 −|y2|
x y 1 ≥ 2
y
|≥
1
1
y
: y

Solution:Weknowthat PC (x) ∈ C.Hence,duetotheconvexityof C, ∀λ ∈ [0, 1]

whichbythedefinitionofthenorm,gives

Takingthelimit λ → 0,weprovethefirstproperty.

Toprovethesecondproperty,since PC (y) ∈ C,weapplytheprevious propertywith PC (y)inplaceof y,i.e.,

Afteraddingtheaboveinequalitiestogetherandrearrangingthetermswe obtainthesecondproperty.

8.17.Provethatif S isaclosedsubspace S ⊂ H inaHilbertspace H,then ∀x, y ∈ H, x,PS (y) = PS (x),

=

(x),PS (y) and PS (ax + by)= aPS (x)+ bPS (y).

Hint:UsetheresultofProblem8.18.

Solution:Wehavethat x,PS (y) = PS (x)+(x PS (x)),PS (y) = PS (x),PS (y) , since x PS (x) ⊥ PS (y)

12
• Real{ x PC (x), y PC (x) }≤ 0. • PC (x) PC (y) 2 ≤ Real{ x y,PC (x) PC (y) }
x
x
≤ x (λy
x
x
x
Real{ x PC
x
y PC (x) }≤
y PC (x
λy +(1 λ)PC (x) ∈ C. Hence
PC (
) 2
+(1 λ)PC (
)) 2
(x PC (x)) λ(y PC (x)) 2 ,
PC (x) 2 ≤
PC (x) 2 + λ2 y PC (x) 2 2λReal{ x PC (x), y PC (x) } or
(
),
λ 2
) 2 .
y
Real { x PC (x),PC (
) PC (x) }≤ 0.
Similarly Real { y PC (y),PC (x) PC (y) }≤ 0.
y
PS

fromProblem8.18.Similarly,wecanshowthat PS (x), y = PS (x),PS (y)

Hence x,PS (y) = PS (x), y

Forthelinearity,wehave

x = PS (x)+(x PS (x)), y = PS (y)+(y PS (y)), where PS (x),PS (y) ∈ S and x PS (x) ∈ S⊥ and y PS (y) ∈ S⊥.Hence

ax + by =(aPS (x)+ bPS (y))+(a(x PS (x))+ b(y PS (y))) , andsincetheterminthesecondparenthesisontherighthandsideliesin S⊥ wereadilyobtainthat PS (ax + by)= aPS (x)+ bPS (y)

8.18.Let S beaclosedconvexsubspaceinaHilbertspace H, S ⊂ H.Let S⊥ be thesetofallelements x ∈ H whichareorthogonalto S.Thenshowthat, a) S⊥ isalsoaclosed,subspace,b) S ∩ S⊥ = {0},c) H = S ⊕ S⊥;that is, ∀x ∈ H, ∃x1 ∈ S and x2 ∈ S⊥ : x = x1 +x2,where x1, x2 are unique

Solution: a)Wewillfirstprovethat S⊥ isasubspace.Indeedif x1 ∈ S⊥ and x2 ∈ S⊥ then x1 , y = x2, y =0, ∀y ∈ S. or ax1 + bx2, y =0 ⇒ ax1 + bx2 ∈ S⊥ .

Also, 0 ∈ S⊥ since x, 0 =0.Hence S⊥ isasubspace. Wewillprovethat S⊥ isalsoclosed.Let {xn}∈ S⊥ and lim n→∞ xn = x∗.

Wewillshowthat x∗ ∈ S⊥.Bythedefinition xn, y =0, ∀y ∈ S.

Moreover, | x∗, y | = | xn, y − x∗, y | = | xn x∗, y |≤ xn x∗ y → 0,

13

wheretheCauchy-Schwartzinequalityhasbeenused.Thelastinequality leadsto x∗, y =0 ⇒ x∗ ∈ S⊥ .

b)Let x ∈ S ∩ S⊥.Bydefinition,sinceitbelongstobothsubspaces, x, x =0 ⇒ x = 0,

c)Let x ∈ H.Wehavethat x = PS (x)+(x PS (x)).

Wewillfirstshowthat x PS (x) ∈ S⊥.Thenwewillshowthatthis decompositionisunique.Wealreadyknowthat

{ x PS (x), y PS (x) }≤ 0, ∀y ∈ S.

Also,since S isasubspace, ay ∈ S, ∀a ∈ R,hence

{ x PS (x),ay PS (x) }≤ 0, or aReal{ x PS (x), y }≤

{ x PS (x),PS (y) }, whichcanonlybetrueif

{ x PS (x), y } =0

Weapplythesamefor jy.Thenwehavethat x PS (x),jy = j x PS (x), y

Recallthatif c ∈ C

Imag{c} =Real{−jc}, Hence, Real{−j x PS (x), y } =0=Imag{ x PS (x), y }

Thus, x PS (x), y } =0, ∀y ∈ S, and x PS (x) ∈ S⊥

Thus x = x1 + x2, x1 = PS (x) ∈ S, x2 = x PS (x) ∈ S⊥ Letusnowassumethatthereisanotherdecomposition, x = x3 + x4, x3 ∈ S, x4 ∈ S⊥ .

14
Real
Real
Real
Real

Then

or

, whichnecessarilyimpliesthattheyareequaltothesinglepointcomprising S ∩ S⊥,i.e.,

, hencethedecompositionisuniqueandwehaveprovedtheclaim. Letuselaborateabitmore.wewillshowthat,

Indeed, PS⊥ (x)isunique.Also,

or

(

), since x PS (x) ∈ S⊥.Notethatweusedthefactthatif y ∈ S then

Indeed,

since

8.19.Showthattherelaxedprojectionoperatorisanon-expansivemapping.

Solution:Bytherespectivedefinitionswehave,

a) µ ∈ (0, 1].Recallingthetrianglepropertyofanorm(Appendix8.15), weget

15
x1 + x2 = x3 + x4
S → x1 x3 = x4 x2 ∈ S⊥
x1 x3 =0= x4 x2
PS⊥ (x)= x PS (x)
x = PS (x)+(x PS (x))
PS⊥ (x)= 0 + PS⊥ (x PS (x))
P
= x PS
x
S⊥ (y)= 0
y 0 2 = y 2 < y a 2 , ∀a = 0 ∈ S⊥
y a 2 = y 2 + a 2 2Real y, a = y 2 + a 2
TC (x) TC (y) = x + µ(PC (x) x) y µ(PC (y) y) = (1 µ)(x y)+ µ(PC (x) PC (y))
TC (x) TC (y) ≤|1 µ| x y + µ PC (x) PC (y) ≤|1 µ| x y + µ x y ≤ x y .

b) µ ∈ (1, 2).Inthiscase

TC (x) TC (y) 2 =(1 µ)2 x y 2 + µ 2 PC (x) PC (y) 2

+2µ(1 µ)Real{ x y,PC (x) PC (y) } ≤ (1 µ)2 x y 2 + µ 2 PC (x) PC (y) 2

+2µ(1 µ) PC (x) PC (y) 2 =(1 µ)2 x y 2 + µ(2 µ) PC (x) PC (y) 2

≤ (1 µ)2 x y 2 + µ(2 µ) x y 2 ≤ x y 2

Toderivetheboundsweusedthat1 µ< 0and2 µ> 0,for µ ∈ (1, 2).

8.20.Showthattherelaxedprojectionoperatorisastronglyattractivemapping.

Solution:Bytherespectivedefinitionwehave,

TC (x) y 2 = x + µ(PC (x) x) y 2

= x y 2 + µ 2 PC (x) x 2 +2µReal{ x y,PC (x) x }

= x y 2 + µ 2 PC (x) x 2

+2µReal{ x PC (x)+ PC (x) y,PC (x) x }

= x y 2 + µ 2 PC (x) x 2 2µ PC (x) x 2

+2µReal{ PC (x) y,PC (x) x }

= x y 2 + µ 2 PC (x) x 2 2µ PC (x) x 2

+2µReal{ x PC (x), y PC (y) } ≤ x y 2 µ(2 µ) PC (x) x 2 where(8.15)hasbeenused.Thus TC (x) y 2 ≤ x y 2 µ(2 µ) µ2 TC (x) x 2 = x y 2 (2 µ) µ TC (x) x 2 ,

whereweusedthat PC (x) x = 1 µ (TC (x) x).

8.21.GiveanexampleofasequenceinaHilbertspace H,whichconverges weaklybutnotstrongly.

Solution:Definethesequenceofpoints xn ∈ l2 , xn = {0, 0,..., 1, 0, 0,...} := {δni},i =0, 1, 2,...

16

Thatis,eachpoint, xn isitselfasequence,withzeroseverywhere,except attimeindex, n,whereitis1.Foreverypoint(sequence) y ∈ l2,wehave that

bythedefinitionof l2 space(Appendix8.15).Thepreviousinequality impliesthat

Ontheotherhand,

since

8.22.Provethatif C1 ...CK areclosedconvexsetsinaHilbertspace H,then theoperator

, isa regular one;thatis,

,n

where T n := TT...T istheapplicationof Tn successivetimes.

Solution: Fact1:

isanon-expansivemapping.

Indeed, ∀x, y ∈ H

Fact2:

Indeed,if x ∈ C then

17
y
∞ n=1 |yn|2 = ∞ n=1 | xn, y |2 < ∞,
xn
y −−−−→ n→∞ 0= 0
y
x
x
x
2 :=
,
,
.
n −→ 0,
n 0 =
n =1.
T = TCK
T
T n 1(x) T n(x) −→ 0
···
C1
−→∞,
T = TCK TCK 1 TC1 := TK T1
T (x) T (y) = TK (TK 1 T1)(x) TK (TK 1 T1)(y) ≤ TK 1 T1(x) TK 1 T1(y) ≤ ≤ T1(x) T1(y) ≤ x y
Fix(T )= K k=1 Ck := C.
TK TK 1 T1(x)= TK TK 1 T2(x)= = TK (x)= x.

Moreover,letusassumethat ∃ x ∈ C: T

(x)= x

Then ∀y ∈ C wehave

=

, asshownbefore.Thus,

or

whichcanonlybetrueif x ∈ C andhence T1(x)= x.Notethatthe previoustwofactsarevalidforgeneralHilbertspaces.

Fact3:If C isaclosedsubspace,then Tk,k =1,...,K,and T = TK TK 1 T1 arelinearoperators.Theproofistrivialfromtherespectivelinearityoftheprojectionoperators, Pk ,k =1,...,K.Thisisalso trueforgeneralHilbertspaces.

Fact4:Theoperator T isa regular one,i.e.,

RecallfromProblem8.20that ∀x ∈ H,

or

andbythedefinition

weget

18
K
x y
T (x) T (y) ≤ T1(x) T1(y)
x y = T1(x) T1(y) = T1(x) y ≤ x y
T1(x) y = x y , ∀y ∈ C.
T n(x) T n 1(x) −→ n−→∞ 0
T1(x) y 2 ≤ x y 2 µ1 (2 µ1) x P1 (x) 2
x P1 (x) 2 ≤ 1 µ1 (2 µ1) x y 2 − T1(x) y 2 ,
T1(x)= x + µ1 (P1(x) x),
x T1(x) = µ 2 1 x P1 (x) 2 ≤ µ1 2 µ1 x y 2 − T1(x) y 2 .
x T2T1(x) 2 = x T1(x)+ T1(x) T2T1(x) 2 ≤ ( x T1(x) + T2T1(x) T1(x) )2 ≤ 2 x T1(x) 2 + T2T1(x) T1(x) 2 ,
Letnow

or

Let

Thenobviously,wecanwrite

Followingasimilarrationaleandbyinductionwecanshowthat

where,

and

Nowbyinduction,

Summingbyparts(17)–(19),weobtain

Hence,

Notethattillnow,everythingisvalidforgeneralHilbertspaces.

8.23.ShowthefundamentalPOCStheoremforthecaseofclosedsubspacesin aHilbertspace, H

Solution:Fact1:Therelaxedprojectionoperatorisselfadjoint,i.e., x,TCi (y) = TCi (x), y , ∀x, y ∈ H.

19
x T2T1(x) 2 ≤ 2µ1 2 µ1 ( x y 2 − T1(x) y 2) + 2µ2 2 µ2 ( T1(x) y 2 − T2T1(x) y 2).
b2 =max 2µ1 2 µ1 , 2µ2 2 µ2
x T2T1(x) 2 ≤ 2b2( x y 2 − T12(x) y 2),
T12(x)= T2 T1(x)
where
x T (x) 2 ≤ bK 2K 1( x y 2 − T (x) y 2), (17)
T = TK TK 1 T1,
bK =max 1≤k≤K µk 2 µk .
T (x) T 2(x) 2 ≤ bK 2K 1( T (x) y) 2 − T 2(x) y (18) T n 1(x) T n(x) 2 ≤ bK 2K 1( T n 1(x) y) 2 − T n(x) y (19)
∞ n=1 T n 1(x) T n(x) 2 ≤ bK 2K 1 x y 2 < +∞.
n−→∞ T n 1(x) T n(x) =0
lim

Thisisadirectconsequenceoftheself-adjointpropertyoftheprojection, when Ck isaclosedsubspace,i.e.,

Fact2:Foraclosedsubspace, Ck ,therespectedrelaxedprojectionoperatorislinear,i.e.,

Thisisalsoadirectconsequenceofthelinearpropertyoftheprojection operatorontosubspaces.Thispropertyiseasilycheckedoutthatitis readilytransferredto T := TCK TC1 .

Fact3:

Fact4:Lettheoperator T = TCK TC1 ,withFix(T )= C,where C isa closedsubspace.Then,theset

S := {y : y =(I T )(z), ∀z ∈ H} isalsoa(closed)subspaceanditistheorthogonalcomplementof C,i.e., S = C⊥

Theproofthat S isasubspaceistrivial,fromthelinearityof T .Also,let x ∈ S⊥.Then,bytherespectivedefinition, 0= x, (I T )(z) = x, z − x,T (z) = x, z − T ∗(x), z = (I T ∗)(x), z , ∀z ∈ H. Hence, (I T ∗)(x)= x T ∗(x)= 0, or T ∗(x)= x, andsince T ∗ and T havethesamefixedpointset(theprooftrivial), S⊥ ⊆ C.

20
x
Ck (y) = PCk (x), y = PCk (x),PCk (y)
,P
.
TCk (ax + by)= aTCk (x)+ bTCk (y), ∀x, y ∈ H
x
y
= x,TCK TC1 (y) = TCK (x)
CK 1 TC1 (y) = ... = TC1 TCK (x)
y = T ∗(x)
y
T ∗ = TC1 TC2 TCK .
,T (
)
,T
,
,
, where

Letnow x ∈ C.Then

x, (I T )z = x, z − x,T (z) = x, z − T ∗(x), z, = x T ∗(x), z =0, since

T ∗(x)= x, whichprovesthat S⊥ = C.

NotethatwhatwehavesaidsofarisageneralizationofProblem8.18.

Wearenowreadytoestablish strongconvergence

Therepeatedapplicationof T onany x ∈ H leadsto T n(x)=(TTT )n(x).

Weknowthat ∀x ∈ H thereisauniquedecompositionintotwoorthogonal complement(closed)subspaces,i.e.,

x = y + z, y ∈ C and z ∈ C⊥ , ∀x ∈ H

andthat y = PC (x).

Hence,duetothelinearityof T n (C subspacein H)

T n(x)= T n(y)+ T n(z) = y + T n(z), since C =Fix(T n).However,

T n(z)= T n(I T )w, forsomew ∈ H = T n(w) T n+1(w)

andweknowthat T n(z) = T n(w) T n+1(w) −→ 0

Thus, T n(x) PC (x) −→ 0 whichprovestheclaim.

8.24.Derivethesubdifferentialofthemetricdistancefunction dC (x),where C isaclosedconvexset C ⊆ Rl and x ∈ Rl

Solution:Bydefinitionwehave

∂dC (x)= {g : g T (y x)+ dC (x) ≤ dC (y), ∀y ∈ Rl}.

Thuslet g beasubgradient,then g T (y x) ≤ dC (y) dC (x) ≤ y x .

21

Theaboveiseasilyshownbytherespectivedefinition

dC (y)=min z∈C y z ≤ min z∈C( y x + x z ) ≤ y x +min z∈C x z , or

dC (y) dC (x) ≤ y x . Hence, g T (y x) ≤ y x . Sincethisistrue ∀y,let y : y x =

or g ∈ B[0, 1]

a)Let x / ∈ C and g anysubgradient.Forany y ∈ Rl ,

g T (y + PC (x) x) ≤ dC (y + PC (x)) dC (x). However, dC (y + PC (x))=min z∈C y + PC (x) z ≤ y +min z∈C PC (x) z , andletting z = PC (x)

dC (y + PC (x)) ≤ y . Hence,wecanwritethat g T (y + PC (x) x) ≤ y − dC (x), ∀y ∈ Rl Set y = 0.Then, g T (x PC (x)) ≤− x PC (x) , or

g T (x PC (x)) ≥ x PC (x) However, g ≤ 1, andrecallingtheCauchy-Schwartzinequality,weobtain

x PC (x) g ≥ x PC (x) ⇒ g =1, and g T (x PC (x))= x PC (x) ,

22
g ⇒ g 2 ≤ g ⇒ g ≤ 1,

whichimplies(recallconditionforequalityintheCauchy-Schwartztheorem)

g = (x PC (x)) (x PC (x)) , whichprovestheclaim.

b)Let x ∈ C.Thenbydefinition,wehave

andforany y ∈ C

Ifinaddition x isaninteriorpoint,therewillbe ε> 0: ∀z ∈ Rl

) ≤

, since x ε(z x) ∈ C andthecondition(20)hasbeenused.Thus,

Set z x = g,whichleadsto g = 0

Thiscompletestheproof.

8.25.Derivetheboundin(8.55).

Solution:Subtracting θ∗ frombothsidesoftherecursion,squaringand takingintoaccountthedefinitionofthesubgradient,itisreadilyshown that,

Applyingthepreviousrecursively,weobtain

Takingintoaccounttheboundofthesubgradientandthefactthatthe lefthandsideoftheinequalityisanon-negativenumber,weobtain

23
g T (y x) ≤ dC (y) dC (x),
g T (y x) ≤ 0, g ≤ 1
(20)
g
z x
x
g T (z x) ≤ 0, ∀z ∈ R
T (x ε(
)
0
l
||θ(i) θ∗||2 ≤||θ(i 1) θ∗||2 2µi J(θ(i 1)) J(θ∗) + µ 2 i ||J ′(θ(i 1))||2
||θ(i) θ∗||2 ≤||θ(0) θ∗||2 2 i k=1 µk J(θ(k 1) ) J(θ∗) + i k=1 µ 2 k ||J ′(θ(k 1))||2
2 i k=1 µk J(θ(k 1)) J(θ∗) ≤||θ(0) θ∗||2 + i k=1 µ 2 k G2 . (21)

However,bytherespectivedefinitionweget

Employingthepreviousboundin(21),theclaimisreadilyobtained,i.e.,

8.26.Showthatifafunctionis γ-Lipschitz,thenanyofitssubgradientsis bounded.

Solution:Bythedefinitionofthesubgradientwehavethat ∀u,v wehave that f (u) f (v) ≥<f ′ (v)(u v) >.

Sincethisistrueforall, u,v,Icanalwaysselecta u sothat u v tobe parallelto f ′ (v),Then

Then,ifweplugintheLipschitzcondition,weshowthat ||f ′ (v)|| is bounded.

8.27.Showtheconvergenceofthegenericprojectedsubgradientalgorithmin (8.61).

Solution:Letusbreaktheiterationintotwosteps,

Then,followingthesameargumentsastheonesadoptedinProblem8.25, weget

However,fromthenon-expansivepropertyoftheprojectionoperator,and takingintoaccountthat θ∗ ∈ C,sinceitisasolution,

CombiningthelasttwoformulastheproofproceedsasinProblem8.25.

24
J(θ(k 1)) J(θ∗) ≥ J (i) ∗ J(θ∗),k =1
,...,i.
J (i) ∗ J(θ) ≤ ||θ(0) θ∗||2 2 i k=1 µk + i k=1 µ2 k 2 i k=1 µk G2
<f ′ (v)(u v) >= | <f ′ (v)(u v) > | = ||f ′ (v
u v||
)||||
z(i) = θ(i 1) µiJ ′(θ(i 1)), (22) θ(i) = PC (z(i) ) (23)
||z(i) θ∗||2 ≤||θ(0) θ∗||2 2 i k=1 µk J(θ(k 1)) J(θ∗) + i k=1 µ 2 k ||J ′(θ(k 1))||2 . (24)
||θ(i) θ∗||2 = ||PC (z(i) ) PC (θ∗)||2 ≤||z(i) θ∗||2 . (25)

8.28.Deriveequation(8.99).

Solution:Bythedefinition

Hence,

Hence

ExpandingthelefthandsidetoafirstorderTaylorapproximationwe get,

whichfinallyprovestheclaim.

8.29.ConsidertheonlineversionofPDMbin(8.64),i.e.,

C (θ

wherewehaveassumedthat J∗ =0.Ifthisisnotthecase,ashift canaccommodateforthedifference.Thusweassumethatweknow the minimum.Forexample,thisisthecaseforanumbertasks,suchasthe hingelossfunction,assuminglinearlyseparableclasses,orthelinear ǫinsensitivelossfunction,forboundednoise.Assumethat

ThenderivethatAPSMalgorithmof(8.39).

Solution:Letthelossfunctionbe

25
Jn(θ
1 n n k=1 L(yk , xk, θ
)=
)
Jn(θ)= 1 n n 1 k=1 L(yk , xk , θ)+ L(yn, xn, θ) = n 1 n Jn 1(θ)+ 1 n L(yn, xn, θ),
∇Jn(θ)= n 1 n ∇Jn 1(θ)+ 1 n ∇L(yn, xn, θ)
or
∇Jn(θ∗(n 1))=0+ 1 n ∇L(yn, xn, θ∗(n 1))
∇Jn(θ∗(n))= 0 = ∇Jn(θ∗(n 1))+∇2Jn(θ∗(n 1))(θ∗(n) θ∗(n 1)) ,
∇Jn(θ∗(n 1))= ∇2 Jn(θ∗(n 1))(θ∗(n) θ∗(n 1)) ,
or
θ
PC θn 1 µn J (θn 1 ) ||J ′(θn 1 )||2 J ′(θn 1)
n 1), If J ′(θn 1)= 0, (26)
n =
, If J ′(θn 1) = 0, P
Ln(θ)= n k=n q+1 ωk dCk (θn 1) n k=n q+1 ωkdCk (θn 1) dCk (θ)
L
n k=n
ωk dCk (θ
n k=n q+1 ωkdCk (θn 1) dCk (θ)= n k=n q+1 βk dCk (θ),
n(θ)=
q+1
n 1)

Thenfortherecursion(26),weneedtocomputethesubgradient of Ln(θ), whichbyExample8.5(andfor θ / ∈ Ck )becomes

Hence,(26)nowbecomes(using µ′ n instead,forreasonstobecomeapparentsoon),

theAPSMalgorithmresults.

8.30.Derivetheregretboundforthesubgradientalgorithmin(8.82).

26 where n k=n q+1 βk =1
L ´ n(θn 1)= n k=n q+1 βk d′ Ck (θ) θ=θn 1 = n k=n q+1 βk θn 1 PCk (θn 1) dCk (θn 1) , or L ´ n(θn 1)= 1 L n k=n q+1 ωk(θn 1 PCk (θ
1)), with L = n k=n q+1 ωk dCk (θn 1)
n
θn = θn 1 µ ′ n 1 L n k=n q+1 ωk d2 Ck (θn 1) 1 L 2 n k=n q+1 ωk(θn 1 PCk (θn 1) 2 1 L n k=n q+1 ωk(θn 1 PCk (θn 1)) = θn 1 + µ ′ nM   n k=n q+1 ωk (PCk (θn 1) θn 1))  = θn 1 + µ ′ nM   n k=n q+1 ωk PCk (θn 1) θn 1  and M := n k=n q+1 ωkd2 Ck (θn 1) n k=n q+1 ωkPCk (θn 1) θn 1 2
µn = µ ′ nM ∈ (0, 2M )
Setting

Solution:Fromthetext,wehavethat

Summingupbothsides,resultsin

Carryingoutthesummationsonthelefthandsideweget

Takingintoaccountthebound ||

,andselectingthestep-size tobeadecreasingsequence,wereadilyget

whichtheneasilyleadsto

Combiningtheabovewith(28),theclaimisproved.

27
Ln(θn 1) −Ln(h) ≤ g T n (θn 1 h) ≤ 1 2µn ||θn 1 h||2 −||θn h||2 + µn 2 G2 (27)
N n=1 Ln(θn 1) N n=1 Ln(h) ≤ N n=1 1 2µn ||θn 1 h||2 −||θn h||2 + G2 2 N n=1 µn (28)
A := 1 2µ1 ||θ0 h||2 1 2µ1 ||θ1 h||2 + 1 2µ2 ||θ1 h||2 1 2µ2 ||θ2 h||2 + 1 2µN ||θN 1 h||2 1 2µN ||θN h||2 + G2 2 N n=1 µn, or A ≤ 1 2µ1 ||θ0 h||2 1 2µ1 ||θ1 h||2 + 1 2µ2 ||θ1 h||2 1 2µ2 ||θ2 h||2 + ......... 1 2µN ||θN 1 h||2 + G2 2 N n=1 µn,
θn h||2 ≤ F 2
A ≤ F 2 1 2µ1 + 1 2 N n=2 1 µn 1 µn 1 + G2 2 N n=1 µn, (29)
A ≤ 1 2µN F 2 + G2 2 N n=1 µn,. (30)

8.31.Showthatafunction f (x)is σ-stronglyconvexifandonlyifthefunction f (x) σ 2 ||x||2 isconvex.

Solution: a)Assumethat

||2 , isconvex.Then,bythedefinitionofthesubgradientat x,wehave

whichreadilyimpliesthat

fromwhichthestrongconvexityof f (x)isdeduced.

b)Assumethat f (x)isstronglyconvex.Thenbyitsdefinitionwehave,

fromwhichweobtain,

whichprovestheclaimthat

||2 isconvex.

8.32.Showthatifthelossfunctionis σ-stronglyconvex,thenif µn = 1 σn ,the regretboundforthesubgradientalgorithmbecomes

Solution:Takingintoaccountthestrongconvexitywehavethat,

andfollowingsimilarargumentsasforProblem8.30,weget

28
f (x) σ 2
x
f (y) σ 2 ||y||2 f (x)+ σ 2 ||x||2 ≥ g T (y x) σx T (y x), (31)
f (y) f (x) ≥ g T (y x)+ σ 2 ||y x||2 , (32)
||
f (y) f (x) ≥ g T (y x)+ σ 2 ||y||2 + σ 2 ||x||2 σx T y +σ||x|||2 σ||x||2 ,
f (y) f (x) ≥ g T (y x)+ σ 2 ||y||2 σ 2 ||x||2 σx T y + σ||x||2 , (33)
f (y) σ 2 ||y||2 f (x)+ σ 2 ||x||2 ≥ g T (y x) σx T (y x), (34)
x) σ 2 ||x
or
f (
1 N N n=1 Ln(θn 1 ) ≤ 1 N N n=1 Ln(θ∗)+ G2 (1+ln N ) 2σN (35)
Ln(θn 1 ) −Ln(θ∗) ≤ g T n (θn 1 θ∗) σ 2 ||θn 1 θ∗||2 , (36)
Ln(θn 1) −Ln(θ∗) ≤ 1 2µn ||θn 1 θ∗||2 −||θn θ∗||2 σ 2 ||θn 1 θ∗||2 + µn 2 G2 (37)

Using µn = 1 σn ,resultsin

Summingupbothsidesweobtain

Usingnowthebound

theclaimisproved.

29
2 Ln(θn 1) −Ln(θ∗) ≤ σn ||θn 1 θ∗||2 −||θn θ∗||2 σ||θn 1 θ∗||2 + 1 σn G2 (38)
2 N n=1 Ln(θn 1) −Ln(θ∗) ≤ σ ||θ0 θ∗||2 −||θ1 θ∗||2 σ||θ0 θ∗||2 + 2σ ||θ1 θ∗||2 −||θ2 θ∗||2 σ||θ1 θ∗||2 + .................................................................. Nσ ||θN 1 θ∗||2 −||θN θ∗||2 σ||θN 1 θ∗||2 + G2 N n=1 1 σn , or 2 N n=1 Ln(θn 1) −Ln(θ∗)||2 ≤ σ ||θ0 θ∗||2 −||θ1 θ∗ σ||θ0 θ∗||2 + 2σ ||θ1 θ∗||2 −||θ2 θ∗ σ||θ1 θ∗||2 + Nσ ||θN 1 θ∗||2 σ||θN 1 θ∗||2 + G2 N n=1 1 σn ≤ G2 N n=1 1 σn .
N n=1 1 n ≤ 1+ N 1 1 t dt =(1+ln N ),

8.33.Considerabatchalgorithmthatcomputestheminimumoftheempirical lossfunction, θ∗(N ),havingaquadraticconvergencerate,i.e.,

lnln 1

(i) θ

(N )||2 ∼ i.

Showthatanonlinealgorithm,runningfor n timeinstantssothattospend thesamecomputationalprocessingresourcesasthebatchone, achieves forlargevaluesof N betterperformancethanthebatchalgorithm,i.e., ([Bott03])

Hint:Usethefactthat

Solution:Let K bethenumberofoperationsperiterationfortheonlinealgorithm.Thisamountstoatotalof Kn operations.Thebatch algorithm,inordertomakesense,shouldperform O(lnln N )operations, sothattogetcloseto ||θ(i)

(N )||2 ∼ 1/N .Assumingthatateachiterationitperforms,approximately, K1N operations,thisamountstoatotal of K1N lnln N operations.Tokeepthesameloadforbothalgorithms,it shouldbe,

Kn = K1N lnln N.

Thisleadstothefollowingapproximateaccuracies,

whichprovestheclaim.Notethatinpractice,thevaluesof K and K1 playanimportantroleaswell.

8.34.Showproperty(8.110)fortheproximaloperator.

Solution:Assumefirstthat p =Proxλf (x).Bydefinition, f (p)+ 1 2λ x p 2 ≤ f (v)+ 1 2λ x v 2 , ∀v ∈ Rl

Sincethepreviousinequalityholdstrueforany v ∈ Rl,italsoholdstrue for αv +(1 α)p,where v isanyvectorin Rl,and α anyrealnumber

30
||θ
||θn θ∗||2 ∼ 1 N lnln N << 1 N ∼||θ∗(N ) θ∗||2
||θn θ∗||2
1 n , and ||θ∗(N ) θ∗||2 ∼ 1 N .
θ∗
||θn θ∗||2 ∼ 1 N lnln N << 1 N ∼||θ∗(N ) θ∗||2 ,

within(0, 1).Hence,

Afterre-arrangingtermsinthepreviousrelation,

Applicationoflimα→0 onbothsidesofthepreviousinequalityresultsin thedesired v

Conversely,assumethat

Thepreviousinequalityclearlysuggeststhat p =Proxλf (x). 8.35.Showproperty(8.111)fortheproximaloperator.

Solution:Forcompactnotations,define pj :=Proxλf (xj ), j =1, 2.Then, p

Addingthepreviousinequalitiesresultsinto

whichinturnleadstothedesired

31
λf (p) ≤ λf (αv +(1 α)p)+ 1 2 x αv (1 α)p 2 1 2 x p 2 ≤ λαf (v)+ λ(1 α)f (p)+ 1 2 x p 2 + 1 2 α 2 v p 2 α x p, v p − 1 2 x p 2 = λαf (v)+ λ(1 α)f (p)+ 1 2 α 2 v p 2 α x p, v p
λf (p) ≤ λf (v)+ 1 2 α v p 2 − x p, v p , ∀α ∈ (0, 1)
x p ≤ λ f (v) f (p)
p,
p, x p /λ ≤ f (v) f (p
f (p)+ 1 2λ x p 2 ≤ f (v)+ 1 2λ x p 2 1 λ v p, x p = f (v)+ 1 2λ (x v)+(v p) 2 1 λ v p, x p = f (v)+ 1 2λ x v 2 + 1 2λ v p 2 + 1 λ v p, x v − 1 λ v p, x p = f
v
1 2λ x v 2 + 1 2λ v p 2 1 λ v p 2 = f
1 2λ x v 2 1 2λ v p 2 ≤ f
v
1 2λ x v 2 , ∀v ∈ Rl .
v
).Then,
(
)+
(v)+
(
)+
1 ≤ λ
2 p1, x1 p
f (p2) f (p1) , p1 p2, x2 p2 ≤ λ f (p1) f (p2)
p1 p2, (p1 p2) (x1 x2) ≤ 0,
p1 p2 2 ≤ p1 p2, x1 x2 .

8.36.Provethatthereflectedproximaloperatorisnon-expansiveandthenthat therecursionin(8.117)convergestoaminimizerof f .

Solution:Definethemapping R :=2Proxλf I.Then,(8.117)takes thefollowingform:

Noticethat R isnon-expansive:

Inturn,let z beafixedpoint,then

k (2 µk ) Proxλf (xk ) xk 2 = xk z 2 µk (2 µk) Proxλf (xk) xk 2 .

Hence, ∀k, µk (2 µk ) Proxλf (xk ) xk 2 ≤ xk z 2 − xk+1 z 2

(xk ) z

Givenanynon-negativeinteger k0 ,theprevioustelescopinginequalityis

32
xk+1 = xk + µk 2 R(xk ) xk = 1 µk 2 xk + µk 2 R(xk)
1
x2 ∈ Rl , R(x1) R(x2) 2 = 2 Proxλf (x1 ) Proxλf (x2) (x1 x2) 2 =4 Proxλf
x1) Proxλf (x2) 2 + x1 x2 2 4 Prox
x1)
λf (x2)
x1 x2 ≤ 4 Proxλf
1
Proxλf (x2) 2 + x1 x2 2 4 Prox
x1 ) Prox
x2) 2 = x1 x2 2
x
,
(
λf (
Prox
,
(x
)
λf (
λf (
(xk ) xk 2 = 1 µk 2 xk z 2 + µk 2 2 Prox
2
µk
xk+1 z 2 = 1 µk 2 (xk z)+ µk 2 R(xk) z 2 = 1 µk 2 xk z 2 + µk 2 R(xk ) z 2 µk 2 1 µk 2 R
λf (xk ) z (xk z)
µk (2 µk ) Proxλf (xk ) xk 2 = 1 µk 2 xk z 2 + µk 2 xk z 2 +2µk Proxλf (xk ) z 2 2µk Proxλf (xk ) z, xk z µk (2 µk ) Proxλf (xk ) xk 2 ≤ xk z 2 +2µk Proxλf (xk) z 2 2
Proxλf
2 µ

utilizedforall k ∈{0,...,k0} toproduce

Sincethepreviousrelationholdsforany k0 ,applyinglim

→∞ onboth sidesoftheinequalityresultsinto

Moreover,noticethat

= Proxλf (

Since( Proxλf (xk ) xk )k∈N ismonotonicallynon-increasing,andbounded frombelow,itconverges.Necessarily,limk→∞ Proxλf (xk ) xk 2 =0. Otherwise,thereexistsan ǫ> 0andasubsequence(km)m∈N suchthat

Thistogetherwiththefactthatlim

∞,and(39) implythat

whichisclearlyabsurd.

33
k0 k=0 µk(2 µk ) Proxλf (xk ) xk 2 ≤ x0 z 2 − xk0 +1 z 2 ≤ x0 z 2 .
0
+∞ k=0 µk(2 µk ) Proxλf (xk) xk 2 < +∞ (39)
k
Proxλf (xk+1 ) xk+1 = 1 2 R(xk+1) xk+1 = 1 2 R(xk+1) R(xk )+ 1 µk 2 R(xk ) xk ≤ 1 2 R(xk+1 ) R(xk ) + 1 2 1 µk 2 R(xk) xk ≤ 1 2 xk+1 xk + 1 µk 2 Proxλf (xk ) xk = µk 2 Proxλf (xk ) xk + 1 µk 2 Proxλf (xk ) xk
xk
xk
)
Prox
xkm ) xkm 2 ≥ ǫ, ∀m ∈ N
λf (
m→∞ km i=0 µi(2 µi
+∞ > +∞ k=0 µk (2 µk ) Proxλf (xk ) xk 2 ≥ +∞ m=0 µkm (2 µkm ) Proxλf (xkm ) xkm 2 ≥ ǫ +∞ m=0 µkm (2 µkm )=+∞,
)=+

Let x∗ beanarbitraryclusterpoint.Noticethat x∗ Proxλf (x∗) 2 = x∗ xkm + xkm Proxλf (x∗) 2 = x∗ xkm 2 + xkm Proxλf (xkm )+Proxλf (xkm ) Proxλf (x∗) 2

+2 x∗ xkm , xkm Proxλf (x∗) = x∗ xkm 2 + xkm

λf (xkm ) 2 + Proxλf (xkm ) Proxλf (x∗) 2

+2 xkm Proxλf (xkm ), Proxλf (xkm ) Proxλf (x∗)

+2 x∗ xkm , xkm x∗ + x∗ Proxλf (x∗)

x∗ xkm 2 + xkm

λf (xkm ) 2 + xkm x∗ 2 +2 xkm Proxλf (xkm ),

λf (xkm ) Prox

Applyinglimm→∞ onbothsidesofthepreviousinequalityresultsinto x∗ =Proxλf (x∗) ⇔ x∗ ∈ Fix(Proxλf ).Since x∗ waschosenarbitrarily withinthesetofallclusterpointsof(xk )k∈N,thenitcanbereadilyseen thatallclusterpointsbelongtoFix(Proxλf ). Wehavealreadyseenthatthesequence( xn x 2)n∈N convergesfor any x ∈ Fix(Proxλf ).Moreover,anyclusterpointof(xk)k∈N belongsto Fix(Proxλf ).Letusshownowthat(xk )k∈N possessesonlyonecluster point.Tothisend,assumetwoclusterpoints x, y of(xk)k∈N .Thismeans thatthereexistsubsequences(xkm )m∈N and(xlm )m∈N whichconvergeto x and y,respectively.Moreover,noticethat xk, x y = 1 2 xk y 2 − xk x 2 + x 2 − y 2 .

Sinceboth( xk x 2)k∈N and( xk y 2 )k∈N converge,sodoesalsothe sequence( xk , x y )k∈N .Hence, x, x y = lim

xlm , x y = y, x y ,

andinturn, x y 2 =0 ⇒ x = y.Toconclude,(xk )k∈N convergestoa pointinFix(Proxλf )=argminv∈Rl f (v).

34
Prox
λf (x∗) 2 x∗ xkm 2 +2 x∗ xkm , x∗ Proxλf (x∗) ≤ xkm Proxλf (xkm ) 2 +2
km Proxλf (xkm ) Proxλf (xkm ) Proxλf (x∗) +2 x∗ xkm x∗ Proxλf (x∗) ≤ xkm Proxλf (xkm ) 2 +2 xkm Proxλf (xkm ) xkm x∗ +2
∗ xkm x∗ Proxλf
x∗)
Prox
Prox
x
x
(
m→∞
m→∞
=lim k→∞
m→∞
=
m→∞
xkm , x y =lim
xkm , x y
xk, x y =lim
xlm , x y
lim

8.37.Derive(8.121)from(8.120).

Solution:Usethematrixinversionlemma

(A + BD 1C) 1 = A 1 A 1 B(D + CA 1 B) 1 CA 1 , with B = C = I, D = A 1 and A = ǫI,whichgives (A + ǫI) 1 =

whichfinallyleadstotheresult.

35
1 ǫ I 1 ǫ2 1 ǫ I + A 1 1 = 1 ǫ I 1 ǫ2 1 ǫ A + I 1A = 1 ǫ I 1 ǫ A + ǫI) 1A,

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.