Algorithmic learning theory 25th international conference alt 2014 bled slovenia october 8 10 2014 p

Algorithmic Learning Theory 25th International Conference ALT 2014 Bled

Slovenia October 8 10 2014

Proceedings 1st Edition Peter Auer

Visit to download the full and correct content document: https://textbookfull.com/product/algorithmic-learning-theory-25th-international-confere nce-alt-2014-bled-slovenia-october-8-10-2014-proceedings-1st-edition-peter-auer/

More products digital (pdf, epub, mobi) instant download maybe you interests ...

Discovery Science 17th International Conference DS 2014

Bled Slovenia October 8 10 2014 Proceedings 1st Edition Sašo Džeroski

https://textbookfull.com/product/discovery-science-17thinternational-conference-ds-2014-bled-sloveniaoctober-8-10-2014-proceedings-1st-edition-saso-dzeroski/

Hybrid Learning Theory and Practice 7th International Conference ICHL 2014 Shanghai China August 8 10 2014 Proceedings 1st Edition Simon K. S. Cheung

https://textbookfull.com/product/hybrid-learning-theory-andpractice-7th-international-conference-ichl-2014-shanghai-chinaaugust-8-10-2014-proceedings-1st-edition-simon-k-s-cheung/

Adaptive and Intelligent Systems Third International Conference ICAIS 2014 Bournemouth UK September 8 10 2014 Proceedings 1st Edition Abdelhamid Bouchachia (Eds.)

https://textbookfull.com/product/adaptive-and-intelligentsystems-third-international-conference-icais-2014-bournemouth-ukseptember-8-10-2014-proceedings-1st-edition-abdelhamidbouchachia-eds/

Serious Games Development and Applications 5th International Conference SGDA 2014 Berlin Germany October 9 10 2014 Proceedings 1st Edition Minhua Ma

https://textbookfull.com/product/serious-games-development-andapplications-5th-international-conference-sgda-2014-berlingermany-october-9-10-2014-proceedings-1st-edition-minhua-ma/

Formal

Modeling and Analysis of Timed Systems 12th

International Conference FORMATS 2014 Florence Italy

September 8 10 2014 Proceedings 1st Edition Axel Legay

https://textbookfull.com/product/formal-modeling-and-analysis-oftimed-systems-12th-international-conferenceformats-2014-florence-italy-september-8-10-2014-proceedings-1stedition-axel-legay/

Intelligent Data Engineering and Automated Learning

IDEAL 2014 15th International Conference Salamanca

Spain September 10 12 2014 Proceedings 1st Edition

Emilio Corchado

https://textbookfull.com/product/intelligent-data-engineeringand-automated-learning-ideal-2014-15th-international-conferencesalamanca-spain-september-10-12-2014-proceedings-1st-editionemilio-corchado/

Smart Health International Conference ICSH 2014 Beijing

China July 10 11 2014 Proceedings 1st Edition Xiaolong

Zheng

https://textbookfull.com/product/smart-health-internationalconference-icsh-2014-beijing-chinajuly-10-11-2014-proceedings-1st-edition-xiaolong-zheng/

Reversible Computation 6th International Conference RC 2014 Kyoto Japan July 10 11 2014 Proceedings 1st Edition Shigeru Yamashita

https://textbookfull.com/product/reversible-computation-6thinternational-conference-rc-2014-kyoto-japanjuly-10-11-2014-proceedings-1st-edition-shigeru-yamashita/

The Semantic Web ISWC 2014 13th International Semantic Web Conference Riva del Garda Italy October 19 23 2014

Proceedings Part II 1st Edition Peter Mika

https://textbookfull.com/product/the-semantic-web-iswc-2014-13thinternational-semantic-web-conference-riva-del-garda-italyoctober-19-23-2014-proceedings-part-ii-1st-edition-peter-mika/

Peter Auer

Alexander Clark

Thomas Zeugmann

Sandra Zilles (Eds.)

Algorithmic Learning Theory

25th International Conference, ALT 2014 Bled, Slovenia, October 8–10, 2014 Proceedings

123 LNAI 8776

SubseriesofLectureNotesinComputerScience

LNAISeriesEditors

RandyGoebel UniversityofAlberta,Edmonton,Canada

YuzuruTanaka

HokkaidoUniversity,Sapporo,Japan

WolfgangWahlster DFKIandSaarlandUniversity,Saarbrücken,Germany

LNAIFoundingSeriesEditor

JoergSiekmann

DFKIandSaarlandUniversity,Saarbrücken,Germany

LectureNotesinArtiﬁcialIntelligence8776

PeterAuerAlexanderClark

ThomasZeugmannSandraZilles(Eds.)

Algorithmic LearningTheory

25thInternationalConference,ALT2014 Bled,Slovenia,October8-10,2014

Proceedings

13

VolumeEditors

PeterAuer

MontanuniversitätLeoben,Austria

E-mail:auer@unileoben.ac.at

AlexanderClark

King’sCollegeLondon,UK DepartmentofPhilosophy

E-mail:alexander.clark@kcl.ac.uk

ThomasZeugmann

HokkaidoUniversity DivisionofComputerScience Sapporo,Japan

E-mail:thomas@ist.hokudai.ac.jp

SandraZilles UniversityofRegina DepartmentofComputerScience Regina,SK,Canada

E-mail:zilles@cs.uregina.ca

ISSN0302-9743e-ISSN1611-3349

ISBN978-3-319-11661-7e-ISBN978-3-319-11662-4

DOI10.1007/978-3-319-11662-4

SpringerChamHeidelbergNewYorkDordrechtLondon

LibraryofCongressControlNumber:2014948640

LNCSSublibrary:SL7–ArtiﬁcialIntelligence

©SpringerInternationalPublishingSwitzerland2014

Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped.Exemptedfromthislegalreservationarebriefexcerptsinconnection withreviewsorscholarlyanalysisormaterialsuppliedspecificallyforthepurposeofbeingenteredand executedonacomputersystem,forexclusiveusebythepurchaserofthework.Duplicationofthispublication orpartsthereofispermittedonlyundertheprovisionsoftheCopyrightLawofthePublisher’slocation, inistcurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer.Permissionsforuse maybeobtainedthroughRightsLinkattheCopyrightClearanceCenter.Violationsareliabletoprosecution undertherespectiveCopyrightLaw.

Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspeciﬁcstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Whiletheadviceandinformationinthisbookarebelievedtobetrueandaccurateatthedateofpublication, neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityforanyerrorsor omissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,withrespecttothe materialcontainedherein.

Typesetting: Camera-readybyauthor,dataconversionbyScientiﬁcPublishingServices,Chennai,India

Printedonacid-freepaper

SpringerispartofSpringerScience+BusinessMedia(www.springer.com)

Preface

Thisvolumecontainsthepaperspresentedatthe25thInternationalConference onAlgorithmicLearningTheory(ALT2014),whichwasheldinBled,Slovenia, duringOctober8–10,2014.ALT2014wasco-locatedwiththe17thInternational ConferenceonDiscoveryScience(DS2014).ThetechnicalprogramofALT2014 had4invitedtalks(presentedjointlytobothALT2014andDS2014)and21 papersselectedfrom50submissionsbytheALTProgramCommittee.

ALT2014tookplaceinthehotelGolfinabeautifulparkfullofoldtrees intheveryheartofBled.Itprovidedastimulatinginterdisciplinaryforumto discussthetheoreticalfoundationsofmachinelearningaswellastheirrelevance topracticalapplications.

ALTisdedicatedtothetheoreticalfoundationsofmachinelearningandprovidesaforumforhigh-qualitytalksandscientiﬁcinteractioninareassuchas reinforcementlearning,inductiveinferenceandgrammaticalinference,learningfromqueries,activelearning,probablyapproximatecorrectlearning,onlinelearning,bandittheory,statisticallearningtheory,Bayesianandstochastic learning,un-supervisedorsemi-supervisedlearning,clustering,universalprediction,stochasticoptimization,highdimensionalandnon-parametricinference,information-basedmethods,decisiontreemethods,kernel-basedmethods, graphmethodsand/ormanifold-basedmethods,samplecomplexity,complexity oflearning,privacypreservinglearning,learningbasedonKolmogorovcomplexity,newlearningmodels,andapplicationsofalgorithmiclearningtheory.

ThepresentvolumeofLNAIcontainsthetextofthe21paperspresentedat ALT2014,aswellasthetexts/abstractsoftheinvitedtalks:

– ZoubinGhahramani(UniversityofCambridge,Cambridge,UK),“BuildinganAutomatedStatistician”(jointinvitedspeakerforALT2014and DS2014)

–

LucDevroye(McGillUniversity,Montreal,Canada),“CellularTreeClassiﬁers”(invitedspeakerforALT2014),

– EykeH¨ullermeier(Universit¨atPaderborn,Germany),“ASurveyofPreference-BasedOnlineLearningwithBanditAlgorithms”(tutorialspeakerfor ALT2014),

– AnuˇskaFerligoj(UniversityofLjubljana,Slovenia).“SocialNetworkAnalysis”(tutorialspeakerforDS2014)

Since1999,ALThasbeenawardingtheE.M.GoldAwardforthemostoutstandingstudentcontribution.Thisyear,theawardwasgiventoHasanAbasiand AliZ.Abdifortheirpaper“LearningBooleanHalfspaceswithSmallWeights fromMembershipQueries”co-authoredbyNaderH.Bshouty.

ALT2014wasthe25thmeetingintheALTconferenceseries,establishedin Japanin1990.TheALTseriesissupervisedbyitsSteeringCommittee:Peter

Auer(UniversityofLeoben,Austria),ShaiBen-David(UniversityofWaterloo, Canada),NaderH.Bshouty(Technion-IsraelInstituteofTechnology,Israel), AlexanderClark(King’sCollegeLondon,UK),MarcusHutter(AustralianNationalUniversity,Canberra,Australia),JyrkiKivinen(UniversityofHelsinki, Finland),FrankStephan(NationalUniversityofSingapore,RepublicofSingapore),GillesStoltz(Ecolenormalesupérieure,Paris,France),CsabaSzepesvári (UniversityofAlberta,Edmonton,Canada),EijiTakimoto(KyushuUniversity, Fukuoka,Japan),GyörgyTurán(UniversityofIllinoisatChicago,USA,and UniversityofSzeged,Hungary),AkihiroYamamoto(KyotoUniversity,Japan), ThomasZeugmann(Chair,HokkaidoUniversity,Sapporo,Japan),andSandra Zilles(Co-chair,UniversityofRegina,Saskatchewan,Canada).

Wethankvariouspeopleandinstitutionswhocontributedtothesuccessof theconference.Mostimportantly,wewouldliketothanktheauthorsforcontributingandpresentingtheirworkattheconference.Withouttheircontribution thisconferencewouldnothavebeenpossible.WewouldliketothanktheOﬃce ofNavalResearchGlobalforthegenerousﬁnancialsupportfortheconference ALT2014providedunderONRGGRANTN62909-14-1-C195.

ALT2014andDS2014wereorganizedbytheJoˇzefStefanInstitute(JSI)and theUniversityofLjubljana.WeareverygratefultotheDepartmentofKnowledgeTechnologies(andtheprojectMAESTRA)atJSIforsponsoringtheconferencesandprovidingadministrativesupport.Inparticular,wethankthelocal arrangementchair,MiliBauer,andherteam,TinaAnˇziˇc,NikolaSimidjievski, andJuricaLevati´cfromJSIfortheireﬀortsinorganizingthetwoconferences.

WearegratefulforthecollaborationwiththeconferenceseriesDiscovery Science.InparticularwewouldliketothankthegeneralchairofDS2014and ALT2014LjupˇcoTodorovskiandtheDS2014ProgramCommitteechairsSaˇso Dˇzeroski,DragiKocev,andPanˇcePanov.

WearealsogratefultoEasyChair,the excellentconferencemanagementsystem,whichwasusedforputtingtogethertheprogramforALT2014.EasyChair wasdevelopedmainlybyAndreiVoronkovandishostedattheUniversityof Manchester.Thesystemiscost-free.

WearegratefultothemembersoftheProgramCommitteeforALT2014 andthesubrefereesfortheirhardworkinselectingagoodprogramforALT 2014.Lastbutnottheleast,wethankSpringerfortheirsupportinpreparing andpublishingthisvolumeintheLectureNotesinArtiﬁcialIntelligenceseries.

August2014PeterAuer

VIPreface

AlexanderClark ThomasZeugmann SandraZilles

Organization

GeneralChairforALT2014andDS2014

LjupˇcoTodorovskiUniversityofLjubljana,Slovenia

ProgramCommittee

NirAilonTechnion,Israel

Andr´asAntosSZIT,Hungary

PeterAuer(Chair)Montanuniversit¨atLeoben,Austria

ShaiBen-DavidUniversityofWaterloo,Canada

S´ebastienBubeckPrincetonUniversity,USA

AlexanderClark(Chair)King’sCollegeLondon,UK

CorinnaCortesGoogle,USA

VitalyFeldmanIBMResearch,USA

ClaudioGentileUniversit`adegliStudidell’Insubria, Italy

SteveHannekeCarnegieMellonUniversity,USA

KoheiHatanoKyushuUniversity,Japan

SanjayJainNationalUniversityofSingapore,Singapore

TimoK¨otzingFriedrich-Schiller-Universit¨at,Germany

EricMartinUniversityofNewSouthWales, Australia

MehryarMohriCourantInstituteofMathematicalSciences, USA

R´emiMunosInria,France

RonaldOrtnerMontanuniversit¨atLeoben,Austria

LevReyzinUniversityofIllinoisatChicago,USA

Daniil,RyabkoInria,France

SivanSabatoMicrosoftResearchNewEngland,USA

MasashiSugiyamaTokyoInstituteofTechnology,Japan

CsabaSzepesv´ariUniversityofAlberta,Canada

JohnShawe-TaylorUniversityCollegeLondon,UK

VladimirVovkRoyalHolloway,UniversityofLondon,UK

SandraZillesUniversityofRegina,Canada

LocalArrangementsChair

MiliBauerJoˇzefStefanInstitute,Ljubljana

Subreferees

Abbasi-Yadkori,Yasin Allauzen,Cyril Amin,Kareem ´ AvilaPires,Bernardo Cesa-Bianchi,Nicol` o Chernov,Alexey Ge,Rong Gravin,Nick Kameoka,Hirokazu Kanade,Varun Kanamori,Takafumi Koc´ak,Tom´aˇ s Kuznetsov,Vitaly Lazaric,Alessandro Lever,Guy London,Ben Long,Phil Ma,Yao Maillard,Odalric-Ambrym

SponsoringInstitutions

Mens,Irini-Eleftheria Morimura,Tetsuro Munoz,Andres Nakajima,Shinichi Neu,Gergely Procopiuc,Cecilia Russo,Daniel Sakuma,Jun Semukhin,Pavel Shamir,Ohad Slivkins,Aleksandrs Smith,Adam Syed,Umar Takimoto,Eiji Telgarsky,Matus Wen,Zheng Yamada,Makoto Yaroslavtsev,Grigory Zolotykh,Nikolai

OﬃceofNavalResearchGlobal,ONRGGRANTN62909-14-1-C195 JoˇzefStefanInstitute,Ljubljana UniversityofLjubljana

VIIIOrganization

InvitedAbstracts

ASurveyofPreference-BasedOnlineLearning withBanditAlgorithms

R´obertBusa-FeketeandEykeH¨ullermeier

DepartmentofComputerScience

UniversityofPaderborn,Germany

{busarobi,eyke}@upb.de

Abstract. Inmachinelearning,thenotionof multi-armedbandits refers toaclassofonlinelearningproblems,inwhichanagentissupposedto simultaneouslyexploreandexploitagivensetofchoicealternativesin thecourseofasequentialdecisionprocess.Inthestandardsetting,the agentlearnsfromstochasticfeedbackintheformofreal-valuedrewards. Inmanyapplications,however,numericalrewardsignalsarenotreadily available—instead,onlyweakerinformationisprovided,inparticularrelativepreferencesintheformofqualitativecomparisonsbetweenpairsof alternatives.Thisobservationhasmotivatedthestudyofvariantsofthe multi-armedbanditproblem,inwhichmoregeneralrepresentationsare usedbothforthetypeoffeedbacktolearnfromandthetargetofprediction.Theaimofthispaperistoprovideasurveyofthestate-of-the-art inthisﬁeld,thatwerefertoas preference-basedmulti-armedbandits.To thisend,weprovideanoverviewofproblemsthathavebeenconsidered intheliteratureaswellasmethodsfortacklingthem.Oursystematizationismainlybasedontheassumptionsmadebythesemethodsabout thedata-generatingprocessand,relatedtothis,thepropertiesofthe preference-basedfeedback.

Keywords: Multi-armedbandits,onlinelearning,preferencelearning, ranking,top-kselection,exploration/exploitation,cumulativeregret,samplecomplexity,PAClearning.

CellularTreeClassiﬁers

G´erardBiau1,2 andLucDevroye3

1 SorbonneUniversit´es,UPMCUnivParis06,France

2 InstitutuniversitairedeFrance

3 McGillUniversity,Canada

Abstract. Supposethatbinaryclassificationisdonebyatreemethod inwhichtheleavesofatreecorrespondtoapartitionofd-space.Within apartition,amajorityvoteisused.Supposefurthermorethatthistree mustbeconstructedrecursivelybyimplementingjusttwofunctions,so thattheconstructioncanbecarriedoutinparallelbyusing“cells”:first ofall,giveninputdata,acellmustdecidewhetheritwillbecomealeaf oraninternalnodeinthetree.Secondly,ifitdecidesonaninternal node,itmustdecidehowtopartitionthespacelinearly.Dataarethen splitintotwopartsandsentdownstreamtotwonewindependentcells. Wediscussthedesignandpropertiesofsuchclassifiers.

SocialNetworkAnalysis

AnuˇskaFerligoj

FacultyofSocialSciences, UniversityofLjubljana

anuska.ferligoj@fdv.uni-lj.si

Abstract. Socialnetworkanalysishasattractedconsiderableinterest fromsocialandbehavioralsciencecommunityinrecentdecades.Much ofthisinterestcanbeattributedtothefocusofsocialnetworkanalysis onrelationshipamongunits,andonthepatternsoftheserelationships. Socialnetworkanalysisisarapidlyexpandingandchangingﬁeldwith broadrangeofapproaches,methods,modelsandsubstantiveapplications.Inthetalkspecialattentionwillbegivento:

1.Generalintroductiontosocialnetworkanalysis:

– Whataresocialnetworks?

– Datacollectionissues.

– Basicnetworkconcepts:networkrepresentation;typesofnetworks;sizeanddensity.

– Walksandpathsinnetworks:lengthandvalueofpath;theshortestpath, k -neighbours;acyclicnetworks.

– Connectivity:weakly,stronglyandbi-connectedcomponents; contraction;extraction.

2.Overviewoftasksandcorrespondingmethods:

– Network/nodeproperties:centrality(degree,closeness,betweenness);hubsandauthorities.

– Cohesion:triads,cliques,cores,islands.

– Partitioning:blockmodeling(directandindirectapproaches; structural,regularequivalence;generalisedblockmodeling);clustering.

– Statisticalmodels.

3.Softwareforsocialnetworkanalysis(UCINET,PAJEK,...)

BuildinganAutomatedStatistician

ZoubinGhahramani

DepartmentofEngineering, UniversityofCambridge, TrumpingtonStreet CambridgeCB21PZ,UK

zoubin@eng.cam.ac.uk

Abstract. Wewillliveaneraofabundantdataandthereisanincreasingneedformethodstoautomatedataanalysisandstatistics.Iwilldescribethe“AutomatedStatistician”,aprojectwhichaimstoautomate theexploratoryanalysisandmodellingofdata.Ourapproachstartsby definingalargespaceofrelatedprobabilisticmodelsviaagrammar overmodels,andthenusesBayesianmarginallikelihoodcomputations tosearchoverthisspaceforoneorafewgoodmodelsofthedata.The aimistofindmodelswhichhavebothgoodpredictiveperformance,and aresomewhatinterpretable.Ourinitialworkhasfocusedonthelearning ofunknownnonparametricregressionfunctions,andonlearningmodels oftimeseriesdata,bothusingGaussianprocesses.Onceagoodmodel hasbeenfound,theAutomatedStatisticiangeneratesanaturallanguage summaryoftheanalysis,producinga10-15pagereportwithplotsand tablesdescribingtheanalysis.Iwilldiscusschallengessuchas:howto tradeoffpredictiveperformanceandinterpretability,howtotranslate complexstatisticalconceptsintonaturallanguagetextthatisunderstandablebyanumeratenon-statistician,andhowtointegratemodel checking.ThisisjointworkwithJamesLloydandDavidDuvenaud (Cambridge)andRogerGrosseandJoshTenenbaum(MIT).

R´obertBusa-FeketeandEykeH¨ullermeier

RegularContributions InductiveInference

TimoK¨otzingandRaphaelaPalenta

SanjayJain,TimoK¨otzing,JunqiMa,andFrankStephan ParallelLearningofAutomaticClassesofLanguages

SanjayJainandEﬁmKinber

LaurentBienvenu,BenoˆıtMonin,andAlexanderShen

ExactLearningfromQueries

HasanAbasi,AliZ.Abdi,andNaderH.Bshouty

OnExactLearningMonotoneDNFfromMembershipQueries .........

HasanAbasi,NaderH.Bshouty,andHannaMazzawi

DanaAngluinandDanaFisman

Editors’Introduction ............................................. 1 PeterAuer,AlexanderClark,ThomasZeugmann,andSandraZilles FullInvitedPapers CellularTreeClassiﬁers 8

18

TableofContents

G´erardBiauandLucDevroye ASurveyofPreference-BasedOnlineLearningwithBandit Algorithms ......................................................

AMapofUpdateConstraintsinInductiveInference 40

OntheRoleofUpdateConstraintsandText-TypesinIterative Learning ........................................................ 55

70

AlgorithmicIdentiﬁcationofProbabilitiesIsHard .................... 85

96

LearningBooleanHalfspaceswithSmallWeightsfromMembership Queries

111

LearningRegularOmegaLanguages ................................ 125

ReinforcementLearning

SelectingNear-OptimalApproximateStateRepresentationsin ReinforcementLearning ...........................................

RonaldOrtner,Odalric-AmbrymMaillard,andDaniilRyabko

PolicyGradientsforCVaR-ConstrainedMDPs

L.A.Prashanth

TorLattimoreandMarcusHutter

ExtremeStateAggregationbeyondMDPs

MarcusHutter

OnlineLearningandLearningwithBandit Information

TorLattimore,AndrásGyörgy,andCsabaSzepesvári BanditOnlineOptimizationoverthePermutahedron

NirAilon,KoheiHatano,andEijiTakimoto

MarcusHutter

AndreasMaurer

GeneralizationBoundsforTimeSeriesPredictionwithNon-stationary

VitalyKuznetsovandMehryarMohri

GeneralizingLabeledandUnlabeledSampleCompressiontoMulti-label ConceptClasses .................................................

RahimSamei,BotingYang,andSandraZilles Privacy,Clustering,MDL,andKolmogorov Complexity

RobustandPrivateBayesianInference

ChristosDimitrakakis,BlaineNelson,AikateriniMitrokotsa,and BenjaminI.P.Rubinstein

XVITableofContents

140

....................... 155

BayesianReinforcementLearningwithExploration ................... 170

185

OnLearningtheOptimalWaitingTime ............................ 200

................. 215

OﬄinetoOnlineConversion ....................................... 230

245

StatisticalLearningTheory AChainRulefortheExpectedSupremaofGaussianProcesses

Processes ....................................................... 260

275

............................. 291

Clustering,HammingEmbedding,GeneralizedLSHandtheMax Norm

BehnamNeyshabur,YuryMakarychev,andNathanSrebro

JanLeikeandMarcusHutter

PeterBloem,FranciscoMota,StevendeRooij,Lu´ısAntunes,and PieterAdriaans

TableofContentsXVII

306

IndeﬁnitelyOscillatingMartingales ................................. 321

ASafeApproximationforKolmogorovComplexity 336

351

AuthorIndex

Editors’Introduction

PeterAuer,AlexanderClark,ThomasZeugmann,andSandraZilles

TheaimoftheseriesofconferencesonAlgorithmicLearningTheory(ALT) istolookatlearningfromanalgorithmicandmathematicalperspective.Over timeseveralmodelsoflearninghavebeendevelopedwhichstudydiﬀerentaspectsoflearning.Inthefollowingwedescribeinbrieftheinvitedtalksandthe contributedpapersforALT2014heldinBled,Slovenia..

InvitedTalks. Followingthetraditionoftheco-locatedconferencesALTand DSallinvitedlecturesaresharedbythe twoconferences.Theinvitedspeakers areeminentresearchersintheirﬁeldsandpresenteithertheirspeciﬁcresearch areaorlectureaboutatopicofbroaderinterest.

Thisyear’sjointinvitedspeakerforALT2014andDS2014isZoubinGhahramani,whoisProfessorofInformationEngineeringattheUniversityofCambridge,UK,whereheleadsagroupofabout30researchers.Hestudiedcomputer scienceandcognitivescienceattheUniversityofPennsylvania,obtainedhisPhD fromMITin1995underthesupervisionofMichaelJordan,andwasapostdoctoralfellowattheUniversityofTorontowithGeoﬀreyHinton.Hisacademic careerincludesconcurrentappointmentsasoneofthefoundingmembersofthe GatsbyComputationalNeuroscienceUnitinLondon,andasafacultymemberof CMU’sMachineLearningDepartmentforover10years.HiscurrentresearchfocusesonnonparametricBayesianmodelingandstatisticalmachinelearning.He hasalsoworkedonapplicationstobioinformatics,econometrics,andavarietyof large-scaledatamodelingproblems.He haspublishedover200papers,receiving 25,000citations(anh-indexof68).HisworkhasbeenfundedbygrantsanddonationsfromEPSRC,DARPA,Microsoft,Google,Infosys,Facebook,Amazon, FXConceptsandanumberofotherindustrialpartners.In2013,hereceived a$750,000GoogleAwardforresearchonbuildingtheAutomaticStatistician. Inhisinvitedtalk BuildinganAutomatedStatistician (jointworkwithJames Lloyd,DavidDuvenaud,RogerGrosse,andJoshTenenbaum)ZoubinGhahramaniaddressestheproblemofabundantdataandtheincreasingneedformethodstoautomatedataanalysisandstatistics.TheAutomatedStatisticianproject aimstoautomatetheexploratoryanalysisandmodelingofdata.Theapproach usesBayesianmarginallikelihoodcomputationstosearchoveralargespaceof relatedprobabilisticmodels.Onceagoodmodelhasbeenfound,theAutomated Statisticiangeneratesanaturallanguagesummaryoftheanalysis,producinga 10-15pagereportwithplotsandtablesdescribingtheanalysis.ZoubinGhahramanidiscusseschallengessuchas:howtotradeoﬀpredictiveperformanceand interpretability,howtotranslatecomplexstatisticalconceptsintonaturallanguagetextthatisunderstandablebyanumeratenon-statistician,andhowto integratemodelchecking.

TheinvitedspeakerforALT2014isLucDevroye,whoisaJamesMcGillProfessorintheSchoolofComputerScienceofMcGillUniversityinMontreal.He

P.Aueretal.(Eds.):ALT2014,LNAI8776,pp.1–7,2014. c SpringerInternationalPublishingSwitzerland2014

studiedatKatholiekeUniversiteitLeuvenandsubsequentlyatOsakaUniversity andin1976receivedhisPhDfromUniversityofTexasatAustinunderthesupervisionofTerryWagner.LucDevroyespecializesintheprobabilisticanalysis ofalgorithms,randomnumbergenerationandenjoystypography.Sincejoining theMcGillfacultyin1977hehaswonnumerousawards,includinganE.W.R. SteacieMemorialFellowship(1987),aHumboldtResearchAward(2004),the KillamPrize(2005)andtheStatisticalSocietyofCanadagoldmedal(2008). Hereceivedanhonorarydocto ratefromtheUniversitécatholiquedeLouvain in2002,andanhonorarydoctoratefromUniversiteitAntwerpenin2012.The invitedpaper CellularTreeClassifiers (jointworkwithGerardBiau)dealswith classificationbydecisiontrees,wherethedecisiontreesareconstructedrecursivelybyusingonlytwolocalrules:(1)giventheinputdatatoanode,itmust decidewhetheritwillbecomealeafornot,and(2)anon-leafnodeneedsto decidehowtosplitthedataforsendingthemdownstream.Theimportantpoint isthateachnodecanmakethesedecisionsbasedonlyonitslocaldata,such thatthedecisiontreeconstructioncanbecarriedoutinparallel.Somewhat surprisinglytherearesuchlocalrules thatguaranteeconvergenceofthedecisiontreeerrortotheBayesoptimalerror.LucDevroyediscussesthedesignand propertiesofsuchclassifiers.

TheALT2014tutorialspeakerisEykeHüllermeier,whoisprofessorand headoftheIntelligentSystemsGroupattheDepartmentofComputerScience oftheUniversityofPaderborn.HereceivedhisPhDinComputerSciencefrom theUniversityofPaderbornin1997andhealsoholdsanMScdegreeinbusinessinformatics.Hewasaresearcherinartificialintelligence,knowledge-based systems,andstatisticsattheUniversityofPaderbornandtheUniversityof DortmundandaMarieCuriefellowattheInstitutdeRechercheenInformatiquedeToulouse.HehasheldalreadyafullprofessorshipintheDepartment ofMathematicsandComputerScienceatMarburgUniversitybeforerejoining theUniversityofPaderborn.Inhistutorial ASurveyofPreference-basedOnline LearningwithBanditAlgorithms (jointworkwithRóbertBusa-Fekete)Eyke Hüllermeierreportsonlearningwithbanditfeedbackthatisweakerthanthe usualreal-valuereward.Whenlearningwithbanditfeedbackthelearningalgorithmreceivesfeedbackonlyfromthedecisionsitmakes,butnoinformation fromotheralternatives.Thusthelearningalgorithmneedstosimultaneously exploreandexploitagivensetofalternativesinthecourseofasequentialdecisionprocess.Inmanyapplicationsthefeedbackisnotanumericalrewardsignal butsomeweakerinformation,inparticularrelativepreferencesintheformof qualitativecomparisonsbetweenpairsofalternatives.Thisobservationhasmotivatedthestudyofvariantsofthemulti-armedbanditproblem,inwhichmore generalrepresentationsareusedbothforthetypeoffeedbacktolearnfromand thetargetofprediction.Theaimofthetutorialistoprovideasurveyofthe state-of-the-artinthisareawhichisreferredtoaspreference-basedmulti-armed bandits.Tothisend,EykeHüllermeierprovidesanoverviewofproblemsthat havebeenconsideredintheliteratureaswellasmethodsfortacklingthem. Hissystematizationismainlybasedontheassumptionsmadebythesemeth-

2P.Aueretal.

odsaboutthedata-generatingprocessand,relatedtothis,thepropertiesofthe preference-basedfeedback.

TheDS2014tutorialspeakerisAnuˇskaFerligoj,whoisprofessorofMultivariateStatisticalMethodsattheUniversityofLjubljana.SheisaSlovenian mathematicianwhoearnedinternationalrecognitionbyherresearchworkon networkanalysis.Herinterestsincludemultivariateanalysis(constrainedand multicriteriaclustering),socialnetworks(measurementqualityandblockmodeling),andsurveymethodology(reliabilityandvalidityofmeasurement).She isafellowoftheEuropeanAcademyofSociology.ShehasalsobeenaneditorofthejournalAdvancesinMethodologyandStatistics(Metodoloskizvezki) since2004andisamemberoftheeditorialboardsoftheJournalofMathematicalSociology,JournalofClassiﬁcation,SocialNetworks,StatisticinTransition, Methodology,StructureandDynamics:eJournalofAnthropologyandRelated Sciences.ShewasaFulbrightscholarin1990andvisitingprofessorattheUniversityofPittsburgh.ShewasawardedthetitleofAmbassadorofScienceofthe RepublicofSloveniain1997.Socialnetworkanalysishasattractedconsiderable interestfromthesocialandbehavioralsciencecommunityinrecentdecades. Muchofthisinterestcanbeattributedtothefocusofsocialnetworkanalysis onrelationshipamongunits,andonthepatternsoftheserelationships.Social networkanalysisisarapidlyexpandingandchangingﬁeldwithbroadrangeof approaches,methods,modelsandsubstantiveapplications.Inhertutorial Social NetworkAnalysis AnuˇskaFerligojgivesageneralintroductiontosocialnetwork analysisandanoverviewoftasksandcorrespondingmethods,accompaniedby pointerstosoftwareforsocialnetworkanalysis.

InductiveInference. Thereareanumberofpapersintheﬁeldofinductive inference,themostclassicalbranchofalgorithmiclearningtheory.First, AMap ofUpdateConstraintsinInductiveInference byTimoK¨otzingandRaphaela Palentaprovidesasystematicoverviewofvariousconstraintsonlearnersin inductiveinferenceproblems.Theyfocusonthequestionofwhichconstraints andcombinationsofconstraintsreducethelearningpower,meaningtheclassof languagesthatarelearnablewithrespecttocertaincriteria.

Onarelatedtheme,thepaper OntheRoleofUpdateConstraintsandTextTypesinIterativeLearning bySanjayJain,TimoK¨otzing,JunqiMa,andFrank Stephanlooksmorespeciﬁcallyatthecasewherethelearnerhasnomemory beyondthecurrenthypothesis.Inthissituationthepaperisabletocompletely characterizetherelationsbetweenthevariousconstraints.

Thepaper ParallelLearningofAutomaticClassesofLanguages bySanjay JainandEﬁmKinbercontinuesthelineofresearchonlearningautomaticclasses oflanguagesinitiatedbyJain,LuoandStephanin2012,inthiscasebyconsideringtheproblemoflearningmultipledistinctlanguagesatthesametime.

LaurentBienvenu,BenoˆıtMoninandAlexanderShenpresentanegativeresult intheirpaper AlgorithmicIdentiﬁcationofProbabilitiesisHard.Theyshowthat itisimpossibletoidentifyinthelimittheexactparameter—inthesenseofthe Turingcodeforacomputablerealnumber—ofaBernoullidistribution,though itisofcourseeasytoapproximateit.

Editors’Introduction3

ExactLearningfromQueries. Incaseswheretheinstancespaceisdiscrete, itisreasonabletoaimatexactlearningalgorithmswherethelearnerisrequired toproduceahypothesisthatisexactlycorrect.

ThepaperwinningtheE.M.GoldAward, LearningBooleanHalfspaceswith SmallWeightsfromMembershipQueries bythestudentauthorsHasanAbasi andAliZ.Abdiandco-authoredbyNaderH.Bshouty,presentsasigniﬁcantlyimprovedalgorithmforlearningBooleanHalfspacesin {0, 1}n withinteger weights {0,...,t} frommembershipqueriesonly.Itisshownthatthisalgorithm needsonly nO (t) membershipqueries,whichimprovesoverpreviousalgorithms with nO (t5 ) queriesandclosesthegaptotheknownlowerbound nt

ThepaperbyHasanAbasi,NaderH.BshoutyandHannaMazzawi OnExact LearningMonotoneDNFfromMembershipQueries presentslearningresultson learnabilitybymembershipqueriesofmonotoneDNF(disjunctivenormalforms) withaboundednumberoftermsandaboundednumberofvariablesperterm.

DanaAngluinandDanaFismanlookatexactlearningusingmembership queriesandequivalencequeriesintheirpaper LearningRegularOmegaLanguages.Heretheclassconcernedisthatofregularlanguagesoverinfinitewords; theauthorsconsiderthreedifferentrepresentationswhichvaryintheirsuccinctness.Thisproblemhasapplicationsinverificationandsynthesisofreactive systems.

ReinforcementLearning. Reinforcementlearningcontinuestobeacentrally importantareaoflearningtheory,andthisconferencecontainsanumberof contributionsinthisﬁeld.RonaldOrtner,Odalric-AmbrymMaillardandDaniil Ryabkopresentapaper SelectingNear-OptimalApproximateStateRepresentationsinReinforcementLearning,whichlooksattheproblemwherethelearner doesnothavedirectinformationaboutthestatesintheunderlyingMarkov DecisionProcess(MDP);incontrasttoPartiallyObservableMDPs,herethe informationisviavariousmodelsthatmapthehistoriestostates.

L.A.Prashanthconsidersriskconstrainedreinforcementlearninginhispaper PolicyGradientsforCVaR-ConstrainedMDPs,focusingonthestochastic shortestpathproblem.Forariskconstrainedproblemnotonlytheexpected sumofcostsperstep E[ m g (sm ,am )]istobeminimized,butalsothesumof anadditionalcostmeasure C = m c(sm ,am )needstobeboundedfromabove. UsuallytheValueatRisk,VaRα =inf {ξ |P(C ≤ ξ ) ≥ α},isconstrained,but suchconstrainedproblemsarehardto optimize.Instead,thepaperproposes toconstraintheConditionalValueatRisk,CVaRα = E[C |C ≥ VaRα ],which allowstoapplystandardoptimizationtechniques.Twoalgorithmsarepresented thatconvergetoalocallyrisk-optimalpolicyusingstochasticapproximation, minibatches,policygradients,andimportancesampling.

IncontrasttotheusualMDPsettingforreinforcementlearning,thetwofollowingpapersconsidermoregeneralreinforcementlearning. BayesianReinforcementLearningwithExploration byTorLattimoreandMarcusHutterimproves someoftheirearlierworkongeneralreinforcementlearning.HerethetrueenvironmentdoesnotneedtobeMarkovian,butitisknowntobedrawnatrandom fromaﬁniteclassofpossibleenvironments.Analgorithmispresentedthatalter-

4P.Aueretal.

natesbetweenperiodsofplayingtheBayesoptimalpolicyandperiodsofforced experimentation.Upperboundsonthesamplecomplexityareestablished,and itisshownthatforsomeclassesofenvironmentsthisboundcannotbeimproved bymorethanalogarithmicfactor.

MarcusHutter’spaper ExtremeStateAggregationbeyondMDPs considers howanarbitrary(non-Markov)decisionprocesswithafinitenumberofactions canbeapproximatedbyafinite-stateMDP.Foragivenfeaturefunction φ : H → S mappinghistories h ofthegeneralprocesstosomefinitestatespace S , thetransitionprobabilitiesoftheMDPcanbedefinedappropriately.Itisshown thattheMDPapproximatesthegeneralprocesswelliftheoptimalpolicyfor thegeneralprocessisconsistentwiththefeaturefunction, π ∗ (h1 )= π ∗ (h2 )for φ(h1 )= φ(h2 ),oriftheoptimalQ-valuefunctionisconsistentwiththefeature function, |Q∗ (h1 ,a) Q∗ (h2 ,a)| <ε for φ(h1 )= φ(h2 )andall a.Itisalsoshown thatsuchafeaturefunctionalwaysexists.

OnlineLearningandLearningwithBanditInformation.

Thepaper OnLearningtheOptimalWaitingTime byTorLattimore,AndrásGyörgy,and CsabaSzepesvári,addressestheproblemofhowlongtowaitforaneventwith independentandidenticallydistributed(i.i.d.)arrivaltimesfromanunknown distribution.Iftheeventoccursduringthewaitingtime,thenthecostisthe timeuntilarrival.Iftheeventoccursafterthewaitingtime,thenthecostisthe waitingtimeplusafixedandknownamount.Algorithmsforthefullinformation settingandforbanditinformationarepresentedthatsequentiallychoosewaiting timesoverseveralroundsinordertominimizetheregretinrespecttoanoptimal waitingtime.Forbanditinformationthearrivaltimeisonlyrevealedifitis smallerthanthewaitingtime,andinthefullinformationsettingitisrevealed always.Theperformanceofthealgorithmsnearlymatchestheminimaxlower boundontheregret.

Inmanyapplicationareas,e.g.recommendationsystems,thelearningalgorithmshouldreturnaranking:apermutationofsomeﬁnitesetofelements.This problemisstudiedinthepaperbyNirAilon,KoheiHatano,andEijiTakimoto titled BanditOnlineOptimizationOverthePermutahedron whenthecostofa rankingsiscalculatedas n i=1 π (i)s(i),where π (i)istherankofitem i and s(i) isitscost.Inthebanditsettingineachiterationanunknowncostvector st is chosen,andthegoalofthealgorithmistominimizetheregretinrespecttothe bestﬁxedrankingoftheitems.

MarcusHutter’spaper OﬄinetoOnlineConversion introducestheproblem ofturningasequenceofdistributions qn onstringsin X n , n =1,...,n,intoa stochasticonlinepredictorforthenextsymbol˜ q (xn |x1 ,...,xn 1 ),suchthatthe inducedprobabilities˜ q (x1 ,...,xn )arecloseto qn (x1 ,...,xn )forallsequences x1 ,x2 ,... Thepaperconsidersfourstrategiesfordoingsuchaconversion,showingthatna¨ıveapproachesmightnotbesatisfactorybutthatagoodpredictor canalwaysbeconstructed,atthecostofpossiblecomputationalineﬃciency. Oneexamplesofsuchaconversiongivesasimplecombinatorialderivationof theGood-Turingestimator.

Editors’Introduction5

StatisticalLearningTheory.

AndreasMaurer’spaper AChainRuleforthe ExpectedSupremaofGaussianProcesses investigatestheproblemofassessing generalizationofalearnerwhoisadaptingafeaturespacewhilealsolearning thetargetfunction.Theapproachtakenistoconsiderextensionsofboundson Gaussianaveragestothecasewherethereisaclassoffunctionsthatcreate featuresandaclassofmappingsfromthosefeaturestooutputs.Intheapplicationsconsideredinthepaperthiscorrespondstoatwolayerkernelmachine, multitasklearning,andthroughaniterationoftheapplicationoftheboundto multilayernetworksanddeeplearners.

Astandardassumptioninstatisticallearningtheoryisthatthedataaregeneratedindependentlyandidenticallydistributedfromsomefixeddistribution; inpractice,thisassumptionisoftenviolatedandamorerealisticassumptionis thatthedataaregeneratedbyaprocesswhichisonlysufficientlyfastmixing, andmaybeevennon-stationary.VitalyKuznetsovandMehryarMohriintheir paper GeneralizationBoundsforTimeSeriesPredictionwithNon-stationary Processes considerthiscaseandareabletoprovenewgeneralizationbounds thatdependonthemixingcoefficientsandtheshiftofthedistribution.

RahimSamei,BotingYang,andSandraZillesintheirpaper Generalizing LabeledandUnlabeledSampleCompressiontoMulti-labelConceptClasses considergeneralizationsofthebinaryVC-dimensiontomulti-labelclassiﬁcation, suchthatmaximumclassesofdimension d allowatightcompressionschemeof size d.Suﬃcientconditionsfornotionsofdimensionswiththispropertyarederived,anditisshownthatsomemulti-labelgeneralizationsoftheVC-dimension allowtightcompressionschemes,whileothergeneralizationsdonot.

Privacy,Clustering,MDL,andKolmogorovComplexity. ChristosDimitrakakis,BlaineNelson,AikateriniMitrokotsa,andBenjaminI.P.Rubinstein presentthepaper RobustandPrivateBayesianInference.Thispaperlooksat theproblemofprivacyinmachinelearning,whereanagent,astatisticianfor example,mightwanttorevealinformationderivedfromadataset,butwithout revealinginformationabouttheparticulardatapointsintheset,whichmight containconﬁdentialinformation.Theauthorsshowthatitispossibletodo Bayesianinferenceinthissetting,satisfyingdiﬀerentialprivacy,providedthat thelikelihoodsandconjugatepriorssatisfysomeproperties.

BehnamNeyshabur,YuryMakarychev,andNathanSrebrointheirpaper Clustering,HammingEmbedding,GeneralizedLSHandtheMaxNorm lookat asymmetriclocalitysensitivehashing(LSH)whichisusefulinmanytypesof machinelearningapplications.Localitysensitivehashing,whichiscloselyrelatedtotheproblemofclustering,isamethodofprobabilisticallyreducingthe dimensionofhighdimensionaldatasets;assigningeachdatapointahashsuch thatsimilardatapointswillbemappedtothesamehash.Thepapershowsthat byshiftingtoco-clusteringandasymmetricLSHtheproblemadmitsatractable relaxation.

JanLeikeandMarcusHutterlookatmartingaletheoryin IndeﬁnitelyOscillatingMartingales ;asaconsequenceoftheiranalysistheyshowanegativeresultinthetheoryofMinimumDescriptionLength(MDL)learning,namelythat

6P.Aueretal.

theMDLestimatorisingeneralinductivelyinconsistent:itwillnotnecessarilyconverge.TheMDLestimatorgivestheregularizedcodelength,MDL(u)= minQ {Q(u)+ K (Q)},where Q isacodingfunction, K (Q)itscomplexity,and Q(u)thecodelengthforthestring u.Itisshownthatthefamilyofcoding functions Q canbeconstructedsuchthatlimn→∞ MDL(u1:n )doesnotconverge formostinﬁnitewords u

Asiswell-known,theKolmogorovcomplexityisnotcomputable.PeterBloem, FranciscoMota,StevendeRooij,Lu´ısAntunes,andPieterAdriaansintheir paper ASafeApproximationforKolmogorovComplexity studytheproblemof approximatingthisquantityusingarestrictiontoaparticularclassofmodels, andaprobabilisticboundontheapproximationerror.

Editors’Introduction7

CellularTreeClassiﬁers

G´erardBiau1,2 andLucDevroye3

1 SorbonneUniversit´es,UPMCUnivParis06,France

2 InstitutuniversitairedeFrance

3 McGillUniversity,Canada

Abstract. Supposethatbinaryclassificationisdonebyatreemethod inwhichtheleavesofatreecorrespondtoapartitionofd-space.Within apartition,amajorityvoteisused.Supposefurthermorethatthistree mustbeconstructedrecursivelybyimplementingjusttwofunctions,so thattheconstructioncanbecarriedoutinparallelbyusing“cells”:first ofall,giveninputdata,acellmustdecidewhetheritwillbecomealeaf oraninternalnodeinthetree.Secondly,ifitdecidesonaninternal node,itmustdecidehowtopartitionthespacelinearly.Dataarethen splitintotwopartsandsentdownstreamtotwonewindependentcells. Wediscussthedesignandpropertiesofsuchclassifiers.

1Introduction

Weexploreinthisnoteanewwayofdealingwiththesupervisedclassiﬁcation problem,inspiredbygreedyapproachesandthedivide-and-conquerphilosophy. Ourpointofviewisnovel,buthasawidereachinaworldinwhichparallel anddistributedcomputationareimportant.Intheshortterm,parallelismwill takeholdinmassivedatasetsandcomplexsystemsand,assuch,isoneofthe excitingquestionsthatwillbeaskedtothestatisticsandmachinelearningﬁelds.

Thegeneralcontextisthatofclassificationtrees,whichmakedecisionsby recursivelypartitioning Rd intoregions,sometimescalledcells.Inthemodelwe promote,abasiccomputationalunitinclassification,acell,takesasinputtrainingdata,andmakesadecisionwhetheramajorityruleshouldbelocallyapplied. Inthenegative,thedatashouldbesplitandeachpartofthepartitionshould betransmittedtoanothercell.Whatisoriginalinourapproachisthatallcells mustuse exactly thesameprotocoltomaketheirdecision—theirfunctionis notalteredbyexternalinputsorglobalparameters.Inotherwords,thedecision tosplitdependsonlyuponthedatapresentedtothecell,independentlyofthe overalledifice.Classifiersdesignedaccordingtothisautonomousprinciplewill becalledcellulartreeclassifiers,orsimplycellularclassifiers.

Decisiontreelearningisamethodcommonlyusedindatamining(see,e.g., [27]).Forexample,inCART(ClassiﬁcationandRegressionTrees,[5]),splitsare madeperpendiculartotheaxesbasedonthenotionofGiniimpurity.Splitsare performeduntilalldataareisolated.Inasecondphase,nodesarerecombined fromthebottom-upinaprocesscalledpruning.Itisthissecondprocessthat makestheCARTtreesnon-cellular,asglobalinformationissharedtomanage

P.Aueretal.(Eds.):ALT2014,LNAI8776,pp.8–17,2014.

c SpringerInternationalPublishingSwitzerland2014

therecombinationprocess.Quinlan’sC4.5[26]alsoprunes.Otherssplituntilall nodesorcellsarehomogeneous(i.e.,havethesameclass)—theprimeexampleis Quinlan’sID3[25].Thisstrategy,whilecompliantwiththecellularframework, leadstonon-consistentrules,aswepointoutinthepresentpaper.Infact,the choiceofagoodstoppingrulefordecisiontreesisveryhard—wewerenotable toﬁndanyintheliteraturethatguaranteeconvergencetotheBayeserror.

2TreeClassiﬁers

Inthedesignofclassifiers,wehaveanunknowndistributionofarandomprototypepair(X,Y ),where X takesvaluesin Rd and Y takesonlyfinitelymany values,say0or1forsimplicity.Classicalpatternrecognitiondealswithpredictingtheunknownnature Y oftheobservation X viaameasurableclassifier g : Rd →{0, 1}.Wemakeamistakeif g (X)differsfrom Y ,andtheprobability oferrorforaparticulardecisionrule g is L(g )= P{g (X) = Y }.TheBayes classifier g (x)= 1if P{Y =1|X = x} > P{Y =0|X = x

hasthesmallestprobabilityoferror,thatis

L = L(g )=inf g :Rd →{0,1} P{g (X) = Y }

(see,forinstance,Theorem2.1in[7]).However,mostofthetime,thedistribution of(X,Y )isunknown,sothattheoptimaldecision g isunknowntoo.Wedo notconsultanexperttotrytoreconstruct g ,buthaveaccesstoadatabase Dn =(X1 ,Y1 ),..., (Xn ,Yn )ofi.i.d.copiesof(X,Y ),observedinthepast.We assumethat Dn and(X,Y )areindependent.Inthiscontext,aclassiﬁcation rule gn (x; Dn )isaBorelmeasurablefunctionof x and Dn ,anditattemptsto estimate Y from x and Dn .Forsimplicity,wesuppress Dn inthenotationand write gn (x)insteadof gn (x; Dn ).

Theprobabilityoferrorofagivenclassiﬁer gn istherandomvariable L(gn )= P{gn (X) = Y |Dn }, andtheruleisconsistentif

Itisuniversallyconsistentifitisconsistentforallpossibledistributionsof (X,Y ).Manypopularclassiﬁersareuniversallyconsistent.Theseincludeseveralbrandsofhistogramrules, k -nearestneighborrules,kernelrules,neural networks,andtreeclassiﬁers.Therear etoomanyreferencestobecitedhere, butthemonographsby[7]and[15]willprovidethereaderwithacomprehensive introductiontothedomainandaliteraturereview.

CellularTreeClassiﬁers9

} 0otherwise

lim

E

n→∞

L(gn )= L .

Fig.1. Abinarytree(left)andthecorrespondingpartition(right)

Treeshavebeensuggestedastoolsforclassificationformorethanthirtyyears. WementioninparticulartheearlyworkofFu[36,1,21,18,24].Otherreferences fromthe1970sinclude[20,3,23,30,34,12,8].MostinfluentialintheclassificationtreeliteraturewastheCARTproposalby[5].WhileCARTproposes partitionsbyhyperrectangles,linearhyperplanesingeneralpositionhavealso gainedinpopularity—theearlyworkonthattopicisby[19],and[22].Additional referencesontreeclassificationinclude[14,2,16,17,35,33,31,6,9,10,32,13].

3CellularTrees

Ingeneral,classiﬁcationtreespartition Rd intoregions,oftenhyperrectangles paralleltotheaxes(anexampleisdepictedinFigure1).Ofinterestinthis articlearebinarytrees,whereeachnodehasexactly0or2children.Ifanode u representstheset A anditschildren u1 ,u2 represent A1 ,A2 ,thenitisrequired that A = A1 ∪ A2 and A1 ∩ A2 = ∅.Therootofthetreerepresents Rd ,and theterminalnodes(orleaves),takentogether,formapartitionof Rd .Ifaleaf representsregion A,thenthetreeclassiﬁertakesthesimpleform gn (x)= 1if n i=1 1[Xi ∈A,Yi =1] > n i=1 1[Xi ∈A,Yi =0] , x ∈ A 0otherwise.

Thatis,ineveryleafregion,amajorityvoteistakenoverall(Xi ,Yi )’swith Xi ’s inthesameregion.Tiesarebroken,byconvention,infavorofclass0.

Thetreestructureisusuallydata-dependent,andindeed,itisintheconstructionitselfwheredifferenttreesdiffer.Thus,therearevirtuallyinfinitelymany possiblestrategiestobuildclassificationtrees.Nevertheless,despitethisgreat diversity,alltreespeciesendupwithtwofundamentalquestionsateachnode:

10G.BiauandL.Devroye leaf leaf leaf leaf leaf leaf

① Shouldthenodebesplit?

② Intheaﬃrmative,whatareitschildren?

Thesetwoquestionsaretypicallyansweredusing global informationonthe tree,suchas,forexample,afunctionofthedata Dn ,thelevelofthenodewithin thetree,thesizeofthedatasetand,moregenerally,anyparameterconnected withthestructureofthetree.Thisparametercouldbe,forexample,thetotal number k ofcellsina k -partitiontreeorthepenaltyterminthepruningofthe CARTalgorithm(e.g.,[5]and[11]).

Ourcellulartreesproceedfromadifferentphilosophy.Inshort,acellular treeshould,ateachnode,beabletoanswerquestions ① and ② using local informationonly,withoutanyhelpfromtheothernodes.Inotherwords,each cellcanperformasmanyoperationsasitwishes,provideditusesonlythedata thataretransmittedtoit,regardlessofthegeneralstructureofthetree.Just imaginethatthecalculationstobecarriedoutatthenodesaresenttodifferent computers,eventuallyasynchronously,andthatthesystemarchitectureisso complexthatcomputersdonotcommunicate.Thus,onceacomputerreceives itsdata,ithastomakeitsowndecisionson ① and ② basedonthisdatasubset only,independentlyoftheothersandwithoutknowinganythingoftheoverall edifice.Onceadatasetissplit,itcanbe giventoanothercomputerforfurther splitting,sincetheremainingdatapointshavenoinfluence.

Formally,acellularbinaryclassificationtreeisamachinethatpartitionsthe spacerecursivelyinthefollowingmanner.Witheachnodeweassociateasubset of Rd ,startingwith Rd fortherootnode.Weconsiderbinarytreeclassifiers basedonaclass C ofpossibleBorelsubsetsof Rd thatcanbeusedforsplits.A typicalexampleofsuchaclassisthefamilyofallhyperplanes,ortheclassofall hyperplanesthatareperpendiculartooneoftheaxes.Higherorderpolynomial splittingsurfacescanbeimaginedaswell.Theclassisparametrizedbyavector σ ∈ Rp .Thereisasplittingfunction f (x,σ ), x ∈ Rd ,σ ∈ Rp ,suchthat Rd is partitionedinto A = {x ∈ Rd : f (x,σ ) ≥ 0} and B = {x ∈ Rd : f (x,σ ) < 0}. Formally,acellularsplitcanbeviewedasafamilyofmeasurablemappings (σm )m from(Rd ×{0, 1})m to Rp .Inthismodel, m isthesizeofthedataset transmittedtothecell.Thus,foreachpossibleinputsize m,wehaveamap.In addition,thereisafamilyofmeasurablemappings(θm )m from(Rd ×{0, 1})m to {0, 1} thatindicatedecisions: θm =1indicatesthatasplitshouldbeapplied, while θm =0correspondstoadecisionnottosplit.Inthatcase,thecellacts asaleafnodeinthetree.Wenotethat(θm )m and(σm )m correspondtothe decisionsgivenin ① and ② Letthesetdatasetbe Dn .If θ (Dn )=0,therootcellisfinal,andthespace isnotsplit.Otherwise, Rd issplitinto

A = x ∈ Rd : f (x,σ (Dn )) ≥ 0 and B = x ∈ Rd : f (x,σ (Dn )) < 0 .

Thedata Dn arepartitionedintotwogroups–thefirstgroupcontainsall(Xi ,Yi ), i =1,...,n,forwhich Xi ∈ A,andthesecondgroupallothers.Thegroupsare senttochildcells,andtheprocessisrepeated.When x ∈ Rd needstobeclassified,wefirstdeterminetheuniqueleafset A(x)towhich x belongs,andthen

CellularTreeClassiﬁers11

takevotesamongthe {Yi : Xi ∈ A(x),i =1,...,n}.Classiﬁcationproceedsby amajorityvote,withthemajoritydecidingtheestimate gn (x).Incaseofatie, weset gn (x)=0.

Acellularbinarytreeclassiﬁerissaidtoberandomizedifeachnodeinthe treehasanindependentcopyofauniform[0, 1]randomvariableassociatedwith it,and θ and σ aremappingsthathaveoneextrareal-valuedcomponentin theinput.Forexample,wecouldﬂipanunbiasedcoinateachnodetodecide whether θm =0or θm =1.

4AConsistentCellularTreeClassiﬁer

Atfirstsight,itappearsthattherearenouniversallyconsistentcellulartree classifiers.Considerforexamplecompletebinarytreeswith k fulllevels,i.e., thereare2k leafregions.Wecanhaveconsistencywhen k isallowedtodepend upon n.Anexampleisthemediantree(seeSection20.3in[7]).When d =1, splitbyfindingthemedianelementamongthe Xi ’s,sothatthechildsetshave cardinalitygivenby (n 1)/2 and (n 1)/2 ,where and arethefloor andceilingfunctions.Themedianitselfdoesstaybehindandisnotsentdownto thesubtrees,withanappropriateconventionforbreakingcellboundariesaswell asemptycells.Keepdoingthisfor k rounds—in d dimensions,onecaneither rotatethroughthecoordinatesformediansplitting,orrandomizebyselecting uniformlyatrandomacoordinatetosplitorthogonally.

Thisruleisknowntobeconsistentassoonasthemarginaldistributionsof X arenonatomic,provided k →∞ and k 2k /n → 0.However,thisisnotacellular treeclassifier.Whilewecanindeedspecify σm ,itisimpossibletodefine θm because θm cannotbeafunctionoftheglobalvalueof n.Inotherwords,ifwe weretoapplymediansplittinganddecidetosplitforafixed k ,thentheleaf nodeswouldallcorrespondtoafixedproportionofthedatapoints.Itisclear thatthedecisionsintheleavesareoffwithafairprobabilityifwehave,for example, Y independentof X and P{Y =1} =1/2.Thus,wecannotcreatea cellulartreeclassifierinthismanner.

Inviewoftheprecedingdiscussion,itseemsparadoxicalthatthereindeed existuniversallyconsistentcellulartreeclassiﬁers.(Wenoteherethatweabuse theword“universal”—wewillassumethroughout,tokeepthediscussionata manageablelevel,thatthemarginaldistributionsof X arenonatomic.Butno otherconditionsonthejointdistributionof(X,Y )areimposed.)Ourconstructionfollowsthemediantreeprincipleandusesrandomization.Theoriginalwork onthesolutionappearsin[4].

Fromnowon,tokeepthingssimple,itisassumedthatthemarginaldistributionsof X arenonatomic.Thecellularsplittingmethod σm describedin thissectionmimicsthemediantreeclassiﬁerdiscussedabove.Weﬁrstchoose adimensiontocut,uniformlyatrandomfromthe d dimensions,asrotating throughthedimensionsbylevelnumberwouldviolatethecellularcondition. Theselecteddimensionisthensplitatthedatamedian,justasintheclassical mediantree.Repeatingthisfor k levelsofnodesleadsto2k leafregions.Onany

12G.BiauandL.Devroye

pathoflength k tooneofthe2k leaves,wehaveadeterministicsequenceofcardinalities n0 = n(root),n1 ,n2 ,...,nk .Wealwayshave ni /2 1 ≤ ni+1 ≤ ni /2. Thus,byinduction,oneeasilyshowsthat,forall i,

Inparticular,eachleafhasatleastmax(n/2k 2, 0)pointsandatmost n/2k . Thenoveltyisinthechoiceofthedecisionfunction.Thisfunctionignoresthe dataaltogetherandusesarandomizeddecisionthatisbasedonthesizeofthe input.Moreprecisely,consideranonincreasingfunction ϕ : N → (0, 1]with ϕ(0)= ϕ(1)=1.Cellscorrespondinanaturalwaytosetsof Rd .So,wecan andwillspeakofacell A,where A ⊂ Rd .Thenumberofdatapointsin A is denotedby N (A):

Then,if U istheuniform[0, 1]randomvariableassociatedwiththecell A and theinputtothecellis N (A),thestoppingrule ① takestheform:

① Put θ =0 if

≤ ϕ (N (A))

Inthismanner,weobtainapossiblyinﬁniterandomizedbinarytreeclassiﬁer. Splittingoccurswithprobability1 ϕ(m)oninputsofsize m.Notethatno attemptismadetosplitemptysetsorsingletonsets.Forconsistency,weneed tolookattherandomleafregiontowhich X belongs.Thisisroughlyequivalent tostudyingthedistancefromthatcelltotherootofthetree.

Inthesequel,thenotation un =o(vn )(respectively, un = ω (vn )and un = O(vn ))meansthat un /vn → 0(respectively, vn /un → 0and un ≤ Cvn forsome constant C )as n →∞.Manychoices ϕ(m)=o(1),butnotall,willdoforus. Thenextlemmamakesthingsmoreprecise.

Lemma1. Let β ∈ (0, 1).Deﬁne

Let K (X) denotetherandompathdistancebetweenthecellof X andtherootof thetree.Then lim n→∞ P {K (X) ≥ kn } =

0 if kn = ω (logβ n) 1 if kn =o(logβ n).

Proof. Letusrecallthat,atlevel k ,eachcelloftheunderlyingmediantree containsatleastmax(n/2k 2, 0)pointsandatmost n/2k .Sincethefunction ϕ(.)isnonincreasing,theﬁrstresultfollowsfromthis:

CellularTreeClassiﬁers13

n 2i 2 ≤ ni ≤ n 2i

N (A)= n i=1 1[Xi ∈A] .

U

ϕ

1

β

(m)=

if m< 3

/log

m if m ≥ 3

Turn static files into dynamic content formats.

Create a flipbook