Algorithmic learning theory 25th international conference alt 2014 bled slovenia october 8 10 2014 p

Page 1

Algorithmic Learning Theory 25th International Conference ALT 2014 Bled

Slovenia October 8 10 2014

Proceedings 1st Edition Peter Auer

Visit to download the full and correct content document: https://textbookfull.com/product/algorithmic-learning-theory-25th-international-confere nce-alt-2014-bled-slovenia-october-8-10-2014-proceedings-1st-edition-peter-auer/

More products digital (pdf, epub, mobi) instant download maybe you interests ...

Discovery Science 17th International Conference DS 2014

Bled Slovenia October 8 10 2014 Proceedings 1st Edition Sašo Džeroski

https://textbookfull.com/product/discovery-science-17thinternational-conference-ds-2014-bled-sloveniaoctober-8-10-2014-proceedings-1st-edition-saso-dzeroski/

Hybrid Learning Theory and Practice 7th International Conference ICHL 2014 Shanghai China August 8 10 2014 Proceedings 1st Edition Simon K. S. Cheung

https://textbookfull.com/product/hybrid-learning-theory-andpractice-7th-international-conference-ichl-2014-shanghai-chinaaugust-8-10-2014-proceedings-1st-edition-simon-k-s-cheung/

Adaptive and Intelligent Systems Third International Conference ICAIS 2014 Bournemouth UK September 8 10 2014 Proceedings 1st Edition Abdelhamid Bouchachia (Eds.)

https://textbookfull.com/product/adaptive-and-intelligentsystems-third-international-conference-icais-2014-bournemouth-ukseptember-8-10-2014-proceedings-1st-edition-abdelhamidbouchachia-eds/

Serious Games Development and Applications 5th International Conference SGDA 2014 Berlin Germany October 9 10 2014 Proceedings 1st Edition Minhua Ma

https://textbookfull.com/product/serious-games-development-andapplications-5th-international-conference-sgda-2014-berlingermany-october-9-10-2014-proceedings-1st-edition-minhua-ma/

Formal

Modeling and Analysis of Timed Systems 12th

International Conference FORMATS 2014 Florence Italy

September 8 10 2014 Proceedings 1st Edition Axel Legay

https://textbookfull.com/product/formal-modeling-and-analysis-oftimed-systems-12th-international-conferenceformats-2014-florence-italy-september-8-10-2014-proceedings-1stedition-axel-legay/

Intelligent Data Engineering and Automated Learning

IDEAL 2014 15th International Conference Salamanca

Spain September 10 12 2014 Proceedings 1st Edition

Emilio Corchado

https://textbookfull.com/product/intelligent-data-engineeringand-automated-learning-ideal-2014-15th-international-conferencesalamanca-spain-september-10-12-2014-proceedings-1st-editionemilio-corchado/

Smart Health International Conference ICSH 2014 Beijing

China July 10 11 2014 Proceedings 1st Edition Xiaolong

Zheng

https://textbookfull.com/product/smart-health-internationalconference-icsh-2014-beijing-chinajuly-10-11-2014-proceedings-1st-edition-xiaolong-zheng/

Reversible Computation 6th International Conference RC 2014 Kyoto Japan July 10 11 2014 Proceedings 1st Edition Shigeru Yamashita

https://textbookfull.com/product/reversible-computation-6thinternational-conference-rc-2014-kyoto-japanjuly-10-11-2014-proceedings-1st-edition-shigeru-yamashita/

The Semantic Web ISWC 2014 13th International Semantic Web Conference Riva del Garda Italy October 19 23 2014

Proceedings Part II 1st Edition Peter Mika

https://textbookfull.com/product/the-semantic-web-iswc-2014-13thinternational-semantic-web-conference-riva-del-garda-italyoctober-19-23-2014-proceedings-part-ii-1st-edition-peter-mika/

Peter Auer

Alexander Clark

Thomas Zeugmann

Sandra Zilles (Eds.)

Algorithmic Learning Theory

25th International Conference, ALT 2014 Bled, Slovenia, October 8–10, 2014 Proceedings

123 LNAI 8776

SubseriesofLectureNotesinComputerScience

LNAISeriesEditors

RandyGoebel UniversityofAlberta,Edmonton,Canada

YuzuruTanaka

HokkaidoUniversity,Sapporo,Japan

WolfgangWahlster DFKIandSaarlandUniversity,Saarbrücken,Germany

LNAIFoundingSeriesEditor

JoergSiekmann

DFKIandSaarlandUniversity,Saarbrücken,Germany

LectureNotesinArtificialIntelligence8776

PeterAuerAlexanderClark

ThomasZeugmannSandraZilles(Eds.)

Algorithmic LearningTheory

25thInternationalConference,ALT2014 Bled,Slovenia,October8-10,2014

Proceedings

13

VolumeEditors

PeterAuer

MontanuniversitätLeoben,Austria

E-mail:auer@unileoben.ac.at

AlexanderClark

King’sCollegeLondon,UK DepartmentofPhilosophy

E-mail:alexander.clark@kcl.ac.uk

ThomasZeugmann

HokkaidoUniversity DivisionofComputerScience Sapporo,Japan

E-mail:thomas@ist.hokudai.ac.jp

SandraZilles UniversityofRegina DepartmentofComputerScience Regina,SK,Canada

E-mail:zilles@cs.uregina.ca

ISSN0302-9743e-ISSN1611-3349

ISBN978-3-319-11661-7e-ISBN978-3-319-11662-4

DOI10.1007/978-3-319-11662-4

SpringerChamHeidelbergNewYorkDordrechtLondon

LibraryofCongressControlNumber:2014948640

LNCSSublibrary:SL7–ArtificialIntelligence

©SpringerInternationalPublishingSwitzerland2014

Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped.Exemptedfromthislegalreservationarebriefexcerptsinconnection withreviewsorscholarlyanalysisormaterialsuppliedspecificallyforthepurposeofbeingenteredand executedonacomputersystem,forexclusiveusebythepurchaserofthework.Duplicationofthispublication orpartsthereofispermittedonlyundertheprovisionsoftheCopyrightLawofthePublisher’slocation, inistcurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer.Permissionsforuse maybeobtainedthroughRightsLinkattheCopyrightClearanceCenter.Violationsareliabletoprosecution undertherespectiveCopyrightLaw.

Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Whiletheadviceandinformationinthisbookarebelievedtobetrueandaccurateatthedateofpublication, neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityforanyerrorsor omissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,withrespecttothe materialcontainedherein.

Typesetting: Camera-readybyauthor,dataconversionbyScientificPublishingServices,Chennai,India

Printedonacid-freepaper

SpringerispartofSpringerScience+BusinessMedia(www.springer.com)

Preface

Thisvolumecontainsthepaperspresentedatthe25thInternationalConference onAlgorithmicLearningTheory(ALT2014),whichwasheldinBled,Slovenia, duringOctober8–10,2014.ALT2014wasco-locatedwiththe17thInternational ConferenceonDiscoveryScience(DS2014).ThetechnicalprogramofALT2014 had4invitedtalks(presentedjointlytobothALT2014andDS2014)and21 papersselectedfrom50submissionsbytheALTProgramCommittee.

ALT2014tookplaceinthehotelGolfinabeautifulparkfullofoldtrees intheveryheartofBled.Itprovidedastimulatinginterdisciplinaryforumto discussthetheoreticalfoundationsofmachinelearningaswellastheirrelevance topracticalapplications.

ALTisdedicatedtothetheoreticalfoundationsofmachinelearningandprovidesaforumforhigh-qualitytalksandscientificinteractioninareassuchas reinforcementlearning,inductiveinferenceandgrammaticalinference,learningfromqueries,activelearning,probablyapproximatecorrectlearning,onlinelearning,bandittheory,statisticallearningtheory,Bayesianandstochastic learning,un-supervisedorsemi-supervisedlearning,clustering,universalprediction,stochasticoptimization,highdimensionalandnon-parametricinference,information-basedmethods,decisiontreemethods,kernel-basedmethods, graphmethodsand/ormanifold-basedmethods,samplecomplexity,complexity oflearning,privacypreservinglearning,learningbasedonKolmogorovcomplexity,newlearningmodels,andapplicationsofalgorithmiclearningtheory.

ThepresentvolumeofLNAIcontainsthetextofthe21paperspresentedat ALT2014,aswellasthetexts/abstractsoftheinvitedtalks:

– ZoubinGhahramani(UniversityofCambridge,Cambridge,UK),“BuildinganAutomatedStatistician”(jointinvitedspeakerforALT2014and DS2014)

LucDevroye(McGillUniversity,Montreal,Canada),“CellularTreeClassifiers”(invitedspeakerforALT2014),

– EykeH¨ullermeier(Universit¨atPaderborn,Germany),“ASurveyofPreference-BasedOnlineLearningwithBanditAlgorithms”(tutorialspeakerfor ALT2014),

– AnuˇskaFerligoj(UniversityofLjubljana,Slovenia).“SocialNetworkAnalysis”(tutorialspeakerforDS2014)

Since1999,ALThasbeenawardingtheE.M.GoldAwardforthemostoutstandingstudentcontribution.Thisyear,theawardwasgiventoHasanAbasiand AliZ.Abdifortheirpaper“LearningBooleanHalfspaceswithSmallWeights fromMembershipQueries”co-authoredbyNaderH.Bshouty.

ALT2014wasthe25thmeetingintheALTconferenceseries,establishedin Japanin1990.TheALTseriesissupervisedbyitsSteeringCommittee:Peter

Auer(UniversityofLeoben,Austria),ShaiBen-David(UniversityofWaterloo, Canada),NaderH.Bshouty(Technion-IsraelInstituteofTechnology,Israel), AlexanderClark(King’sCollegeLondon,UK),MarcusHutter(AustralianNationalUniversity,Canberra,Australia),JyrkiKivinen(UniversityofHelsinki, Finland),FrankStephan(NationalUniversityofSingapore,RepublicofSingapore),GillesStoltz(Ecolenormalesup´erieure,Paris,France),CsabaSzepesv´ari (UniversityofAlberta,Edmonton,Canada),EijiTakimoto(KyushuUniversity, Fukuoka,Japan),Gy¨orgyTur´an(UniversityofIllinoisatChicago,USA,and UniversityofSzeged,Hungary),AkihiroYamamoto(KyotoUniversity,Japan), ThomasZeugmann(Chair,HokkaidoUniversity,Sapporo,Japan),andSandra Zilles(Co-chair,UniversityofRegina,Saskatchewan,Canada).

Wethankvariouspeopleandinstitutionswhocontributedtothesuccessof theconference.Mostimportantly,wewouldliketothanktheauthorsforcontributingandpresentingtheirworkattheconference.Withouttheircontribution thisconferencewouldnothavebeenpossible.WewouldliketothanktheOffice ofNavalResearchGlobalforthegenerousfinancialsupportfortheconference ALT2014providedunderONRGGRANTN62909-14-1-C195.

ALT2014andDS2014wereorganizedbytheJoˇzefStefanInstitute(JSI)and theUniversityofLjubljana.WeareverygratefultotheDepartmentofKnowledgeTechnologies(andtheprojectMAESTRA)atJSIforsponsoringtheconferencesandprovidingadministrativesupport.Inparticular,wethankthelocal arrangementchair,MiliBauer,andherteam,TinaAnˇziˇc,NikolaSimidjievski, andJuricaLevati´cfromJSIfortheireffortsinorganizingthetwoconferences.

WearegratefulforthecollaborationwiththeconferenceseriesDiscovery Science.InparticularwewouldliketothankthegeneralchairofDS2014and ALT2014LjupˇcoTodorovskiandtheDS2014ProgramCommitteechairsSaˇso Dˇzeroski,DragiKocev,andPanˇcePanov.

WearealsogratefultoEasyChair,the excellentconferencemanagementsystem,whichwasusedforputtingtogethertheprogramforALT2014.EasyChair wasdevelopedmainlybyAndreiVoronkovandishostedattheUniversityof Manchester.Thesystemiscost-free.

WearegratefultothemembersoftheProgramCommitteeforALT2014 andthesubrefereesfortheirhardworkinselectingagoodprogramforALT 2014.Lastbutnottheleast,wethankSpringerfortheirsupportinpreparing andpublishingthisvolumeintheLectureNotesinArtificialIntelligenceseries.

August2014PeterAuer

VIPreface
AlexanderClark ThomasZeugmann SandraZilles

Organization

GeneralChairforALT2014andDS2014

LjupˇcoTodorovskiUniversityofLjubljana,Slovenia

ProgramCommittee

NirAilonTechnion,Israel

Andr´asAntosSZIT,Hungary

PeterAuer(Chair)Montanuniversit¨atLeoben,Austria

ShaiBen-DavidUniversityofWaterloo,Canada

S´ebastienBubeckPrincetonUniversity,USA

AlexanderClark(Chair)King’sCollegeLondon,UK

CorinnaCortesGoogle,USA

VitalyFeldmanIBMResearch,USA

ClaudioGentileUniversit`adegliStudidell’Insubria, Italy

SteveHannekeCarnegieMellonUniversity,USA

KoheiHatanoKyushuUniversity,Japan

SanjayJainNationalUniversityofSingapore,Singapore

TimoK¨otzingFriedrich-Schiller-Universit¨at,Germany

EricMartinUniversityofNewSouthWales, Australia

MehryarMohriCourantInstituteofMathematicalSciences, USA

R´emiMunosInria,France

RonaldOrtnerMontanuniversit¨atLeoben,Austria

LevReyzinUniversityofIllinoisatChicago,USA

Daniil,RyabkoInria,France

SivanSabatoMicrosoftResearchNewEngland,USA

MasashiSugiyamaTokyoInstituteofTechnology,Japan

CsabaSzepesv´ariUniversityofAlberta,Canada

JohnShawe-TaylorUniversityCollegeLondon,UK

VladimirVovkRoyalHolloway,UniversityofLondon,UK

SandraZillesUniversityofRegina,Canada

LocalArrangementsChair

MiliBauerJoˇzefStefanInstitute,Ljubljana

Subreferees

Abbasi-Yadkori,Yasin Allauzen,Cyril Amin,Kareem ´ AvilaPires,Bernardo Cesa-Bianchi,Nicol` o Chernov,Alexey Ge,Rong Gravin,Nick Kameoka,Hirokazu Kanade,Varun Kanamori,Takafumi Koc´ak,Tom´aˇ s Kuznetsov,Vitaly Lazaric,Alessandro Lever,Guy London,Ben Long,Phil Ma,Yao Maillard,Odalric-Ambrym

SponsoringInstitutions

Mens,Irini-Eleftheria Morimura,Tetsuro Munoz,Andres Nakajima,Shinichi Neu,Gergely Procopiuc,Cecilia Russo,Daniel Sakuma,Jun Semukhin,Pavel Shamir,Ohad Slivkins,Aleksandrs Smith,Adam Syed,Umar Takimoto,Eiji Telgarsky,Matus Wen,Zheng Yamada,Makoto Yaroslavtsev,Grigory Zolotykh,Nikolai

OfficeofNavalResearchGlobal,ONRGGRANTN62909-14-1-C195 JoˇzefStefanInstitute,Ljubljana UniversityofLjubljana

VIIIOrganization

InvitedAbstracts

ASurveyofPreference-BasedOnlineLearning withBanditAlgorithms

DepartmentofComputerScience

UniversityofPaderborn,Germany

{busarobi,eyke}@upb.de

Abstract. Inmachinelearning,thenotionof multi-armedbandits refers toaclassofonlinelearningproblems,inwhichanagentissupposedto simultaneouslyexploreandexploitagivensetofchoicealternativesin thecourseofasequentialdecisionprocess.Inthestandardsetting,the agentlearnsfromstochasticfeedbackintheformofreal-valuedrewards. Inmanyapplications,however,numericalrewardsignalsarenotreadily available—instead,onlyweakerinformationisprovided,inparticularrelativepreferencesintheformofqualitativecomparisonsbetweenpairsof alternatives.Thisobservationhasmotivatedthestudyofvariantsofthe multi-armedbanditproblem,inwhichmoregeneralrepresentationsare usedbothforthetypeoffeedbacktolearnfromandthetargetofprediction.Theaimofthispaperistoprovideasurveyofthestate-of-the-art inthisfield,thatwerefertoas preference-basedmulti-armedbandits.To thisend,weprovideanoverviewofproblemsthathavebeenconsidered intheliteratureaswellasmethodsfortacklingthem.Oursystematizationismainlybasedontheassumptionsmadebythesemethodsabout thedata-generatingprocessand,relatedtothis,thepropertiesofthe preference-basedfeedback.

Keywords: Multi-armedbandits,onlinelearning,preferencelearning, ranking,top-kselection,exploration/exploitation,cumulativeregret,samplecomplexity,PAClearning.

CellularTreeClassifiers

G´erardBiau1,2 andLucDevroye3

1 SorbonneUniversit´es,UPMCUnivParis06,France

2 InstitutuniversitairedeFrance

3 McGillUniversity,Canada

Abstract. Supposethatbinaryclassificationisdonebyatreemethod inwhichtheleavesofatreecorrespondtoapartitionofd-space.Within apartition,amajorityvoteisused.Supposefurthermorethatthistree mustbeconstructedrecursivelybyimplementingjusttwofunctions,so thattheconstructioncanbecarriedoutinparallelbyusing“cells”:first ofall,giveninputdata,acellmustdecidewhetheritwillbecomealeaf oraninternalnodeinthetree.Secondly,ifitdecidesonaninternal node,itmustdecidehowtopartitionthespacelinearly.Dataarethen splitintotwopartsandsentdownstreamtotwonewindependentcells. Wediscussthedesignandpropertiesofsuchclassifiers.

SocialNetworkAnalysis

FacultyofSocialSciences, UniversityofLjubljana

anuska.ferligoj@fdv.uni-lj.si

Abstract. Socialnetworkanalysishasattractedconsiderableinterest fromsocialandbehavioralsciencecommunityinrecentdecades.Much ofthisinterestcanbeattributedtothefocusofsocialnetworkanalysis onrelationshipamongunits,andonthepatternsoftheserelationships. Socialnetworkanalysisisarapidlyexpandingandchangingfieldwith broadrangeofapproaches,methods,modelsandsubstantiveapplications.Inthetalkspecialattentionwillbegivento:

1.Generalintroductiontosocialnetworkanalysis:

– Whataresocialnetworks?

– Datacollectionissues.

– Basicnetworkconcepts:networkrepresentation;typesofnetworks;sizeanddensity.

– Walksandpathsinnetworks:lengthandvalueofpath;theshortestpath, k -neighbours;acyclicnetworks.

– Connectivity:weakly,stronglyandbi-connectedcomponents; contraction;extraction.

2.Overviewoftasksandcorrespondingmethods:

– Network/nodeproperties:centrality(degree,closeness,betweenness);hubsandauthorities.

– Cohesion:triads,cliques,cores,islands.

– Partitioning:blockmodeling(directandindirectapproaches; structural,regularequivalence;generalisedblockmodeling);clustering.

– Statisticalmodels.

3.Softwareforsocialnetworkanalysis(UCINET,PAJEK,...)

BuildinganAutomatedStatistician

DepartmentofEngineering, UniversityofCambridge, TrumpingtonStreet CambridgeCB21PZ,UK

zoubin@eng.cam.ac.uk

Abstract. Wewillliveaneraofabundantdataandthereisanincreasingneedformethodstoautomatedataanalysisandstatistics.Iwilldescribethe“AutomatedStatistician”,aprojectwhichaimstoautomate theexploratoryanalysisandmodellingofdata.Ourapproachstartsby definingalargespaceofrelatedprobabilisticmodelsviaagrammar overmodels,andthenusesBayesianmarginallikelihoodcomputations tosearchoverthisspaceforoneorafewgoodmodelsofthedata.The aimistofindmodelswhichhavebothgoodpredictiveperformance,and aresomewhatinterpretable.Ourinitialworkhasfocusedonthelearning ofunknownnonparametricregressionfunctions,andonlearningmodels oftimeseriesdata,bothusingGaussianprocesses.Onceagoodmodel hasbeenfound,theAutomatedStatisticiangeneratesanaturallanguage summaryoftheanalysis,producinga10-15pagereportwithplotsand tablesdescribingtheanalysis.Iwilldiscusschallengessuchas:howto tradeoffpredictiveperformanceandinterpretability,howtotranslate complexstatisticalconceptsintonaturallanguagetextthatisunderstandablebyanumeratenon-statistician,andhowtointegratemodel checking.ThisisjointworkwithJamesLloydandDavidDuvenaud (Cambridge)andRogerGrosseandJoshTenenbaum(MIT).

R´obertBusa-FeketeandEykeH¨ullermeier

RegularContributions InductiveInference

TimoK¨otzingandRaphaelaPalenta

SanjayJain,TimoK¨otzing,JunqiMa,andFrankStephan ParallelLearningofAutomaticClassesofLanguages

SanjayJainandEfimKinber

LaurentBienvenu,BenoˆıtMonin,andAlexanderShen

ExactLearningfromQueries

HasanAbasi,AliZ.Abdi,andNaderH.Bshouty

OnExactLearningMonotoneDNFfromMembershipQueries .........

HasanAbasi,NaderH.Bshouty,andHannaMazzawi

DanaAngluinandDanaFisman

Editors’Introduction ............................................. 1 PeterAuer,AlexanderClark,ThomasZeugmann,andSandraZilles FullInvitedPapers CellularTreeClassifiers 8
18
TableofContents
G´erardBiauandLucDevroye ASurveyofPreference-BasedOnlineLearningwithBandit Algorithms ......................................................
AMapofUpdateConstraintsinInductiveInference 40
OntheRoleofUpdateConstraintsandText-TypesinIterative Learning ........................................................ 55
70
AlgorithmicIdentificationofProbabilitiesIsHard .................... 85
96
LearningBooleanHalfspaceswithSmallWeightsfromMembership Queries
111
LearningRegularOmegaLanguages ................................ 125

ReinforcementLearning

SelectingNear-OptimalApproximateStateRepresentationsin ReinforcementLearning ...........................................

RonaldOrtner,Odalric-AmbrymMaillard,andDaniilRyabko

PolicyGradientsforCVaR-ConstrainedMDPs

L.A.Prashanth

TorLattimoreandMarcusHutter

ExtremeStateAggregationbeyondMDPs

MarcusHutter

OnlineLearningandLearningwithBandit Information

TorLattimore,Andr´asGy¨orgy,andCsabaSzepesv´ari BanditOnlineOptimizationoverthePermutahedron

NirAilon,KoheiHatano,andEijiTakimoto

MarcusHutter

AndreasMaurer

GeneralizationBoundsforTimeSeriesPredictionwithNon-stationary

VitalyKuznetsovandMehryarMohri

GeneralizingLabeledandUnlabeledSampleCompressiontoMulti-label ConceptClasses .................................................

RahimSamei,BotingYang,andSandraZilles Privacy,Clustering,MDL,andKolmogorov Complexity

RobustandPrivateBayesianInference

ChristosDimitrakakis,BlaineNelson,AikateriniMitrokotsa,and BenjaminI.P.Rubinstein

XVITableofContents
140
....................... 155
BayesianReinforcementLearningwithExploration ................... 170
185
OnLearningtheOptimalWaitingTime ............................ 200
................. 215
OfflinetoOnlineConversion ....................................... 230
245
StatisticalLearningTheory AChainRulefortheExpectedSupremaofGaussianProcesses
Processes ....................................................... 260
275
............................. 291

Clustering,HammingEmbedding,GeneralizedLSHandtheMax Norm

BehnamNeyshabur,YuryMakarychev,andNathanSrebro

JanLeikeandMarcusHutter

PeterBloem,FranciscoMota,StevendeRooij,Lu´ısAntunes,and PieterAdriaans

TableofContentsXVII
306
IndefinitelyOscillatingMartingales ................................. 321
ASafeApproximationforKolmogorovComplexity 336
351
AuthorIndex

Editors’Introduction

PeterAuer,AlexanderClark,ThomasZeugmann,andSandraZilles

TheaimoftheseriesofconferencesonAlgorithmicLearningTheory(ALT) istolookatlearningfromanalgorithmicandmathematicalperspective.Over timeseveralmodelsoflearninghavebeendevelopedwhichstudydifferentaspectsoflearning.Inthefollowingwedescribeinbrieftheinvitedtalksandthe contributedpapersforALT2014heldinBled,Slovenia..

InvitedTalks. Followingthetraditionoftheco-locatedconferencesALTand DSallinvitedlecturesaresharedbythe twoconferences.Theinvitedspeakers areeminentresearchersintheirfieldsandpresenteithertheirspecificresearch areaorlectureaboutatopicofbroaderinterest.

Thisyear’sjointinvitedspeakerforALT2014andDS2014isZoubinGhahramani,whoisProfessorofInformationEngineeringattheUniversityofCambridge,UK,whereheleadsagroupofabout30researchers.Hestudiedcomputer scienceandcognitivescienceattheUniversityofPennsylvania,obtainedhisPhD fromMITin1995underthesupervisionofMichaelJordan,andwasapostdoctoralfellowattheUniversityofTorontowithGeoffreyHinton.Hisacademic careerincludesconcurrentappointmentsasoneofthefoundingmembersofthe GatsbyComputationalNeuroscienceUnitinLondon,andasafacultymemberof CMU’sMachineLearningDepartmentforover10years.HiscurrentresearchfocusesonnonparametricBayesianmodelingandstatisticalmachinelearning.He hasalsoworkedonapplicationstobioinformatics,econometrics,andavarietyof large-scaledatamodelingproblems.He haspublishedover200papers,receiving 25,000citations(anh-indexof68).HisworkhasbeenfundedbygrantsanddonationsfromEPSRC,DARPA,Microsoft,Google,Infosys,Facebook,Amazon, FXConceptsandanumberofotherindustrialpartners.In2013,hereceived a$750,000GoogleAwardforresearchonbuildingtheAutomaticStatistician. Inhisinvitedtalk BuildinganAutomatedStatistician (jointworkwithJames Lloyd,DavidDuvenaud,RogerGrosse,andJoshTenenbaum)ZoubinGhahramaniaddressestheproblemofabundantdataandtheincreasingneedformethodstoautomatedataanalysisandstatistics.TheAutomatedStatisticianproject aimstoautomatetheexploratoryanalysisandmodelingofdata.Theapproach usesBayesianmarginallikelihoodcomputationstosearchoveralargespaceof relatedprobabilisticmodels.Onceagoodmodelhasbeenfound,theAutomated Statisticiangeneratesanaturallanguagesummaryoftheanalysis,producinga 10-15pagereportwithplotsandtablesdescribingtheanalysis.ZoubinGhahramanidiscusseschallengessuchas:howtotradeoffpredictiveperformanceand interpretability,howtotranslatecomplexstatisticalconceptsintonaturallanguagetextthatisunderstandablebyanumeratenon-statistician,andhowto integratemodelchecking.

TheinvitedspeakerforALT2014isLucDevroye,whoisaJamesMcGillProfessorintheSchoolofComputerScienceofMcGillUniversityinMontreal.He

P.Aueretal.(Eds.):ALT2014,LNAI8776,pp.1–7,2014. c SpringerInternationalPublishingSwitzerland2014

studiedatKatholiekeUniversiteitLeuvenandsubsequentlyatOsakaUniversity andin1976receivedhisPhDfromUniversityofTexasatAustinunderthesupervisionofTerryWagner.LucDevroyespecializesintheprobabilisticanalysis ofalgorithms,randomnumbergenerationandenjoystypography.Sincejoining theMcGillfacultyin1977hehaswonnumerousawards,includinganE.W.R. SteacieMemorialFellowship(1987),aHumboldtResearchAward(2004),the KillamPrize(2005)andtheStatisticalSocietyofCanadagoldmedal(2008). Hereceivedanhonorarydocto ratefromtheUniversit´ecatholiquedeLouvain in2002,andanhonorarydoctoratefromUniversiteitAntwerpenin2012.The invitedpaper CellularTreeClassifiers (jointworkwithGerardBiau)dealswith classificationbydecisiontrees,wherethedecisiontreesareconstructedrecursivelybyusingonlytwolocalrules:(1)giventheinputdatatoanode,itmust decidewhetheritwillbecomealeafornot,and(2)anon-leafnodeneedsto decidehowtosplitthedataforsendingthemdownstream.Theimportantpoint isthateachnodecanmakethesedecisionsbasedonlyonitslocaldata,such thatthedecisiontreeconstructioncanbecarriedoutinparallel.Somewhat surprisinglytherearesuchlocalrules thatguaranteeconvergenceofthedecisiontreeerrortotheBayesoptimalerror.LucDevroyediscussesthedesignand propertiesofsuchclassifiers.

TheALT2014tutorialspeakerisEykeH¨ullermeier,whoisprofessorand headoftheIntelligentSystemsGroupattheDepartmentofComputerScience oftheUniversityofPaderborn.HereceivedhisPhDinComputerSciencefrom theUniversityofPaderbornin1997andhealsoholdsanMScdegreeinbusinessinformatics.Hewasaresearcherinartificialintelligence,knowledge-based systems,andstatisticsattheUniversityofPaderbornandtheUniversityof DortmundandaMarieCuriefellowattheInstitutdeRechercheenInformatiquedeToulouse.HehasheldalreadyafullprofessorshipintheDepartment ofMathematicsandComputerScienceatMarburgUniversitybeforerejoining theUniversityofPaderborn.Inhistutorial ASurveyofPreference-basedOnline LearningwithBanditAlgorithms (jointworkwithR´obertBusa-Fekete)Eyke H¨ullermeierreportsonlearningwithbanditfeedbackthatisweakerthanthe usualreal-valuereward.Whenlearningwithbanditfeedbackthelearningalgorithmreceivesfeedbackonlyfromthedecisionsitmakes,butnoinformation fromotheralternatives.Thusthelearningalgorithmneedstosimultaneously exploreandexploitagivensetofalternativesinthecourseofasequentialdecisionprocess.Inmanyapplicationsthefeedbackisnotanumericalrewardsignal butsomeweakerinformation,inparticularrelativepreferencesintheformof qualitativecomparisonsbetweenpairsofalternatives.Thisobservationhasmotivatedthestudyofvariantsofthemulti-armedbanditproblem,inwhichmore generalrepresentationsareusedbothforthetypeoffeedbacktolearnfromand thetargetofprediction.Theaimofthetutorialistoprovideasurveyofthe state-of-the-artinthisareawhichisreferredtoaspreference-basedmulti-armed bandits.Tothisend,EykeH¨ullermeierprovidesanoverviewofproblemsthat havebeenconsideredintheliteratureaswellasmethodsfortacklingthem. Hissystematizationismainlybasedontheassumptionsmadebythesemeth-

2P.Aueretal.

odsaboutthedata-generatingprocessand,relatedtothis,thepropertiesofthe preference-basedfeedback.

TheDS2014tutorialspeakerisAnuˇskaFerligoj,whoisprofessorofMultivariateStatisticalMethodsattheUniversityofLjubljana.SheisaSlovenian mathematicianwhoearnedinternationalrecognitionbyherresearchworkon networkanalysis.Herinterestsincludemultivariateanalysis(constrainedand multicriteriaclustering),socialnetworks(measurementqualityandblockmodeling),andsurveymethodology(reliabilityandvalidityofmeasurement).She isafellowoftheEuropeanAcademyofSociology.ShehasalsobeenaneditorofthejournalAdvancesinMethodologyandStatistics(Metodoloskizvezki) since2004andisamemberoftheeditorialboardsoftheJournalofMathematicalSociology,JournalofClassification,SocialNetworks,StatisticinTransition, Methodology,StructureandDynamics:eJournalofAnthropologyandRelated Sciences.ShewasaFulbrightscholarin1990andvisitingprofessorattheUniversityofPittsburgh.ShewasawardedthetitleofAmbassadorofScienceofthe RepublicofSloveniain1997.Socialnetworkanalysishasattractedconsiderable interestfromthesocialandbehavioralsciencecommunityinrecentdecades. Muchofthisinterestcanbeattributedtothefocusofsocialnetworkanalysis onrelationshipamongunits,andonthepatternsoftheserelationships.Social networkanalysisisarapidlyexpandingandchangingfieldwithbroadrangeof approaches,methods,modelsandsubstantiveapplications.Inhertutorial Social NetworkAnalysis AnuˇskaFerligojgivesageneralintroductiontosocialnetwork analysisandanoverviewoftasksandcorrespondingmethods,accompaniedby pointerstosoftwareforsocialnetworkanalysis.

InductiveInference. Thereareanumberofpapersinthefieldofinductive inference,themostclassicalbranchofalgorithmiclearningtheory.First, AMap ofUpdateConstraintsinInductiveInference byTimoK¨otzingandRaphaela Palentaprovidesasystematicoverviewofvariousconstraintsonlearnersin inductiveinferenceproblems.Theyfocusonthequestionofwhichconstraints andcombinationsofconstraintsreducethelearningpower,meaningtheclassof languagesthatarelearnablewithrespecttocertaincriteria.

Onarelatedtheme,thepaper OntheRoleofUpdateConstraintsandTextTypesinIterativeLearning bySanjayJain,TimoK¨otzing,JunqiMa,andFrank Stephanlooksmorespecificallyatthecasewherethelearnerhasnomemory beyondthecurrenthypothesis.Inthissituationthepaperisabletocompletely characterizetherelationsbetweenthevariousconstraints.

Thepaper ParallelLearningofAutomaticClassesofLanguages bySanjay JainandEfimKinbercontinuesthelineofresearchonlearningautomaticclasses oflanguagesinitiatedbyJain,LuoandStephanin2012,inthiscasebyconsideringtheproblemoflearningmultipledistinctlanguagesatthesametime.

LaurentBienvenu,BenoˆıtMoninandAlexanderShenpresentanegativeresult intheirpaper AlgorithmicIdentificationofProbabilitiesisHard.Theyshowthat itisimpossibletoidentifyinthelimittheexactparameter—inthesenseofthe Turingcodeforacomputablerealnumber—ofaBernoullidistribution,though itisofcourseeasytoapproximateit.

Editors’Introduction3

ExactLearningfromQueries. Incaseswheretheinstancespaceisdiscrete, itisreasonabletoaimatexactlearningalgorithmswherethelearnerisrequired toproduceahypothesisthatisexactlycorrect.

ThepaperwinningtheE.M.GoldAward, LearningBooleanHalfspaceswith SmallWeightsfromMembershipQueries bythestudentauthorsHasanAbasi andAliZ.Abdiandco-authoredbyNaderH.Bshouty,presentsasignificantlyimprovedalgorithmforlearningBooleanHalfspacesin {0, 1}n withinteger weights {0,...,t} frommembershipqueriesonly.Itisshownthatthisalgorithm needsonly nO (t) membershipqueries,whichimprovesoverpreviousalgorithms with nO (t5 ) queriesandclosesthegaptotheknownlowerbound nt

ThepaperbyHasanAbasi,NaderH.BshoutyandHannaMazzawi OnExact LearningMonotoneDNFfromMembershipQueries presentslearningresultson learnabilitybymembershipqueriesofmonotoneDNF(disjunctivenormalforms) withaboundednumberoftermsandaboundednumberofvariablesperterm.

DanaAngluinandDanaFismanlookatexactlearningusingmembership queriesandequivalencequeriesintheirpaper LearningRegularOmegaLanguages.Heretheclassconcernedisthatofregularlanguagesoverinfinitewords; theauthorsconsiderthreedifferentrepresentationswhichvaryintheirsuccinctness.Thisproblemhasapplicationsinverificationandsynthesisofreactive systems.

ReinforcementLearning. Reinforcementlearningcontinuestobeacentrally importantareaoflearningtheory,andthisconferencecontainsanumberof contributionsinthisfield.RonaldOrtner,Odalric-AmbrymMaillardandDaniil Ryabkopresentapaper SelectingNear-OptimalApproximateStateRepresentationsinReinforcementLearning,whichlooksattheproblemwherethelearner doesnothavedirectinformationaboutthestatesintheunderlyingMarkov DecisionProcess(MDP);incontrasttoPartiallyObservableMDPs,herethe informationisviavariousmodelsthatmapthehistoriestostates.

L.A.Prashanthconsidersriskconstrainedreinforcementlearninginhispaper PolicyGradientsforCVaR-ConstrainedMDPs,focusingonthestochastic shortestpathproblem.Forariskconstrainedproblemnotonlytheexpected sumofcostsperstep E[ m g (sm ,am )]istobeminimized,butalsothesumof anadditionalcostmeasure C = m c(sm ,am )needstobeboundedfromabove. UsuallytheValueatRisk,VaRα =inf {ξ |P(C ≤ ξ ) ≥ α},isconstrained,but suchconstrainedproblemsarehardto optimize.Instead,thepaperproposes toconstraintheConditionalValueatRisk,CVaRα = E[C |C ≥ VaRα ],which allowstoapplystandardoptimizationtechniques.Twoalgorithmsarepresented thatconvergetoalocallyrisk-optimalpolicyusingstochasticapproximation, minibatches,policygradients,andimportancesampling.

IncontrasttotheusualMDPsettingforreinforcementlearning,thetwofollowingpapersconsidermoregeneralreinforcementlearning. BayesianReinforcementLearningwithExploration byTorLattimoreandMarcusHutterimproves someoftheirearlierworkongeneralreinforcementlearning.HerethetrueenvironmentdoesnotneedtobeMarkovian,butitisknowntobedrawnatrandom fromafiniteclassofpossibleenvironments.Analgorithmispresentedthatalter-

4P.Aueretal.

natesbetweenperiodsofplayingtheBayesoptimalpolicyandperiodsofforced experimentation.Upperboundsonthesamplecomplexityareestablished,and itisshownthatforsomeclassesofenvironmentsthisboundcannotbeimproved bymorethanalogarithmicfactor.

MarcusHutter’spaper ExtremeStateAggregationbeyondMDPs considers howanarbitrary(non-Markov)decisionprocesswithafinitenumberofactions canbeapproximatedbyafinite-stateMDP.Foragivenfeaturefunction φ : H → S mappinghistories h ofthegeneralprocesstosomefinitestatespace S , thetransitionprobabilitiesoftheMDPcanbedefinedappropriately.Itisshown thattheMDPapproximatesthegeneralprocesswelliftheoptimalpolicyfor thegeneralprocessisconsistentwiththefeaturefunction, π ∗ (h1 )= π ∗ (h2 )for φ(h1 )= φ(h2 ),oriftheoptimalQ-valuefunctionisconsistentwiththefeature function, |Q∗ (h1 ,a) Q∗ (h2 ,a)| <ε for φ(h1 )= φ(h2 )andall a.Itisalsoshown thatsuchafeaturefunctionalwaysexists.

OnlineLearningandLearningwithBanditInformation.

Thepaper OnLearningtheOptimalWaitingTime byTorLattimore,Andr´asGy¨orgy,and CsabaSzepesv´ari,addressestheproblemofhowlongtowaitforaneventwith independentandidenticallydistributed(i.i.d.)arrivaltimesfromanunknown distribution.Iftheeventoccursduringthewaitingtime,thenthecostisthe timeuntilarrival.Iftheeventoccursafterthewaitingtime,thenthecostisthe waitingtimeplusafixedandknownamount.Algorithmsforthefullinformation settingandforbanditinformationarepresentedthatsequentiallychoosewaiting timesoverseveralroundsinordertominimizetheregretinrespecttoanoptimal waitingtime.Forbanditinformationthearrivaltimeisonlyrevealedifitis smallerthanthewaitingtime,andinthefullinformationsettingitisrevealed always.Theperformanceofthealgorithmsnearlymatchestheminimaxlower boundontheregret.

Inmanyapplicationareas,e.g.recommendationsystems,thelearningalgorithmshouldreturnaranking:apermutationofsomefinitesetofelements.This problemisstudiedinthepaperbyNirAilon,KoheiHatano,andEijiTakimoto titled BanditOnlineOptimizationOverthePermutahedron whenthecostofa rankingsiscalculatedas n i=1 π (i)s(i),where π (i)istherankofitem i and s(i) isitscost.Inthebanditsettingineachiterationanunknowncostvector st is chosen,andthegoalofthealgorithmistominimizetheregretinrespecttothe bestfixedrankingoftheitems.

MarcusHutter’spaper OfflinetoOnlineConversion introducestheproblem ofturningasequenceofdistributions qn onstringsin X n , n =1,...,n,intoa stochasticonlinepredictorforthenextsymbol˜ q (xn |x1 ,...,xn 1 ),suchthatthe inducedprobabilities˜ q (x1 ,...,xn )arecloseto qn (x1 ,...,xn )forallsequences x1 ,x2 ,... Thepaperconsidersfourstrategiesfordoingsuchaconversion,showingthatna¨ıveapproachesmightnotbesatisfactorybutthatagoodpredictor canalwaysbeconstructed,atthecostofpossiblecomputationalinefficiency. Oneexamplesofsuchaconversiongivesasimplecombinatorialderivationof theGood-Turingestimator.

Editors’Introduction5

StatisticalLearningTheory.

AndreasMaurer’spaper AChainRuleforthe ExpectedSupremaofGaussianProcesses investigatestheproblemofassessing generalizationofalearnerwhoisadaptingafeaturespacewhilealsolearning thetargetfunction.Theapproachtakenistoconsiderextensionsofboundson Gaussianaveragestothecasewherethereisaclassoffunctionsthatcreate featuresandaclassofmappingsfromthosefeaturestooutputs.Intheapplicationsconsideredinthepaperthiscorrespondstoatwolayerkernelmachine, multitasklearning,andthroughaniterationoftheapplicationoftheboundto multilayernetworksanddeeplearners.

Astandardassumptioninstatisticallearningtheoryisthatthedataaregeneratedindependentlyandidenticallydistributedfromsomefixeddistribution; inpractice,thisassumptionisoftenviolatedandamorerealisticassumptionis thatthedataaregeneratedbyaprocesswhichisonlysufficientlyfastmixing, andmaybeevennon-stationary.VitalyKuznetsovandMehryarMohriintheir paper GeneralizationBoundsforTimeSeriesPredictionwithNon-stationary Processes considerthiscaseandareabletoprovenewgeneralizationbounds thatdependonthemixingcoefficientsandtheshiftofthedistribution.

RahimSamei,BotingYang,andSandraZillesintheirpaper Generalizing LabeledandUnlabeledSampleCompressiontoMulti-labelConceptClasses considergeneralizationsofthebinaryVC-dimensiontomulti-labelclassification, suchthatmaximumclassesofdimension d allowatightcompressionschemeof size d.Sufficientconditionsfornotionsofdimensionswiththispropertyarederived,anditisshownthatsomemulti-labelgeneralizationsoftheVC-dimension allowtightcompressionschemes,whileothergeneralizationsdonot.

Privacy,Clustering,MDL,andKolmogorovComplexity. ChristosDimitrakakis,BlaineNelson,AikateriniMitrokotsa,andBenjaminI.P.Rubinstein presentthepaper RobustandPrivateBayesianInference.Thispaperlooksat theproblemofprivacyinmachinelearning,whereanagent,astatisticianfor example,mightwanttorevealinformationderivedfromadataset,butwithout revealinginformationabouttheparticulardatapointsintheset,whichmight containconfidentialinformation.Theauthorsshowthatitispossibletodo Bayesianinferenceinthissetting,satisfyingdifferentialprivacy,providedthat thelikelihoodsandconjugatepriorssatisfysomeproperties.

BehnamNeyshabur,YuryMakarychev,andNathanSrebrointheirpaper Clustering,HammingEmbedding,GeneralizedLSHandtheMaxNorm lookat asymmetriclocalitysensitivehashing(LSH)whichisusefulinmanytypesof machinelearningapplications.Localitysensitivehashing,whichiscloselyrelatedtotheproblemofclustering,isamethodofprobabilisticallyreducingthe dimensionofhighdimensionaldatasets;assigningeachdatapointahashsuch thatsimilardatapointswillbemappedtothesamehash.Thepapershowsthat byshiftingtoco-clusteringandasymmetricLSHtheproblemadmitsatractable relaxation.

JanLeikeandMarcusHutterlookatmartingaletheoryin IndefinitelyOscillatingMartingales ;asaconsequenceoftheiranalysistheyshowanegativeresultinthetheoryofMinimumDescriptionLength(MDL)learning,namelythat

6P.Aueretal.

theMDLestimatorisingeneralinductivelyinconsistent:itwillnotnecessarilyconverge.TheMDLestimatorgivestheregularizedcodelength,MDL(u)= minQ {Q(u)+ K (Q)},where Q isacodingfunction, K (Q)itscomplexity,and Q(u)thecodelengthforthestring u.Itisshownthatthefamilyofcoding functions Q canbeconstructedsuchthatlimn→∞ MDL(u1:n )doesnotconverge formostinfinitewords u

Asiswell-known,theKolmogorovcomplexityisnotcomputable.PeterBloem, FranciscoMota,StevendeRooij,Lu´ısAntunes,andPieterAdriaansintheir paper ASafeApproximationforKolmogorovComplexity studytheproblemof approximatingthisquantityusingarestrictiontoaparticularclassofmodels, andaprobabilisticboundontheapproximationerror.

Editors’Introduction7

CellularTreeClassifiers

1 SorbonneUniversit´es,UPMCUnivParis06,France

2 InstitutuniversitairedeFrance

3 McGillUniversity,Canada

Abstract. Supposethatbinaryclassificationisdonebyatreemethod inwhichtheleavesofatreecorrespondtoapartitionofd-space.Within apartition,amajorityvoteisused.Supposefurthermorethatthistree mustbeconstructedrecursivelybyimplementingjusttwofunctions,so thattheconstructioncanbecarriedoutinparallelbyusing“cells”:first ofall,giveninputdata,acellmustdecidewhetheritwillbecomealeaf oraninternalnodeinthetree.Secondly,ifitdecidesonaninternal node,itmustdecidehowtopartitionthespacelinearly.Dataarethen splitintotwopartsandsentdownstreamtotwonewindependentcells. Wediscussthedesignandpropertiesofsuchclassifiers.

1Introduction

Weexploreinthisnoteanewwayofdealingwiththesupervisedclassification problem,inspiredbygreedyapproachesandthedivide-and-conquerphilosophy. Ourpointofviewisnovel,buthasawidereachinaworldinwhichparallel anddistributedcomputationareimportant.Intheshortterm,parallelismwill takeholdinmassivedatasetsandcomplexsystemsand,assuch,isoneofthe excitingquestionsthatwillbeaskedtothestatisticsandmachinelearningfields.

Thegeneralcontextisthatofclassificationtrees,whichmakedecisionsby recursivelypartitioning Rd intoregions,sometimescalledcells.Inthemodelwe promote,abasiccomputationalunitinclassification,acell,takesasinputtrainingdata,andmakesadecisionwhetheramajorityruleshouldbelocallyapplied. Inthenegative,thedatashouldbesplitandeachpartofthepartitionshould betransmittedtoanothercell.Whatisoriginalinourapproachisthatallcells mustuse exactly thesameprotocoltomaketheirdecision—theirfunctionis notalteredbyexternalinputsorglobalparameters.Inotherwords,thedecision tosplitdependsonlyuponthedatapresentedtothecell,independentlyofthe overalledifice.Classifiersdesignedaccordingtothisautonomousprinciplewill becalledcellulartreeclassifiers,orsimplycellularclassifiers.

Decisiontreelearningisamethodcommonlyusedindatamining(see,e.g., [27]).Forexample,inCART(ClassificationandRegressionTrees,[5]),splitsare madeperpendiculartotheaxesbasedonthenotionofGiniimpurity.Splitsare performeduntilalldataareisolated.Inasecondphase,nodesarerecombined fromthebottom-upinaprocesscalledpruning.Itisthissecondprocessthat makestheCARTtreesnon-cellular,asglobalinformationissharedtomanage

P.Aueretal.(Eds.):ALT2014,LNAI8776,pp.8–17,2014.

c SpringerInternationalPublishingSwitzerland2014

therecombinationprocess.Quinlan’sC4.5[26]alsoprunes.Otherssplituntilall nodesorcellsarehomogeneous(i.e.,havethesameclass)—theprimeexampleis Quinlan’sID3[25].Thisstrategy,whilecompliantwiththecellularframework, leadstonon-consistentrules,aswepointoutinthepresentpaper.Infact,the choiceofagoodstoppingrulefordecisiontreesisveryhard—wewerenotable tofindanyintheliteraturethatguaranteeconvergencetotheBayeserror.

2TreeClassifiers

Inthedesignofclassifiers,wehaveanunknowndistributionofarandomprototypepair(X,Y ),where X takesvaluesin Rd and Y takesonlyfinitelymany values,say0or1forsimplicity.Classicalpatternrecognitiondealswithpredictingtheunknownnature Y oftheobservation X viaameasurableclassifier g : Rd →{0, 1}.Wemakeamistakeif g (X)differsfrom Y ,andtheprobability oferrorforaparticulardecisionrule g is L(g )= P{g (X) = Y }.TheBayes classifier g (x)= 1if P{Y =1|X = x} > P{Y =0|X = x

hasthesmallestprobabilityoferror,thatis

L = L(g )=inf g :Rd →{0,1} P{g (X) = Y }

(see,forinstance,Theorem2.1in[7]).However,mostofthetime,thedistribution of(X,Y )isunknown,sothattheoptimaldecision g isunknowntoo.Wedo notconsultanexperttotrytoreconstruct g ,buthaveaccesstoadatabase Dn =(X1 ,Y1 ),..., (Xn ,Yn )ofi.i.d.copiesof(X,Y ),observedinthepast.We assumethat Dn and(X,Y )areindependent.Inthiscontext,aclassification rule gn (x; Dn )isaBorelmeasurablefunctionof x and Dn ,anditattemptsto estimate Y from x and Dn .Forsimplicity,wesuppress Dn inthenotationand write gn (x)insteadof gn (x; Dn ).

Theprobabilityoferrorofagivenclassifier gn istherandomvariable L(gn )= P{gn (X) = Y |Dn }, andtheruleisconsistentif

Itisuniversallyconsistentifitisconsistentforallpossibledistributionsof (X,Y ).Manypopularclassifiersareuniversallyconsistent.Theseincludeseveralbrandsofhistogramrules, k -nearestneighborrules,kernelrules,neural networks,andtreeclassifiers.Therear etoomanyreferencestobecitedhere, butthemonographsby[7]and[15]willprovidethereaderwithacomprehensive introductiontothedomainandaliteraturereview.

CellularTreeClassifiers9
} 0otherwise
lim
E
n→∞
L(gn )= L .

Fig.1. Abinarytree(left)andthecorrespondingpartition(right)

Treeshavebeensuggestedastoolsforclassificationformorethanthirtyyears. WementioninparticulartheearlyworkofFu[36,1,21,18,24].Otherreferences fromthe1970sinclude[20,3,23,30,34,12,8].MostinfluentialintheclassificationtreeliteraturewastheCARTproposalby[5].WhileCARTproposes partitionsbyhyperrectangles,linearhyperplanesingeneralpositionhavealso gainedinpopularity—theearlyworkonthattopicisby[19],and[22].Additional referencesontreeclassificationinclude[14,2,16,17,35,33,31,6,9,10,32,13].

3CellularTrees

Ingeneral,classificationtreespartition Rd intoregions,oftenhyperrectangles paralleltotheaxes(anexampleisdepictedinFigure1).Ofinterestinthis articlearebinarytrees,whereeachnodehasexactly0or2children.Ifanode u representstheset A anditschildren u1 ,u2 represent A1 ,A2 ,thenitisrequired that A = A1 ∪ A2 and A1 ∩ A2 = ∅.Therootofthetreerepresents Rd ,and theterminalnodes(orleaves),takentogether,formapartitionof Rd .Ifaleaf representsregion A,thenthetreeclassifiertakesthesimpleform gn (x)= 1if n i=1 1[Xi ∈A,Yi =1] > n i=1 1[Xi ∈A,Yi =0] , x ∈ A 0otherwise.

Thatis,ineveryleafregion,amajorityvoteistakenoverall(Xi ,Yi )’swith Xi ’s inthesameregion.Tiesarebroken,byconvention,infavorofclass0.

Thetreestructureisusuallydata-dependent,andindeed,itisintheconstructionitselfwheredifferenttreesdiffer.Thus,therearevirtuallyinfinitelymany possiblestrategiestobuildclassificationtrees.Nevertheless,despitethisgreat diversity,alltreespeciesendupwithtwofundamentalquestionsateachnode:

10G.BiauandL.Devroye leaf leaf leaf leaf leaf leaf

① Shouldthenodebesplit?

② Intheaffirmative,whatareitschildren?

Thesetwoquestionsaretypicallyansweredusing global informationonthe tree,suchas,forexample,afunctionofthedata Dn ,thelevelofthenodewithin thetree,thesizeofthedatasetand,moregenerally,anyparameterconnected withthestructureofthetree.Thisparametercouldbe,forexample,thetotal number k ofcellsina k -partitiontreeorthepenaltyterminthepruningofthe CARTalgorithm(e.g.,[5]and[11]).

Ourcellulartreesproceedfromadifferentphilosophy.Inshort,acellular treeshould,ateachnode,beabletoanswerquestions ① and ② using local informationonly,withoutanyhelpfromtheothernodes.Inotherwords,each cellcanperformasmanyoperationsasitwishes,provideditusesonlythedata thataretransmittedtoit,regardlessofthegeneralstructureofthetree.Just imaginethatthecalculationstobecarriedoutatthenodesaresenttodifferent computers,eventuallyasynchronously,andthatthesystemarchitectureisso complexthatcomputersdonotcommunicate.Thus,onceacomputerreceives itsdata,ithastomakeitsowndecisionson ① and ② basedonthisdatasubset only,independentlyoftheothersandwithoutknowinganythingoftheoverall edifice.Onceadatasetissplit,itcanbe giventoanothercomputerforfurther splitting,sincetheremainingdatapointshavenoinfluence.

Formally,acellularbinaryclassificationtreeisamachinethatpartitionsthe spacerecursivelyinthefollowingmanner.Witheachnodeweassociateasubset of Rd ,startingwith Rd fortherootnode.Weconsiderbinarytreeclassifiers basedonaclass C ofpossibleBorelsubsetsof Rd thatcanbeusedforsplits.A typicalexampleofsuchaclassisthefamilyofallhyperplanes,ortheclassofall hyperplanesthatareperpendiculartooneoftheaxes.Higherorderpolynomial splittingsurfacescanbeimaginedaswell.Theclassisparametrizedbyavector σ ∈ Rp .Thereisasplittingfunction f (x,σ ), x ∈ Rd ,σ ∈ Rp ,suchthat Rd is partitionedinto A = {x ∈ Rd : f (x,σ ) ≥ 0} and B = {x ∈ Rd : f (x,σ ) < 0}. Formally,acellularsplitcanbeviewedasafamilyofmeasurablemappings (σm )m from(Rd ×{0, 1})m to Rp .Inthismodel, m isthesizeofthedataset transmittedtothecell.Thus,foreachpossibleinputsize m,wehaveamap.In addition,thereisafamilyofmeasurablemappings(θm )m from(Rd ×{0, 1})m to {0, 1} thatindicatedecisions: θm =1indicatesthatasplitshouldbeapplied, while θm =0correspondstoadecisionnottosplit.Inthatcase,thecellacts asaleafnodeinthetree.Wenotethat(θm )m and(σm )m correspondtothe decisionsgivenin ① and ② Letthesetdatasetbe Dn .If θ (Dn )=0,therootcellisfinal,andthespace isnotsplit.Otherwise, Rd issplitinto

A = x ∈ Rd : f (x,σ (Dn )) ≥ 0 and B = x ∈ Rd : f (x,σ (Dn )) < 0 .

Thedata Dn arepartitionedintotwogroups–thefirstgroupcontainsall(Xi ,Yi ), i =1,...,n,forwhich Xi ∈ A,andthesecondgroupallothers.Thegroupsare senttochildcells,andtheprocessisrepeated.When x ∈ Rd needstobeclassified,wefirstdeterminetheuniqueleafset A(x)towhich x belongs,andthen

CellularTreeClassifiers11

takevotesamongthe {Yi : Xi ∈ A(x),i =1,...,n}.Classificationproceedsby amajorityvote,withthemajoritydecidingtheestimate gn (x).Incaseofatie, weset gn (x)=0.

Acellularbinarytreeclassifierissaidtoberandomizedifeachnodeinthe treehasanindependentcopyofauniform[0, 1]randomvariableassociatedwith it,and θ and σ aremappingsthathaveoneextrareal-valuedcomponentin theinput.Forexample,wecouldflipanunbiasedcoinateachnodetodecide whether θm =0or θm =1.

4AConsistentCellularTreeClassifier

Atfirstsight,itappearsthattherearenouniversallyconsistentcellulartree classifiers.Considerforexamplecompletebinarytreeswith k fulllevels,i.e., thereare2k leafregions.Wecanhaveconsistencywhen k isallowedtodepend upon n.Anexampleisthemediantree(seeSection20.3in[7]).When d =1, splitbyfindingthemedianelementamongthe Xi ’s,sothatthechildsetshave cardinalitygivenby (n 1)/2 and (n 1)/2 ,where and arethefloor andceilingfunctions.Themedianitselfdoesstaybehindandisnotsentdownto thesubtrees,withanappropriateconventionforbreakingcellboundariesaswell asemptycells.Keepdoingthisfor k rounds—in d dimensions,onecaneither rotatethroughthecoordinatesformediansplitting,orrandomizebyselecting uniformlyatrandomacoordinatetosplitorthogonally.

Thisruleisknowntobeconsistentassoonasthemarginaldistributionsof X arenonatomic,provided k →∞ and k 2k /n → 0.However,thisisnotacellular treeclassifier.Whilewecanindeedspecify σm ,itisimpossibletodefine θm because θm cannotbeafunctionoftheglobalvalueof n.Inotherwords,ifwe weretoapplymediansplittinganddecidetosplitforafixed k ,thentheleaf nodeswouldallcorrespondtoafixedproportionofthedatapoints.Itisclear thatthedecisionsintheleavesareoffwithafairprobabilityifwehave,for example, Y independentof X and P{Y =1} =1/2.Thus,wecannotcreatea cellulartreeclassifierinthismanner.

Inviewoftheprecedingdiscussion,itseemsparadoxicalthatthereindeed existuniversallyconsistentcellulartreeclassifiers.(Wenoteherethatweabuse theword“universal”—wewillassumethroughout,tokeepthediscussionata manageablelevel,thatthemarginaldistributionsof X arenonatomic.Butno otherconditionsonthejointdistributionof(X,Y )areimposed.)Ourconstructionfollowsthemediantreeprincipleandusesrandomization.Theoriginalwork onthesolutionappearsin[4].

Fromnowon,tokeepthingssimple,itisassumedthatthemarginaldistributionsof X arenonatomic.Thecellularsplittingmethod σm describedin thissectionmimicsthemediantreeclassifierdiscussedabove.Wefirstchoose adimensiontocut,uniformlyatrandomfromthe d dimensions,asrotating throughthedimensionsbylevelnumberwouldviolatethecellularcondition. Theselecteddimensionisthensplitatthedatamedian,justasintheclassical mediantree.Repeatingthisfor k levelsofnodesleadsto2k leafregions.Onany

12G.BiauandL.Devroye

pathoflength k tooneofthe2k leaves,wehaveadeterministicsequenceofcardinalities n0 = n(root),n1 ,n2 ,...,nk .Wealwayshave ni /2 1 ≤ ni+1 ≤ ni /2. Thus,byinduction,oneeasilyshowsthat,forall i,

Inparticular,eachleafhasatleastmax(n/2k 2, 0)pointsandatmost n/2k . Thenoveltyisinthechoiceofthedecisionfunction.Thisfunctionignoresthe dataaltogetherandusesarandomizeddecisionthatisbasedonthesizeofthe input.Moreprecisely,consideranonincreasingfunction ϕ : N → (0, 1]with ϕ(0)= ϕ(1)=1.Cellscorrespondinanaturalwaytosetsof Rd .So,wecan andwillspeakofacell A,where A ⊂ Rd .Thenumberofdatapointsin A is denotedby N (A):

Then,if U istheuniform[0, 1]randomvariableassociatedwiththecell A and theinputtothecellis N (A),thestoppingrule ① takestheform:

① Put θ =0 if

≤ ϕ (N (A))

Inthismanner,weobtainapossiblyinfiniterandomizedbinarytreeclassifier. Splittingoccurswithprobability1 ϕ(m)oninputsofsize m.Notethatno attemptismadetosplitemptysetsorsingletonsets.Forconsistency,weneed tolookattherandomleafregiontowhich X belongs.Thisisroughlyequivalent tostudyingthedistancefromthatcelltotherootofthetree.

Inthesequel,thenotation un =o(vn )(respectively, un = ω (vn )and un = O(vn ))meansthat un /vn → 0(respectively, vn /un → 0and un ≤ Cvn forsome constant C )as n →∞.Manychoices ϕ(m)=o(1),butnotall,willdoforus. Thenextlemmamakesthingsmoreprecise.

Lemma1. Let β ∈ (0, 1).Define

Let K (X) denotetherandompathdistancebetweenthecellof X andtherootof thetree.Then lim n→∞ P {K (X) ≥ kn } =

0 if kn = ω (logβ n) 1 if kn =o(logβ n).

Proof. Letusrecallthat,atlevel k ,eachcelloftheunderlyingmediantree containsatleastmax(n/2k 2, 0)pointsandatmost n/2k .Sincethefunction ϕ(.)isnonincreasing,thefirstresultfollowsfromthis:

CellularTreeClassifiers13
n 2i 2 ≤ ni ≤ n 2i
N (A)= n i=1 1[Xi ∈A] .
U
ϕ
1
1
β
(m)=
if m< 3
/log
m if m ≥ 3

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.