Algorithmic Learning Theory 25th International Conference ALT 2014 Bled
Slovenia October 8 10 2014
Proceedings 1st Edition Peter Auer
Visit to download the full and correct content document: https://textbookfull.com/product/algorithmic-learning-theory-25th-international-confere nce-alt-2014-bled-slovenia-october-8-10-2014-proceedings-1st-edition-peter-auer/
More products digital (pdf, epub, mobi) instant download maybe you interests ...
Discovery Science 17th International Conference DS 2014
Bled Slovenia October 8 10 2014 Proceedings 1st Edition Sašo Džeroski
https://textbookfull.com/product/discovery-science-17thinternational-conference-ds-2014-bled-sloveniaoctober-8-10-2014-proceedings-1st-edition-saso-dzeroski/
Hybrid Learning Theory and Practice 7th International Conference ICHL 2014 Shanghai China August 8 10 2014 Proceedings 1st Edition Simon K. S. Cheung
https://textbookfull.com/product/hybrid-learning-theory-andpractice-7th-international-conference-ichl-2014-shanghai-chinaaugust-8-10-2014-proceedings-1st-edition-simon-k-s-cheung/
Adaptive and Intelligent Systems Third International Conference ICAIS 2014 Bournemouth UK September 8 10 2014 Proceedings 1st Edition Abdelhamid Bouchachia (Eds.)
https://textbookfull.com/product/adaptive-and-intelligentsystems-third-international-conference-icais-2014-bournemouth-ukseptember-8-10-2014-proceedings-1st-edition-abdelhamidbouchachia-eds/
Serious Games Development and Applications 5th International Conference SGDA 2014 Berlin Germany October 9 10 2014 Proceedings 1st Edition Minhua Ma
https://textbookfull.com/product/serious-games-development-andapplications-5th-international-conference-sgda-2014-berlingermany-october-9-10-2014-proceedings-1st-edition-minhua-ma/
Formal
Modeling and Analysis of Timed Systems 12th
International Conference FORMATS 2014 Florence Italy
September 8 10 2014 Proceedings 1st Edition Axel Legay
https://textbookfull.com/product/formal-modeling-and-analysis-oftimed-systems-12th-international-conferenceformats-2014-florence-italy-september-8-10-2014-proceedings-1stedition-axel-legay/
Intelligent Data Engineering and Automated Learning
IDEAL 2014 15th International Conference Salamanca
Spain September 10 12 2014 Proceedings 1st Edition
Emilio Corchado
https://textbookfull.com/product/intelligent-data-engineeringand-automated-learning-ideal-2014-15th-international-conferencesalamanca-spain-september-10-12-2014-proceedings-1st-editionemilio-corchado/
Smart Health International Conference ICSH 2014 Beijing
China July 10 11 2014 Proceedings 1st Edition Xiaolong
Zheng
https://textbookfull.com/product/smart-health-internationalconference-icsh-2014-beijing-chinajuly-10-11-2014-proceedings-1st-edition-xiaolong-zheng/
Reversible Computation 6th International Conference RC 2014 Kyoto Japan July 10 11 2014 Proceedings 1st Edition Shigeru Yamashita
https://textbookfull.com/product/reversible-computation-6thinternational-conference-rc-2014-kyoto-japanjuly-10-11-2014-proceedings-1st-edition-shigeru-yamashita/
The Semantic Web ISWC 2014 13th International Semantic Web Conference Riva del Garda Italy October 19 23 2014
Proceedings Part II 1st Edition Peter Mika
https://textbookfull.com/product/the-semantic-web-iswc-2014-13thinternational-semantic-web-conference-riva-del-garda-italyoctober-19-23-2014-proceedings-part-ii-1st-edition-peter-mika/
Peter Auer
Alexander Clark
Thomas Zeugmann
Sandra Zilles (Eds.)
Algorithmic Learning Theory
25th International Conference, ALT 2014 Bled, Slovenia, October 8–10, 2014 Proceedings
123 LNAI 8776
SubseriesofLectureNotesinComputerScience
LNAISeriesEditors
RandyGoebel UniversityofAlberta,Edmonton,Canada
YuzuruTanaka
HokkaidoUniversity,Sapporo,Japan
WolfgangWahlster DFKIandSaarlandUniversity,Saarbrücken,Germany
LNAIFoundingSeriesEditor
JoergSiekmann
DFKIandSaarlandUniversity,Saarbrücken,Germany
LectureNotesinArtificialIntelligence8776
PeterAuerAlexanderClark
ThomasZeugmannSandraZilles(Eds.)
Algorithmic LearningTheory
25thInternationalConference,ALT2014 Bled,Slovenia,October8-10,2014
Proceedings
13
VolumeEditors
PeterAuer
MontanuniversitätLeoben,Austria
E-mail:auer@unileoben.ac.at
AlexanderClark
King’sCollegeLondon,UK DepartmentofPhilosophy
E-mail:alexander.clark@kcl.ac.uk
ThomasZeugmann
HokkaidoUniversity DivisionofComputerScience Sapporo,Japan
E-mail:thomas@ist.hokudai.ac.jp
SandraZilles UniversityofRegina DepartmentofComputerScience Regina,SK,Canada
E-mail:zilles@cs.uregina.ca
ISSN0302-9743e-ISSN1611-3349
ISBN978-3-319-11661-7e-ISBN978-3-319-11662-4
DOI10.1007/978-3-319-11662-4
SpringerChamHeidelbergNewYorkDordrechtLondon
LibraryofCongressControlNumber:2014948640
LNCSSublibrary:SL7–ArtificialIntelligence
©SpringerInternationalPublishingSwitzerland2014
Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped.Exemptedfromthislegalreservationarebriefexcerptsinconnection withreviewsorscholarlyanalysisormaterialsuppliedspecificallyforthepurposeofbeingenteredand executedonacomputersystem,forexclusiveusebythepurchaserofthework.Duplicationofthispublication orpartsthereofispermittedonlyundertheprovisionsoftheCopyrightLawofthePublisher’slocation, inistcurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer.Permissionsforuse maybeobtainedthroughRightsLinkattheCopyrightClearanceCenter.Violationsareliabletoprosecution undertherespectiveCopyrightLaw.
Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Whiletheadviceandinformationinthisbookarebelievedtobetrueandaccurateatthedateofpublication, neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityforanyerrorsor omissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,withrespecttothe materialcontainedherein.
Typesetting: Camera-readybyauthor,dataconversionbyScientificPublishingServices,Chennai,India
Printedonacid-freepaper
SpringerispartofSpringerScience+BusinessMedia(www.springer.com)
Preface
Thisvolumecontainsthepaperspresentedatthe25thInternationalConference onAlgorithmicLearningTheory(ALT2014),whichwasheldinBled,Slovenia, duringOctober8–10,2014.ALT2014wasco-locatedwiththe17thInternational ConferenceonDiscoveryScience(DS2014).ThetechnicalprogramofALT2014 had4invitedtalks(presentedjointlytobothALT2014andDS2014)and21 papersselectedfrom50submissionsbytheALTProgramCommittee.
ALT2014tookplaceinthehotelGolfinabeautifulparkfullofoldtrees intheveryheartofBled.Itprovidedastimulatinginterdisciplinaryforumto discussthetheoreticalfoundationsofmachinelearningaswellastheirrelevance topracticalapplications.
ALTisdedicatedtothetheoreticalfoundationsofmachinelearningandprovidesaforumforhigh-qualitytalksandscientificinteractioninareassuchas reinforcementlearning,inductiveinferenceandgrammaticalinference,learningfromqueries,activelearning,probablyapproximatecorrectlearning,onlinelearning,bandittheory,statisticallearningtheory,Bayesianandstochastic learning,un-supervisedorsemi-supervisedlearning,clustering,universalprediction,stochasticoptimization,highdimensionalandnon-parametricinference,information-basedmethods,decisiontreemethods,kernel-basedmethods, graphmethodsand/ormanifold-basedmethods,samplecomplexity,complexity oflearning,privacypreservinglearning,learningbasedonKolmogorovcomplexity,newlearningmodels,andapplicationsofalgorithmiclearningtheory.
ThepresentvolumeofLNAIcontainsthetextofthe21paperspresentedat ALT2014,aswellasthetexts/abstractsoftheinvitedtalks:
– ZoubinGhahramani(UniversityofCambridge,Cambridge,UK),“BuildinganAutomatedStatistician”(jointinvitedspeakerforALT2014and DS2014)
–
LucDevroye(McGillUniversity,Montreal,Canada),“CellularTreeClassifiers”(invitedspeakerforALT2014),
– EykeH¨ullermeier(Universit¨atPaderborn,Germany),“ASurveyofPreference-BasedOnlineLearningwithBanditAlgorithms”(tutorialspeakerfor ALT2014),
– AnuˇskaFerligoj(UniversityofLjubljana,Slovenia).“SocialNetworkAnalysis”(tutorialspeakerforDS2014)
Since1999,ALThasbeenawardingtheE.M.GoldAwardforthemostoutstandingstudentcontribution.Thisyear,theawardwasgiventoHasanAbasiand AliZ.Abdifortheirpaper“LearningBooleanHalfspaceswithSmallWeights fromMembershipQueries”co-authoredbyNaderH.Bshouty.
ALT2014wasthe25thmeetingintheALTconferenceseries,establishedin Japanin1990.TheALTseriesissupervisedbyitsSteeringCommittee:Peter
Auer(UniversityofLeoben,Austria),ShaiBen-David(UniversityofWaterloo, Canada),NaderH.Bshouty(Technion-IsraelInstituteofTechnology,Israel), AlexanderClark(King’sCollegeLondon,UK),MarcusHutter(AustralianNationalUniversity,Canberra,Australia),JyrkiKivinen(UniversityofHelsinki, Finland),FrankStephan(NationalUniversityofSingapore,RepublicofSingapore),GillesStoltz(Ecolenormalesup´erieure,Paris,France),CsabaSzepesv´ari (UniversityofAlberta,Edmonton,Canada),EijiTakimoto(KyushuUniversity, Fukuoka,Japan),Gy¨orgyTur´an(UniversityofIllinoisatChicago,USA,and UniversityofSzeged,Hungary),AkihiroYamamoto(KyotoUniversity,Japan), ThomasZeugmann(Chair,HokkaidoUniversity,Sapporo,Japan),andSandra Zilles(Co-chair,UniversityofRegina,Saskatchewan,Canada).
Wethankvariouspeopleandinstitutionswhocontributedtothesuccessof theconference.Mostimportantly,wewouldliketothanktheauthorsforcontributingandpresentingtheirworkattheconference.Withouttheircontribution thisconferencewouldnothavebeenpossible.WewouldliketothanktheOffice ofNavalResearchGlobalforthegenerousfinancialsupportfortheconference ALT2014providedunderONRGGRANTN62909-14-1-C195.
ALT2014andDS2014wereorganizedbytheJoˇzefStefanInstitute(JSI)and theUniversityofLjubljana.WeareverygratefultotheDepartmentofKnowledgeTechnologies(andtheprojectMAESTRA)atJSIforsponsoringtheconferencesandprovidingadministrativesupport.Inparticular,wethankthelocal arrangementchair,MiliBauer,andherteam,TinaAnˇziˇc,NikolaSimidjievski, andJuricaLevati´cfromJSIfortheireffortsinorganizingthetwoconferences.
WearegratefulforthecollaborationwiththeconferenceseriesDiscovery Science.InparticularwewouldliketothankthegeneralchairofDS2014and ALT2014LjupˇcoTodorovskiandtheDS2014ProgramCommitteechairsSaˇso Dˇzeroski,DragiKocev,andPanˇcePanov.
WearealsogratefultoEasyChair,the excellentconferencemanagementsystem,whichwasusedforputtingtogethertheprogramforALT2014.EasyChair wasdevelopedmainlybyAndreiVoronkovandishostedattheUniversityof Manchester.Thesystemiscost-free.
WearegratefultothemembersoftheProgramCommitteeforALT2014 andthesubrefereesfortheirhardworkinselectingagoodprogramforALT 2014.Lastbutnottheleast,wethankSpringerfortheirsupportinpreparing andpublishingthisvolumeintheLectureNotesinArtificialIntelligenceseries.
August2014PeterAuer
VIPreface
AlexanderClark ThomasZeugmann SandraZilles
Organization
GeneralChairforALT2014andDS2014
LjupˇcoTodorovskiUniversityofLjubljana,Slovenia
ProgramCommittee
NirAilonTechnion,Israel
Andr´asAntosSZIT,Hungary
PeterAuer(Chair)Montanuniversit¨atLeoben,Austria
ShaiBen-DavidUniversityofWaterloo,Canada
S´ebastienBubeckPrincetonUniversity,USA
AlexanderClark(Chair)King’sCollegeLondon,UK
CorinnaCortesGoogle,USA
VitalyFeldmanIBMResearch,USA
ClaudioGentileUniversit`adegliStudidell’Insubria, Italy
SteveHannekeCarnegieMellonUniversity,USA
KoheiHatanoKyushuUniversity,Japan
SanjayJainNationalUniversityofSingapore,Singapore
TimoK¨otzingFriedrich-Schiller-Universit¨at,Germany
EricMartinUniversityofNewSouthWales, Australia
MehryarMohriCourantInstituteofMathematicalSciences, USA
R´emiMunosInria,France
RonaldOrtnerMontanuniversit¨atLeoben,Austria
LevReyzinUniversityofIllinoisatChicago,USA
Daniil,RyabkoInria,France
SivanSabatoMicrosoftResearchNewEngland,USA
MasashiSugiyamaTokyoInstituteofTechnology,Japan
CsabaSzepesv´ariUniversityofAlberta,Canada
JohnShawe-TaylorUniversityCollegeLondon,UK
VladimirVovkRoyalHolloway,UniversityofLondon,UK
SandraZillesUniversityofRegina,Canada
LocalArrangementsChair
MiliBauerJoˇzefStefanInstitute,Ljubljana
Subreferees
Abbasi-Yadkori,Yasin Allauzen,Cyril Amin,Kareem ´ AvilaPires,Bernardo Cesa-Bianchi,Nicol` o Chernov,Alexey Ge,Rong Gravin,Nick Kameoka,Hirokazu Kanade,Varun Kanamori,Takafumi Koc´ak,Tom´aˇ s Kuznetsov,Vitaly Lazaric,Alessandro Lever,Guy London,Ben Long,Phil Ma,Yao Maillard,Odalric-Ambrym
SponsoringInstitutions
Mens,Irini-Eleftheria Morimura,Tetsuro Munoz,Andres Nakajima,Shinichi Neu,Gergely Procopiuc,Cecilia Russo,Daniel Sakuma,Jun Semukhin,Pavel Shamir,Ohad Slivkins,Aleksandrs Smith,Adam Syed,Umar Takimoto,Eiji Telgarsky,Matus Wen,Zheng Yamada,Makoto Yaroslavtsev,Grigory Zolotykh,Nikolai
OfficeofNavalResearchGlobal,ONRGGRANTN62909-14-1-C195 JoˇzefStefanInstitute,Ljubljana UniversityofLjubljana
VIIIOrganization
InvitedAbstracts
ASurveyofPreference-BasedOnlineLearning withBanditAlgorithms
R´obertBusa-FeketeandEykeH¨ullermeier
DepartmentofComputerScience
UniversityofPaderborn,Germany
{busarobi,eyke}@upb.de
Abstract. Inmachinelearning,thenotionof multi-armedbandits refers toaclassofonlinelearningproblems,inwhichanagentissupposedto simultaneouslyexploreandexploitagivensetofchoicealternativesin thecourseofasequentialdecisionprocess.Inthestandardsetting,the agentlearnsfromstochasticfeedbackintheformofreal-valuedrewards. Inmanyapplications,however,numericalrewardsignalsarenotreadily available—instead,onlyweakerinformationisprovided,inparticularrelativepreferencesintheformofqualitativecomparisonsbetweenpairsof alternatives.Thisobservationhasmotivatedthestudyofvariantsofthe multi-armedbanditproblem,inwhichmoregeneralrepresentationsare usedbothforthetypeoffeedbacktolearnfromandthetargetofprediction.Theaimofthispaperistoprovideasurveyofthestate-of-the-art inthisfield,thatwerefertoas preference-basedmulti-armedbandits.To thisend,weprovideanoverviewofproblemsthathavebeenconsidered intheliteratureaswellasmethodsfortacklingthem.Oursystematizationismainlybasedontheassumptionsmadebythesemethodsabout thedata-generatingprocessand,relatedtothis,thepropertiesofthe preference-basedfeedback.
Keywords: Multi-armedbandits,onlinelearning,preferencelearning, ranking,top-kselection,exploration/exploitation,cumulativeregret,samplecomplexity,PAClearning.
CellularTreeClassifiers
G´erardBiau1,2 andLucDevroye3
1 SorbonneUniversit´es,UPMCUnivParis06,France
2 InstitutuniversitairedeFrance
3 McGillUniversity,Canada
Abstract. Supposethatbinaryclassificationisdonebyatreemethod inwhichtheleavesofatreecorrespondtoapartitionofd-space.Within apartition,amajorityvoteisused.Supposefurthermorethatthistree mustbeconstructedrecursivelybyimplementingjusttwofunctions,so thattheconstructioncanbecarriedoutinparallelbyusing“cells”:first ofall,giveninputdata,acellmustdecidewhetheritwillbecomealeaf oraninternalnodeinthetree.Secondly,ifitdecidesonaninternal node,itmustdecidehowtopartitionthespacelinearly.Dataarethen splitintotwopartsandsentdownstreamtotwonewindependentcells. Wediscussthedesignandpropertiesofsuchclassifiers.
SocialNetworkAnalysis
AnuˇskaFerligoj
FacultyofSocialSciences, UniversityofLjubljana
anuska.ferligoj@fdv.uni-lj.si
Abstract. Socialnetworkanalysishasattractedconsiderableinterest fromsocialandbehavioralsciencecommunityinrecentdecades.Much ofthisinterestcanbeattributedtothefocusofsocialnetworkanalysis onrelationshipamongunits,andonthepatternsoftheserelationships. Socialnetworkanalysisisarapidlyexpandingandchangingfieldwith broadrangeofapproaches,methods,modelsandsubstantiveapplications.Inthetalkspecialattentionwillbegivento:
1.Generalintroductiontosocialnetworkanalysis:
– Whataresocialnetworks?
– Datacollectionissues.
– Basicnetworkconcepts:networkrepresentation;typesofnetworks;sizeanddensity.
– Walksandpathsinnetworks:lengthandvalueofpath;theshortestpath, k -neighbours;acyclicnetworks.
– Connectivity:weakly,stronglyandbi-connectedcomponents; contraction;extraction.
2.Overviewoftasksandcorrespondingmethods:
– Network/nodeproperties:centrality(degree,closeness,betweenness);hubsandauthorities.
– Cohesion:triads,cliques,cores,islands.
– Partitioning:blockmodeling(directandindirectapproaches; structural,regularequivalence;generalisedblockmodeling);clustering.
– Statisticalmodels.
3.Softwareforsocialnetworkanalysis(UCINET,PAJEK,...)
BuildinganAutomatedStatistician
ZoubinGhahramani
DepartmentofEngineering, UniversityofCambridge, TrumpingtonStreet CambridgeCB21PZ,UK
zoubin@eng.cam.ac.uk
Abstract. Wewillliveaneraofabundantdataandthereisanincreasingneedformethodstoautomatedataanalysisandstatistics.Iwilldescribethe“AutomatedStatistician”,aprojectwhichaimstoautomate theexploratoryanalysisandmodellingofdata.Ourapproachstartsby definingalargespaceofrelatedprobabilisticmodelsviaagrammar overmodels,andthenusesBayesianmarginallikelihoodcomputations tosearchoverthisspaceforoneorafewgoodmodelsofthedata.The aimistofindmodelswhichhavebothgoodpredictiveperformance,and aresomewhatinterpretable.Ourinitialworkhasfocusedonthelearning ofunknownnonparametricregressionfunctions,andonlearningmodels oftimeseriesdata,bothusingGaussianprocesses.Onceagoodmodel hasbeenfound,theAutomatedStatisticiangeneratesanaturallanguage summaryoftheanalysis,producinga10-15pagereportwithplotsand tablesdescribingtheanalysis.Iwilldiscusschallengessuchas:howto tradeoffpredictiveperformanceandinterpretability,howtotranslate complexstatisticalconceptsintonaturallanguagetextthatisunderstandablebyanumeratenon-statistician,andhowtointegratemodel checking.ThisisjointworkwithJamesLloydandDavidDuvenaud (Cambridge)andRogerGrosseandJoshTenenbaum(MIT).
R´obertBusa-FeketeandEykeH¨ullermeier
RegularContributions InductiveInference
TimoK¨otzingandRaphaelaPalenta
SanjayJain,TimoK¨otzing,JunqiMa,andFrankStephan ParallelLearningofAutomaticClassesofLanguages
SanjayJainandEfimKinber
LaurentBienvenu,BenoˆıtMonin,andAlexanderShen
ExactLearningfromQueries
HasanAbasi,AliZ.Abdi,andNaderH.Bshouty
OnExactLearningMonotoneDNFfromMembershipQueries .........
HasanAbasi,NaderH.Bshouty,andHannaMazzawi
DanaAngluinandDanaFisman
Editors’Introduction ............................................. 1 PeterAuer,AlexanderClark,ThomasZeugmann,andSandraZilles FullInvitedPapers CellularTreeClassifiers 8
18
TableofContents
G´erardBiauandLucDevroye ASurveyofPreference-BasedOnlineLearningwithBandit Algorithms ......................................................
AMapofUpdateConstraintsinInductiveInference 40
OntheRoleofUpdateConstraintsandText-TypesinIterative Learning ........................................................ 55
70
AlgorithmicIdentificationofProbabilitiesIsHard .................... 85
96
LearningBooleanHalfspaceswithSmallWeightsfromMembership Queries
111
LearningRegularOmegaLanguages ................................ 125
ReinforcementLearning
SelectingNear-OptimalApproximateStateRepresentationsin ReinforcementLearning ...........................................
RonaldOrtner,Odalric-AmbrymMaillard,andDaniilRyabko
PolicyGradientsforCVaR-ConstrainedMDPs
L.A.Prashanth
TorLattimoreandMarcusHutter
ExtremeStateAggregationbeyondMDPs
MarcusHutter
OnlineLearningandLearningwithBandit Information
TorLattimore,Andr´asGy¨orgy,andCsabaSzepesv´ari BanditOnlineOptimizationoverthePermutahedron
NirAilon,KoheiHatano,andEijiTakimoto
MarcusHutter
AndreasMaurer
GeneralizationBoundsforTimeSeriesPredictionwithNon-stationary
VitalyKuznetsovandMehryarMohri
GeneralizingLabeledandUnlabeledSampleCompressiontoMulti-label ConceptClasses .................................................
RahimSamei,BotingYang,andSandraZilles Privacy,Clustering,MDL,andKolmogorov Complexity
RobustandPrivateBayesianInference
ChristosDimitrakakis,BlaineNelson,AikateriniMitrokotsa,and BenjaminI.P.Rubinstein
XVITableofContents
140
....................... 155
BayesianReinforcementLearningwithExploration ................... 170
185
OnLearningtheOptimalWaitingTime ............................ 200
................. 215
OfflinetoOnlineConversion ....................................... 230
245
StatisticalLearningTheory AChainRulefortheExpectedSupremaofGaussianProcesses
Processes ....................................................... 260
275
............................. 291
Clustering,HammingEmbedding,GeneralizedLSHandtheMax Norm
BehnamNeyshabur,YuryMakarychev,andNathanSrebro
JanLeikeandMarcusHutter
PeterBloem,FranciscoMota,StevendeRooij,Lu´ısAntunes,and PieterAdriaans
TableofContentsXVII
306
IndefinitelyOscillatingMartingales ................................. 321
ASafeApproximationforKolmogorovComplexity 336
351
AuthorIndex
Editors’Introduction
PeterAuer,AlexanderClark,ThomasZeugmann,andSandraZilles
TheaimoftheseriesofconferencesonAlgorithmicLearningTheory(ALT) istolookatlearningfromanalgorithmicandmathematicalperspective.Over timeseveralmodelsoflearninghavebeendevelopedwhichstudydifferentaspectsoflearning.Inthefollowingwedescribeinbrieftheinvitedtalksandthe contributedpapersforALT2014heldinBled,Slovenia..
InvitedTalks. Followingthetraditionoftheco-locatedconferencesALTand DSallinvitedlecturesaresharedbythe twoconferences.Theinvitedspeakers areeminentresearchersintheirfieldsandpresenteithertheirspecificresearch areaorlectureaboutatopicofbroaderinterest.
Thisyear’sjointinvitedspeakerforALT2014andDS2014isZoubinGhahramani,whoisProfessorofInformationEngineeringattheUniversityofCambridge,UK,whereheleadsagroupofabout30researchers.Hestudiedcomputer scienceandcognitivescienceattheUniversityofPennsylvania,obtainedhisPhD fromMITin1995underthesupervisionofMichaelJordan,andwasapostdoctoralfellowattheUniversityofTorontowithGeoffreyHinton.Hisacademic careerincludesconcurrentappointmentsasoneofthefoundingmembersofthe GatsbyComputationalNeuroscienceUnitinLondon,andasafacultymemberof CMU’sMachineLearningDepartmentforover10years.HiscurrentresearchfocusesonnonparametricBayesianmodelingandstatisticalmachinelearning.He hasalsoworkedonapplicationstobioinformatics,econometrics,andavarietyof large-scaledatamodelingproblems.He haspublishedover200papers,receiving 25,000citations(anh-indexof68).HisworkhasbeenfundedbygrantsanddonationsfromEPSRC,DARPA,Microsoft,Google,Infosys,Facebook,Amazon, FXConceptsandanumberofotherindustrialpartners.In2013,hereceived a$750,000GoogleAwardforresearchonbuildingtheAutomaticStatistician. Inhisinvitedtalk BuildinganAutomatedStatistician (jointworkwithJames Lloyd,DavidDuvenaud,RogerGrosse,andJoshTenenbaum)ZoubinGhahramaniaddressestheproblemofabundantdataandtheincreasingneedformethodstoautomatedataanalysisandstatistics.TheAutomatedStatisticianproject aimstoautomatetheexploratoryanalysisandmodelingofdata.Theapproach usesBayesianmarginallikelihoodcomputationstosearchoveralargespaceof relatedprobabilisticmodels.Onceagoodmodelhasbeenfound,theAutomated Statisticiangeneratesanaturallanguagesummaryoftheanalysis,producinga 10-15pagereportwithplotsandtablesdescribingtheanalysis.ZoubinGhahramanidiscusseschallengessuchas:howtotradeoffpredictiveperformanceand interpretability,howtotranslatecomplexstatisticalconceptsintonaturallanguagetextthatisunderstandablebyanumeratenon-statistician,andhowto integratemodelchecking.
TheinvitedspeakerforALT2014isLucDevroye,whoisaJamesMcGillProfessorintheSchoolofComputerScienceofMcGillUniversityinMontreal.He
P.Aueretal.(Eds.):ALT2014,LNAI8776,pp.1–7,2014. c SpringerInternationalPublishingSwitzerland2014
studiedatKatholiekeUniversiteitLeuvenandsubsequentlyatOsakaUniversity andin1976receivedhisPhDfromUniversityofTexasatAustinunderthesupervisionofTerryWagner.LucDevroyespecializesintheprobabilisticanalysis ofalgorithms,randomnumbergenerationandenjoystypography.Sincejoining theMcGillfacultyin1977hehaswonnumerousawards,includinganE.W.R. SteacieMemorialFellowship(1987),aHumboldtResearchAward(2004),the KillamPrize(2005)andtheStatisticalSocietyofCanadagoldmedal(2008). Hereceivedanhonorarydocto ratefromtheUniversit´ecatholiquedeLouvain in2002,andanhonorarydoctoratefromUniversiteitAntwerpenin2012.The invitedpaper CellularTreeClassifiers (jointworkwithGerardBiau)dealswith classificationbydecisiontrees,wherethedecisiontreesareconstructedrecursivelybyusingonlytwolocalrules:(1)giventheinputdatatoanode,itmust decidewhetheritwillbecomealeafornot,and(2)anon-leafnodeneedsto decidehowtosplitthedataforsendingthemdownstream.Theimportantpoint isthateachnodecanmakethesedecisionsbasedonlyonitslocaldata,such thatthedecisiontreeconstructioncanbecarriedoutinparallel.Somewhat surprisinglytherearesuchlocalrules thatguaranteeconvergenceofthedecisiontreeerrortotheBayesoptimalerror.LucDevroyediscussesthedesignand propertiesofsuchclassifiers.
TheALT2014tutorialspeakerisEykeH¨ullermeier,whoisprofessorand headoftheIntelligentSystemsGroupattheDepartmentofComputerScience oftheUniversityofPaderborn.HereceivedhisPhDinComputerSciencefrom theUniversityofPaderbornin1997andhealsoholdsanMScdegreeinbusinessinformatics.Hewasaresearcherinartificialintelligence,knowledge-based systems,andstatisticsattheUniversityofPaderbornandtheUniversityof DortmundandaMarieCuriefellowattheInstitutdeRechercheenInformatiquedeToulouse.HehasheldalreadyafullprofessorshipintheDepartment ofMathematicsandComputerScienceatMarburgUniversitybeforerejoining theUniversityofPaderborn.Inhistutorial ASurveyofPreference-basedOnline LearningwithBanditAlgorithms (jointworkwithR´obertBusa-Fekete)Eyke H¨ullermeierreportsonlearningwithbanditfeedbackthatisweakerthanthe usualreal-valuereward.Whenlearningwithbanditfeedbackthelearningalgorithmreceivesfeedbackonlyfromthedecisionsitmakes,butnoinformation fromotheralternatives.Thusthelearningalgorithmneedstosimultaneously exploreandexploitagivensetofalternativesinthecourseofasequentialdecisionprocess.Inmanyapplicationsthefeedbackisnotanumericalrewardsignal butsomeweakerinformation,inparticularrelativepreferencesintheformof qualitativecomparisonsbetweenpairsofalternatives.Thisobservationhasmotivatedthestudyofvariantsofthemulti-armedbanditproblem,inwhichmore generalrepresentationsareusedbothforthetypeoffeedbacktolearnfromand thetargetofprediction.Theaimofthetutorialistoprovideasurveyofthe state-of-the-artinthisareawhichisreferredtoaspreference-basedmulti-armed bandits.Tothisend,EykeH¨ullermeierprovidesanoverviewofproblemsthat havebeenconsideredintheliteratureaswellasmethodsfortacklingthem. Hissystematizationismainlybasedontheassumptionsmadebythesemeth-
2P.Aueretal.
odsaboutthedata-generatingprocessand,relatedtothis,thepropertiesofthe preference-basedfeedback.
TheDS2014tutorialspeakerisAnuˇskaFerligoj,whoisprofessorofMultivariateStatisticalMethodsattheUniversityofLjubljana.SheisaSlovenian mathematicianwhoearnedinternationalrecognitionbyherresearchworkon networkanalysis.Herinterestsincludemultivariateanalysis(constrainedand multicriteriaclustering),socialnetworks(measurementqualityandblockmodeling),andsurveymethodology(reliabilityandvalidityofmeasurement).She isafellowoftheEuropeanAcademyofSociology.ShehasalsobeenaneditorofthejournalAdvancesinMethodologyandStatistics(Metodoloskizvezki) since2004andisamemberoftheeditorialboardsoftheJournalofMathematicalSociology,JournalofClassification,SocialNetworks,StatisticinTransition, Methodology,StructureandDynamics:eJournalofAnthropologyandRelated Sciences.ShewasaFulbrightscholarin1990andvisitingprofessorattheUniversityofPittsburgh.ShewasawardedthetitleofAmbassadorofScienceofthe RepublicofSloveniain1997.Socialnetworkanalysishasattractedconsiderable interestfromthesocialandbehavioralsciencecommunityinrecentdecades. Muchofthisinterestcanbeattributedtothefocusofsocialnetworkanalysis onrelationshipamongunits,andonthepatternsoftheserelationships.Social networkanalysisisarapidlyexpandingandchangingfieldwithbroadrangeof approaches,methods,modelsandsubstantiveapplications.Inhertutorial Social NetworkAnalysis AnuˇskaFerligojgivesageneralintroductiontosocialnetwork analysisandanoverviewoftasksandcorrespondingmethods,accompaniedby pointerstosoftwareforsocialnetworkanalysis.
InductiveInference. Thereareanumberofpapersinthefieldofinductive inference,themostclassicalbranchofalgorithmiclearningtheory.First, AMap ofUpdateConstraintsinInductiveInference byTimoK¨otzingandRaphaela Palentaprovidesasystematicoverviewofvariousconstraintsonlearnersin inductiveinferenceproblems.Theyfocusonthequestionofwhichconstraints andcombinationsofconstraintsreducethelearningpower,meaningtheclassof languagesthatarelearnablewithrespecttocertaincriteria.
Onarelatedtheme,thepaper OntheRoleofUpdateConstraintsandTextTypesinIterativeLearning bySanjayJain,TimoK¨otzing,JunqiMa,andFrank Stephanlooksmorespecificallyatthecasewherethelearnerhasnomemory beyondthecurrenthypothesis.Inthissituationthepaperisabletocompletely characterizetherelationsbetweenthevariousconstraints.
Thepaper ParallelLearningofAutomaticClassesofLanguages bySanjay JainandEfimKinbercontinuesthelineofresearchonlearningautomaticclasses oflanguagesinitiatedbyJain,LuoandStephanin2012,inthiscasebyconsideringtheproblemoflearningmultipledistinctlanguagesatthesametime.
LaurentBienvenu,BenoˆıtMoninandAlexanderShenpresentanegativeresult intheirpaper AlgorithmicIdentificationofProbabilitiesisHard.Theyshowthat itisimpossibletoidentifyinthelimittheexactparameter—inthesenseofthe Turingcodeforacomputablerealnumber—ofaBernoullidistribution,though itisofcourseeasytoapproximateit.
Editors’Introduction3
ExactLearningfromQueries. Incaseswheretheinstancespaceisdiscrete, itisreasonabletoaimatexactlearningalgorithmswherethelearnerisrequired toproduceahypothesisthatisexactlycorrect.
ThepaperwinningtheE.M.GoldAward, LearningBooleanHalfspaceswith SmallWeightsfromMembershipQueries bythestudentauthorsHasanAbasi andAliZ.Abdiandco-authoredbyNaderH.Bshouty,presentsasignificantlyimprovedalgorithmforlearningBooleanHalfspacesin {0, 1}n withinteger weights {0,...,t} frommembershipqueriesonly.Itisshownthatthisalgorithm needsonly nO (t) membershipqueries,whichimprovesoverpreviousalgorithms with nO (t5 ) queriesandclosesthegaptotheknownlowerbound nt
ThepaperbyHasanAbasi,NaderH.BshoutyandHannaMazzawi OnExact LearningMonotoneDNFfromMembershipQueries presentslearningresultson learnabilitybymembershipqueriesofmonotoneDNF(disjunctivenormalforms) withaboundednumberoftermsandaboundednumberofvariablesperterm.
DanaAngluinandDanaFismanlookatexactlearningusingmembership queriesandequivalencequeriesintheirpaper LearningRegularOmegaLanguages.Heretheclassconcernedisthatofregularlanguagesoverinfinitewords; theauthorsconsiderthreedifferentrepresentationswhichvaryintheirsuccinctness.Thisproblemhasapplicationsinverificationandsynthesisofreactive systems.
ReinforcementLearning. Reinforcementlearningcontinuestobeacentrally importantareaoflearningtheory,andthisconferencecontainsanumberof contributionsinthisfield.RonaldOrtner,Odalric-AmbrymMaillardandDaniil Ryabkopresentapaper SelectingNear-OptimalApproximateStateRepresentationsinReinforcementLearning,whichlooksattheproblemwherethelearner doesnothavedirectinformationaboutthestatesintheunderlyingMarkov DecisionProcess(MDP);incontrasttoPartiallyObservableMDPs,herethe informationisviavariousmodelsthatmapthehistoriestostates.
L.A.Prashanthconsidersriskconstrainedreinforcementlearninginhispaper PolicyGradientsforCVaR-ConstrainedMDPs,focusingonthestochastic shortestpathproblem.Forariskconstrainedproblemnotonlytheexpected sumofcostsperstep E[ m g (sm ,am )]istobeminimized,butalsothesumof anadditionalcostmeasure C = m c(sm ,am )needstobeboundedfromabove. UsuallytheValueatRisk,VaRα =inf {ξ |P(C ≤ ξ ) ≥ α},isconstrained,but suchconstrainedproblemsarehardto optimize.Instead,thepaperproposes toconstraintheConditionalValueatRisk,CVaRα = E[C |C ≥ VaRα ],which allowstoapplystandardoptimizationtechniques.Twoalgorithmsarepresented thatconvergetoalocallyrisk-optimalpolicyusingstochasticapproximation, minibatches,policygradients,andimportancesampling.
IncontrasttotheusualMDPsettingforreinforcementlearning,thetwofollowingpapersconsidermoregeneralreinforcementlearning. BayesianReinforcementLearningwithExploration byTorLattimoreandMarcusHutterimproves someoftheirearlierworkongeneralreinforcementlearning.HerethetrueenvironmentdoesnotneedtobeMarkovian,butitisknowntobedrawnatrandom fromafiniteclassofpossibleenvironments.Analgorithmispresentedthatalter-
4P.Aueretal.
natesbetweenperiodsofplayingtheBayesoptimalpolicyandperiodsofforced experimentation.Upperboundsonthesamplecomplexityareestablished,and itisshownthatforsomeclassesofenvironmentsthisboundcannotbeimproved bymorethanalogarithmicfactor.
MarcusHutter’spaper ExtremeStateAggregationbeyondMDPs considers howanarbitrary(non-Markov)decisionprocesswithafinitenumberofactions canbeapproximatedbyafinite-stateMDP.Foragivenfeaturefunction φ : H → S mappinghistories h ofthegeneralprocesstosomefinitestatespace S , thetransitionprobabilitiesoftheMDPcanbedefinedappropriately.Itisshown thattheMDPapproximatesthegeneralprocesswelliftheoptimalpolicyfor thegeneralprocessisconsistentwiththefeaturefunction, π ∗ (h1 )= π ∗ (h2 )for φ(h1 )= φ(h2 ),oriftheoptimalQ-valuefunctionisconsistentwiththefeature function, |Q∗ (h1 ,a) Q∗ (h2 ,a)| <ε for φ(h1 )= φ(h2 )andall a.Itisalsoshown thatsuchafeaturefunctionalwaysexists.
OnlineLearningandLearningwithBanditInformation.
Thepaper OnLearningtheOptimalWaitingTime byTorLattimore,Andr´asGy¨orgy,and CsabaSzepesv´ari,addressestheproblemofhowlongtowaitforaneventwith independentandidenticallydistributed(i.i.d.)arrivaltimesfromanunknown distribution.Iftheeventoccursduringthewaitingtime,thenthecostisthe timeuntilarrival.Iftheeventoccursafterthewaitingtime,thenthecostisthe waitingtimeplusafixedandknownamount.Algorithmsforthefullinformation settingandforbanditinformationarepresentedthatsequentiallychoosewaiting timesoverseveralroundsinordertominimizetheregretinrespecttoanoptimal waitingtime.Forbanditinformationthearrivaltimeisonlyrevealedifitis smallerthanthewaitingtime,andinthefullinformationsettingitisrevealed always.Theperformanceofthealgorithmsnearlymatchestheminimaxlower boundontheregret.
Inmanyapplicationareas,e.g.recommendationsystems,thelearningalgorithmshouldreturnaranking:apermutationofsomefinitesetofelements.This problemisstudiedinthepaperbyNirAilon,KoheiHatano,andEijiTakimoto titled BanditOnlineOptimizationOverthePermutahedron whenthecostofa rankingsiscalculatedas n i=1 π (i)s(i),where π (i)istherankofitem i and s(i) isitscost.Inthebanditsettingineachiterationanunknowncostvector st is chosen,andthegoalofthealgorithmistominimizetheregretinrespecttothe bestfixedrankingoftheitems.
MarcusHutter’spaper OfflinetoOnlineConversion introducestheproblem ofturningasequenceofdistributions qn onstringsin X n , n =1,...,n,intoa stochasticonlinepredictorforthenextsymbol˜ q (xn |x1 ,...,xn 1 ),suchthatthe inducedprobabilities˜ q (x1 ,...,xn )arecloseto qn (x1 ,...,xn )forallsequences x1 ,x2 ,... Thepaperconsidersfourstrategiesfordoingsuchaconversion,showingthatna¨ıveapproachesmightnotbesatisfactorybutthatagoodpredictor canalwaysbeconstructed,atthecostofpossiblecomputationalinefficiency. Oneexamplesofsuchaconversiongivesasimplecombinatorialderivationof theGood-Turingestimator.
Editors’Introduction5
StatisticalLearningTheory.
AndreasMaurer’spaper AChainRuleforthe ExpectedSupremaofGaussianProcesses investigatestheproblemofassessing generalizationofalearnerwhoisadaptingafeaturespacewhilealsolearning thetargetfunction.Theapproachtakenistoconsiderextensionsofboundson Gaussianaveragestothecasewherethereisaclassoffunctionsthatcreate featuresandaclassofmappingsfromthosefeaturestooutputs.Intheapplicationsconsideredinthepaperthiscorrespondstoatwolayerkernelmachine, multitasklearning,andthroughaniterationoftheapplicationoftheboundto multilayernetworksanddeeplearners.
Astandardassumptioninstatisticallearningtheoryisthatthedataaregeneratedindependentlyandidenticallydistributedfromsomefixeddistribution; inpractice,thisassumptionisoftenviolatedandamorerealisticassumptionis thatthedataaregeneratedbyaprocesswhichisonlysufficientlyfastmixing, andmaybeevennon-stationary.VitalyKuznetsovandMehryarMohriintheir paper GeneralizationBoundsforTimeSeriesPredictionwithNon-stationary Processes considerthiscaseandareabletoprovenewgeneralizationbounds thatdependonthemixingcoefficientsandtheshiftofthedistribution.
RahimSamei,BotingYang,andSandraZillesintheirpaper Generalizing LabeledandUnlabeledSampleCompressiontoMulti-labelConceptClasses considergeneralizationsofthebinaryVC-dimensiontomulti-labelclassification, suchthatmaximumclassesofdimension d allowatightcompressionschemeof size d.Sufficientconditionsfornotionsofdimensionswiththispropertyarederived,anditisshownthatsomemulti-labelgeneralizationsoftheVC-dimension allowtightcompressionschemes,whileothergeneralizationsdonot.
Privacy,Clustering,MDL,andKolmogorovComplexity. ChristosDimitrakakis,BlaineNelson,AikateriniMitrokotsa,andBenjaminI.P.Rubinstein presentthepaper RobustandPrivateBayesianInference.Thispaperlooksat theproblemofprivacyinmachinelearning,whereanagent,astatisticianfor example,mightwanttorevealinformationderivedfromadataset,butwithout revealinginformationabouttheparticulardatapointsintheset,whichmight containconfidentialinformation.Theauthorsshowthatitispossibletodo Bayesianinferenceinthissetting,satisfyingdifferentialprivacy,providedthat thelikelihoodsandconjugatepriorssatisfysomeproperties.
BehnamNeyshabur,YuryMakarychev,andNathanSrebrointheirpaper Clustering,HammingEmbedding,GeneralizedLSHandtheMaxNorm lookat asymmetriclocalitysensitivehashing(LSH)whichisusefulinmanytypesof machinelearningapplications.Localitysensitivehashing,whichiscloselyrelatedtotheproblemofclustering,isamethodofprobabilisticallyreducingthe dimensionofhighdimensionaldatasets;assigningeachdatapointahashsuch thatsimilardatapointswillbemappedtothesamehash.Thepapershowsthat byshiftingtoco-clusteringandasymmetricLSHtheproblemadmitsatractable relaxation.
JanLeikeandMarcusHutterlookatmartingaletheoryin IndefinitelyOscillatingMartingales ;asaconsequenceoftheiranalysistheyshowanegativeresultinthetheoryofMinimumDescriptionLength(MDL)learning,namelythat
6P.Aueretal.
theMDLestimatorisingeneralinductivelyinconsistent:itwillnotnecessarilyconverge.TheMDLestimatorgivestheregularizedcodelength,MDL(u)= minQ {Q(u)+ K (Q)},where Q isacodingfunction, K (Q)itscomplexity,and Q(u)thecodelengthforthestring u.Itisshownthatthefamilyofcoding functions Q canbeconstructedsuchthatlimn→∞ MDL(u1:n )doesnotconverge formostinfinitewords u
Asiswell-known,theKolmogorovcomplexityisnotcomputable.PeterBloem, FranciscoMota,StevendeRooij,Lu´ısAntunes,andPieterAdriaansintheir paper ASafeApproximationforKolmogorovComplexity studytheproblemof approximatingthisquantityusingarestrictiontoaparticularclassofmodels, andaprobabilisticboundontheapproximationerror.
Editors’Introduction7
CellularTreeClassifiers
G´erardBiau1,2 andLucDevroye3
1 SorbonneUniversit´es,UPMCUnivParis06,France
2 InstitutuniversitairedeFrance
3 McGillUniversity,Canada
Abstract. Supposethatbinaryclassificationisdonebyatreemethod inwhichtheleavesofatreecorrespondtoapartitionofd-space.Within apartition,amajorityvoteisused.Supposefurthermorethatthistree mustbeconstructedrecursivelybyimplementingjusttwofunctions,so thattheconstructioncanbecarriedoutinparallelbyusing“cells”:first ofall,giveninputdata,acellmustdecidewhetheritwillbecomealeaf oraninternalnodeinthetree.Secondly,ifitdecidesonaninternal node,itmustdecidehowtopartitionthespacelinearly.Dataarethen splitintotwopartsandsentdownstreamtotwonewindependentcells. Wediscussthedesignandpropertiesofsuchclassifiers.
1Introduction
Weexploreinthisnoteanewwayofdealingwiththesupervisedclassification problem,inspiredbygreedyapproachesandthedivide-and-conquerphilosophy. Ourpointofviewisnovel,buthasawidereachinaworldinwhichparallel anddistributedcomputationareimportant.Intheshortterm,parallelismwill takeholdinmassivedatasetsandcomplexsystemsand,assuch,isoneofthe excitingquestionsthatwillbeaskedtothestatisticsandmachinelearningfields.
Thegeneralcontextisthatofclassificationtrees,whichmakedecisionsby recursivelypartitioning Rd intoregions,sometimescalledcells.Inthemodelwe promote,abasiccomputationalunitinclassification,acell,takesasinputtrainingdata,andmakesadecisionwhetheramajorityruleshouldbelocallyapplied. Inthenegative,thedatashouldbesplitandeachpartofthepartitionshould betransmittedtoanothercell.Whatisoriginalinourapproachisthatallcells mustuse exactly thesameprotocoltomaketheirdecision—theirfunctionis notalteredbyexternalinputsorglobalparameters.Inotherwords,thedecision tosplitdependsonlyuponthedatapresentedtothecell,independentlyofthe overalledifice.Classifiersdesignedaccordingtothisautonomousprinciplewill becalledcellulartreeclassifiers,orsimplycellularclassifiers.
Decisiontreelearningisamethodcommonlyusedindatamining(see,e.g., [27]).Forexample,inCART(ClassificationandRegressionTrees,[5]),splitsare madeperpendiculartotheaxesbasedonthenotionofGiniimpurity.Splitsare performeduntilalldataareisolated.Inasecondphase,nodesarerecombined fromthebottom-upinaprocesscalledpruning.Itisthissecondprocessthat makestheCARTtreesnon-cellular,asglobalinformationissharedtomanage
P.Aueretal.(Eds.):ALT2014,LNAI8776,pp.8–17,2014.
c SpringerInternationalPublishingSwitzerland2014
therecombinationprocess.Quinlan’sC4.5[26]alsoprunes.Otherssplituntilall nodesorcellsarehomogeneous(i.e.,havethesameclass)—theprimeexampleis Quinlan’sID3[25].Thisstrategy,whilecompliantwiththecellularframework, leadstonon-consistentrules,aswepointoutinthepresentpaper.Infact,the choiceofagoodstoppingrulefordecisiontreesisveryhard—wewerenotable tofindanyintheliteraturethatguaranteeconvergencetotheBayeserror.
2TreeClassifiers
Inthedesignofclassifiers,wehaveanunknowndistributionofarandomprototypepair(X,Y ),where X takesvaluesin Rd and Y takesonlyfinitelymany values,say0or1forsimplicity.Classicalpatternrecognitiondealswithpredictingtheunknownnature Y oftheobservation X viaameasurableclassifier g : Rd →{0, 1}.Wemakeamistakeif g (X)differsfrom Y ,andtheprobability oferrorforaparticulardecisionrule g is L(g )= P{g (X) = Y }.TheBayes classifier g (x)= 1if P{Y =1|X = x} > P{Y =0|X = x
hasthesmallestprobabilityoferror,thatis
L = L(g )=inf g :Rd →{0,1} P{g (X) = Y }
(see,forinstance,Theorem2.1in[7]).However,mostofthetime,thedistribution of(X,Y )isunknown,sothattheoptimaldecision g isunknowntoo.Wedo notconsultanexperttotrytoreconstruct g ,buthaveaccesstoadatabase Dn =(X1 ,Y1 ),..., (Xn ,Yn )ofi.i.d.copiesof(X,Y ),observedinthepast.We assumethat Dn and(X,Y )areindependent.Inthiscontext,aclassification rule gn (x; Dn )isaBorelmeasurablefunctionof x and Dn ,anditattemptsto estimate Y from x and Dn .Forsimplicity,wesuppress Dn inthenotationand write gn (x)insteadof gn (x; Dn ).
Theprobabilityoferrorofagivenclassifier gn istherandomvariable L(gn )= P{gn (X) = Y |Dn }, andtheruleisconsistentif
Itisuniversallyconsistentifitisconsistentforallpossibledistributionsof (X,Y ).Manypopularclassifiersareuniversallyconsistent.Theseincludeseveralbrandsofhistogramrules, k -nearestneighborrules,kernelrules,neural networks,andtreeclassifiers.Therear etoomanyreferencestobecitedhere, butthemonographsby[7]and[15]willprovidethereaderwithacomprehensive introductiontothedomainandaliteraturereview.
CellularTreeClassifiers9
} 0otherwise
lim
E
n→∞
L(gn )= L .
Fig.1. Abinarytree(left)andthecorrespondingpartition(right)
Treeshavebeensuggestedastoolsforclassificationformorethanthirtyyears. WementioninparticulartheearlyworkofFu[36,1,21,18,24].Otherreferences fromthe1970sinclude[20,3,23,30,34,12,8].MostinfluentialintheclassificationtreeliteraturewastheCARTproposalby[5].WhileCARTproposes partitionsbyhyperrectangles,linearhyperplanesingeneralpositionhavealso gainedinpopularity—theearlyworkonthattopicisby[19],and[22].Additional referencesontreeclassificationinclude[14,2,16,17,35,33,31,6,9,10,32,13].
3CellularTrees
Ingeneral,classificationtreespartition Rd intoregions,oftenhyperrectangles paralleltotheaxes(anexampleisdepictedinFigure1).Ofinterestinthis articlearebinarytrees,whereeachnodehasexactly0or2children.Ifanode u representstheset A anditschildren u1 ,u2 represent A1 ,A2 ,thenitisrequired that A = A1 ∪ A2 and A1 ∩ A2 = ∅.Therootofthetreerepresents Rd ,and theterminalnodes(orleaves),takentogether,formapartitionof Rd .Ifaleaf representsregion A,thenthetreeclassifiertakesthesimpleform gn (x)= 1if n i=1 1[Xi ∈A,Yi =1] > n i=1 1[Xi ∈A,Yi =0] , x ∈ A 0otherwise.
Thatis,ineveryleafregion,amajorityvoteistakenoverall(Xi ,Yi )’swith Xi ’s inthesameregion.Tiesarebroken,byconvention,infavorofclass0.
Thetreestructureisusuallydata-dependent,andindeed,itisintheconstructionitselfwheredifferenttreesdiffer.Thus,therearevirtuallyinfinitelymany possiblestrategiestobuildclassificationtrees.Nevertheless,despitethisgreat diversity,alltreespeciesendupwithtwofundamentalquestionsateachnode:
10G.BiauandL.Devroye leaf leaf leaf leaf leaf leaf
① Shouldthenodebesplit?
② Intheaffirmative,whatareitschildren?
Thesetwoquestionsaretypicallyansweredusing global informationonthe tree,suchas,forexample,afunctionofthedata Dn ,thelevelofthenodewithin thetree,thesizeofthedatasetand,moregenerally,anyparameterconnected withthestructureofthetree.Thisparametercouldbe,forexample,thetotal number k ofcellsina k -partitiontreeorthepenaltyterminthepruningofthe CARTalgorithm(e.g.,[5]and[11]).
Ourcellulartreesproceedfromadifferentphilosophy.Inshort,acellular treeshould,ateachnode,beabletoanswerquestions ① and ② using local informationonly,withoutanyhelpfromtheothernodes.Inotherwords,each cellcanperformasmanyoperationsasitwishes,provideditusesonlythedata thataretransmittedtoit,regardlessofthegeneralstructureofthetree.Just imaginethatthecalculationstobecarriedoutatthenodesaresenttodifferent computers,eventuallyasynchronously,andthatthesystemarchitectureisso complexthatcomputersdonotcommunicate.Thus,onceacomputerreceives itsdata,ithastomakeitsowndecisionson ① and ② basedonthisdatasubset only,independentlyoftheothersandwithoutknowinganythingoftheoverall edifice.Onceadatasetissplit,itcanbe giventoanothercomputerforfurther splitting,sincetheremainingdatapointshavenoinfluence.
Formally,acellularbinaryclassificationtreeisamachinethatpartitionsthe spacerecursivelyinthefollowingmanner.Witheachnodeweassociateasubset of Rd ,startingwith Rd fortherootnode.Weconsiderbinarytreeclassifiers basedonaclass C ofpossibleBorelsubsetsof Rd thatcanbeusedforsplits.A typicalexampleofsuchaclassisthefamilyofallhyperplanes,ortheclassofall hyperplanesthatareperpendiculartooneoftheaxes.Higherorderpolynomial splittingsurfacescanbeimaginedaswell.Theclassisparametrizedbyavector σ ∈ Rp .Thereisasplittingfunction f (x,σ ), x ∈ Rd ,σ ∈ Rp ,suchthat Rd is partitionedinto A = {x ∈ Rd : f (x,σ ) ≥ 0} and B = {x ∈ Rd : f (x,σ ) < 0}. Formally,acellularsplitcanbeviewedasafamilyofmeasurablemappings (σm )m from(Rd ×{0, 1})m to Rp .Inthismodel, m isthesizeofthedataset transmittedtothecell.Thus,foreachpossibleinputsize m,wehaveamap.In addition,thereisafamilyofmeasurablemappings(θm )m from(Rd ×{0, 1})m to {0, 1} thatindicatedecisions: θm =1indicatesthatasplitshouldbeapplied, while θm =0correspondstoadecisionnottosplit.Inthatcase,thecellacts asaleafnodeinthetree.Wenotethat(θm )m and(σm )m correspondtothe decisionsgivenin ① and ② Letthesetdatasetbe Dn .If θ (Dn )=0,therootcellisfinal,andthespace isnotsplit.Otherwise, Rd issplitinto
A = x ∈ Rd : f (x,σ (Dn )) ≥ 0 and B = x ∈ Rd : f (x,σ (Dn )) < 0 .
Thedata Dn arepartitionedintotwogroups–thefirstgroupcontainsall(Xi ,Yi ), i =1,...,n,forwhich Xi ∈ A,andthesecondgroupallothers.Thegroupsare senttochildcells,andtheprocessisrepeated.When x ∈ Rd needstobeclassified,wefirstdeterminetheuniqueleafset A(x)towhich x belongs,andthen
CellularTreeClassifiers11
takevotesamongthe {Yi : Xi ∈ A(x),i =1,...,n}.Classificationproceedsby amajorityvote,withthemajoritydecidingtheestimate gn (x).Incaseofatie, weset gn (x)=0.
Acellularbinarytreeclassifierissaidtoberandomizedifeachnodeinthe treehasanindependentcopyofauniform[0, 1]randomvariableassociatedwith it,and θ and σ aremappingsthathaveoneextrareal-valuedcomponentin theinput.Forexample,wecouldflipanunbiasedcoinateachnodetodecide whether θm =0or θm =1.
4AConsistentCellularTreeClassifier
Atfirstsight,itappearsthattherearenouniversallyconsistentcellulartree classifiers.Considerforexamplecompletebinarytreeswith k fulllevels,i.e., thereare2k leafregions.Wecanhaveconsistencywhen k isallowedtodepend upon n.Anexampleisthemediantree(seeSection20.3in[7]).When d =1, splitbyfindingthemedianelementamongthe Xi ’s,sothatthechildsetshave cardinalitygivenby (n 1)/2 and (n 1)/2 ,where and arethefloor andceilingfunctions.Themedianitselfdoesstaybehindandisnotsentdownto thesubtrees,withanappropriateconventionforbreakingcellboundariesaswell asemptycells.Keepdoingthisfor k rounds—in d dimensions,onecaneither rotatethroughthecoordinatesformediansplitting,orrandomizebyselecting uniformlyatrandomacoordinatetosplitorthogonally.
Thisruleisknowntobeconsistentassoonasthemarginaldistributionsof X arenonatomic,provided k →∞ and k 2k /n → 0.However,thisisnotacellular treeclassifier.Whilewecanindeedspecify σm ,itisimpossibletodefine θm because θm cannotbeafunctionoftheglobalvalueof n.Inotherwords,ifwe weretoapplymediansplittinganddecidetosplitforafixed k ,thentheleaf nodeswouldallcorrespondtoafixedproportionofthedatapoints.Itisclear thatthedecisionsintheleavesareoffwithafairprobabilityifwehave,for example, Y independentof X and P{Y =1} =1/2.Thus,wecannotcreatea cellulartreeclassifierinthismanner.
Inviewoftheprecedingdiscussion,itseemsparadoxicalthatthereindeed existuniversallyconsistentcellulartreeclassifiers.(Wenoteherethatweabuse theword“universal”—wewillassumethroughout,tokeepthediscussionata manageablelevel,thatthemarginaldistributionsof X arenonatomic.Butno otherconditionsonthejointdistributionof(X,Y )areimposed.)Ourconstructionfollowsthemediantreeprincipleandusesrandomization.Theoriginalwork onthesolutionappearsin[4].
Fromnowon,tokeepthingssimple,itisassumedthatthemarginaldistributionsof X arenonatomic.Thecellularsplittingmethod σm describedin thissectionmimicsthemediantreeclassifierdiscussedabove.Wefirstchoose adimensiontocut,uniformlyatrandomfromthe d dimensions,asrotating throughthedimensionsbylevelnumberwouldviolatethecellularcondition. Theselecteddimensionisthensplitatthedatamedian,justasintheclassical mediantree.Repeatingthisfor k levelsofnodesleadsto2k leafregions.Onany
12G.BiauandL.Devroye
pathoflength k tooneofthe2k leaves,wehaveadeterministicsequenceofcardinalities n0 = n(root),n1 ,n2 ,...,nk .Wealwayshave ni /2 1 ≤ ni+1 ≤ ni /2. Thus,byinduction,oneeasilyshowsthat,forall i,
Inparticular,eachleafhasatleastmax(n/2k 2, 0)pointsandatmost n/2k . Thenoveltyisinthechoiceofthedecisionfunction.Thisfunctionignoresthe dataaltogetherandusesarandomizeddecisionthatisbasedonthesizeofthe input.Moreprecisely,consideranonincreasingfunction ϕ : N → (0, 1]with ϕ(0)= ϕ(1)=1.Cellscorrespondinanaturalwaytosetsof Rd .So,wecan andwillspeakofacell A,where A ⊂ Rd .Thenumberofdatapointsin A is denotedby N (A):
Then,if U istheuniform[0, 1]randomvariableassociatedwiththecell A and theinputtothecellis N (A),thestoppingrule ① takestheform:
① Put θ =0 if
≤ ϕ (N (A))
Inthismanner,weobtainapossiblyinfiniterandomizedbinarytreeclassifier. Splittingoccurswithprobability1 ϕ(m)oninputsofsize m.Notethatno attemptismadetosplitemptysetsorsingletonsets.Forconsistency,weneed tolookattherandomleafregiontowhich X belongs.Thisisroughlyequivalent tostudyingthedistancefromthatcelltotherootofthetree.
Inthesequel,thenotation un =o(vn )(respectively, un = ω (vn )and un = O(vn ))meansthat un /vn → 0(respectively, vn /un → 0and un ≤ Cvn forsome constant C )as n →∞.Manychoices ϕ(m)=o(1),butnotall,willdoforus. Thenextlemmamakesthingsmoreprecise.
Lemma1. Let β ∈ (0, 1).Define
Let K (X) denotetherandompathdistancebetweenthecellof X andtherootof thetree.Then lim n→∞ P {K (X) ≥ kn } =
0 if kn = ω (logβ n) 1 if kn =o(logβ n).
Proof. Letusrecallthat,atlevel k ,eachcelloftheunderlyingmediantree containsatleastmax(n/2k 2, 0)pointsandatmost n/2k .Sincethefunction ϕ(.)isnonincreasing,thefirstresultfollowsfromthis:
CellularTreeClassifiers13
n 2i 2 ≤ ni ≤ n 2i
N (A)= n i=1 1[Xi ∈A] .
U
ϕ
1
1
β
(m)=
if m< 3
/log
m if m ≥ 3