Table Of ContentLecture Notes in Mathematics 2033
Editors:
J.-M.Morel,Cachan
B.Teissier,Paris
Subseries:
E´coled’E´te´deProbabilite´sdeSaint-Flour
Forfurthervolumes:
http://www.springer.com/series/304
Saint-FlourProbabilitySummerSchool
The Saint-Flour volumes are reflections of the courses given at the
Saint-Flour Probability Summer School. Founded in 1971, this school is
organised every year by the Laboratoire de Mathe´matiques (CNRS and
Universite´BlaisePascal,Clermont-Ferrand,France).ItisintendedforPhD
students,teachersandresearcherswhoareinterestedinprobabilitytheory,
statistics,andintheirapplications.
The duration of each school is 13 days (it was 17 days up to 2005), and
up to 70 participants can attend it. The aim is to provide, in three high-
levelcourses, a comprehensivestudy of some fields in probabilitytheory
orStatistics. Thelecturersarechosenbyaninternationalscientific board.
Theparticipantsthemselvesalsohavetheopportunitytogiveshortlectures
abouttheirresearchwork.
Participantsarelodgedandworkinthesamebuilding,aformerseminary
builtinthe18thcenturyinthecityofSaint-Flour,atanaltitudeof900m.
Thepleasantsurroundingsfacilitatescientificdiscussionandexchange.
TheSaint-FlourProbabilitySummerSchoolissupportedby:
– Universite´BlaisePascal
– CentreNationaldelaRechercheScientifique(C.N.R.S.)
– Ministe`rede´le´gue´a` l’Enseignementsupe´rieureta` laRecherche
Formoreinformation,seebackpagesofthebookand
http://math.univ-bpclermont.fr/stflour/
JeanPicard
SummerSchoolChairman
LaboratoiredeMathe´matiques
Universite´BlaisePascal
63177Aubie`reCedex
France
Vladimir Koltchinskii
Oracle Inequalities
in Empirical Risk
Minimization and Sparse
Recovery Problems
´ ´
Ecole d’Ete´ de Probabilite´s
de Saint-Flour XXXVIII-2008
123
Prof.VladimirKoltchinskii
GeorgiaInstituteofTechnology
SchoolofMathematics
Atlanta
USA
[email protected]
ISBN978-3-642-22146-0 e-ISBN978-3-642-22147-7
DOI10.1007/978-3-642-22147-7
SpringerHeidelbergDordrechtLondonNewYork
LectureNotesinMathematicsISSNprintedition:0075-8434
ISSNelectronicedition:1617-9692
LibraryofCongressControlNumber:2011934366
MathematicsSubjectClassification(2010):62J99,62H12,60B20,60G99
(cid:2)c Springer-VerlagBerlinHeidelberg2011
Thisworkissubjecttocopyright.Allrightsarereserved,whetherthewholeorpartofthematerialis
concerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation,broadcasting,
reproductiononmicrofilmorinanyotherway,andstorageindatabanks.Duplicationofthispublication
orpartsthereofispermittedonlyundertheprovisionsoftheGermanCopyrightLawofSeptember9,
1965,initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer.Violations
areliabletoprosecutionundertheGermanCopyrightLaw.
Theuseofgeneral descriptive names,registered names, trademarks, etc. inthis publication does not
imply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevantprotective
lawsandregulationsandthereforefreeforgeneraluse.
Coverdesign:deblik,Berlin
Printedonacid-freepaper
SpringerispartofSpringerScience+BusinessMedia(www.springer.com)
Preface
The purpose of these lecture notes is to provide an introduction to the general
theory of empirical risk minimization with an emphasis on excess risk bounds
andoracleinequalitiesin penalizedproblems.Intherecentyears,therehavebeen
new developments in this area motivated by the study of new classes of methods
in machine learning such as large margin classification methods (boosting,kernel
machines).Themainprobabilistictoolsinvolvedintheanalysisoftheseproblems
areconcentrationanddeviationinequalitiesbyTalagrandalongwithothermethods
of empirical processes theory (symmetrization inequalities, contraction inequality
forRademachersums,entropyandgenericchainingbounds).Sparserecoverybased
on `1-type penalization and low rank matrix recovery based on the nuclear norm
penalization are other active areas of research, where the main problems can be
statedintheframeworkofpenalizedempiricalriskminimization,andconcentration
inequalitiesandempiricalprocessestoolsprovedtobeveryuseful.
My interest in empirical processes started in the late 1970s and early 1980s.
It was largely influenced by the work of Vapnik and Chervonenkis on Glivenko–
Cantelli problem and on empirical risk minimization in pattern recognition, and,
especially,bytheresultsofDudleyonuniformcentrallimittheorems.Talagrand’s
concentrationinequality provedin the 1990s was a major result with deep conse-
quences in the theory of empirical processes and related areas of statistics, and it
inspiredmanynewapproachesinanalysisofempiricalriskminimizationproblems.
Over the last years, the work of many people have had a profound impact on
my own research and on my view of the subject of these notes. I was lucky to
worktogetherwithseveralofthemandtohavenumerousconversationsandemail
exchanges with many others. I am especially thankful to Peter Bartlett, Lucien
Birge´, Gilles Blanchard, Stephane Boucheron, Olivier Bousquet, Richard Dudley,
Sara van de Geer, Evarist Gine´, Gabor Lugosi, Pascal Massart, David Mason,
Shahar Mendelson, Dmitry Panchenko, Alexandre Tsybakov, Aad van der Vaart,
JonWellnerandJoelZinn.
v
vi Preface
IamthankfultotheSchoolofMathematics,GeorgiaInstituteofTechnologyand
totheDepartmentofMathematicsandStatistics,UniversityofNewMexicowhere
mostofmyworkforthepastyearshavetakenplace.
TheresearchdescribedinthesenoteshasbeensupportedinpartbyNSFgrants
DMS-0304861,MSPA-MPS-0624841,DMS-0906880andCCF-0808863.
I was working on the initial draft while visiting the Isaac Newton Institute for
MathematicalSciencesinCambridgein2008.IamthankfultotheInstituteforits
hospitality.
I am also very thankful to Jean-Yves Audibert, Kai Ni and Jon Wellner who
providedmewiththelistsoftyposandinconsistenciesinthefirstdraft.
Iamverygratefultoallthestudentsandcolleagueswhoattendedthelecturesin
2008andwhosequestionsmotivatedmanyofthechangesIhavemadesincethen.
I am especially thankfulto Jean Picard for all his effortsthat make The Saint-
FlourSchooltrulyunique.
Atlanta VladimirKoltchinskii
February2011
Contents
1 Introduction .................................................................. 1
1.1 AbstractEmpiricalRiskMinimization................................ 1
1.2 ExcessRisk:DistributionDependentBounds ........................ 3
1.3 RademacherProcessesandDataDependentBounds
onExcessRisk.......................................................... 5
1.4 PenalizedEmpiricalRiskMinimizationandOracleInequalities.... 7
1.5 ConcreteEmpiricalRiskMinimizationProblems.................... 8
1.6 SparseRecoveryProblems............................................. 11
1.7 RecoveringLowRankMatrices ....................................... 13
2 EmpiricalandRademacherProcesses..................................... 17
2.1 SymmetrizationInequalities........................................... 18
2.2 ComparisonInequalitiesforRademacherSums...................... 19
2.3 ConcentrationInequalities............................................. 23
2.4 ExponentialBoundsforSumsofIndependentRandomMatrices ... 26
2.5 FurtherComments...................................................... 31
3 Bounding Expected Sup-Norms of Empirical
andRademacherProcesses ................................................. 33
3.1 GaussianandSubgaussianProcesses,MetricEntropies
andGenericChainingComplexities................................... 33
3.2 FiniteClassesofFunctions ............................................ 36
3.3 ShatteringNumbersandVC-classesofSets.......................... 38
3.4 UpperEntropyBounds................................................. 42
3.5 LowerEntropyBounds................................................. 48
3.6 GenericChainingComplexitiesandBoundingEmpirical
ProcessesIndexedbyF2 .............................................. 52
3.7 FunctionClassesinHilbertSpaces.................................... 54
3.8 FurtherComments...................................................... 57
vii
viii Contents
4 ExcessRiskBounds.......................................................... 59
4.1 DistributionDependentBoundsand Ratio Bounds
forExcessRisk......................................................... 59
4.2 RademacherComplexitiesandDataDependentBounds
onExcessRisk.......................................................... 70
4.3 FurtherComments...................................................... 78
5 ExamplesofExcessRiskBoundsinPredictionProblems............... 81
5.1 RegressionwithQuadraticLoss....................................... 82
5.2 EmpiricalRiskMinimizationwithConvexLoss ..................... 88
5.3 BinaryClassificationProblems........................................ 91
5.4 FurtherComments...................................................... 97
6 Penalized Empirical Risk Minimization
andModelSelectionProblems ............................................. 99
6.1 PenalizationinMonotoneFamiliesFk ............................... 101
6.2 PenalizationbyEmpiricalRiskMinima............................... 104
6.3 LinkingExcessRiskandVarianceinPenalization ................... 110
6.4 FurtherComments...................................................... 118
7 LinearProgramminginSparseRecovery................................. 121
7.1 SparseRecoveryandNeighborlinessofConvexPolytopes.......... 121
7.2 GeometricPropertiesoftheDictionary ............................... 123
7.2.1 ConesofDominantCoordinates .............................. 123
7.2.2 RestrictedIsometryConstantsandRelated
Characteristics.................................................. 126
7.2.3 AlignmentCoefficients ........................................ 130
7.3 SparseRecoveryinNoiselessProblems............................... 133
7.4 TheDantzigSelector................................................... 140
7.5 FurtherComments...................................................... 149
8 ConvexPenalizationinSparseRecovery.................................. 151
8.1 GeneralAspectsofConvexPenalization.............................. 151
8.2 `1-PenalizationandOracleInequalities............................... 157
8.3 Entropy Penalization and Sparse Recovery
inConvexHulls:RandomErrorBounds.............................. 175
8.4 ApproximationErrorBounds,AlignmentandOracle
Inequalities.............................................................. 186
8.5 FurtherComments...................................................... 189
9 LowRankMatrixRecovery:NuclearNormPenalization .............. 191
9.1 Geometric Parameters of Low Rank Recovery
andOtherPreliminaries................................................ 191
9.2 MatrixRegressionwithFixedDesign................................. 196
9.3 MatrixRegressionwithSubgaussianDesign ......................... 203
9.4 OtherTypesofDesigninMatrixRegression ......................... 219
9.5 FurtherComments...................................................... 233
Contents ix
A AuxiliaryMaterial........................................................... 235
A.1 OrliczNorms ........................................................... 235
A.2 ClassicalExponentialInequalities..................................... 236
A.3 Propertiesof]-and[-Transforms ..................................... 237
A.4 SomeNotationsandFactsinLinearAlgebra......................... 238
References......................................................................... 241
Index............................................................................... 249
Programmeoftheschool........................................................ 251
Listofparticipants............................................................... 253