Table Of ContentTowards a More Principled Compiler:
Register Allocation and Instruction Selection
Revisited
David Ryan Koes
CMU-CS-09-157
October 2009
SchoolofComputerScience
CarnegieMellonUniversity
Pittsburgh,PA15213
ThesisCommittee:
SethCopenGoldstein,Chair
PeterLee
AnupamGupta
MichaelD.Smith,HarvardUniversity
Submittedinpartialfulfillmentoftherequirements
forthedegreeofDoctorofPhilosophy.
Copyright©2009DavidRyanKoes
ThisresearchwassponsoredbytheNationalScienceFoundationundergrantnumbersCCF-0702640,CCR-0205523,
EIA-0220214,andIIS-0117658;andHewlettPackardundergrantnumber1010162.
The views and conclusions contained in this document are those of the author and should not be interpreted as
representingtheofficialpolicies,eitherexpressedorimplied,ofanysponsoringinstitution,theU.S.governmentor
anyotherentity.
Keywords: Compilers,RegisterAllocation,InstructionSelection,BackendOptimization
ForMary,Andrew,andAlex
ButespeciallyforMary
iv
Abstract
Backend optimizations are a critical part of an optimizing compiler. This thesis
develops a principled approach for understanding, evaluating, and solving backend
optimization problems. Our principled approach is to develop a comprehensive and
expressive model of the backend optimization problem, and design solution tech-
niques for this model that achieve or approach optimality. We apply our principled
approachtotheclassicalbackendoptimizationsofregisterallocationandinstruction
selection.
Wedevelopanexpressivemodelofregisterallocationbasedonmulti-commodity
network flow. This model exactly represents the complexities of the target architec-
ture. Wedesignprogressivesolutiontechniquesforourmodel. Progressivesolution
techniques quickly find an initial solution and then improve upon the solution as
moretimeisallottedforcompilation. Ourprogressiveallocatorallowstheprogram-
mer to explicitly manage the trade-off between compile-time and code quality. As
more time is allowed for compilation, the resulting allocation approaches optimal,
andsubstantialimprovementsincodequalityareobtained.
Wedescribeanexpressivedirectedacyclicgraphrepresentationoftheinstruction
selection problem and develop a near-optimal, linear-time algorithm that solves the
instruction selection problem using this expressive model. Our principled approach
toinstructionselectionresultsinsignificantimprovementsincodequalitycompared
totraditionalalgorithms.
We evaluate our principled approaches to register allocation and instruction se-
lection on a range of architectures and benchmarks. We achieve significant reduc-
tionsincodesizeandincreasesinperformancerelativetopreviousapproaches. Our
resultsconfirmthatourprincipledapproachisamajoradvanceinthestateoftheart
ofbackendoptimization.
Contents
1 Introduction 1
1.1 ProblemDescription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 RegisterAllocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.2 InstructionSelection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 RelatedWork 9
2.1 RegisterAllocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 GraphColoringRegisterAllocation . . . . . . . . . . . . . . . . . . . . 10
2.1.2 SSARegisterAllocators . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.3 LinearScanAllocators . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.4 AlternativeHeuristicAllocators . . . . . . . . . . . . . . . . . . . . . . 20
2.1.5 OptimalRegisterAllocation . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.6 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2 InstructionSelection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3 GlobalMCNFRegisterAllocationModel 29
3.1 Multi-commodityNetworkFlow . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 LocalRegisterAllocationModel . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.1 SourceNodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.2 SinkNodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.3 AllocationClassNodes . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.4 CrossbarGroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2.5 InstructionGroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.6 FullModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3 GlobalRegisterAllocationModel . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.4 PersistentMemory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.5 ModelingCosts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.6 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.7 HardnessofSingleGlobalFlow . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.8 Simplifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4 EvaluationMethodology 67
vi
4.1 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2 CodeQualityMetrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.2.1 CodeSize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.2.2 CodePerformance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.3 InstructionSetArchitectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.3.1 x86-32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.3.2 x86-64 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.3.3 ARM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.3.4 Thumb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.4 Microarchitectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5 HeuristicRegisterAllocation 75
5.1 IterativeHeuristicAllocator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.1.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.1.2 Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.1.3 AsymptoticAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.2 SimultaneousHeuristicAllocator . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.2.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.2.2 Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.2.3 AsymptoticAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.3 BoundaryConstraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.3.1 AsymptoticAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.4 HybridAllocator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.5 CompileTime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6 ProgressiveRegisterAllocation 115
6.1 RelaxationTechniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.1.1 LinearProgrammingRelaxation . . . . . . . . . . . . . . . . . . . . . . 116
6.1.2 LagrangianRelaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.2 SubgradientOptimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.2.1 FlowCalculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.2.2 StepUpdate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.2.3 PriceUpdate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.2.4 PriceInitialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.3 ProgressiveRegisterAllocation . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.3.1 CodeQuality: Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.3.2 CodeQuality: Performance . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.3.3 Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.3.4 CompileTime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
7 Near-OptimalLinear-TimeInstructionSelection 151
vii
7.1 ProblemDescriptionandHardness . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.2 NOLTIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.3 0-1ProgrammingSolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
7.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
7.5.1 Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
7.5.2 ComparisonofAlgorithms . . . . . . . . . . . . . . . . . . . . . . . . . 166
7.5.3 ImpactonCodeSize . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
7.5.4 CompileTimePerformance . . . . . . . . . . . . . . . . . . . . . . . . . 167
7.6 LimitationsandFutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
7.7 InteractionwithRegisterAllocation . . . . . . . . . . . . . . . . . . . . . . . . 169
7.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
8 Conclusion 171
Bibliography 173
viii
List of Figures
1.1 Thestructureofatypicalcompiler. . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Simpleregisterallocationexample. . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Anexampleofinstructionselectiononatree-basedIR. . . . . . . . . . . . . . . 6
2.1 Theflowofatraditionalgraphcoloringalgorithm. . . . . . . . . . . . . . . . . 10
2.2 Liverangesandthecorrespondinginterferencegraph. . . . . . . . . . . . . . . . 10
2.3 Anexampleofthesimplifyandselectphasesofagraphcoloringallocator. . . . . 11
2.4 Thelinearorderingofbasicblocks,liveintervals,andlifetimeholes. . . . . . . . 16
2.5 Resultofsimplelinearscanandsecond-chancebinpackinglinearscan. . . . . . . 17
2.6 Percentoffunctionswhichdonotspill. . . . . . . . . . . . . . . . . . . . . . . . 23
2.7 Decreaseincodequalityresultingfromspillcodeandassignmentheuristics. . . . 24
2.8 Theeffectofvariouscomponentsofregisterallocation. . . . . . . . . . . . . . . 25
3.1 Asimpleexampleofamulti-commoditynetworkflowproblem. . . . . . . . . . 30
3.2 Asimpleexampleoflocalregisterallocation. . . . . . . . . . . . . . . . . . . . 34
3.3 SourcenodesofaMCNFmodelofregisterallocation. . . . . . . . . . . . . . . . 35
3.4 SinknodesofaMCNFmodelofregisterallocation. . . . . . . . . . . . . . . . . 37
3.5 CrossbargroupsforthelocalregisterallocationproblemofFigure3.2. . . . . . . 38
3.6 Twopossiblecrossbargroupnetworkstructures. . . . . . . . . . . . . . . . . . . 39
3.7 InstructiongroupsforthelocalregisterallocationproblemofFigure3.2. . . . . . 41
3.8 ThefullMCNFmodelofthelocalregisterallocationproblemofFigure3.2. . . . 43
3.9 Asimplecontrolflowgraph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.10 ThethreetypesofflownodesintheglobalMCNFmodelofregisterallocation. . 44
3.11 EntryandexitgroupsofaglobalMCNFmodelofregisterallocation. . . . . . . . 45
3.12 Acrossbargroupwithnodesforanti-variables. . . . . . . . . . . . . . . . . . . 48
3.13 Anetworkthatdemonstratesvaluemodification,loadremat. andanti-variables. . 49
3.14 TheaccuracyofthecodesizeglobalMCNFcostmode. . . . . . . . . . . . . . . 53
3.15 Impactofsingle-executioncostsondynamicmemoryoperations. . . . . . . . . . 54
3.16 Impactonperformanceofvaryingsingle-executioncosts. . . . . . . . . . . . . . 55
3.17 Decreaseincodequalitywhencoalescingisseparatedfromanoptimalallocator . 58
3.18 AnexampleofareductionfromglobalMCNFtominimumgraphlabeling. . . . 62
3.19 Decreaseincodequalitywhenmoveinsertionisrestrictedinanoptimalallocator. 65
5.1 Anexampleofthebehavioroftheiterativeheuristicallocator. . . . . . . . . . . 78
5.2 Asimpleexampleofglobalvariableusage. . . . . . . . . . . . . . . . . . . . . 81
5.3 Theimportanceofblockorderingintheiterativeallocator. . . . . . . . . . . . . 84
ix
5.4 Theimportanceoftiebreakingstrategiesintheiterativeallocator. . . . . . . . . 85
5.5 Runningtimeofiterativeallocatorforallbenchmarkedfunctions. . . . . . . . . 87
5.6 Exampleexecutionofthesimultaneousheuristicallocator. . . . . . . . . . . . . 90
5.7 Exampleevictiondecisionsinthesimultaneousheuristicallocator. . . . . . . . . 94
5.8 Effectoftiebreakingheuristicsoncodequalityinthesimultaneousallocator . . 97
5.9 Anexamplecontrolflowgraphdecomposedintotraces. . . . . . . . . . . . . . . 98
5.10 Effectoftracedecompositionsoncodequalityinthesimultaneousallocator. . . . 99
5.11 Effectoftraceupdatepolicyoncodequalityinthesimultaneousallocator . . . . 102
5.12 Runningtimeofthesimultaneousallocatorforallbenchmarkedfunctions. . . . . 104
5.13 ACFGthatillustratesthesubtletiesofsettingboundaryconstraints. . . . . . . . 105
5.14 Codesizeimprovementofheuristicallocators. . . . . . . . . . . . . . . . . . . 109
5.15 Codesizeimprovementofheuristicallocators. . . . . . . . . . . . . . . . . . . 110
5.16 Memoryoperationreductionofheuristicallocators. . . . . . . . . . . . . . . . . 111
5.17 Averagecodequalityimprovementofheuristicallocators . . . . . . . . . . . . . 112
5.18 Slowdownofvariousallocatorsrelativetoextendedlinearscan. . . . . . . . . . . 113
6.1 Thepercentageoffunctionsthatdemonstrateanintegralitygap. . . . . . . . . . 117
6.2 LinearprogrammingsolutiontimesoftheglobalMCNFproblem. . . . . . . . . 118
6.3 Convergencebehaviorofthebasicsubgradientoptimizationalgorithm. . . . . . . 122
6.4 Convergenceofsubgradientoptimizationwithdifferentflowcalculations. . . . . 124
6.5 Graphicaldepictionoffiveratiostepupdaterules. . . . . . . . . . . . . . . . . 125
6.6 Convergenceofsubgradientoptimizationwithdifferentstepupdaterules. . . . . 126
6.7 ConvergenceofthesubgradientoptimizationwithNewton’smethodstepupdate. 128
6.8 Examplepricebehaviorusingdifferentpriceupdatestrategies. . . . . . . . . . . 129
6.9 Convergenceofsubgradientoptimizationwithdifferentpriceupdatestrategies. . 131
6.10 Effectofpriceinitializationontheinitiallowerbound. . . . . . . . . . . . . . . 134
6.11 Convergenceofsubgradientoptimizationwithdifferentpriceinitializations. . . . 134
6.12 Convergenceofheuristicpriceinitializationwithdifferentinitialallocations. . . . 135
6.13 Thebehaviorofthreeheuristicallocatorswithinaprogressiveallocator. . . . . . 137
6.14 Averagecodesizeimprovementoftheprogressiveallocator. . . . . . . . . . . . 138
6.15 Codesizeimprovementoftheprogressiveallocator. . . . . . . . . . . . . . . . 139
6.16 Codesizeimprovementoftheprogressiveallocator. . . . . . . . . . . . . . . . 140
6.17 Averagememoryoperationreductionoftheprogressiveallocator. . . . . . . . . 142
6.18 Averageperformanceimprovementoftheprogressiveallocator. . . . . . . . . . 142
6.19 Memoryoperationreductionoftheprogressiveallocator. . . . . . . . . . . . . . 143
6.20 Codeperformanceimprovementoftheprogressiveallocatorforx86-32. . . . . . 144
6.21 Codeperformanceimprovementoftheprogressiveallocatorforx86-64. . . . . . 145
6.22 Effectofblockfrequencyestimatoroncodequality. . . . . . . . . . . . . . . . . 146
6.23 Codesizeoptimalityboundsofprogressiveallocator. . . . . . . . . . . . . . . . 148
6.24 Codeperformanceoptimalityboundsofprogressiveallocator. . . . . . . . . . . . 149
6.25 Registerallocationtimebreakdownofprogressiveallocator. . . . . . . . . . . . 149
7.1 Anexampleofinstructionselectionasatilingproblem. . . . . . . . . . . . . . . 152
7.2 ExpressingBooleansatisfiabilityasaninstructionselectionproblem. . . . . . . . 154
x
Description:Keywords: Compilers, Register Allocation, Instruction Selection, Backend
solve the halting problem (halting side-effect free code would be replaced by a
nop.