Table Of ContentDesigning Reliable and
Efficient Networks on Chips
Lecture Notes in Electrical Engineering
Volume 34
Forothertitlespublishedinthisseries,goto
www.springer.com/series/7818
Srinivasan Murali
Designing Reliable and
Efficient Networks on Chips
Dr.SrinivasanMurali
INF331,Station14,EPFL
1015Lausanne
Switzerland
srinivasan.murali@epfl.ch
ISBN 978-1-4020-9756-0 e-ISBN 978-1-4020-9757-7
DOI 10.1007/978-1-4020-9757-7
LibraryofCongressControlNumber:2008944292
©2009SpringerScience+BusinessMediaB.V.
Nopartofthisworkmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorby
anymeans,electronic,mechanical,photocopying,microfilming,recordingorotherwise,withoutwritten
permissionfromthePublisher,withtheexceptionofanymaterialsuppliedspecificallyforthepurpose
ofbeingenteredandexecutedonacomputersystem,forexclusiveusebythepurchaserofthework.
Printedonacid-freepaper
9 8 7 6 5 4 3 2 1
springer.com
Preface
ThecomplexityofMultiprocessorSystemsonChips(MPSoCs)isgrowingrapidly
with the advances in semiconductor technology. The number of processors, hard-
warecores,andmemoriesonasinglechipisincreasingandahighly-scalablecom-
municationinfrastructureis requiredtoconnectthem.Toeffectivelytacklethein-
terconnectcomplexityofcurrentandfutureMPSoCs,acommunication-centricde-
sign approach, Networks on Chips (NoCs), has recently emerged. NoCs bring the
networking principles for data transfer, such as those used in large area networks
(e.g.,theInternet),totheon-chipdomain.
DevelopingNoC-basedsystemstailoredtoaparticularapplicationdomain,sat-
isfyingtheapplicationperformanceconstraintswithminimumpower-areaoverhead
isamajorchallenge.Withtechnologyscaling,asthegeometriesofon-chipdevices
reachthephysicallimitsofoperation,anotherimportantdesignchallengeforNoCs
will be to provide dynamic (run-time) support against permanent and intermittent
faultsthatcanoccurinthesystem.
Thepurposeofthisbookistoprovidestate-of-the-artmethodstosolvesomeof
the most important and time-intensive problems encountered during NoC design.
Wepresentmethodsfortopologysynthesis,mappingofcoresontoNoCtopologies,
crossbar sizing, route generation, resource reservation, achieving fault-tolerance,
RTLcode,andlayoutgeneration.Weshowhowthedifferentdesignmethodscanbe
integratedtomakeacompletetoolflowfordesigningreliableandefficientNoCsfor
application-specificMPSoCsandchipmultiprocessors.Tohavelessdesignrespins
and faster time-to-market,we show how the architectural synthesis models can be
integratedwithback-endphysicaldesigntoolsandmodels,therebybridgingabig
designgapinon-chipinterconnectsynthesis.
Keyfeaturesofbook:
• Presentsindepththestate-of-the-artalgorithmsandoptimizationmodelsforper-
formingsystem-leveldesignofNoCs
• Presents an integrated flow to design interconnect architectures that can lead to
fastertime-to-marketanddesignclosure
• ShowsevolutionofdesignmethodsfromcomplexcrossbarbasedbusestoNoCs
• Presentsstaticandrun-timemethodsforachievingreliableoperationoftheNoC
andtheentiresystem
Thisbookshouldbeofinterestto:
• Systemlevelarchitectsanddesigners:Themethodsshowhowtoimprovedesign
productivityandachievedesignclosureofSoCs.
• Communicationarchitecture/interconnectdesigners:Themethodsshowtrade-off
analysisandexplorationsofNoCs.
vi Preface
• Designautomationengineers:Thehigh-levelsynthesismethodsandmathemati-
calmodelspresentedinthisbookcanbeappliedtosolveseveralcommunication
architectureissues.Theyarealsoofgeneralinteresttodesignersworkinginre-
latedfields,suchassensor,body-area,andautomotivenetworks.
ThisbookisbasedonmyPh.D.researchworkdoneatStanfordUniversity.Iam
greatlyindebtedtomyadviserProf.GiovanniDeMicheliandco-adviserProf.Luca
Benini(UniversityofBologna),astheywereinstrumentalinshapingtheideaspre-
sentedhere.Theworkisaresultofcollaborationwithmanyresearchers.Ithankall
mycollaborators:Dr.FedericoAngioliniandAntonioPulliniofiNoCs,Prof.David
Atienza (EPFL), Dr. Kees Goossens and his team (Dr. Andrei Radulescu, Mar-
tijnCoenen,AndreasHansson)atNXPresearch,Prof.DavideBertozzi(University
of Ferrara), Rutuparna Tamhankar (Marvell Technology), Prof. N. VijayKrishnan,
Prof.MaryJaneIrvinandDr.TheocharisTheocharidesatPennsylvaniaStateUni-
versity,Prof.SalvatoreCarta,PaoloMeloniandProf.LuigiRaffoofUniversityof
Cagliarifortheircontributionstothiswork.
EPFL,Lausanne,Switzerland SrinivasanMurali
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 NetworksonChips:ScalableInterconnectsforSoCs . . . . . . . . 1
1.2 NoCDesignChallenges . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 BookOverview . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 NoCDesignMethods . . . . . . . . . . . . . . . . . . . . 5
1.3.2 NoCReliabilityMechanisms . . . . . . . . . . . . . . . . 7
1.4 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.1 NoCArchitecturesandDesignMethods . . . . . . . . . . 8
1.4.2 ReliabilitySupportforNoCs . . . . . . . . . . . . . . . . 10
PartI NoCDesignMethods
2 DesigningCrossbarBasedSystems . . . . . . . . . . . . . . . . . . . 15
2.1 ProblemMotivationandApplicationTrafficAnalysis . . . . . . . 17
2.1.1 ProblemMotivation . . . . . . . . . . . . . . . . . . . . . 17
2.1.2 ApplicationTrafficAnalysis . . . . . . . . . . . . . . . . . 19
2.2 DesignMethodology. . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 ExactApproachtoCrossbarSynthesis . . . . . . . . . . . . . . . 22
2.3.1 ProblemFormulation . . . . . . . . . . . . . . . . . . . . 22
2.3.2 ExactCrossbarSynthesisAlgorithm . . . . . . . . . . . . 24
2.4 HeuristicApproachtoCrossbarSynthesis . . . . . . . . . . . . . 24
2.5 ExperimentsandCaseStudies . . . . . . . . . . . . . . . . . . . . 28
2.5.1 ExperimentalPlatformandPowerModels . . . . . . . . . 28
2.5.2 ApplicationBenchmarkAnalysis . . . . . . . . . . . . . . 29
2.5.3 ComparisonsofHeuristicEnginewiththeExactEngine . . 32
2.5.4 WindowSizing. . . . . . . . . . . . . . . . . . . . . . . . 34
2.5.5 Real-TimeStreams&EffectofBinding . . . . . . . . . . 36
2.5.6 OverlapThresholdSetting . . . . . . . . . . . . . . . . . . 36
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3 NetchipToolFlowforNoCDesign . . . . . . . . . . . . . . . . . . . 39
3.1 Front-EndDesignPhase . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 ArchitecturalDesignPhase:The×pipesNoCLibrary . . . . . . . 40
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4 DesigningStandardTopologies . . . . . . . . . . . . . . . . . . . . . 43
4.1 On-ChipTrafficModeling . . . . . . . . . . . . . . . . . . . . . . 45
4.2 ProblemFormulation . . . . . . . . . . . . . . . . . . . . . . . . 47
vii
viii Contents
4.3 MappingandPhysicalPlanningAlgorithm . . . . . . . . . . . . . 50
4.4 PhysicalPlanning . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.5 ExperimentsandCaseStudies . . . . . . . . . . . . . . . . . . . . 53
4.5.1 EffectofPhysicalPlanning . . . . . . . . . . . . . . . . . 53
4.5.2 DesignforQoSGuarantees . . . . . . . . . . . . . . . . . 53
4.5.3 VOPDDesign . . . . . . . . . . . . . . . . . . . . . . . . 54
4.5.4 BufferSizingandNetworkOptimization . . . . . . . . . . 54
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5 DesigningCustomTopologies . . . . . . . . . . . . . . . . . . . . . . 57
5.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.1.1 BackgroundonNoCTopologySynthesis . . . . . . . . . . 58
5.1.2 BackgroundonDeadlock-FreeNoCDesign . . . . . . . . 59
5.2 InputModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.2.1 Area,PowerModels . . . . . . . . . . . . . . . . . . . . . 60
5.2.2 TrafficModels . . . . . . . . . . . . . . . . . . . . . . . . 62
5.3 DesignAlgorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.4 ExperimentsandCaseStudies . . . . . . . . . . . . . . . . . . . . 68
5.4.1 ExperimentsonMPSoCBenchmarks . . . . . . . . . . . . 68
5.4.2 Layout-LevelComparisons . . . . . . . . . . . . . . . . . 70
5.4.3 ImpactofFrequencyConstraints . . . . . . . . . . . . . . 72
5.4.4 HandlingDynamicEffects. . . . . . . . . . . . . . . . . . 74
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6 SupportingMultipleApplications . . . . . . . . . . . . . . . . . . . . 77
6.1 TheÆtherealNoCArchitecture . . . . . . . . . . . . . . . . . . . 78
6.1.1 Switch/NIArchitecture . . . . . . . . . . . . . . . . . . . 79
6.1.2 DynamicNoCReconfiguration . . . . . . . . . . . . . . . 79
6.2 DesignMethodology. . . . . . . . . . . . . . . . . . . . . . . . . 80
6.3 Use-CasePreprocessing . . . . . . . . . . . . . . . . . . . . . . . 82
6.4 UnifiedMapping–NoCConfiguration . . . . . . . . . . . . . . . . 83
6.5 SimulationResults . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.5.1 ExperimentalBenchmarks . . . . . . . . . . . . . . . . . . 89
6.5.2 EffectofMappingforSoCBenchmarks . . . . . . . . . . 90
6.5.3 Frequency-AreaTrade-offs . . . . . . . . . . . . . . . . . 90
6.5.4 DynamicConfiguration . . . . . . . . . . . . . . . . . . . 92
6.5.5 ParallelUse-Cases . . . . . . . . . . . . . . . . . . . . . . 93
6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7 SupportingDynamicApplicationPatterns . . . . . . . . . . . . . . . 95
7.1 NoCDesignChallengesforCMPs . . . . . . . . . . . . . . . . . 95
7.2 BasicsoftheSynthesisApproach . . . . . . . . . . . . . . . . . . 97
7.3 DesignFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.4 ProblemFormulation . . . . . . . . . . . . . . . . . . . . . . . . 99
7.5 SynthesisAlgorithm . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.5.1 NoCLinkSizing . . . . . . . . . . . . . . . . . . . . . . . 102
Contents ix
7.5.2 TimingFeasibilityCheck . . . . . . . . . . . . . . . . . . 105
7.5.3 AlgorithmRun-Time . . . . . . . . . . . . . . . . . . . . 105
7.6 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . 105
7.6.1 ExperimentsonaMeshTopology . . . . . . . . . . . . . . 106
7.6.2 EffectofCoreInjectionRates . . . . . . . . . . . . . . . . 107
7.6.3 EffectofDifferentNoCSizes . . . . . . . . . . . . . . . . 108
7.6.4 EffectofLinkLength . . . . . . . . . . . . . . . . . . . . 110
7.6.5 ApplicationtoTorusTopology . . . . . . . . . . . . . . . 110
7.6.6 ValidatingDesignFlowPredictability . . . . . . . . . . . . 111
7.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
PartII NoCReliabilityMechanisms
8 Timing-ErrorTolerantNoCDesign . . . . . . . . . . . . . . . . . . . 117
8.1 TheDoubleSamplingTechnique . . . . . . . . . . . . . . . . . . 118
8.2 UsingLinksasaStorageMedium . . . . . . . . . . . . . . . . . . 120
8.3 T-errorLinkDesigns. . . . . . . . . . . . . . . . . . . . . . . . . 123
8.3.1 Scheme1:LowoverheadT-errorLinks . . . . . . . . . . . 123
8.3.2 Scheme2:High-PerformanceT-errorLinks . . . . . . . . 126
8.4 AggressiveSwitch/NIDesign . . . . . . . . . . . . . . . . . . . . 128
8.4.1 OutputBufferChanges . . . . . . . . . . . . . . . . . . . 128
8.4.2 InputBufferChanges . . . . . . . . . . . . . . . . . . . . 129
8.5 DynamicConfigurationoftheNoC . . . . . . . . . . . . . . . . . 130
8.6 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . 131
8.6.1 SimulationPlatform . . . . . . . . . . . . . . . . . . . . . 131
8.6.2 ExperimentsonaMulti-MediaBenchmark . . . . . . . . . 131
8.6.3 EffectofApplication-LevelPowerManagement . . . . . . 134
8.6.4 ExperimentsonOtherBenchmarks . . . . . . . . . . . . . 134
8.6.5 EffectofNoCConfiguration. . . . . . . . . . . . . . . . . 138
8.6.6 ChoiceofLinkDesignSchemes. . . . . . . . . . . . . . . 138
8.6.7 SynthesisResults . . . . . . . . . . . . . . . . . . . . . . 139
8.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
9 AnalysisofNoCErrorRecoverySchemes . . . . . . . . . . . . . . . 141
9.1 SwitchArchitectureDesign . . . . . . . . . . . . . . . . . . . . . 142
9.1.1 End-to-EndErrorDetection . . . . . . . . . . . . . . . . . 142
9.1.2 Switch-to-SwitchErrorDetection . . . . . . . . . . . . . . 143
9.1.3 HybridSingleErrorCorrecting,MultipleErrorDetecting
Scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
9.2 EnergyEstimationandModels . . . . . . . . . . . . . . . . . . . 144
9.2.1 EnergyEstimation . . . . . . . . . . . . . . . . . . . . . . 144
9.2.2 ErrorModels . . . . . . . . . . . . . . . . . . . . . . . . . 144
9.3 ExperimentsandSimulationResults . . . . . . . . . . . . . . . . 144
9.3.1 PowerConsumptionofSchemesforFixedResidualError
Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
9.3.2 PerformanceComparisonofReliabilitySchemes . . . . . . 146
x Contents
9.3.3 PowerConsumptionOverheadofReliabilitySchemes . . . 146
9.3.4 EffectofBufferingRequirements,TrafficPatternsand
PacketSize . . . . . . . . . . . . . . . . . . . . . . . . . . 149
9.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
10 Fault-TolerantRouteGeneration . . . . . . . . . . . . . . . . . . . . 153
10.1 Multi-PathRoutingwithIn-OrderDelivery . . . . . . . . . . . . . 155
10.2 PathSelectionAlgorithm . . . . . . . . . . . . . . . . . . . . . . 156
10.3 MultipathTrafficSplitting . . . . . . . . . . . . . . . . . . . . . . 160
10.4 Fault-ToleranceSupportwithMultipathRouting . . . . . . . . . . 161
10.4.1 ResilienceAgainstTransientErrors . . . . . . . . . . . . . 161
10.4.2 ResilienceAgainstPermanentErrors . . . . . . . . . . . . 162
10.5 SimulationResults . . . . . . . . . . . . . . . . . . . . . . . . . . 164
10.5.1 Area,PowerandTimingOverhead . . . . . . . . . . . . . 164
10.5.2 CaseStudy:MPEGDecoder . . . . . . . . . . . . . . . . 164
10.5.3 ComparisonswithSingle-PathRouting . . . . . . . . . . . 165
10.5.4 EffectofFault-ToleranceSupport . . . . . . . . . . . . . . 166
10.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
11 NoCSupportforReliableOn-ChipMemories . . . . . . . . . . . . . 169
11.1 AnalysisofMultimediaSoftware . . . . . . . . . . . . . . . . . . 170
11.2 BaselineSoCArchitectureandExtensions . . . . . . . . . . . . . 172
11.2.1 SoCTemplateArchitecture . . . . . . . . . . . . . . . . . 172
11.2.2 ProposedHardwareExtensions . . . . . . . . . . . . . . . 173
11.3 Run-TimeFaultTolerantSchemes. . . . . . . . . . . . . . . . . . 176
11.3.1 PermanentErrorRecoverySupport . . . . . . . . . . . . . 177
11.3.2 IntermittentErrorRecoverySupport . . . . . . . . . . . . 178
11.4 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . 178
11.4.1 PerformanceStudies . . . . . . . . . . . . . . . . . . . . . 179
11.4.2 ArchitecturalExplorationofNoCFeatures . . . . . . . . . 182
11.4.3 EffectsofVaryingPercentagesofCriticalData . . . . . . . 183
11.4.4 SynthesisResults . . . . . . . . . . . . . . . . . . . . . . 184
11.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
12 ConclusionsandFutureDirections . . . . . . . . . . . . . . . . . . . 187
12.1 PuttingItAllTogether . . . . . . . . . . . . . . . . . . . . . . . . 187
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191