Table Of Content39
Demand-Driven Pointer Analysis with Strong Updates via Value-Flow
Refinement
YuleiSui,SchoolofComputerScienceandEngineering,UNSWAustralia
JinglingXue,SchoolofComputerScienceandEngineering,UNSWAustralia
We present a new demand-driven flow- and context-sensitive pointer analysis with strong updates for C
programs, called SUPA, that enables computing points-to information via value-flow refinement, in envi-
ronments with small time and memory budgets such as IDEs. We formulate SUPA by solving a graph-
reachabilityproblemonaninter-proceduralvalue-flowgraphrepresentingaprogram’sdef-usechains,which
are pre-computed efficiently but over-approximately. To answer a client query (a request for a variable’s
7
points-toset),SUPAreasonsabouttheflowofvaluesalongthepre-computeddef-usechainssparsely(rather
1
thanacrossallprogrampoints),byperformingonlytheworknecessaryforthequery(ratherthananalyz-
0
ingthewholeprogram).Inparticular,strongupdatesareperformedtofilteroutspuriousdef-usechains
2
throughvalue-flowrefinementaslongasthetotalbudgetisnotexhausted.SUPAfacilitatesefficiencyand
n precisiontradeoffsbyapplyingdifferentpointeranalysesinahybridmulti-stageanalysisframework.
a WehaveimplementedSUPAinLLVM(3.5.0)andevaluateitbychoosinguninitializedpointerdetection
J asamajorclienton18open-sourceCprograms.Astheanalysisbudgetincreases,SUPAachievesimproved
precision,withitssingle-stageflow-sensitiveanalysisreaching97.4%ofthatachievedbywhole-program
0
flow-sensitiveanalysisbyconsumingabout0.18secondsand65KBofmemoryperquery,onaverage(witha
2
budgetofatmost10000value-flowedgesperquery).Withcontext-sensitivityalsoconsidered,SUPA’stwo-
] stageanalysisbecomesmorepreciseforsomeprogramsbutalsoincursmoreanalysistimes.SUPAisalso
L amenabletoparallelization.Aparallelimplementationofitssingle-stageflow-sensitiveanalysisachievesa
P speedupofupto6.9xwithanaverageof3.05xa8-coremachinewithrespectitssequentialversion.
. CCSConcepts:•SoftwareanditsengineeringÑSoftwareverificationandvalidation;Softwaredefect
s
c analysis;•TheoryofcomputationÑProgramanalysis;
[ AdditionalKeyWordsandPhrases:strongupdates,valueflow,pointeranalysis,flowsensitivity
1
1. INTRODUCTION
v
0 Pointer analysis is one of the most fundamental static program analyses, on which
5 virtuallyallothersarebuilt.Thegoalofpointeranalysisistocomputeanapproxima-
6 tion of the set of abstract objects that a pointer can refer to. A pointer analysis is (1)
5 flow-sensitive if it respects control flow and flow-insensitive otherwise and (2) context-
0 sensitiveifitdistinguishesdifferentcallingcontextsandcontext-insensitiveotherwise.
1. Strong updates, where stores overwrite, i.e., kill the previous contents of their ab-
0 stract destination objects with new values, is an important factor in the precision of
7 pointer analysis [Hardekopf and Lin 2009; Lhota´k and Chung 2011]. In the case of
1 weak updates, these objects are assumed conservatively to also retain their old con-
: tents. Strong updates are possible only if flow-sensitivity is maintained. In addition,
v
a flow-sensitive analysis can strongly update an abstract object written at a store if
i
X andonlyifthatobjecthasexactlyoneconcretememoryaddress,knownasasingleton.
r By applying strong updates where needed, a pointer analysis can improve precision,
a thereby providing significant benefits to many clients, such as change impact anal-
ysis [Acharya and Robinson 2011], bug detection [Yan et al. 2016; Ye et al. 2014a],
security analysis [Arzt et al. 2014], type state verification [Fink et al. 2008], compiler
optimization[Suietal.2016b,2013,2014b],andsymbolicexecution[Blackshearetal.
2013].
Inthispaper,weintroduceademand-drivenpointeranalysisforCbyinvestigating
howtoperformstrongupdateseffectivelyinaflow-andcontext-sensitiveframework.
For C programs, flow-sensitivity is important in achieving the precision required by
the afore-mentioned client applications due to strong updates performed. If context-
sensitivity is also considered, some more strong updates are possible for some pro-
39:2 YuleiSuiandJinglingXue
gramsattheexpenseofmoreanalysistimes.Forobject-orientedlanguageslikeJava,
context-sensitivity (without strong updates) is widely used in achieving useful preci-
sion [Lhota´k and Hendren 2003; Li et al. 2014; Milanova et al. 2002, 2005; Smarag-
dakisetal.2011;Sunetal.2011;XiaoandZhang2011].
Ideally, strong updates at stores should be performed by analyzing all paths inde-
pendentlybysolvingameet-over-all-paths(MOP)problem.However,evenwithbranch
conditions being ignored, this problem is intractable due to potentially unbounded
numberofpathsthatmustbeanalyzed[Landi1992;Ramalingam1994].
Instead,traditionalflow-sensitivepointeranalysis(FS)forC[HindandPioli1998;
KamandUllman1977]computesthemaximal-fixed-pointsolution(MFP)asanover-
approximation of MOP by solving an iterative data-flow problem. Thus, the data-flow
facts that reach a confluence point along different paths are merged. Improving on
this, sparse flow-sensitive pointer analysis (SFS) [Hardekopf and Lin 2011; Li et al.
2011; Oh et al. 2012; Ye et al. 2014b; Yu et al. 2010] boosts the performance of FS in
analyzing large C programs while maintaining the same strong updates done by FS.
The basic idea is to first conduct a pre-analysis on the program to over-approximate
itsdef-usechainsandthenperformFSbypropagatingthedata-flowfacts,i.e.,points-
to information sparsely along only the pre-computed def-use chains (aka value-flows)
insteadofallprogrampointsintheprogram’scontrol-flowgraph(CFG).
Recently,anapproach[Lhota´kandChung2011]forperformingstrongupdatesinC
programs is introduced. It sacrifices the precision of FS to gain efficiency by applying
strongupdatesatstoreswhereflow-sensitivesingletonpoints-tosetsareavailablebut
fallsbacktotheflow-insensitivepoints-toinformationotherwise.
By nature, the challenge of pointer analysis is to make judicious tradeoffs between
efficiencyandprecision.VirtuallyalloftheprioranalysesforCthatconsidersomede-
greeofflow-sensitivityarewhole-programanalyses.Preciseonesareunscalablesince
theymusttypicallyconsiderbothflow-andcontext-sensitivity(FSCS)inordertomax-
imize the number of strong updates performed. In contrast, faster ones like [Lhota´k
andChung2011]arelessprecise,duetobothmissingstrongupdatesandpropagating
thepoints-toinformationflow-insensitivelyacrosstheweakly-updatedlocations.
In practice, a client application of a pointer analysis may require only parts of the
program to be analyzed. In addition, some points-to queries may demand precise an-
swerswhileotherscanbeansweredaspreciselyaspossiblewithsmalltimeandmem-
ory budgets. In all these cases, performing strong updates blindly across the entire
programiscost-ineffectiveinachievingprecision.
For C programs, how do we develop precise and efficient pointer analyses that are
focused and partial, paying closer attention to the parts of the programs relevant
to on-demand queries? Demand-driven analyses for C [Heintze and Tardieu 2001;
Zhang et al. 2014a; Zheng and Rugina 2008] and Java [Lu et al. 2013; Shang et al.
2012; Sridharan and Bod´ık 2006; Su et al. 2016; Yan et al. 2011] are flow-insensitive
and thus cannot perform strong updates to produce the precision needed by some
clients.BOOMERANG[Spa¨thetal.2016]representsarecentflow-andcontext-sensitive
demand-driven pointer analysis for Java. However, its access-path-based approach
performs strong updates at a store a.f “ ... only partially, by updating a.f strongly
and the aliases of a.f.˚ weakly. Elsewhere, advances in whole-program flow-sensitive
analysisforChaveexploitedsomeformofsparsitytoimproveperformance[Hardekopf
and Lin 2011; Li et al. 2011; Oh et al. 2012; Ye et al. 2014b; Yu et al. 2010]. However,
howtoreplicatethissuccessfordemand-drivenflow-sensitiveanalysisforCisunclear.
Finally,itremainsopenastowhethersparsestrongupdateanalysiscanbeperformed
bothflow-andcontext-sensitivelyon-demandtoavoidunder-orover-analyzing.
In this paper, we introduce SUPA, the first demand-driven pointer analysis with
strong updates for C, designed to support flexible yet effective tradeoffs between effi-
Demand-DrivenFlow-SensitivePointerAnalysis 39:3
Pre- Stages
Program Value-Flows
analysis Efficiency
Precision
Refine
On-Demand Stage[0] ....Stage[N-1]
Queries
Reachability Solver
Stage[i]
Select i
Budgets Out-Of-Budget[i]? No
i++
Yes
Fig.1:Overviewof SUPA
ciencyandprecisioninansweringclientqueries,inenvironmentswithsmalltimeand
memory budgets such as IDEs. As shown in Figure 1, the novelty behind SUPA lies
inperformingStrongUPdateAnalysispreciselybyrefiningimpreciselypre-computed
value-flowsawayinahybridmulti-stageanalysisframework.Givenapoints-toquery,
strong updates are performed by solving a graph-reachability problem on an inter-
proceduralvalue-flowgraphthatcapturesthedef-usechainsoftheprogramobtained
conservativelybyapre-analysis.Suchover-approximatedvalue-flowscanbeobtained
by applying Andersen’s analysis [Andersen 1994] (flow- and context-insensitively).
SUPA conducts its reachability analysis on-demand sparsely along only the pre-
computedvalue-flowsratherthancontrol-flows.Inaddition,SUPAfiltersoutimprecise
value-flows by performing strong updates flow- and context-sensitively where needed
withnolossofprecisionaslongasthetotalanalysisbudgetissufficient.Theprecision
of SUPA depends on the degree of value-flow refinement performed under a budget.
Themorespuriousvalue-flowsSUPAremoves,themoreprecisethepoints-tofactsare.
SUPA handles large C programs by staging analyses in increasing efficiency but
decreasing precision in a hybrid manner. Currently, SUPA proceeds in two stages by
applying FSCS and FS in that order with a configurable budget for each analysis.
Whenfailingtoansweraqueryinastagewithinitsallotedbudget,SUPAdowngrades
itselftoamorescalablebutlesspreciseanalysisinthenextstageandwilleventually
fall back to the pre-computed flow-insensitive information. At each stage, SUPA will
re-answer the query by reusing the points-to information found from processing the
currentandearlierqueries.Byincreasingthebudgetsusedintheearlierstages(e.g.,
FSCS), SUPA willobtainimprovedprecisionviaimprovedvalue-flowrefinement.
Insummary,thispapermakesthefollowingcontributions:
— We present the first demand-driven flow- and context-sensitive pointer analysis
with strong updates for C that enables computing precise points-to information by
refiningawayimpreciselyprecomputedvalue-flows,subjecttoanalysisbudgets.
— Weintroduceahybridmulti-stageanalysisframeworkthatfacilitatesefficiencyand
precisiontradeoffsbystagingdifferentanalysesinansweringclientqueries.
— We have produced an implementation of SUPA in LLVM (3.5.0) [SUPA 2016]. We
evaluate SUPA with uninitialized pointer detection as a practical client by using a
totalof18open-sourceCprograms.Astheanalysisbudgetincreases,SUPAachieves
improved precision, with its single-stage flow-sensitive analysis reaching 97.4% of
that achieved by whole-program flow-sensitive analysis, by consuming about 0.18
39:4 YuleiSuiandJinglingXue
seconds and 65KB of memory per query, on average (with a per-query budget of at
most10000value-flowedgestraversed).Withcontext-sensitivityalsobeingconsid-
ered,morestrongupdatesarealsopossible.SUPA’stwo-stageanalysisthenbecomes
morepreciseforsomeprogramsattheexpenseofmoreanalysistimes.
— We present four case studies to demonstrate that SUPA is effective in checking
whethervariablesareinitializedornotinreal-worldapplications.
— We show that SUPA is amenable to parallelization. To demonstrate this, we have
developed a parallel implementation of SUPA’s single-stage flow-sensitive analysis
basedonIntelThreadingBuildingBlocks(TBB),achievingaspeedupofupto6.9x
withanaverageof3.05xa8-coremachineoveritssequentialversion.
The rest of this paper is organized as follows. Section 2 provides the background
information. Section 3 presents a motivating example. Section 4 introduces our for-
malismforSUPA.Section5discussesandanalyzesourexperimentalresults.Section6
containsfourcasestudies.Section7describesaparallelimplementationofSUPA.Sec-
tion8describestherelatedwork.Finally,Section9concludesthepaper.
2. BACKGROUND
We describe how to represent a C program by an interprocedural sparse value-flow
graphtoenabledemand-drivenpointeranalysisviavalue-flowrefinement.Section2.1
introducesthepartofLLVM-IRrelevanttopointeranalysis.Section2.2describeshow
toputtop-levelvariablesinSSAform.Section2.3describeshowtoputaddress-taken
variablesinSSAform.Section2.4constructsasparsevalue-flowgraphthatrepresents
thedef-usechainsforbothtop-levelandaddress-takenvariablesintheprogram.
2.1. LLVM-IR
We perform pointer analysis in the LLVM-IR of a program, as in [Balatsouras and
Smaragdakis 2016; Hardekopf and Lin 2011; Lhota´k and Chung 2011; Li et al. 2011;
Sui et al. 2012; Ye et al. 2014b]. The domains and the LLVM instructions relevant to
pointeranalysisaregiveninTableI.ThesetofallvariablesV areseparatedintotwo
subsets,O thatcontainsallpossibleabstractobjects,i.e.,address-takenvariablesofa
pointerandP thatcontainsalltop-levelvariables.
InLLVM-IR,top-levelvariablesinP “SYG,includingstackvirtualregisters(sym-
bolsstartingwith”%”)andglobalvariables(symbolsstartingwith”@”)areexplicit,i.e.,
directlyaccessed.Address-takenvariablesinOareimplicit,i.e.,accessedindirectlyat
LLVM’sloadorstoreinstructionsviatop-levelvariables.
OnlyasubsetofthecompleteLLVMinstructionsetthatisrelevanttopointeranal-
ysis are modeled. As in Table I, every function f of a program contains nine types of
instructions (statements), including seven types of instructions used in the function
body of f, and one FUNENTRY instruction fpr1,...,rnq with the declarations of the
parameters of f, and one FUNEXIT instruction retf p as the unique return of f. Note
thattheLLVMpassUnifyFunctionExitNodesisexecutedbeforepointeranalysisinorder
toensurethateveryfunctionhasonlyone FUNEXIT instruction.
Let us go through the seven types of instructions used inside a function. For an
ADDROFinstructionp“&o,knownasanallocationsite,oisoneofthefollowingobjects:
(1)astackobject,o ,where(cid:96)isitsallocationsite(viaanLLVMallocainstruction),(2)a
(cid:96)
globalobject,i.e.,aglobalobjecto ,where(cid:96)isitsallocationsiteoraprogramfunction
(cid:96)
o , where f is its name, and (3) a dynamically created heap object oh, where (cid:96) is its
f (cid:96)
heapallocationsite(e.g.,viaamalloc()call).Foreachobjecto(exceptforafunction),we
writeo torepresentthesub-objectthatcorrespondstoitsfieldfld.Forflow-sensitive
fld
pointeranalysis,theinitializationsforglobalobjectstakeplaceattheentryofmain().
Demand-DrivenFlow-SensitivePointerAnalysis 39:5
TableI:DomainsandLLVMinstructionsusedbypointeranalysis.
AnalysisDomains LLVMInstructionSet
(cid:96) PL instructionlabels ADDROF p =&o
fld PC constants(fieldaccesses) COPY p =q
s PS stackvirtualregisters PHI p =φpq,rq
g PG globalvariables FIELD p =&qÑfld
LOAD p =˚q
f PF ĎG programfunctions
STORE ˚p=q
p,q,r,x,y PP “SYG top-levelvariables
CALL p =qpr1,...,rnq
o,a,b,c,d PO address-takenvariables
FUNENTRY fpr1,...,rnq
v PV “P YO programvariables
FUNEXIT retf p
COPY denotes a casting instruction (e.g., bitcast) in LLVM. PHI is a standard SSA
instructionintroducedataconfluencepointintheCFGtoselectthevalueofavariable
fromdifferentcontrol-flowbranches.LOAD(STORE)isamemoryaccessinginstruction
thatreads(write)avaluefrom(into)anaddress-takenobject.
Ourhandlingoffield-sensitivityisANSI-compliant.Givenapointertoanaggregate
(e.g., a struct or an array), pointer arithmetic used for accessing anything other than
theaggregateitselfhasundefinedbehavior[ISO901990;Pearceetal.2007]andthus
not modeled. To model the field accesses of a struct object, FIELD represents a getele-
mentptrinstructionwithitsfieldoffsetfldasaconstantvalue.Agetelementptrinstruc-
tionthatoperatesonanon-constantfieldofastructismodeledas COPY instructions,
oneforeveryfieldofthestructconservatively.Arraysaretreatedmonolithically.
CALL, p “ qpr1,...,rnq, denotes a call instruction, where q can be either a global
variable(foradirectcall)orastackvirtualregister(foranindirectcall).
2.2. SSAFormforTop-LevelVariables
LLVM-IR is known as a partial SSA form since only top-level variables are in SSA
form.InLLVM-IR,top-levelvariablesareexplicit,i.e.,directlyaccessedandcanthus
be put in SSA form by using a standard SSA construction algorithm [Cytron et al.
1991](with PHI instructionsinsertedatconfluencepoints).
p = &a; p = &a; Points-to relations for p and q
q = &c; q = &c; observed at runtime
x = &b; p q p q
a = &b; y = &d;
c = &d; *p = x;
*q = y; a c a c
t1 = *p;
t1 = *p;
swap *p = *q; swap t2 = *q; b d b d
*p = t2;
*q = t1;
*q = t1;
(a) C code (b) Partial SSA (c) Before swap (d) After swap
Fig.2:AswapexampleanditspartialSSAform.
Let us illustrate LLVM’s partial SSA form by using an example in Figure 2. Fig-
ure 2(a) shows a swap program in C and Figure 2(b) gives its corresponding partial
39:6 YuleiSuiandJinglingXue
SSA form. Figures 2(c) and (d) depict some (runtime) points-to relations before and
aftertheswapoperation.Inthisexample,wehavep,q,x,y,t1,t2 P P anda,b,c,d P O.
Notethatx,y,t1andt2arenewtemporaryregistersintroducedinordertoputthepro-
gramgiveninFigure2(a)intothepartialSSAformgiveninFigure2(b).Inparticular,
˚p“˚q isdecomposedintot2“˚q and˚p“t2,wheret2isatop-levelpointer.
In LLVM-IR, all top-level variables are in SSA form. In this example, all top-level
variablesaretriviallyinSSAform,aseachhasexactlyonedefinitiononly.Asaresult,
thedef-usechainsfortop-levelvariablesarereadilyavailable.
However,address-takenvariablesareaccessedindirectlyatloadsandstoresviatop-
level variables and thus not in SSA form. For example, the address-taken variable a
is defined implicitly twice, once at ˚p “ x and once at ˚p “ t2, and the address-taken
variable c is also defined implicitly twice, once at ˚q “ y and once at ˚q “ t1. As a
result,thedef-usechainsforaddress-takenvariablesarenotimmediatelyavailable.
2.3. SSAFormforAddress-TakenVariables
Starting with LLVM’s partial SSA form, we first perform a pre-analysis by using An-
dersen’s algorithm flow- and context-insensitively [Andersen 1994], implemented in
SVF [Sui and Xue 2016]. We then put address-taken variables in memory SSA form,
by using the SSA construction algorithm [Cytron et al. 1991]. Imprecise points-to in-
formationcomputedthiswaywillberefinedbyourdemand-drivenpointeranalysis.
Givenavariablev,AnderPtspvqrepresentsitspoints-tosetcomputedbyAndersen’s
algorithm. There are two steps [Sui et al. 2014a], illustrated in Figures 3(a) and (b)
intraprocedurallyandinFigures4(a)and(b)interprocedurally.
Step1:ComputingModificationandReferenceSide-Effects. As shown in Figure 3(a),
every load, e.g., t1 “ ˚q is annotated with a µpaq operator for each object a pointed
by q, i.e., a P AnderPtspqq to represent a potential use of a at the load. Similarly,
every store, e.g., ˚p “ x is annotated with a a“χpaq operator for each object a P
AnderPtsppqtorepresentapotentialdefanduseofaatthestore.Ifacanbestrongly
updated, then a receives whatever x points to and the old contents in a are killed.
Otherwise,amustalsoincorporateitsoldcontents,resultinginaweakupdatetoa.
Wecomputetheside-effectsofafunctioncallbyapplyingalightweightinterproce-
duralmod-refanalysis[Suietal.2014a,§4.2.1].Foragivencallsite(cid:96),itisannotated
withµpaq(a“χpaq)ifamayberead(modified)insidethecalleesof(cid:96)(discoveredby
Andersen’s pointer analysis). In addition, appropriate χ and µ operators are also
added for the FUNENTRY and FUNEXIT instructions of these callees in order to
mimicpassingparametersandreturningresultsforaddress-takenvariables.
Figure4(a)givesanexamplemodifiedfromFigure3(a)bymovingthefourswapin-
structionsintoafunction,swap.Forreadside-effects,µpaqandµpcqareaddedbefore
callsite(cid:96) torepresentthepotentialusesofaandcinswap.Correspondingly,swap’s
7
FUNENTRY instruction(cid:96)8 isannotatedwitha“χpaqandc“χpcqtoreceivetheval-
uesofaandcpassedfrom(cid:96) .Formodificationside-effects,a“χpaqandc“χpcqare
7
added after (cid:96) to receive the potentially modified values of a and c returned from
7
swap’s FUNEXIT instruction(cid:96)13,whichareannotatedwithµpaqandµpcq.
Step2:MemorySSARenaming. All the address-taken variables are converted into
SSA form as suggested in [Chow et al. 1996]. Every µpaq is treated as a use of a.
Every a“χpaq is treated as both a def and use of a, as a may admit only a weak
update. Then the SSA form for address-taken variables is obtained by applying a
standardSSAconstructionalgorithm[Cytronetal.1991].
Fortheprogramannotatedwithµ’sandχ’sinFigure3(a),Figure3(b)givesitsmem-
orySSAform.Similarly,Figure4(b)givesthememorySSAformforFigure4(a).
Demand-DrivenFlow-SensitivePointerAnalysis 39:7
ℓ1: p = &a; p = &a; p = &a;
ℓ2: q = &c; q = &c; q = &c;
ℓ3: x = &b; x = &b; x = &b;
ℓ4: y = &d; y = &d; y = &d;
ℓ5: *p = x; *p = x; *p = x;
a = !(a) a1 = !(a0) a1 = !(a0)
ℓ6: *q = y; *q = y; [a] *q = y;
c = !(c) c1 = !(c0) c1 = !(c0)
"(a) "(a1) "(a1) [c]
ℓ7: t1 = *p; t1 = *p; t1 = *p;
[a]
"(c) "(c1) "(c1) [c]
ℓ8: t2 = *q; t2 = *q; t2 = *q;
swap swap swap
ℓ9: *p = t2; *p = t2; *p = t2;
a = !(a) a2 = !(a1) a2 = !(a1)
ℓ10: *q = t1; *q = t1; *q = t1;
c = !(c) c2 = !(c1) c2 = !(c1)
(a) Step 1: adding "s$and$!s (b) Step 2: renaming (c) Sparse value-flows of a and c
Fig. 3: Memory SSA form and sparse value-flows constructed intraprocedurally for
Figure2,obtainedwithAndersen’sanalysis:AnderPtsppq“tauandAnderPtspqq“tcu.
foo(){ ℓ8:swap(p,q){ foo(){ swap(p,q){ foo(){ swap(p,q){
ℓ1 : p = &a; a = !(a) p = &a; a1 = !(a0) p = &a; a1 = !(a0)
ℓ2 : q = &c; c = !(c) q = &c; c1 = !(c0) q = &c; c1 = !(c0)
[a] [a]
ℓ3 : x = &b; "(a) x = &b; "(a1) x = &b; "(a1)
ℓ4 : y = &d; ℓ9: t1 = *p; y = &d; t1 = *p; y = &d; [a] t1 = *p; [c]
ℓ5 : *p = x; "(c) *p = x; "(c1) *p = x; [c] "(c1)
a = !(a) ℓ10: t2 = *q; a1 = !(a0) t2 = *q; a1 = !(a0) t2 = *q; [c]
ℓ6 : * q c == y!;(c) ℓ11: * p a= = t 2!;(a) * q c 1= =y ;!(c0) * p a=2 t=2 !;(a1) [a] * q c 1= =y ;!(c0) * p a=2 t=2 !;(a1)
[a]
"(a) ℓ12: *q = t1; "(a1) *q = t1; [c] "(a1) *q = t1;
"(c) c = !(c) "(c1) c2 = !(c1) "(c1) c2 = !(c1) [c]
ℓ 7 : swap(p,q); swap(p,q); swap(p,q);
a = !(a) "(a) a2 = !(a1) "(a2) a2 = !(a1) [a] "(a2)
c = !(c) ℓ13: "(c) c2 = !(c1) "(c2) c2 = !(c1) [c] "(c2)
} } } } } }
(a) Step 1: adding "s$and$!s (b) Step 2: renaming (c) Sparse value-flows of a and c
Fig.4:MemorySSAformandsparsevalue-flowsconstructedinterprocedurallyforan
examplemodifiedfromFigure2withitsfourswapinstructionsmovedintoaseparate
function,calledswap.(cid:96)8 and(cid:96)13 correspondtothe FUNENTRY and FUNEXIT ofswap.
2.4. SparseValue-FlowGraph
Oncebothtop-levelandaddress-takenvariablesareinSSAform,theirdef-usechains
areimmediatelyavailable,asshowninTableII.Wediscussedtop-levelvariablesear-
lier. For the two address-taken variables a and c in Figure 2, Figure 3(c) depicts their
def-use chains, i.e., sparse value-flows for the memory SSA form in Figure 3(b). Simi-
larly,Figure4(c)givestheirsparsevalue-flowsforthememorySSAforminFigure4(b).
Givenaprogram,asparsevalue-flowgraph(SVFG),G “pN,Eq,isamulti-edged
vfg
directed graph that captures its def-use chains for both top-level and address-taken
39:8 YuleiSuiandJinglingXue
Table II: Def-use information of both top-level and address-taken variables. Def
v
(Use )denotesthesetofdefinition(use)instructionsforavariablev PV.
v
Instruction (cid:96) DefsandUsesofVariablesinMemorySSAForm
p“&o t(cid:96)u“Def
p
p“q t(cid:96)u“Def (cid:96)PUse
p q
p“φpq,rq t(cid:96)u“Def (cid:96)PUse (cid:96)PUse
p q r
p“&qÑfld t(cid:96)u“Def (cid:96)PUse
p q
p“˚q µpa q t(cid:96)u“Def (cid:96)PUse (cid:96)PUse
i p q ai
˚p“q a “χpa q (cid:96)PUse (cid:96)PUse (cid:96)PDef (cid:96)PUse
i`1 i p q ai`1 ai
p“qpr ,...,r q t(cid:96)u“Def (cid:96)PUse @iP1,...,n:(cid:96)PUse
1 n p q ri
µpa q a “χpa q (cid:96)PUse (cid:96)PDef (cid:96)PUse
i j`1 j ai aj`1 aj
fpr ,...,r q a “χpa q @iP1,...,n:(cid:96)PDef (cid:96)PDef (cid:96)PUse
1 n i`1 i ri ai`1 ai
ret p µpa q (cid:96)PUse (cid:96)PUse
f i p ai
(cid:96)PDef (cid:96)1 PUse (cid:96)PDef (cid:96)1 PUse
[INTRA-TOP] p p [INTRA-ADDR] ai ai
(cid:96)ÝÑp (cid:96)1 (cid:96)ÝÑa (cid:96)1
(cid:96):p“qpr ,...,r q o PAnderPtspqq (cid:96)1 :fpr1,...,r1q
[INTER-CALL-TOP] 1 n f 1 n
@iP1,...,n:(cid:96)ÝrÑi (cid:96)1
(cid:96):p“qp...q a PAnderPtspqq (cid:96)1 :ret p1
f f
[INTER-RET-TOP]
(cid:96)1 ÝÑp (cid:96)
(cid:96):p“qp...qµpa q a PAnderPtspqq (cid:96)1 :fp...qa “χpa q
[INTER-CALL-ADDR] i f j`1 j
(cid:96)ÝÑa (cid:96)1
(cid:96): “qp...qa “χpa q a PAnderPtspqq (cid:96)1 :ret µpa q
[INTER-RET-ADDR] j`1 j f f i
(cid:96)1 ÝÑa (cid:96)
Fig.5:Value-flowconstructioninMemorySSAform.
variables. N is the set of nodes representing all instructions and E is the set of edges
v
representing all potential def-use chains. In particular, an edge (cid:96) ÝÑ (cid:96) , where v P V,
1 2
fromstatement(cid:96) tostatement(cid:96) signifiesapotentialdef-usechainforvwithitsdefat
1 2
(cid:96) anduseat(cid:96) .Wereferto(cid:96) ÝÑv (cid:96) adirectvalue-flowifv P P andanindirectvalue-
1 2 1 2
flow if v P O. This representation is sparse since the intermediate program points
between (cid:96) and (cid:96) are omitted, thereby enabling the underlying points-to information
1 2
tobegraduallyrefinedbyapplyingasparsedemand-drivenpointeranalysis.
Figure 5 gives the rules for connecting value-flows between two instructions
based on the defs and uses computed in Table II. For intraprocedural value-flows,
[INTRA-TOP]and[INTRA-ADDR]handletop-levelandaddress-takenvariables,respec-
tively.InSSAform,everyuseofavariableonlyhasauniquedefinition.Forauseofa
identified as a (with its i-th version) at (cid:96)1 annotated with µpa q, its unique definition
i i
inSSAformisa atan(cid:96)annotatedwitha “χpa q.Then,(cid:96)ÝÑa (cid:96)1 isgeneratedtorep-
i i i´1
resent potentially the value-flow of a from (cid:96) to (cid:96)1. Thus, the PHI functions introduced
foraddress-takenvariableswillbeignored,asthevalueain(cid:96)ÝÑa (cid:96)1 isnotversioned.
Let us consider interprocedural value-flows. The def-use information in Table II
is only intraprocedural. According to Figure 5, interprocedural value-flows are con-
structed to represent parameter passing for top-level variables ([INTER-CALL-TOP]
and [INTER-RET-TOP]), and the µ{χ operators annotated at FUNENTRY, FUNEXIT
and CALL foraddress-takenvariables([INTER-CALL-ADDR]and[INTER-RET-ADDR]).
Demand-DrivenFlow-SensitivePointerAnalysis 39:9
[INTER-CALL-TOP] connects the value-flow from an actual argument r at a call in-
i
struction(cid:96)toitscorrespondingformalparameterr1 attheFUNENTRY(cid:96)1ofeverycallee
i
f invoked at the call. Conversely, [INTER-RET-TOP] models the value-flow from the
FUNEXIT instruction of f to every callsite where f is invoked. Just like for top-level
variables,[INTER-CALL-ADDR]and[INTER-RET-ADDR]buildthevalue-flowsofaddress-
takenvariablesacrossthefunctionsaccordingtotheannotatedµ’sandχ’s.Notethat
the versions i and j of an SSA variable a in different functions may be different. For
a c
example,Figure4(c)illustratesthefourinter-proceduralvalue-flows(cid:96) ÝÑ(cid:96) ,(cid:96) ÑÝ (cid:96) ,
7 8 7 8
a c
(cid:96) ÝÑ(cid:96) and(cid:96) ÑÝ (cid:96) obtainedbyapplyingthetworulestoFigure4(b).
13 7 13 7
a
The SVFG obtained this way may contain spurious def-use chains, such as (cid:96) ÝÑ (cid:96)
5 9
in Figure 3, as Andersen’s flow- and context-insensitive pointer analysis is fast but
imprecise. However, this representation allows imprecise points-to information to be
refinedbyperformingsparsewhole-programflow-sensitivepointeranalysisasinprior
work [Hardekopf and Lin 2011; Nagaraj and Govindarajan 2013; Sui et al. 2016a; Ye
etal.2014b].Inthispaper,weintroduceademand-drivenflow-andcontext-sensitive
pointeranalysiswithstrongupdatesthatcananswerpoints-toqueriesefficientlyand
preciselyon-demand,byremovingspuriousdef-usechainsintheSVFGiteratively.
3. AMOTIVATINGEXAMPLE
Our demand-driven pointer analysis, SUPA, operates on the SVFG of a program. It
computes points-to queries flow- and context-sensitively on-demand by performing
strongupdates,wheneverpossible,torefineawayimprecisevalue-flowsintheSVFG.
Our example program, shown in Figure 6(a), is simple (even with 16 lines). The
program consists of a straight-line sequence of code, with (cid:96) – (cid:96) taken directly from
1 10
Figure2(b)and thesixnewstatements (cid:96) – (cid:96) addedtoenable ustohighlightsome
11 16
key properties of SUPA. We assume that u at (cid:96)11 is uninitialized but i at (cid:96)12 is initial-
ized.TheSVFGembeddedinFigure6(a)willbereferredtoshortlybelow.Wedescribe
how SUPA can be used to prove that z at (cid:96)16 points only to the initialized object i, by
computingflow-sensitivelyon-demandthepoints-toqueryptpx(cid:96) ,zyq,i.e.,thepoints-to
16
setofz attheprogrampointafter(cid:96) ,whichisdefinedin(1)inSection4.
16
Figure 6(b) depicts the points-to relations for the six address-taken variables and
some top-level ones found at the end of the code sequence by a whole-program flow-
sensitive analysis (with strong updates) like SFS [Hardekopf and Lin 2011]. Due to
flow-sensitivity,multiplesolutionsforapointeraremaintained.Inthisexample,these
are the true relations observed at the end of program execution. Note that SFS gives
rise to Figure 2(c) by analyzing (cid:96) – (cid:96) , Figure 2(d) by analyzing also (cid:96) – (cid:96) , and
1 6 7 10
finally,Figure6(b)byanalyzing(cid:96) –(cid:96) further.Asz pointstoibutnotu,nowarning
11 16
isissuedforz,implyingthatz isregardedasbeingproperlyinitialized.
Figure 6(c) shows how the points-to relations in Figure 6(b) are over-approximated
flow-insensitively by applying Andersen’s analysis [Andersen 1994]. In this case, a
single solution is computed conservatively for the entire program. Due to the lack of
strong updates in analyzing the two stores performed by swap, the points-to relations
in Figures 2(c) and 2(d) are merged, causing ˚a and ˚c to become spurious aliases.
When (cid:96) – (cid:96) are analyzed, the seven spurious points-to relations (shown in dashed
11 16
arrowsinFigure6(c))areintroduced.Sincezpointstoi(correctly)andu(spuriously),a
falsealarmforzwillbeissued.Failingtoconsiderflow-sensitivity,Andersen’sanalysis
isnotpreciseforthisuninitializationpointerdetectionclient.
Let us now explain how SUPA, shown in Figure 1, works. SUPA will first perform
a pre-analysis to the example program to build the SVFG given in Figure 6(a), as
discussed in Section 2. For its top-level variables, their direct value-flows, i.e., def-
use chains are explicit and thus omitted to avoid cluttering. For example, q has three
39:10 YuleiSuiandJinglingXue
Points-to Spurious Points-to Direct Value-flow Indirect Value-flow
ℓ1 : p = &a; Query ℓ1: p = &a;
ℓ2 : q = &c; pt.(⟨ℓ16 ,z⟩) =? ℓ2: q = &c;
ℓ3 : x = &b; ℓ4: y = &d;
ℓ4 : y = &d; ℓ5:*p = x; [y] 7
ℓ5:*p = x; SU for c [q] ℓ6: *q = y;
[a] ℓ6: *q = y; [a] [c] 6
5
ℓ7: t1 = *p; [c] [q] x ℓ8: t2 = *q;
[a] [t2] 4
ℓ8: t2 = *q;
swap [c] SU for a ℓ9: *p = t2; [p]
ℓ9: *p = t2;
ℓ10:*q = t1; 3 [a] ℓ12: v = &i; 2
[p]
ℓ11: w = &u; ℓ13 : t3 = *p;
[a]
ℓ12: v = &i; 9 [v]
x ℓ14: *t3 = w; x
ℓ13: t3 = *p;
[b] [d]
[b] ℓ14:*t3 = w; [d] SU for d ℓ15: *t3 = v; [t3]
[b] ℓℓ1156 :: * tz3 == *vt3;; [d] Spurious Value-Flows x[b] ℓ1 6 : z = *t3; [d] 8 [t13]
(a) A program and its SVFG (with (d) The SUPA analysis for resolving pt(⟨ℓ16 ,z⟩) = {i} by
only indirect value-flows shown) traversing from ⟨ℓ16 ,z⟩ backwards against the value-flows
p q p q
a c a c
t3
t3
b d b d
z
z
i u i u
(b) Flow-sensitive points-to relations found
(c) Flow-insensitive points-to relations
to hold at the end of the program
(with some for top-level pointers omitted)
(with some for top-level pointers omitted)
Fig.6:Amotivatingexampleforillustrating SUPA (SUstandsfor“StrongUpdate”).
q q q
def-use chains (cid:96) ÝÑ (cid:96) , (cid:96) ÝÑ (cid:96) and (cid:96) ÝÑ (cid:96) . For its address-taken variables, there
2 6 2 8 2 10
are nine indirect value-flows, i.e., def-use chains depicted in Figure 6(a). Let us see
how the two def-use chains for b are created. As t3 points to b, (cid:96) , (cid:96) and (cid:96) will be
14 15 16
annotated with b “ χpbq, b “ χpbq and µpbq, respectively. By putting b in SSA form,
b
thesethreefunctionsbecomeb2“χpb1q,b3“χpb2qandµpb3q.Hence,wehave(cid:96) ÑÝ (cid:96)
14 15
b
and (cid:96) ÑÝ (cid:96) , indicating b at (cid:96) has two potential definitions, with the one at (cid:96)
15 16 16 15
overwritingtheoneat(cid:96) .Thedef-usechainsfordandaarebuiltsimilarly.
14