Assignment
MarkHarman 1
Nicolas Gold 2
Rob Hierons 1
DaveBinkley 3
1
Brunel University
2
UMIST
3
LoyolaCollege
Uxbridge, Middlesex Manchester Baltimore
UB8 3PH, UK. M60 1QD,UK. MD21210-2699, USA.
Keywords: slicing,concept assignment,source code extraction
Abstract
One approach to reverse engineering is to par-
tially automate subc omponent extraction, improve-
mentandsubsequent re c ombination. Twopreviously
proposed automated te chniques for supp ortingthis
activity are slicing and c oncept assignment. How-
ever,neitherisdire ctlyapplicableinisolation;slicing
criteria(setsofprogramvariables)aresimplytoolow
level in many cases, while c oncept assignment typi-
c allyfails toproduceexecutablesubc omponents.
This paper introducesaunication of slicing and
conceptassignmentwhichexploitstheircombinedad-
vantages, while overc omingtheir individual weak-
nesses. Our `c onceptslices' are extracted using
high level criteria, while producingexecutable sub-
programs. The paper introducesthre eways of com-
biningslicingandconceptassignmentandalgorithms
for e ach. The applicationof the conceptslicing al-
gorithmsisillustratedwith acasestudyfromalarge
nancialorganisation.
1 Introduction
For program comprehension and reverse engineer-
ing it is important to haveautomated techniques
for extracting executable subcomponents according
tohigh levelextraction criteria. These components
need to be semantically related to the original (so
that they can be executed in isolation), while the
criteriaforselection may need to identify disparate
sectionsof diverse codewhich will haveto be mar-
riedtogether. Therefore,theproblemistobeableto
automaticallyproduceprogramswhichanswerques-
tionsof the form: Given an original program, con-
struct the simplest program that, for exampleper-
formsthesamemasterleupdateoperationorwhich
closesdownthereactorunderthesameconditions.
Programslicingandconceptassignmentareauto-
matedsourcecodeextractiontechniquesthattakea
criterionandprogramsourcecodeasinputandyield
partsoftheprogram'ssourcecodeasoutput. There-
fore, they suggest themselves as natural candidate
solutionstothisproblem. Slicinghastheadvantage
thattheextractedcodeitproducescanbeexecuted
asaprograminits ownright,butthedisadvantage
thatthecriterionmustbeexpressedatthelowlevel
of program variables. Concept assignment has the
advantagethat the extractioncriterion isexpressed
at just the right level(in termsof concepts such as
`master le', `errorreco very'and`log update'), but
thedisadvantagethatthecodefragmentsitextracts
cannotbecompiled andexecutedasaseparatepro-
gram. Thus,eachtechniqueovercomesthediÆculty
associatedwiththeother.
Thispapershowshowslicingandconceptassign-
mentcanbecombinedtoproducebetterresultsthan
either is capable of individually. The contributions
ofthispapercanbesummarisedasfollows.
AframeworkforcombiningSlicingandConcept
Assignmentisintroduced
Algorithmsareintroduced for
{ ExecutableConceptSlicing
{ KeyStatementAnalysis
{ ConceptDependencyAnalysis
Theapplicationoftheconceptslicingapproach
to reverseengineering isillustratedwith acase
study
Therestofthepaperisorganisedasfollows. Sec-
tion2brieyreviewsslicingandconceptassignment
to makethe paper self-contained.It can safely be
skipped bya reader familiar with both techniques.
Section 3 presents a framework for unifying slic-
ing and concept assignment, suggesting three new
techniques which combine slicing with concept as-
signment. Algorithms for these three techniques:
Analysis(KSA) and Concept Dependency Analysis
(CDA)areintroducedin sections4,5and 6respec-
tively. Section 7presents a case study involvinga
nancialpayment system, which illustrates the use
ofthe concept slicingalgorithms introduced in sec-
tions4,5and6. Section 8concludes andSection 9
givesdirectionsforfuturework.
2 Background
This section providessome background, denitions
and notation for slicing and concept assignment
whichareusedin theremainderofthepaper.
2.1 Slicing
Programslicing[31]isdenedwithrespecttoa`slic-
ing criterion'. Slicing uses dependence analysis to
isolatethose partsof aprogramthat potentiallyaf-
fecttheslicingcriterion.
Traditionally`partsoftheprogramtobeisolated'
havebeen restricted to statements and predicates
and the slicing criterion has been dened in terms
ofaset of variables anda pointat which theirval-
uesare of interest. Morerecentwork hasextended
traditionalslicingby consideringnovelslicing crite-
riainvolvingconditionsandtestadequacyproperties
[5, 15]. The techniquesfor isolation of statements
havealso broadenedfrom statement deletion to al-
lowformoregeneraltransformation[4,12,30].
This paper will be concernedsolely with syntax-
preservingstaticslicing,which willbeused bothto
rene and to extend the results of concept assign-
ment. Inallcases,sliceswillbeconstructedforaset
ofnodesofaprogram'sControlFlowGraph(CFG).
Thismeans that theslicingcriterion will simplybe
asetofnstatementsfs
1
;:::;s
n g.
Denition1(Slice)
A slice of a program p for the slicing criterion
fs
1
;:::;s
n
g is an executable subprogram, s, con-
structed from p bystatement deletion, suchthat
s behavesiden ticallyto p with respect to the se-
quenceofvaluescomputedateachofthestatements
infs
1
;:::;s
n
g. Thesliceofaprogrampw.r.taset
ofstatementsS willbedenoted Slice(p;S).
This denition of a slice is essentiallythe exe-
cutableversionofthedenitionadoptedbytheSys-
tem Dependence Graph approach of Horwitz et al.
[16]. Typically, workon the System Dependence
Graph(SDG)denesittocontainasetof`naluse'
verticesforeachvariable. TheSDGisso-constructed
toguaranteetheexistenceof such avertexforeach
variable. This allows slices to be constructed for a
variableintermsofitsnalusevertex.
FinalUse(p;v)isthenalusevertexofvariablevin
programp.
The dependence graph itself can be useful in
analysingthedistancebetweenaslicenodeandsome
othernodein theslice.
Denition3(SDG Distance)
Givenstatements sand s 0
ofa program p,the dis-
tance,Dist(p;s;s 0
)isthelengthoftheshortestpath
betweensands 0
intheSDGofp. Ifthereisnopath
from sand s 0
in theSDG ofp, thenDist(p;s;s 0
)is
undened.
2.2 Concept Assignment
The concept assignment 1
problem is dened as \a
process of recognising concepts within a computer
program and building up an`understanding' of the
programbyrelatingrecognisedconceptsto portions
oftheprogram,itsoperationalcontextandtoonean-
other[3]." Itcanbeundertakenbyintelligen tagents
(tools),withthreedistinctapproachesbeingadopted
[3]:
1. Highly domain specic, model driven, rule-
based questionanswering systemsthat depend
on a manually populated database describing
the software system.This approach is typied
bytheLassiesystem[7].
2. Plandriven,algorithmicprogramunderstanders
or recognisers. Two examples of this type
are the Programmer's Apprentice [28], and
GRASPR[32].
3. Modeldriven,plausible reasoningsystems. Ex-
amplesofthistypeincludeDM-TAO[3],IRENE
[17],andHB-CA[9,10].
Biggersta et al. claim that systems using ap-
proaches 1 and 2 are good at completely deriv-
ingconceptswithinsmall-scaleprogramsbutcannot
dealwithlarge-scaleprogramsdue tooverwhelming
computationalgrowth. Approach3systemscaneas-
ily handle large-scale programs since their compu-
tational growthappears to be linear in the length
of the programunder analysis but they suer from
approximate andimpreciseresults[3].
Weareconcernedwithplausiblereasoningsystems
(category 3above)and allreferencesto conceptas-
signmentin this papershould betak en asreferring
to this kindof system. Plausible reasoningsystems
are of particular interest because they are scalable
1
Note:conceptassignmentisawhollydierenttechnology
from formalconcept analysis(FCA)(sometimes just called
`conceptanalysis').
levelconcepts than some of the other approaches.
Theassignmentisbasedontheevidenceavailablein
thecodebeinganalysedfrom whicha`bestguess'is
taken; reasoningisthus basedonplausibilityrather
thandeduction. Inadditiontothecommonapplica-
tionofconceptassignmentinhelpingmaintainersto
comprehendprograms, Cimitile et al. [6]havesug-
gested it as a wayof validatingthe adequacy of a
candidate criterion when iden tifyingsuitable mod-
ulesforreuse.
Hypothesis-Based Concept Assignment (HB-CA)
[9, 11] is one of the most recentexamples of a
plausible-reasoningconceptassignmentapproach. It
dealswiththepartoftheconceptassignmentprob-
lemthatinvolvesrelatingrecognisedconceptstopor-
tionsofaprogram. HB-CAusesasimpleknowledge
base to encode the relationships betweenconcepts
andpotentialevidencefortheminsourcecode. Itis
thisapproachthatweproposeto combinewithpro-
gramslicing. Thefollowingdenitionintroducesthe
notationwewilluseto denoteconceptassignment.
Denition4(Concept)
Aconceptc,namedn,ofaprogrampisconstructed
withrespecttoadomainmodelD. Theconceptcon-
sistsofataggedcontiguoussequenceofcodefromp,
forwhichthereisevidence(accordingtoD)thatthe
sequence implements the concept named n. Fora
conceptc,Tag(c)refersto thenameof theconcept
c,whileStatements(c)refersto itsstatements. For
aprogram pand domain model D, Concepts(p;D)
referstothesetofallconceptsassignedtopaccord-
ingto D.
Figure 1 shows a fragment of a domain model
(whichwillbeusedinthecasestudyinSection7). In
adomain model, conceptsare classiedinto actions
and objects and may be composed or specialised.
Eachconcepthasanumberofindicatorswhichrep-
resentthepotentialsourcecodeevidenceforthecon-
cept. Foreverypieceofsourcecodeevidencethat is
found,ahypothesisisgeneratedfortheappropriate
concept. Segments (contiguous groupsof hypothe-
sesdening acontiguousregionof source code) are
formed from the resulting list of hypothesesusing
conceptualdensityandprogramsyntaxtodenethe
boundaries. Thedominantconcept(i.e. theone for
whichthereismostevidence)ineachsegmentisas-
signedtotheappropriateregionofsourcecode.
3 A Framework for Unifying Slic-
ing and Concept Assignment
This section presents a framework of notation and
requirementsfor developingacombinedslicing and
usedtorefertotheresultofanycombinationofslic-
ing and concept assignment. Figure 2 depicts, in
overview,thethreetypesofconceptsliceconsidered
in thispaper.
3.1 Executable Concept Slicing(ECS)
ExecutableConceptSlicing(ECS)isthebasicstart-
ingpointfortheapproachweadvocate. An ECSis
formed using slicing to augment theresults of con-
cept assignmenttomaketheconcept anexecutable
sub-program.
Moreformally,analgorithmforECSisafunction
which takesaprogram, p, andadomain model, D,
and produces a set of executable sub-components,
one per concept in the program p accordingto D.
Eachreturnedconceptmustbeexecutableand,when
executed,thecomputationcapturesthecomputation
ontheassociatedconceptofpw.r.t. D. Informing
anECS, theset of statements taggedwith thecon-
cept namemaynolongerbecontiguous.
The end of the segment identied in concept as-
signment will be treatedasan end of programver-
tex. Forexample,inthecaseofCOBOL,thiscanbe
achievedbyinserting aSTOP RUN statement at the
endofthesegmentofcodeassigned totheconcept.
The ECS will befurther rened using keystate-
mentanalysis,asdescribedinthenextsection.
3.2 Key Statement Analysis(KSA)
Givenaconcept(and/orconceptslice), somestate-
mentswillbemoreimportantthanothers;theywill
contribute more to the computation embodied by
the concept. The more important statements are
regardedasthe`key'statementsoftheconcept. Key
StatementAnalysis(KSA)isananalysisstepwhich
aims to determinethe keystatementsin aconcept.
The approach canbeapplied to both conceptsand
toconceptslices.
Moreformally,analgorithmforKSAisafunction
whichtakesaprogramandaconceptassignedwithin
it, and returnsafunction which describesthe rela-
tiveweight of each statement in the concept. The
weightisrepresentedasafunction,formstatements,
Statement, to real numbers, IR. If the function
returnedisf thenf(s),denotestheweightofstate-
ments
Asimpleapproachidentiesasubsetofthestate-
mentsasbeingkey.A moreelaborateapproach,as-
signs weightsto each statement, indicating relative
keyness. These weightings will be real numbers in
therange 0to 1and so thesimple caseis merelya
special case in which the only twooutcomes are 0
and1.
“Mortgage”
Mortgage Interest
Interest Calculate
Calculate Interest
“Subtract”
“Interest”
“Outstanding” .
“Calculate” “Divide”
Key
Indicates Specialisation Composition Concept Indicator
Composite Concept
Figure1: AFragmentofaDomain Model
3.3 Concept Dependency Analysis (CDA)
Toperform concept assignment, an initial domain
model is created bythe softwareengineer (based
on their experience). This model ought to be im-
provedasthe methodis used.Unfortunately, using
traditionalconceptassignment,thereisnoguidance
toindicatewhichconceptsoccurtogetherfrequently
nortoelucidatetheinter-conceptrelationshipswhich
evolveas the analysis process iterates. Therefore,
themodel is improvedonly byserendipityand in a
poorlydenedad-hocmanner. Aclearlydenedand
tool-assisted feedback approach is required to sup-
port thedisciplinedandsystematicevolutionofthe
model,facilitatingprocessimprovementoversucces-
siveanalyses.
More formally, an algorithm for Concept Depen-
dencyAnalysis(CDA)takesaprogramandadomain
model and produces a concept dependence graph.
A concept dependence graph is a directed graph,
in which the nodes are concepts and the edges are
weighted.Thus, formally, the concept dependency
graphis aset of triples,such thattriple (c;c 0
;w) is
inthegraphithereisanedgefromconceptctoc 0
withweightw. Theconceptgraphisthusaweighted
relationonthesetofconcepts.
3.4 Principal Variables
Inordertoform conceptslices forKSAandCDA it
willbenecessarytodeterminetheprincipalvariables
ofanarbitrarysetofstatements. Theprincipalvari-
ablesarethosewhichmightbeconsideredtobethe
resultofthesetofstatements.
As Bieman and Ott point out in their workon
slice-based cohesionmeasurement[18, 25],the deci-
sionastowhatvariablesare`principal'issomewhat
arbitrary;changingitcan,ofcourse,altertheresults
ofthealgorithmsuponwhichitisbased. Therefore,
thedenitionofwhatconstitutesaprincipalvariable
shouldbetreatedasaparameteroftheconceptslic-
ing approach advocatedhere. The denition below
willbeusedasaworkingdenitionforthecasestudy
in Section7,and isderivedfromBiemanandOtt.
Denition5(Principal Variable)
A variablev in a setof statements S is aprincipal
variableiitiseither
globalandassignedin S
call-by-referenceandassignedinS
theparametertoanoutputstatementofS
Givenaset ofstatemen tsS, PV(S)will beused
todenote theset ofprincipalvariablesofS.
4 ECS Algorithm
The ECS algorithm is presented in Figure 3. The
algorithm isstraightforward. Thestatementsofthe
concept form the set of statements for the slicing
criterion. Slicing on these statements adds to the
concept, all statements of the original program re-
quired to ensure that theconcept statementsfaith-
fullymimic(in theconceptslice)theirbehaviourin
theoriginalprogram.
ExecutableConceptSlic-
ing(ECS)
Toform an executable
sub-component
Inter-Concept Analysis Reuseandre-engineering
Key Statement Analysis
(KSA)
Toreneaconcept Intra-ConceptAnalysis Comprehension and re-
verseengineering
Concept Dependency
Analysis(CDA)
Toidentifyinter-concept
relationships
Inter-Concept Analysis Domain model
improvement
Figure2: Overviewof theConceptSlicingFramework
function ECS(Program p,DomainModelD)
returns: setofProgram
letfc
1
;:::;c
n
g=Concepts(p;D)
foreachc
i 2fc
1
;:::;c
n g
letECS
i
=Slice(p;Statements(c
i ))
letTag(ECS
i
)=Tag(c
i )
endfor
returnfECS
1
;:::;ECS
n g
Figure3: TheExecutableConceptSlicingAlgorithm
5 KSA Algorithm
A simple KSA algorithm is presented in Figure 4.
The function KSA
BO
takesa program and a con-
ceptassignedwithin itandreturnsafunctionwhich
describestherelativeweightofeachstatementinthe
concept. Inthiscase,thereturnedresultiseither 0
or1, with 1 signifying that the statement is a key
statementand0signifying thatitisnot.
Theideaistousetheset ofprincipalvariablesin
theconcepttoformasetofslices. Theintersectionof
theseslicescontainsthestatementswhichcontribute
to the computation of everyprincipal variable; in
otherwords,thekeystatementsoftheconcept.
We callthisthe `BiemanOtt-style'algorithm, be-
cause it is inspired byBieman and Ott's work on
measuringcohesion using slicing[2, 18, 25,26,27].
Specically, the intersection of slices on principal
variablesisthe set usedto computethe`Tightness'
metricintroducedbyOttandThuss[26]. Tightness
waslater developed into a theory of cohesion mea-
surementbasedonslicing[2,25].
AnalternativeKSAalgorithmispresentedinFig-
ure5. In this approach, the returned valueassoci-
atedwithastatementisarealnumber,ratherthan
simply a valuein f0;1g. The valueassigned to a
statement represents the directness of dependence
between it and the principal variablesof the con-
cept. Theweightforstatementsiscomputedasthe
lengthoftheshortestpathfromstoanalusever-
tex ofa principal variable,normalized withrespect
functionKSA
BO
(Program p,ConceptSlicec)
returns: functionfrom Statementtof0;1g
foreachvariablev
i
in PV(c)
lets
i
=Slice(p;fFinalUse(Statements(c);v
i )g)
endfor
letTight= T
i s
i
letKS=Statements(c)\Tight
returnx:if x2KS then1else0
Figure4: KeyStatementAnalysis`BiemanOtt'Style
to thelengthofthelongest acyclicpathin thepro-
gram'sSDG 2
,suchthatstatementswithKSAvalues
closer to 1are more` key' and those with KSA val-
uescloserto 0are `lesskey'. This givesareal value
weightingbetween0and1foreachstatementin the
concept. ThealgorithmbuildsupthefunctionF to
bereturned,adding amapletfor eac h statements
i
whichmapss
i
toitsweight.
Observethat,becauseDist(s;s 0
;p)isundenedif
there is no path from s to s 0
in the SDG of p, the
weightofastatementisalsoundenedwhenthereis
nopathfromittothenalusevertexofanyprincipal
variable. For any such an`unconnected'statemen t,
the undenedness of the weight will alert the engi-
neeringto a possibleanomaly; whyis such astate-
mentinaconceptifithasnoeectonanyprincipal
variable?Wecallthisalgorithmthe`BallEick'algo-
rithmbecauseitisinspiredbyBallandEick'swork
ontheSeeSliceproject[1].
6 CDA Algorithm
AnalgorithmforproducingaweightedConceptDe-
pendencyGraphispresentedinFigure6.Weightings
willbeallocatedaccordingtotheamountofcompu-
2
Inthealgorithm,theSDGisused,buttheremaybeanaly-
sesfordata-intensiveprograms,forwhichitwouldbeedifying
toconsiderthereplacementoftheSDGwiththeDataDepen-
dence Graph (DDG) and (for controlsensitiveconcepts) to
usetheControlDependenceGraph(CGD).