• Aucun résultat trouvé

Mortgage Interest

N/A
N/A
Protected

Academic year: 2021

Partager "Mortgage Interest"

Copied!
11
0
0

Texte intégral

(1)

Assignment

MarkHarman 1

Nicolas Gold 2

Rob Hierons 1

DaveBinkley 3

1

Brunel University

2

UMIST

3

LoyolaCollege

Uxbridge, Middlesex Manchester Baltimore

UB8 3PH, UK. M60 1QD,UK. MD21210-2699, USA.

Keywords: slicing,concept assignment,source code extraction

Abstract

One approach to reverse engineering is to par-

tially automate subc omponent extraction, improve-

mentandsubsequent re c ombination. Twopreviously

proposed automated te chniques for supp ortingthis

activity are slicing and c oncept assignment. How-

ever,neitherisdire ctlyapplicableinisolation;slicing

criteria(setsofprogramvariables)aresimplytoolow

level in many cases, while c oncept assignment typi-

c allyfails toproduceexecutablesubc omponents.

This paper introducesaunication of slicing and

conceptassignmentwhichexploitstheircombinedad-

vantages, while overc omingtheir individual weak-

nesses. Our `c onceptslices' are extracted using

high level criteria, while producingexecutable sub-

programs. The paper introducesthre eways of com-

biningslicingandconceptassignmentandalgorithms

for e ach. The applicationof the conceptslicing al-

gorithmsisillustratedwith acasestudyfromalarge

nancialorganisation.

1 Introduction

For program comprehension and reverse engineer-

ing it is important to haveautomated techniques

for extracting executable subcomponents according

tohigh levelextraction criteria. These components

need to be semantically related to the original (so

that they can be executed in isolation), while the

criteriaforselection may need to identify disparate

sectionsof diverse codewhich will haveto be mar-

riedtogether. Therefore,theproblemistobeableto

automaticallyproduceprogramswhichanswerques-

tionsof the form: Given an original program, con-

struct the simplest program that, for exampleper-

formsthesamemasterleupdateoperationorwhich

closesdownthereactorunderthesameconditions.

Programslicingandconceptassignmentareauto-

matedsourcecodeextractiontechniquesthattakea

criterionandprogramsourcecodeasinputandyield

partsoftheprogram'ssourcecodeasoutput. There-

fore, they suggest themselves as natural candidate

solutionstothisproblem. Slicinghastheadvantage

thattheextractedcodeitproducescanbeexecuted

asaprograminits ownright,butthedisadvantage

thatthecriterionmustbeexpressedatthelowlevel

of program variables. Concept assignment has the

advantagethat the extractioncriterion isexpressed

at just the right level(in termsof concepts such as

`master le', `errorreco very'and`log update'), but

thedisadvantagethatthecodefragmentsitextracts

cannotbecompiled andexecutedasaseparatepro-

gram. Thus,eachtechniqueovercomesthediÆculty

associatedwiththeother.

Thispapershowshowslicingandconceptassign-

mentcanbecombinedtoproducebetterresultsthan

either is capable of individually. The contributions

ofthispapercanbesummarisedasfollows.

AframeworkforcombiningSlicingandConcept

Assignmentisintroduced

Algorithmsareintroduced for

{ ExecutableConceptSlicing

{ KeyStatementAnalysis

{ ConceptDependencyAnalysis

Theapplicationoftheconceptslicingapproach

to reverseengineering isillustratedwith acase

study

Therestofthepaperisorganisedasfollows. Sec-

tion2brieyreviewsslicingandconceptassignment

to makethe paper self-contained.It can safely be

skipped bya reader familiar with both techniques.

Section 3 presents a framework for unifying slic-

ing and concept assignment, suggesting three new

techniques which combine slicing with concept as-

signment. Algorithms for these three techniques:

(2)

Analysis(KSA) and Concept Dependency Analysis

(CDA)areintroducedin sections4,5and 6respec-

tively. Section 7presents a case study involvinga

nancialpayment system, which illustrates the use

ofthe concept slicingalgorithms introduced in sec-

tions4,5and6. Section 8concludes andSection 9

givesdirectionsforfuturework.

2 Background

This section providessome background, denitions

and notation for slicing and concept assignment

whichareusedin theremainderofthepaper.

2.1 Slicing

Programslicing[31]isdenedwithrespecttoa`slic-

ing criterion'. Slicing uses dependence analysis to

isolatethose partsof aprogramthat potentiallyaf-

fecttheslicingcriterion.

Traditionally`partsoftheprogramtobeisolated'

havebeen restricted to statements and predicates

and the slicing criterion has been dened in terms

ofaset of variables anda pointat which theirval-

uesare of interest. Morerecentwork hasextended

traditionalslicingby consideringnovelslicing crite-

riainvolvingconditionsandtestadequacyproperties

[5, 15]. The techniquesfor isolation of statements

havealso broadenedfrom statement deletion to al-

lowformoregeneraltransformation[4,12,30].

This paper will be concernedsolely with syntax-

preservingstaticslicing,which willbeused bothto

rene and to extend the results of concept assign-

ment. Inallcases,sliceswillbeconstructedforaset

ofnodesofaprogram'sControlFlowGraph(CFG).

Thismeans that theslicingcriterion will simplybe

asetofnstatementsfs

1

;:::;s

n g.

Denition1(Slice)

A slice of a program p for the slicing criterion

fs

1

;:::;s

n

g is an executable subprogram, s, con-

structed from p bystatement deletion, suchthat

s behavesiden ticallyto p with respect to the se-

quenceofvaluescomputedateachofthestatements

infs

1

;:::;s

n

g. Thesliceofaprogrampw.r.taset

ofstatementsS willbedenoted Slice(p;S).

This denition of a slice is essentiallythe exe-

cutableversionofthedenitionadoptedbytheSys-

tem Dependence Graph approach of Horwitz et al.

[16]. Typically, workon the System Dependence

Graph(SDG)denesittocontainasetof`naluse'

verticesforeachvariable. TheSDGisso-constructed

toguaranteetheexistenceof such avertexforeach

variable. This allows slices to be constructed for a

variableintermsofitsnalusevertex.

FinalUse(p;v)isthenalusevertexofvariablevin

programp.

The dependence graph itself can be useful in

analysingthedistancebetweenaslicenodeandsome

othernodein theslice.

Denition3(SDG Distance)

Givenstatements sand s 0

ofa program p,the dis-

tance,Dist(p;s;s 0

)isthelengthoftheshortestpath

betweensands 0

intheSDGofp. Ifthereisnopath

from sand s 0

in theSDG ofp, thenDist(p;s;s 0

)is

undened.

2.2 Concept Assignment

The concept assignment 1

problem is dened as \a

process of recognising concepts within a computer

program and building up an`understanding' of the

programbyrelatingrecognisedconceptsto portions

oftheprogram,itsoperationalcontextandtoonean-

other[3]." Itcanbeundertakenbyintelligen tagents

(tools),withthreedistinctapproachesbeingadopted

[3]:

1. Highly domain specic, model driven, rule-

based questionanswering systemsthat depend

on a manually populated database describing

the software system.This approach is typied

bytheLassiesystem[7].

2. Plandriven,algorithmicprogramunderstanders

or recognisers. Two examples of this type

are the Programmer's Apprentice [28], and

GRASPR[32].

3. Modeldriven,plausible reasoningsystems. Ex-

amplesofthistypeincludeDM-TAO[3],IRENE

[17],andHB-CA[9,10].

Biggersta et al. claim that systems using ap-

proaches 1 and 2 are good at completely deriv-

ingconceptswithinsmall-scaleprogramsbutcannot

dealwithlarge-scaleprogramsdue tooverwhelming

computationalgrowth. Approach3systemscaneas-

ily handle large-scale programs since their compu-

tational growthappears to be linear in the length

of the programunder analysis but they suer from

approximate andimpreciseresults[3].

Weareconcernedwithplausiblereasoningsystems

(category 3above)and allreferencesto conceptas-

signmentin this papershould betak en asreferring

to this kindof system. Plausible reasoningsystems

are of particular interest because they are scalable

1

Note:conceptassignmentisawhollydierenttechnology

from formalconcept analysis(FCA)(sometimes just called

`conceptanalysis').

(3)

levelconcepts than some of the other approaches.

Theassignmentisbasedontheevidenceavailablein

thecodebeinganalysedfrom whicha`bestguess'is

taken; reasoningisthus basedonplausibilityrather

thandeduction. Inadditiontothecommonapplica-

tionofconceptassignmentinhelpingmaintainersto

comprehendprograms, Cimitile et al. [6]havesug-

gested it as a wayof validatingthe adequacy of a

candidate criterion when iden tifyingsuitable mod-

ulesforreuse.

Hypothesis-Based Concept Assignment (HB-CA)

[9, 11] is one of the most recentexamples of a

plausible-reasoningconceptassignmentapproach. It

dealswiththepartoftheconceptassignmentprob-

lemthatinvolvesrelatingrecognisedconceptstopor-

tionsofaprogram. HB-CAusesasimpleknowledge

base to encode the relationships betweenconcepts

andpotentialevidencefortheminsourcecode. Itis

thisapproachthatweproposeto combinewithpro-

gramslicing. Thefollowingdenitionintroducesthe

notationwewilluseto denoteconceptassignment.

Denition4(Concept)

Aconceptc,namedn,ofaprogrampisconstructed

withrespecttoadomainmodelD. Theconceptcon-

sistsofataggedcontiguoussequenceofcodefromp,

forwhichthereisevidence(accordingtoD)thatthe

sequence implements the concept named n. Fora

conceptc,Tag(c)refersto thenameof theconcept

c,whileStatements(c)refersto itsstatements. For

aprogram pand domain model D, Concepts(p;D)

referstothesetofallconceptsassignedtopaccord-

ingto D.

Figure 1 shows a fragment of a domain model

(whichwillbeusedinthecasestudyinSection7). In

adomain model, conceptsare classiedinto actions

and objects and may be composed or specialised.

Eachconcepthasanumberofindicatorswhichrep-

resentthepotentialsourcecodeevidenceforthecon-

cept. Foreverypieceofsourcecodeevidencethat is

found,ahypothesisisgeneratedfortheappropriate

concept. Segments (contiguous groupsof hypothe-

sesdening acontiguousregionof source code) are

formed from the resulting list of hypothesesusing

conceptualdensityandprogramsyntaxtodenethe

boundaries. Thedominantconcept(i.e. theone for

whichthereismostevidence)ineachsegmentisas-

signedtotheappropriateregionofsourcecode.

3 A Framework for Unifying Slic-

ing and Concept Assignment

This section presents a framework of notation and

requirementsfor developingacombinedslicing and

usedtorefertotheresultofanycombinationofslic-

ing and concept assignment. Figure 2 depicts, in

overview,thethreetypesofconceptsliceconsidered

in thispaper.

3.1 Executable Concept Slicing(ECS)

ExecutableConceptSlicing(ECS)isthebasicstart-

ingpointfortheapproachweadvocate. An ECSis

formed using slicing to augment theresults of con-

cept assignmenttomaketheconcept anexecutable

sub-program.

Moreformally,analgorithmforECSisafunction

which takesaprogram, p, andadomain model, D,

and produces a set of executable sub-components,

one per concept in the program p accordingto D.

Eachreturnedconceptmustbeexecutableand,when

executed,thecomputationcapturesthecomputation

ontheassociatedconceptofpw.r.t. D. Informing

anECS, theset of statements taggedwith thecon-

cept namemaynolongerbecontiguous.

The end of the segment identied in concept as-

signment will be treatedasan end of programver-

tex. Forexample,inthecaseofCOBOL,thiscanbe

achievedbyinserting aSTOP RUN statement at the

endofthesegmentofcodeassigned totheconcept.

The ECS will befurther rened using keystate-

mentanalysis,asdescribedinthenextsection.

3.2 Key Statement Analysis(KSA)

Givenaconcept(and/orconceptslice), somestate-

mentswillbemoreimportantthanothers;theywill

contribute more to the computation embodied by

the concept. The more important statements are

regardedasthe`key'statementsoftheconcept. Key

StatementAnalysis(KSA)isananalysisstepwhich

aims to determinethe keystatementsin aconcept.

The approach canbeapplied to both conceptsand

toconceptslices.

Moreformally,analgorithmforKSAisafunction

whichtakesaprogramandaconceptassignedwithin

it, and returnsafunction which describesthe rela-

tiveweight of each statement in the concept. The

weightisrepresentedasafunction,formstatements,

Statement, to real numbers, IR. If the function

returnedisf thenf(s),denotestheweightofstate-

ments

Asimpleapproachidentiesasubsetofthestate-

mentsasbeingkey.A moreelaborateapproach,as-

signs weightsto each statement, indicating relative

keyness. These weightings will be real numbers in

therange 0to 1and so thesimple caseis merelya

special case in which the only twooutcomes are 0

and1.

(4)

“Mortgage”

Mortgage Interest

Interest Calculate

Calculate Interest

“Subtract”

“Interest”

“Outstanding” .

“Calculate” “Divide”

Key

Indicates Specialisation Composition Concept Indicator

Composite Concept

Figure1: AFragmentofaDomain Model

3.3 Concept Dependency Analysis (CDA)

Toperform concept assignment, an initial domain

model is created bythe softwareengineer (based

on their experience). This model ought to be im-

provedasthe methodis used.Unfortunately, using

traditionalconceptassignment,thereisnoguidance

toindicatewhichconceptsoccurtogetherfrequently

nortoelucidatetheinter-conceptrelationshipswhich

evolveas the analysis process iterates. Therefore,

themodel is improvedonly byserendipityand in a

poorlydenedad-hocmanner. Aclearlydenedand

tool-assisted feedback approach is required to sup-

port thedisciplinedandsystematicevolutionofthe

model,facilitatingprocessimprovementoversucces-

siveanalyses.

More formally, an algorithm for Concept Depen-

dencyAnalysis(CDA)takesaprogramandadomain

model and produces a concept dependence graph.

A concept dependence graph is a directed graph,

in which the nodes are concepts and the edges are

weighted.Thus, formally, the concept dependency

graphis aset of triples,such thattriple (c;c 0

;w) is

inthegraphithereisanedgefromconceptctoc 0

withweightw. Theconceptgraphisthusaweighted

relationonthesetofconcepts.

3.4 Principal Variables

Inordertoform conceptslices forKSAandCDA it

willbenecessarytodeterminetheprincipalvariables

ofanarbitrarysetofstatements. Theprincipalvari-

ablesarethosewhichmightbeconsideredtobethe

resultofthesetofstatements.

As Bieman and Ott point out in their workon

slice-based cohesionmeasurement[18, 25],the deci-

sionastowhatvariablesare`principal'issomewhat

arbitrary;changingitcan,ofcourse,altertheresults

ofthealgorithmsuponwhichitisbased. Therefore,

thedenitionofwhatconstitutesaprincipalvariable

shouldbetreatedasaparameteroftheconceptslic-

ing approach advocatedhere. The denition below

willbeusedasaworkingdenitionforthecasestudy

in Section7,and isderivedfromBiemanandOtt.

Denition5(Principal Variable)

A variablev in a setof statements S is aprincipal

variableiitiseither

globalandassignedin S

call-by-referenceandassignedinS

theparametertoanoutputstatementofS

Givenaset ofstatemen tsS, PV(S)will beused

todenote theset ofprincipalvariablesofS.

4 ECS Algorithm

The ECS algorithm is presented in Figure 3. The

algorithm isstraightforward. Thestatementsofthe

concept form the set of statements for the slicing

criterion. Slicing on these statements adds to the

concept, all statements of the original program re-

quired to ensure that theconcept statementsfaith-

fullymimic(in theconceptslice)theirbehaviourin

theoriginalprogram.

(5)

ExecutableConceptSlic-

ing(ECS)

Toform an executable

sub-component

Inter-Concept Analysis Reuseandre-engineering

Key Statement Analysis

(KSA)

Toreneaconcept Intra-ConceptAnalysis Comprehension and re-

verseengineering

Concept Dependency

Analysis(CDA)

Toidentifyinter-concept

relationships

Inter-Concept Analysis Domain model

improvement

Figure2: Overviewof theConceptSlicingFramework

function ECS(Program p,DomainModelD)

returns: setofProgram

letfc

1

;:::;c

n

g=Concepts(p;D)

foreachc

i 2fc

1

;:::;c

n g

letECS

i

=Slice(p;Statements(c

i ))

letTag(ECS

i

)=Tag(c

i )

endfor

returnfECS

1

;:::;ECS

n g

Figure3: TheExecutableConceptSlicingAlgorithm

5 KSA Algorithm

A simple KSA algorithm is presented in Figure 4.

The function KSA

BO

takesa program and a con-

ceptassignedwithin itandreturnsafunctionwhich

describestherelativeweightofeachstatementinthe

concept. Inthiscase,thereturnedresultiseither 0

or1, with 1 signifying that the statement is a key

statementand0signifying thatitisnot.

Theideaistousetheset ofprincipalvariablesin

theconcepttoformasetofslices. Theintersectionof

theseslicescontainsthestatementswhichcontribute

to the computation of everyprincipal variable; in

otherwords,thekeystatementsoftheconcept.

We callthisthe `BiemanOtt-style'algorithm, be-

cause it is inspired byBieman and Ott's work on

measuringcohesion using slicing[2, 18, 25,26,27].

Specically, the intersection of slices on principal

variablesisthe set usedto computethe`Tightness'

metricintroducedbyOttandThuss[26]. Tightness

waslater developed into a theory of cohesion mea-

surementbasedonslicing[2,25].

AnalternativeKSAalgorithmispresentedinFig-

ure5. In this approach, the returned valueassoci-

atedwithastatementisarealnumber,ratherthan

simply a valuein f0;1g. The valueassigned to a

statement represents the directness of dependence

between it and the principal variablesof the con-

cept. Theweightforstatementsiscomputedasthe

lengthoftheshortestpathfromstoanalusever-

tex ofa principal variable,normalized withrespect

functionKSA

BO

(Program p,ConceptSlicec)

returns: functionfrom Statementtof0;1g

foreachvariablev

i

in PV(c)

lets

i

=Slice(p;fFinalUse(Statements(c);v

i )g)

endfor

letTight= T

i s

i

letKS=Statements(c)\Tight

returnx:if x2KS then1else0

Figure4: KeyStatementAnalysis`BiemanOtt'Style

to thelengthofthelongest acyclicpathin thepro-

gram'sSDG 2

,suchthatstatementswithKSAvalues

closer to 1are more` key' and those with KSA val-

uescloserto 0are `lesskey'. This givesareal value

weightingbetween0and1foreachstatementin the

concept. ThealgorithmbuildsupthefunctionF to

bereturned,adding amapletfor eac h statements

i

whichmapss

i

toitsweight.

Observethat,becauseDist(s;s 0

;p)isundenedif

there is no path from s to s 0

in the SDG of p, the

weightofastatementisalsoundenedwhenthereis

nopathfromittothenalusevertexofanyprincipal

variable. For any such an`unconnected'statemen t,

the undenedness of the weight will alert the engi-

neeringto a possibleanomaly; whyis such astate-

mentinaconceptifithasnoeectonanyprincipal

variable?Wecallthisalgorithmthe`BallEick'algo-

rithmbecauseitisinspiredbyBallandEick'swork

ontheSeeSliceproject[1].

6 CDA Algorithm

AnalgorithmforproducingaweightedConceptDe-

pendencyGraphispresentedinFigure6.Weightings

willbeallocatedaccordingtotheamountofcompu-

2

Inthealgorithm,theSDGisused,buttheremaybeanaly-

sesfordata-intensiveprograms,forwhichitwouldbeedifying

toconsiderthereplacementoftheSDGwiththeDataDepen-

dence Graph (DDG) and (for controlsensitiveconcepts) to

usetheControlDependenceGraph(CGD).

Références

Documents relatifs

The system developed by the LAST for the PAN@FIRE 2020 Task on Authorship Identification of SOurce COde (AI-SOCO) was aimed at obtaining the best possible performance by combining a

This form of domain knowledge is modeled in a bottom-up manner, using the references to legal texts in the literature as instances in the bottom levels of the concept hierarchy..

In this paper we present a novel method for identification and classification of name entities based on the features. Though classifying named entities from twitter data is

This in- cludes the following: Declarations from imported mod- ules are added, predicates and their goals and subgoals are annotated with inferred determinisms and modes, variables

The goals of this evaluation were to test the keyword extraction services with the examples of existing educational content, to compare the keywords extracted

Dans cet article, nous avons montr´e comment la combinaison d’un algorithme g´en´e- tique (AG) et d’un compilateur source ` a source permettait d’explorer avantageuse- ment

When constructing the dependency graph, we have selected only those dependencies whose composing nodes contains at least one class, which means that the

In this paper, we identify examples of transformation patterns in real world software systems and study their properties: (i) they are specific to a system; (ii) they were