Framework: Is it Feasible?
RajaVallée-Rai,EtienneGagnon,LaurieHendren,
PatrickLam,PatricePominvilleand VijaySundaresan
{rvalleerai,gagnon,hendren,plam,patrice}@sable.mcgill.ca
vijaysun@ca.ibm.com
SableResearchGroup,SchoolofComputerScience,McGillUniversity
Abstract. ThispaperpresentsSoot,aframeworkforoptimizingJava TM
bytecode.TheframeworkisimplementedinJavaandsupportsthree
in-termediaterepresentationsforrepresentingJavabytecode:Baf,a
stream-lined representation of Java's stack-based bytecode; Jimple, a typed
three-addressintermediaterepresentationsuitableforoptimization;and
Grimp,anaggregatedversionofJimple.
Ourapproachtoclassleoptimizationistorstconvertthestack-based
bytecodeintoJimple,athree-addressformmoreamenabletotraditional
programoptimization, and thenconvertthe optimized Jimple back to
bytecode.
In orderto demonstratethat our approach is feasible, we present
ex-perimental results showing the eects of processing class les through
ourframework. Inparticular, we studythe techniquesnecessaryto
ef-fectivelytranslateJimplebacktobytecode,withoutlosingperformance.
Finally,we demonstratethat classleoptimizationcan bequite
eec-tivebyshowingtheresultsofsomebasicoptimizationsusingour
frame-work. Ourexperimentsweredone on ten benchmarks, including seven
SPECjvm98benchmarks,andwereexecutedonvedierentJavavirtual
machineimplementations.
1 Introduction
Javaprovides many attractivefeatures such asplatform independence,
execu-tion safety, garbagecollection and objectorientation. These features facilitate
application developmentbut are expensive to support; applications written in
JavaareoftenmuchslowerthantheircounterpartswritteninCorC++.Touse
thesefeatureswithouthavingtopayagreatperformancepenalty,sophisticated
optimizationsandruntimesystemsarerequired.Forexample,Just-In-Time
com-pilers[1],adaptivecompilerssuchas Hotspot TM
andWay-Ahead-Of-Time Java
compilers[18,17]arethree approachesusedtoimproveperformance.
Our approach is to statically optimize Java bytecode. There are several
reasons for optimizing at the bytecode level. Firstly, the optimized bytecode
imple-improvementisduetobothourstaticbytecodeoptimization,andthe
optimiza-tionsandsophisticatedruntimesystemsusedinthevirtualmachinesexecuting
theoptimizedbytecode.Secondly,manydierentcompilersforavarietyof
lan-guages(Ada,Scheme,Fortran,Eiel,etc.)nowproduceJavabytecodeastheir
target code. Thus, ouroptimization techniquescanbeapplied asabackend to
allofthesecompilers.
Thegoalofourworkistodeveloptoolsthatsimplify thetaskofoptimizing
Javabytecode,andtodemonstratethatsignicantoptimizationcanbeachieved
using these tools. Thus,wehavedevelopedtheSoot[20]framework which
pro-videsasetofintermediaterepresentationsandasetofJavaAPIsforoptimizing
Javabytecodedirectly. Since ourframework iswritten in Java,and providesa
set ofclean APIs,itshould beportableandeasytobuildupon.
Earlyin our work we found that optimizing stack-basedbytecode directly
was,ingeneral,toodicult.Thus,ourframeworkconsistsofthreeintermediate
representations,twoofwhicharestacklessrepresentations.Bafisastreamlined
representationofthestack-basedbytecode,whereasJimpleandGrimparemore
standard,stacklessintermediaterepresentations.Jimple isathree-address
rep-resentation,whereeachinstructionissimple,andeachvariablehasatype.Itis
idealforimplementingstandardcompileranalysesandtransformations.Grimp
issimilartoJimple,buthasaggregatedexpressions.Grimpisusefulfor
decom-pilation,andas ameanstogenerateecientbytecodefromJimple.
InordertooptimizebytecodewerstconvertbytecodetoJimple(the
three-addressrepresentation),analyzeandoptimizetheJimplecode,andthenconvert
theoptimizedJimplebacktobytecode.Inthispaperwefocusonthetechniques
usedto translatefromJimple toecientbytecode,andwegivetwoalternative
approachestothis translation.
Our framework is designed so that many dierent analyses could be
im-plemented, above and beyond those carried out by javacor aJIT. A typical
transformationmightberemovingredundanteldaccesses,orinlining.Forthese
sortsof optimizations,improvementsatthe Jimplelevelcorresponddirectly to
improvementsinnalbytecodeproduced.
We have performed substantial experimental studies to validate our
ap-proach.Ourrstresultsshowthatwecanindeedgothroughthecycle,bytecode
to Jimple and back to bytecode, without losing any performance. This means
thatanyoptimizationsmadeinJimplewilllikelyalsoresultin optimizationsin
thenal bytecode. Oursecondset ofresultsshowtheeectof optimizingclass
les using method inlining and a set of simple intraprocedural optimizations.
TheseresultsshowthatwecanachieveperformancegainsoveravarietyofJava
VirtualMachines.
Insummary,ourcontributionsinthispaperare:(1)threeintermediate
rep-resentationswhich provideageneral-purposeframeworkforbytecode
oftheframeworkandtheintermediaterepresentations.Section3describestwo
alternativeapproachestotranslatingJimplebacktobytecode.Section4presents
and discusses our experimental results. Section 5 discusses related work and
Section 6coverstheconclusionsandfuture work.
2 Framework Overview
The Soot framework has been designed to simplify the process of developing
newoptimizationsforJavabytecode.Figure1showstheoverallstructureofthe
framework.Asindicatedatthetopofthegure,manydierentcompilerscanbe
usedto generatetheclassles.ThelargeboxlabeledSOOTdemonstratesthat
theframeworktakestheoriginalclasslesasinput,andproducesoptimizedclass
lesasoutput.Finally,asshownatthebottomofthegure,theseoptimizedclass
lescanthenbeusedasinputtoJavainterpreters,Just-In-Time(JIT)compilers,
adaptiveexecutionengineslikeHotspot TM
, andAhead-of-Timecompilerssuch
as theHighPerformanceCompilerforJava(HPCJ) TM
orTowerJ TM
.
Theinternalstructure ofSootis indicatedinside thelargeboxin Figure1.
Shadedcomponentscorrespondtomodulesthatwehavedesigned/implemented
andeachcomponentisdiscussedin thefollowingsubsections.
2.1 Jimplify
The rstphaseis to jimplify the inputclass les, that is, to convert class les
totheJimplethree-addressrepresentation.Weconverttoathree-address
repre-sentationbecauseoptimizingstackcodedirectlyisawkwardforseveralreasons.
First,thestackimplicitlyparticipatesineverycomputation;thereareeectively
twotypesof variables, the implicitstack variablesand explicit local variables.
Second,the expressionsare notexplicit,and mustbelocated onthestack[25].
Forexample,asimpleinstructionsuchasaddcanhaveitsoperandsseparatedby
anarbitrarynumberofstackinstructions,and evenbybasicblockboundaries.
Anotherdicultyistheuntypednatureofthestackandofthelocalvariablesin
thebytecode,asthisdoesnotallowanalysestomakeuseofthedeclaredtypeof
variables.Afourthproblemisthejsrbytecode.Thejsrbytecodeisdicultto
handle becauseitis essentiallyainterproceduralfeature which isinserted into
atraditionallyintraproceduralcontext.
WeproduceJimplefrom bytecodebyrstprocessingthebytecodefromthe
inputclasslesandrepresentingitinaninternalformat. 1
Duringthistranslation
jsrinstructionsareeliminatedbyinlineexpansionofcode.Afterproducingthe
internalform ofbytecode,thetranslationtoJimple codeproceedsasfollows:
1. Producenaive3-addresscode:Mapeverystackvariabletoalocalvariable,by
determiningthestackheightateveryinstruction.Thenmapeachinstruction
which acts implicitly on thestack variables to a3-address codestatement
source
Java
source
SML
source
Scheme
source
Eiffel
Baf
Jasmin files
SOOT
Jimple
Optimized class files
class files
Grimp
JIMPLE
Optimized Jimple
OPTION I
OPTION II
javac
MLJ
KAWA
JIMPLIFY
OPTIMIZE JIMPLE
Baf
GENERATE NAIVE
STACK CODE
OPTIMIZE
STACK CODE
GENERATE
STACK CODE
GENERATE JASMIN
ASSEMBLER
JASMIN
Interpreter
JIT
Adaptive Engine
Ahead-of-Time
Compiler
SmallEiffel
AGGREGATE
Fig.1.SootFrameworkStructure
whichreferstothelocalvariablesexplicitly.Forexample,ifthecurrentstack
height is 3, then the instruction iadd would be translated to the Jimple
statement$i2 = $i2 + $i3.This isastandardtechniqueandisalso used
inothersystems[18,17].
2. Type the local variables: The resulting Jimple code may be untypable
be-causealocalvariablemaybeusedwithtwodierenttypesin dierent
ablescanbegivenaprimitive,class,orinterfacetype. Todothis,weinvoke
analgorithmdescribedin[9].Thecompletesolutiontothistypingproblemis
NP-complete,butinpracticesimplepolynomialalgorithmssuce.Although
splittingthe local variablesin this stepproducesmanylocal variables, the
resultingJimple codetendstobeeasiertoanalyzebecauseitinherits some
ofthedisambiguationbenetsofSSAform[5].
3. Cleanupthe code:Jimplecodemustnowbecompactedbecausestep1
pro-ducedextremelyverbosecode[18,17].Wehavefoundthatsimpleaggregation
(collapsingsingledef/usepairs)followedbycopypropagation/eliminationof
stackvariablestobesucienttoeliminatealmost allredundantstack
vari-ables.
Figure 2(a)shows aninput Javaprogram andFigure 2(b) showsthe
byte-code generated by javac as we would represent it in our Baf representation.
Figure2(c)showsthe Jimplecodethatwouldresultfromjimplifying the
byte-code. Note that all local variables have been giventypes. Variables beginning
with $ correspond to variables that were inserted to standfor stack locations,
whereas variables that do notbeginwith $ correspondto local variablesfrom
thebytecode.
2.2 Optimize Jimple
Most of the analyses and transformations developed by our group, as well as
otherresearchgroups,areimplementedusingtheJimplerepresentation.Wehave
currentlyimplementedmanyintraproceduralandinterproceduraloptimizations.
In thispaperwefocus onstudying theeect ofseveral simpleintraprocedural
techniquesin conjunctionwithmethodinlining.
2.3 ConvertJimpleback to Stack Code
Producing bytecodenaivelyfrom Jimple code produceshighly inecientcode.
EventhebestJITthatwetestedcannotmakeupforthisineciency.Anditis
veryimportantthat wedo notintroduceinecienciesat thispoint thatwould
negatetheoptimizationsperformedonJimple.
Wehaveinvestigatedtwoalternativesforproducingstackcode,labeled
Op-tionI andOptionIIin Figure1.These twooptionsarediscussedin moredetail
in Section3,andtheyareexperimentallyvalidated inSection4.
2.4 Generate JasminAssembler
AfterconvertingJimpletoBafstackcode,thenalphaseistogenerateJasmin
[11]assemblerlesfromtheinternalBafrepresentation.SinceBafisarelatively
directencodingoftheJavabytecode,thisphaseisquitesimple.
2
int x;
public void f(int i, int c)
{ g(x *= 10);
while(i * 2 < 10)
{ a[i++] = new Object();
}
}
(a)OriginalJavaSource
.method public f(II)V
aload_0 aload_0 dup getfield A/x I bipush 10 imul dup_x1 putfield A/x I invokevirtual A/g(I)V goto Label1 Label0: aload_0
getfield A/a [Ljava/lang/Object;
iload_1 iinc 1 1 new java/lang/Object dup invokenonvirtual java/lang/Object/<init>()V aastore Label1: iload_1 iconst_2 imul bipush 10 if_icmplt Label0 return
public void f(int, int)
{
Example this;
int i, c, $i0, $i1, $i2, $i3;
java.lang.Object[] $r1; java.lang.Object $r2; this := @this; i := @parameter0; c := @parameter1; X:$i0 = this.x; X:$i1 = $i0 * 10; this.x = $i1; this.g($i1); goto label1; label0: Y:$r1 = this.a; $i2 = i; i = i + 1; $r2 = new java.lang.Object; specialinvoke $r2.<init>(); Y:$r1[$i2] = $r2; label1: Z:$i3 = i * 2;
Z:if $i3 < 10 goto label0;
return;
}
(b)Bytecode (c)Jimple
Fig.2.TranslatingbytecodetoJimple
3 Transformations
After optimizingJimple code, itisnecessarytotranslatebackto ecientJava
bytecode. Figure 3(a)illustrates thebytecode produced byanaivetranslation
fromJimple. 3
Notethat thisnaivecodehasmanyredundantstoreandload
in-structionsreectingthefactthatJimplecomputationresultsarestoredtolocal
variables, whilestack-basedbytecodecanusethestacktostoremany
interme-diate computations.
3
Note that our Baf representation closely mirrors Java bytecode. However, we do
ThemainideabehindBafoptimizationsisto identifyandeliminateredundant
store/loadcomputations.Figure 3(b)showstheresultsofapplyingtheBaf
op-timizationsonthenaivecodegivenin Figure3(a).
Theoptimizationscurrentlyinuseareallperformedondiscretebasicblocks
withinamethod.Inter-blockoptimizationsthatcrossblockboundarieshavealso
beenimplemented,buttodatethesehavenotyieldedanyappreciablespeedups
on the runtime and are not described here. In the following discussion it is
assumedthatallinstructionsbelongto thesamebasicblock.
Inpractice,themajorityoftheredundantstoreandloadinstructionspresent
in naiveBafcodebelongto oneofthefollowingcodepatterns:
store/load (sl pair) :astoreinstructionfollowedbyaloadinstruction
refer-ringtothesamelocal variablewithnoother uses.Boththestoreand load
instructionscanbeeliminated,andthevaluewillsimplyremainonthestack.
store/load/load (slltriple) : astore instructionfollowedby 2load
instruc-tions, all referring to the same local variable with no other uses. The 3
instructionscan be eliminated and a dupinstruction introduced. The dup
instructionreplacesthesecondloadbyduplicatingthevalueleftonthestack
aftereliminatingthestoreandtherstload.
WecanobservethesepatternsoccurringinFigure3(a)wherelabelsA,B,C
andEeachidentifydistinctslpairs,andwherelabelsDandGidentifyslltriples.
To optimize a Baf method our algorithm performs a xed point iteration
overitsbasicblocksidentifyingand reducingthese patternswheneverpossible.
Byreducing,wemeaneliminatingan slpairorreplacingansll tripleto adup
instruction.
Reducing a pattern is trivial when all its instructions directly follow each
other in theinstruction stream. Forexample,the sl pairsat labels Aand Ein
Figure 3(a)aretrivialones.However,oftenthestore andloadinstructionsare
far apart (like at labels B and C.) To identify pairs of statements to reduce,
wemust determine theeect on the stack of the intervening bytecode. These
bytecodesarecalledtheinterleavingsequences.
We compute two pieces of information asa means to analyse interleaving
sequences and their eects/dependencies on the stack. The net eect on the
stack height after executing a sequence of bytecode is referred to as the net
stack height variation or nshv. The minimum stack height variation attained
while executing a sequence of bytecode is referred to as the minimum stack
height variation ormshv.A sequenceofinstructionshavingbothnshv=0and
mshv=0isreferredto asalevel sequence.
If interleaving sequences are level sequences, then the target patterns can
bereduced directly. This is because onecan just leavethe valueon thestack,
executetheinterleavinginstructions,andthenusethevaluethatwasleftonthe
stack.Thisis thecasefortheslpairlabeledB,andlaterthesl pairlabeledC,
once Bhasbeenreduced.Finally,the slltriple Dcanbe reduced,oncebothB
andChavebeenreduced.Ingeneral,however,manysuchinterleavingsequences
{ word this, i, c, $i2, $r2;
this := @this: Example;
i := @parameter0: int;
c := @parameter1: int;
load.r this;
fieldget <Example: int x>;
A:store.i c; A:load.i c; push 10; mul.i; G:store.i c; F:load.r this; G:load.i c;
fieldput <Example: int x>;
F:load.r this;
G:load.i c;
virtualinvoke
<Example: void g(int)>;
goto label1;
label0:
load.r this;
fieldget
<Example: java.lang.Object[] a>;
B:store.r c; load.i i; C:store.i $i2; inc.i i 1; new java.lang.Object; D:store.r $r2; D:load.r $r2; specialinvoke
<java.lang.Object: void <init>()>;
B:load.r c; C:load.i $i2; D:load.r $r2; arraywrite.r; label1: load.i i; push 2; mul.i; E:store.i $r2; E:load.i $r2; push 10; ifcmplt.i label0; return; }
public void f(int, int)
{ word this, i, c;
this := @this: Example;
i := @parameter0: int;
c := @parameter1: int;
load.r this;
F:load.r this;
F:load.r this;
fieldget <Example: int x>;
push 10;
mul.i;
G:store.i c;
G:load.i c;
fieldput <Example: int x>;
G:load.i c;
virtualinvoke
<Example: void g(int)>;
goto label1;
label0:
load.r this;
fieldget
<Example: java.lang.Object[] a>;
load.i i;
inc.i i 1;
new java.lang.Object;
D:dup1.r;
specialinvoke
<java.lang.Object: void <init>()>;
arraywrite.r; label1: load.i i; push 2; mul.i; push 10; ifcmplt.i label0; return; }
(a)naiveBafgeneratedfromJimple (b)optimizedBaf
pair,thenouralgorithmwilltrytolowerthenshvvaluebyrelocatingabytecode
havingapositivenshvto anearlierlocationin theblock.Thisis illustratedby
themovementofinstructionslabeledbyFinFigures3(a)and3(b)inanattempt
to reducethepattern identiedby G.Anotherstrategyusedwhen nshv<0is
to movelevelsubsequenceending withthe pattern's storeinstructionpastthe
interleavingsequence.Of course,thiscanonlybedoneifnodatadependencies
areviolated.
Applyingthese heuristicsproducesoptimizedBafcodewhichbecomesJava
bytecode and is extremely similar to the original bytecode. We observe that
except for two minor dierences, theoptimized Bafcode in Figure 3(b) is the
sameastheoriginalbytecodefoundin Figure2(b).Thedierencesare:(1)the
secondloadlabeledbyFisnotconvertedtoadup;and(2)thepatternidentied
by G is not reduced to a dup_x1 instruction. We have actually implemented
thesepatterns,butourexperimentalresultsdidnotjustifyenablingtheseextra
transformations.Infact,introducingbytecodessuchasdup_x1oftenyields
non-standard,albeitlegal,bytecodesequencesthatincreaseexecutiontimeandcause
manyJITcompilerstofail.
3.2 OptionII:Build Grimpand Traverse
In this section, wedescribe thesecond route for translating Jimple into
byte-code.Thecompilerjavacisabletoproduceecientbytecodebecauseithasthe
structuredtreerepresentationoftheoriginalprogram,and thestackbased
na-tureofthebytecodeisparticularlywellsuitedforcodegenerationfromtrees[2].
Essentially,thisphaseattemptstorecovertheoriginalstructuredtree
represen-tation, by building Grimp, anaggregated form of Jimple, and then producing
stackcodebystandardtreetraversaltechniques.
GrimpisessentiallyJimplebuttheexpressionsaretreesofarbitrarydepth.
Figure4(a)showstheGrimpversionoftheJimple programinFigure2(c).
Aggregation of bytecode The basicalgorithm foraggregationis asfollows.
Weconsiderpairs(def,use)inextendedbasicblocks,wheredefisanassignment
statementwithsoleuseuse,andusehastheuniquedenitiondef.Weinspectthe
pathbetweendefanduse,todetermineifdefcanbesafelymovedintouse.This
means checkingfor dependencies andnotmovingacrossexception boundaries.
We perform this algorithm iteratively, and the pairs are considered in reverse
pseudo-topologicalorder tocascade theoptimizations asecientlyaspossible.
Examples of these aggregation opportunities are shown in gure 2(c) at the
labelsX, YandZ.XandZ aretrivialcasesbecausetheiraggregationpairsare
adjacent, butthe pairat Yareafew statements apart.Thusbefore producing
the aggregated code in 4(a) the interleaving statements must be checked for
writesto this.a.
Peephole optimizations Insomecases,GrimpcannotconciselyexpressJava
can-public void f(int, int)
{
Example this;
int i, c, $i1, $i2;
this := @this; i := @parameter0; c := @parameter1; X:$i1 = this.x * 10; this.x = $i1; this.g($i1); goto label1; label0: $i2 = i; i = i + 1; Y:this.a[$i2] = new java.lang.Object(); label1:
Z:if i * 2 < 10 goto label0;
return;
}
{ word this, i, c, $i2;
this := @this: Example;
i := @parameter0: int;
c := @parameter1: int;
load.r this;
fieldget <Example: int x>;
push 10;
mul.i;
store.i c;
load.r this;
load.i c;
fieldput <Example: int x>;
load.r this;
load.i c;
virtualinvoke
<Example: void g(int)>;
goto label1;
label0:
load.r this;
fieldget
<Example: java.lang.Object[] a>;
load.i i;
inc.i i 1;
new java.lang.Object;
dup1.r
specialinvoke
<java.lang.Object: void <init>()>;
arraywrite.r; label1: load.i i; push 2; mul.i push 10; ifcmplt.i label0; return; }
(a)Grimp (b)BafgeneratedfromGrimp
Fig.4.GeneratingBaffromGrimp
sideofanassignmentstatement,notasasideeect. 4
Toremedythis problem,
weusesomepeepholeoptimizationsin thecodegeneratorforGrimp.
For example, for the increment case, we search for Grimp patterns of the
form:
s1: local = <lvalue>;
s2: <lvalue> = local/<lvalue> + 1;
s3: use(local)
andweensurethatthelocaldenedins
1
hasexactlytwouses,andthattheuses
ins
2 ;s
3
haveexactlyonedenition.Giventhissituation,weemitcodeforonly
s
3
.However,duringthegenerationofcodefors
3
,whenlocalistobeemitted,
exampleofthispatternoccursjust afterLabel0in Figure4(a).
Thisapproachproducesreasonablyecientbytecode.Insomesituationsthe
peephole patterns fail and the complete original structure is not recovered.In
thesecases,theBafapproachusuallyperformsbetter.Seethesection4formore
details.
4 Experimental Results
Herewepresenttheresultsoftwoexperiments.Therstexperiment,discussed
in Section 4.3, validates that we can pass class les through the framework,
withoutoptimizingtheJimple code, andproduceclasslesthathavethesame
performance as the original ones. In particular, this shows that our methods
of converting from Jimple to stack-basedbytecodeare acceptable. The second
experiment,discussedinSection4.4,showstheeectofapplyingmethodinlining
onJimplecodeanddemonstratesthatoptimizingJavabytecodeisfeasibleand
desirable.
4.1 Methodology
All experimentswereperformedondual 400MhzPentiumII TM
machines. Two
operating systemswere used, Debian GNU/Linux (kernel 2.2.8)and Windows
NT 4.0 (service pack 5). Under GNU/Linux we ran experiments using three
dierentcongurationsoftheBlackdownLinuxJDK1.2,pre-releaseversion2. 5
Thecongurationswere:interpreter,SunJIT,andapublicbetaversionof
Bor-land'sJIT 6
.UnderWindowsNT,twodierentcongurationsofSun'sJDK1.2.2
wereused:theJIT,andHotSpot(version1.0.1)
Execution timeswere measuredbyrunningthe benchmarksten times,
dis-carding the best and worst runs, and averaging the remaining eight. All
exe-cutions were veried forcorrectness by comparing the output to the expected
output.
4.2 Benchmarks and Baseline Times
Thebenchmarksusedconsistofsevenoftheeightstandardbenchmarksfromthe
SPECjvm98 7
suite, plus three additionalapplications from our collection.See
gure5.Wediscardedthemtrtbenchmarkfromoursetbecauseitisessentially
the samebenchmark as raytrace. The soot-c benchmark is based on an older
versionofSoot,andisinterestingbecauseitisheavilyobjectoriented,schro
eder-s is an audio editing program which manipulates sound les, and jpat-p is a
proteinanalysistool.
Figure 5also givesbasiccharacteristicssuchas size,and runningtimes on
theveplatforms.Allofthese benchmarksarerealworldapplications thatare
5
http://www.blackdown.org
6
http://www.borland.com
Linux interpreter as the base time, and all the fractional execution times are
withrespectto thisbase.
Benchmarks for which a dash is given for the running time indicates that
the benchmark failed validity checks. In all these cases, the virtual machine
isto blameas theprogramsruncorrectly withtheinterpreterwiththeverier
explicitlyturnedon.Arithmeticaveragesandstandarddeviationsarealsogiven,
andtheseautomatically excludethoserunningtimeswhich arenotvalid.
Forthissetofbenchmarks,wecandrawthefollowingobservations.TheLinux
JIT isabouttwice asfastasthe interpreterbut itvaries widely depending on
thebenchmark.Forexample,withcompressitismorethansixtimesfaster,but
forabenchmarklikeschroeder-sitisonly56%faster.TheNTvirtualmachines
also tend to be twice as fast as the Linux JIT. Furthermore,the performance
of theHotSpot performance engineseemsto be, onaverage,notthat dierent
from the standard Sun JIT. Perhaps this is because the benchmarks are not
longrunningserversideapplications(i.e.notthekindsofapplicationsforwhich
HotSpotwasdesigned).
4.3 Straight throughSoot
Figure6comparestheeectof processingapplicationswithSoot withBafand
Grimp, without performing any optimizations. Fractional execution times are
given, and these are with respect to the original execution time of the
bench-mark fora givenplatform. The idealresult is 1.00.This means that thesame
performance is obtained as the original application. Forjavac the ratio is .98
which indicates that javac's execution time has beenreduced by 2%. raytrace
hasaratioof1.02whichindicatesthatitwasmadeslightlyslower;itsexecution
time has been increasedby 2%. The idealarithmetic averages forthese tables
is 1.00becausewearetryingtosimply reproducethe programasis.Theideal
standarddeviationis0whichwouldindicate thatthetransformationis having
aconsistenteect,andtheresultsdonotdeviatefrom1.00.
On average,using Baf tends to reproduce the original execution time. Its
averageislowerthanGrimp's,andthestandarddeviationisloweraswell.For
the faster virtual machines (the ones on NT), this dierence disappears. The
main disadvantage of Grimp is that it can producea noticeableslowdown for
benchmarkslikecompresswhichhavetightloopsonJavastatementscontaining
sideeects, whichitdoesnotalwayscatch.
Bothtechniqueshavesimilarrunningtimes,butimplementingGrimpandits
aggregationisconceptuallysimpler.IntermsofcodegenerationforJavavirtual
machines, webelievethat ifoneis interestedin generatingcodefor slowVMs,
then the Baf-like approach is best. For fast VMs, or if one desires a simpler
compilerimplementation,then Grimpismoresuitable.
Wehavealso measuredthe sizeofthe bytecode before andafter processing
withSoot.Thesizesareverysimilar,with thecodeafterprocessingsometimes
slightlylarger,and sometimes slightly smaller.ForthesevenSPECjvm
We haveinvestigated the feasibility of optimizing Javabytecode with Soot by
implementingmethodinlining. Althoughinliningisawholeprogram
optimiza-tion,Sootis alsosuitableforoptimizationsapplicabletopartialprograms.Our
approachto inlining is simple.We build aninvokegraphusing class hierarchy
analysis[8] and inline method calls wheneverthey resolveto one method. Our
inlinerisabottom-upinliner,andattemptstoinlineallcallsites subjecttothe
followingrestrictions:1)themethodtobeinlinedmustcontainlessthan20
Jim-ple statements, 2) nomethod maycontain morethan 5000Jimple statements,
and3)nomethod mayhaveitssizeincreasedmorethanbyafactorof3.
After inlining, the following traditional intraprocedural optimizations are
performed to maximize the benet from inlining: copy propagation, constant
propagationandfolding,conditionalandunconditionalbranchfolding,dead
as-signmenteliminationand unreachablecodeelimination.These aredescribedin
[2].
Figure7givestheresultofperformingthisoptimization. Thenumbers
pre-sentedarefractionalexecutiontimeswithrespecttotheoriginalexecutiontime
of thebenchmarkforagivenplatform. FortheLinux virtualmachines, we
ob-tain asignicantimprovement in speed.In particular, for the Linux Sun JIT,
theaverageratiois.92whichindicatesthattheaveragerunningtimeisreduced
by8%.Forraytrace,theresultsarequitesignicant,asweobtainaratioof.62,
areductionof38%.
ForthevirtualmachinesunderNT,theaverageis1.00or1.01,butanumber
ofbenchmarksexperienceasignicantimprovement.Onebenchmark,javac
un-dertheSunJIT, experiencessignicantdegradation.However,underthesame
JIT,raytraceyieldsaratioof.89,andunderHotSpot,javac,jackandmpegaudio
yieldsomeimprovements.GiventhatHotSpot itselfperformsdynamicinlining,
thisindicatesthatourstaticinliningheuristicssometimescaptureopportunities
thatHotSpotdoesnot.OurheuristicsforinliningwerealsotunedfortheLinux
VMs,andfuture experimentationcouldproducevalueswhich arebettersuited
fortheNTvirtualmachines.
Theseresultsare highly encouragingastheystronglysuggest thata
signif-icant amount of improvement can be achieved by performing aggressive
opti-mizationswhicharenotperformedbythevirtualmachines.
5 Related Work
Relatedworkfallsintovedierentcategories:
Javabytecode optimizers:ThereareonlytwoJavatoolsofwhichweareaware
that perform signicant optimizations on bytecode and produce new class
les:Cream[3] andJax[23].Cream performs optimizationssuchasloop
in-variantremovalandcommonsub-expressioneliminationusingasimpleside
eect analysis.Only extremely small speed-ups (1% to 3%) are reported,
ex-Stmts SunInt. Sun Bor. Sun Sun
(secs) JIT JIT JIT Hot.
compress 7322 440.30 .15 .14 .06 .07 db 7293 259.09 .56 .58 .26 .14 jack 16792 151.39 .43 .32 .15 .16 javac 31054 137.78 .52 .42 .24 .33 jess 17488 109.75 .45 .32 .21 .12 jpat-p 1622 47.94 1.01 .96 .90 .80 mpegaudio 19585 368.10 .15 - .07 .10 raytrace 10037 121.99 .45 .23 .16 .12 schroeder-s 9713 48.51 .64 .62 .19 .12 soot-c 42107 85.69 .58 .45 .29 .53 average .49 .45 .25 .25 std.dev. .23 .23 .23 .23
Fig.5.Benchmarksandtheircharacteristics.
Baf Grimp
Linux NT Linux NT
Sun Sun Bor. Sun Sun Sun Sun Bor. Sun Sun
Int. JIT JIT JIT Hot. Int. JIT JIT JIT Hot.
compress 1.01 1.00 .99 .99 1.00 1.07 1.02 1.04 1.00 1.01 db .99 1.01 1.00 1.00 1.00 1.01 1.05 1.01 1.01 1.02 jack 1.00 1.00 1.00 - 1.00 1.01 .99 1.00 - 1.00 javac 1.00 .98 1.00 1.00 .97 .99 1.03 1.00 1.00 .95 jess 1.02 1.01 1.04 .99 1.01 1.01 1.02 1.04 .97 1.00 jpat-p 1.00 .99 1.00 1.00 1.00 .99 1.01 1.01 1.00 1.00 mpegaudio 1.05 1.00 - - 1.00 1.03 1.00 - - 1.01 raytrace 1.00 1.02 1.00 .99 1.00 1.01 1.00 .99 .99 1.00 schroeder-s .97 1.01 - 1.03 1.01 .98 .99 - 1.03 1.00 soot-c .99 1.00 1.02 .99 1.03 1.00 1.01 1.00 1.01 1.01 average 1.00 1.00 1.01 1.00 1.00 1.01 1.01 1.01 1.00 1.00 std.dev. .02 .01 .01 .01 .01 .02 .02 .02 .02 .02
Fig.6.TheeectofprocessingclassleswithSootusingBaforGrimp,without
opti-mization.
compressed. They also are interested in speed optimizations, but at this
timetheircurrentpublished speedupresultsareextremelylimited.
Bytecode manipulation tools:There are anumberof Javatoolswhich provide
frameworksformanipulatingbytecode:JTrek[13],Joie[4],Bit[14]and
Java-Class[12].ThesetoolsareconstrainedtomanipulatingJavabytecodeintheir
originalform,however.Theydonotprovideconvenientintermediate
repre-sentationssuchasBaf,JimpleorGrimpforperforminganalysesor
transfor-mations.
ap-Sun Sun Bor. Sun Sun
Int. JIT JIT JIT Hot.
compress 1.01 .78 1.00 1.01 .99 db .99 1.01 1.00 1.00 1.00 jack 1.00 .98 .99 - .97 javac .97 .96 .97 1.11 .93 jess .93 .93 1.01 .99 1.00 jpat-p .99 .99 1.00 1.00 1.00 mpegaudio 1.04 .96 - - .97 raytrace .76 .62 .74 .89 1.01 schroeder-s .97 1.00 .97 1.02 1.06 soot-c .94 .94 .96 1.03 1.05 average .96 .92 .96 1.01 1.00 std.dev. .07 .12 .08 .06 .04
Fig.7.Theeectofinliningwithclasshierarchyanalysis.
packagingconsists ofcode compressionand/or code obfuscation.Although
we have not yet applied Soot to this application area, we have plans to
implementthisfunctionalityaswell.
Java native compilers: The tools in thiscategorytakeJavaapplications and
compilethem tonativeexecutables.These arerelated becausethey allare
forcedtobuild 3-address codeintermediaterepresentations,and some
per-formsignicantoptimizations.Thesimplestoftheseis Toba[18]which
pro-ducesunoptimized Ccode and relies on GCC to produce the nativecode.
Slightlymoresophisticated, Harissa[17]also producesCcode butperforms
somemethoddevirtualizationandinliningrst.Themostsophisticated
sys-tems are Vortex[7]and Marmot[10]. Vortex is anative compilerfor Cecil,
C++ and Java, and contains a complete set of optimizations. Marmot is
also a complete Java optimization compiler and is SSA based. There are
alsonumerouscommercialJavanativecompilers,suchastheIBM(R)High
Performance Compiler for Java, Tower Technology's TowerJ[24], and
Su-perCede[22],buttheyhaveverylittlepublishedinformation.
Stackcode optimization:Someresearchhasbeenpreviouslydoneintheeldof
optimizingstackcode.Theworkpresentedin[15]isrelatedbutoptimizesthe
stackcodebasedontheassumptionthatstackoperationsarecheaperthan
manipulatinglocals.ThisisclearlynotthecasefortheJavaVirtualMachines
weareinterestedin.Ontheotherhand,somecloselyrelatedworkhasbeen
doneonoptimizingnaiveJavabytecodecodeatUniversityofMaryland[19].
Theirtechniqueissimilar,buttheypresentresultsforonlytoybenchmarks.
6 Conclusions and Future Work
WehavepresentedSoot,aframeworkforoptimizingJavabytecode.Sootconsists
of three intermediate representations (Baf, Jimple & Grimp), transformations
representa-ing onthe mechanismsfortranslating Javastack-basedbytecode to ourtyped
three-addressrepresentationJimple,andthetranslationofJimplebackto
byte-code.Jimplewasdesignedtomaketheimplementationofcompileranalysesand
transformationssimple.Ourexperimentalresultsshowthatwecanperformthe
conversions without losingperformance, and so wecan eectively optimize at
theJimple level.Wedemonstrated theeectiveness of aset ofintraprocedural
transformations andinlining on vedierent Java VirtualMachine
implemen-tations.
Weare encouraged by ourresultsso far, andwehavefound that theSoot
APIshavebeeneectiveforavarietyoftasksincluding theoptimizations
pre-sentedin thispaper.
We,andotherresearchgroups,areactivelyengagedinfurtherworkonSoot
on manyfronts.Our groupis currently focusing onnew techniques forvirtual
method resolution, pointer analyses, side-eect analyses,and various
transfor-mationsthatcantakeadvantageofaccurateside-eect analysis.
Acknowledgements
This research was supported by IBM's Centre for Advanced Studies (CAS),
NSERCandFCAR.
References
1. Ali-Reza Adl-Tabatabai, Michal Cierniak, Guei-Yuan Lueh, Vishesh M. Parikh,
andJamesM.Stichnoth. Fastandeectivecodegenerationinajust-in-timeJava
compiler. ACMSIGPLANNotices,33(5):280290,May1998.
2. Alfred V. Aho,Ravi Sethi,and Jerey D. Ullman. Compilers Principles,
Tech-niquesandTools. Addison-Wesley,1986.
3. Lars R. Clausen. A Java bytecode optimizerusingside-eectanalysis.
Concur-rency: Practice&Experience,9(11):10311045, November1997.
4. Geo A. Cohen,Jerey S.Chase, and DavidL. Kaminsky. Automatic program
transformationwithJOIE. InProceedingsoftheUSENIX1998 Annual Technical
Conference,pages167178,Berkeley,USA,June15191998.USENIXAssociation.
5. RonCytron,JeanneFerrante,BarryK.Rosen,MarkK.Wegman,andF.Kenneth
Zadeck. Anecientmethodofcomputingstaticsingleassignmentform. In16th
Annual ACMSymposiumonPrinciples ofProgramming Languages, pages2535,
1989.
6. DashOPro. http://www.preemptive.com/products.html.
7. JereyDean,GregDeFouw,DavidGrove,VassilyLitvinov,andCraigChambers.
VORTEX: An optimizingcompiler for object-orientedlanguages. InProceedings
OOPSLA '96 Conference on Object-Oriented Programming Systems, Languages,
and Applications, volume 31 of ACM SIGPLAN Notices, pages 83100. ACM,
October1996.
8. JereyDean,DavidGrove,andCraigChambers.Optimizationofobject-oriented
programs using static class hierarchy analysis. In Walter G. Oltho, editor,
ECOOP'95Object-OrientedProgramming,9thEuropeanConference,volume952
Au-forJavaBytecode. SableTechnicalReport1999-1,SableResearchGroup,McGill
University,March1999.
10. RobertFitzgerald,Todd B.Knoblock, ErikRuf,Bjarne Steensgaard,and David
Tarditi. Marmot: anOptimizingCompiler for Java. Microsoft technical report,
MicrosoftResearch,October1998.
11. Jasmin:AJavaAssemblerInterface. http://www.cat.nyu.edu/meyer/jasmin/.
12. JavaClass. http://www.inf.fu-berlin.de/dahm/JavaClass/.
13. CompaqJTrek. http://www.digital.com/java/download/jtrek .
14. HanBokLeeandBenjamin G.Zorn. AToolforInstrumentingJava
Byte-codes. InThe USENIX Symposiumon InternetTechnologies and Systems,
pages7382,1997.
15. Martin Maierhofer and M. Anton Ertl. Local stack allocation. In Kai
Koskimies,editor,Compiler Construction (CC'98), pages189203,Lisbon,
1998.SpringerLNCS1383.
16. StevenS.Muchnick. AdvancedCompiler Design andImplementation.
Mor-ganKaufmann,1997.
17. GillesMuller,BárbaraMoura,FabriceBellard,andCharlesConsel.Harissa:
AexibleandecientJavaenvironmentmixingbytecodeandcompiledcode.
In Proceedings of the 3rd Conference on Object-OrientedTechnologies and
Systems,pages120,Berkeley,June16201997.UsenixAssociation.
18. ToddA. Proebsting,Gregg Townsend, PatrickBridges, JohnH. Hartman,
TimNewsham,andScottA.Watterson.Toba:Javaforapplications:Away
ahead of time (WAT) compiler. In Proceedings of the 3rd Conference on
Object-OrientedTechnologies andSystems,pages4154,Berkeley,June16
201997.UsenixAssociation.
19. TatianaShpeismanandMustafaTikir. GeneratingEcientStackCodefor
Java. Technicalreport,UniversityofMaryland,1999.
20. Soot-aJavaOptimizationFramework.http://www.sable.mcgill.ca/soot/.
21. 4thpassSourceGuard. http://www.4thpass.com/sourceguard/.
22. SuperCede,Inc. SuperCedeforJava. http://www.supercede.com/.
23. Frank Tip, Chris Lara, Peter F. Sweeney, and David Streeter. Practical
ExperiencewithanApplicationExtractor forJava. IBMResearchReport
RC21451,IBMResearch,1999.
24. TowerTechnology.TowerJ. http://www.twr.com/.
25. RajaVallée-RaiandLaurieJ.Hendren. Jimple:SimplifyingJavaBytecode
for Analyses and Transformations. Sable Technical Report 1998-4, Sable