l"i!
!=;
·�
�
�
Fi
F"'
p-
<:!:l
t"'
v
Digital Technical Journal
lillD 0
- -
��.!!. P1 � 77
- 1/_'\._
.._;s�r64_Q R.20 v S1r64 L Rl.2v
I
FX!32 EMULATION AND TRANSLATION VISUAL FORTRAN
MEMORY CHANNEL 2 INTERCONNECT OBJECTBROKER SECURITY
STRONGARM MICROPROCESSOR
Volume 9 Number I 1997
Editorial
Jane C. Blake, Managing Editor Kathleen M. Stetson, Editor Helen L. Patterson, Editor Circulation
Catherine M. Phillips, Administrator Production
Christa W. Jessica, Production Editor Elizabeth McGrail, Typographer Peter R. Woodbury, Illustrator Advisory Board
Samuel H. Fuller, Chairman Donald Z. Harbert Richard J. Hollingsworth William A. Laing Richard F. Lary Alan G. Nemeth Robert M. Supnik
Cover Design
The display of program code in the fore
ground and the background of our cover represents one of the unique aspects of the DIGITAL FX'32 software, the opening topic in this issue. By emulating an appli
cation in the foreground and later translat
ing the execution profile into native Alpha code in the background, FX!32 enables 32-bit applications that run on lntel-based machines to also run on Alpha-based machines. The combination of emulation and binary translation provides Alpha users with additional applications and good per
formance with transparent operation.
The cover design is by Lucinda O'Neill of the DIGITAL Industrial and Graphic Design Group.
The Digilal Tecbn.ical.foumalis a refereed journal published quarterly by Digital Equipment Corporation, 50 Nagog Park, AK02-3/B3, Acton, MA 01720-9843.
Hard-copy subscriptions can be ordered by sending a check in U.S. funds (made payable to Digital Equipment Corporation) to the published-by address. General subscription rates arc $40.00 (non-U.S. $60) for tour issues and $75.00 (non-U.S. $!15) for eight issues.
University and college professors and Ph.D.
srudents in the electrical engineering and com
puter science fields receive complimentary sub
scriptions upon request. DIGITAL customers may qualifY tor gifT subscriptions and arc encour
aged to contact their account representatives.
Electronic subscriptions arc available at no charge by accessing URL
http://www.digital.com/infojsubscription.
This service will send an electronic mail notification when a new issue is available on the Internet.
Single copies and back issues c,m be ordered by sending the requested issue's volume and number and a check for $16.00 (non-U.S.
$18) each to the published-by address. Recent issues arc also available on the lmernct at http://www.digital.com/i nfo/dtj.
DIGITAL employees may order subscrip
tions through Readers Choice at U RL http://wcb7-c.das.dcc.com or by entering VfX PROFILE at the Open VMS system prompt.
Inquiries, address changes, and compli
mentary subscription orders can be sent to the Digital 'f'echnicaljourna!at the published-by address or the electronic mail address, drj@digital.com. Inquiries can also be made by calling the.fourua!
office at 508-264-7549.
Comments on the content of any paper and requests to contact authors Jrc welcomed and may be sent to the managing editor at the published-by or electronic mail address.
Copyright© 1997 Digital Equipment Corporation. Copying without fcc is per
mitted provided that such copies arc made for usc in educational institutions by fcKulry members and arc not distributed for com
mercial advantage. Abstre1cting with credir of Digital Equipment Corporation's author
ship is permitted.
The information in theJourualis subject to change without notice and should not be construed as a commitment by Digital Equipment Corporation or by the compan
ies herein represented. Digital Equipment Corporation assumes no responsibility tor anv errors that may appear in the .fou rua!.
ISSN 0898-901 X
Documentation Number EC-N7963-18 Book production was done by Quantic Communications, Inc.
The following arc trademarks of Digital Equipment Corporation: AlphaServcr, DEC, DIGITAL, the DIGITAL logo, DIGITAL UNIX, Open VMS, and TruCiuster.
ARM and StrongAJ{J\11 arc registered trademarks of Advanced RISC Machines Ltd.
BEA ObjectBroker is a registered trademark of B EA Systems, Inc.
Bull is a registered trademark of Bull Worldwide Information Systems.
Cray is a registered trademark of Cray Research, Inc.
Encore is a registered trademark and MEMORY CHANNEL is a trademark of Encore Computer Corporation.
Gradient is a registered trademark of Gradient Technologies, Inc.
HAL is a registered trademark of HAL Com purer Systems, lnc.
Hitachi is a registered trademark of Hitachi, Ltd.
H P is a registered trademark of Hewlett-Packard Company.
IBM and SP2 arc registered trademarks and Power PC and Power PC 603 arc trademarks of I ntcrnational Business Machines Corporation.
Intel and Pentium arc registered trademarks of Intel Corporation.
Kcrbcros is a trademark of Massachusetts Institute of Technology.
Lucellt Technologies is a trademark of Lucent Technologies.
Microsoft, Visual Basic, Visual C++, Win32, Windows, and Windows NT arc registered trademarks and Active X and Visual J++ are trademarks of Microsoft Corporation.
MOTIVE is a registered trademark of Quad Design Technologies, Inc.
NCR is a registered trademark of NCR Corporation.
NEC is a registered trademark of NEC Corporation.
Object Managemem and OMG arc registered trademarks and COREA is a trademark ofrhe Object Management Group.
Olivetti is a registered trademark of In g. C.
Olivetti.
Oracle Parallel Server is a trademark of Oracle Corporation.
PAL is a regisrercd trademark of Advanced Micro Devices, 1 nc.
Photoshop is a trademark of Adobe Systems, Incorporated.
POSIX is a registered trademark of the lnstirute of Electrical and Electronics Engineers.
SCO is a registered trademark of The Santa Cruz Operation, Inc.
Siemens Pyramid is a registered trademark of Siemens P):ramid Information Sysrcms, Inc.
Stratus is a registered trademark of Stratus Computer, Inc.
Synopsys is a registered trademark of Synopsys, Inc.
Tandem is a registered trademark of Tandem Computers Incorporated.
The Open Group is a trademark of rhc Open Software Foundation, Inc. and X/Opcn Company Ltd.
TPC-C is a registered trademark of the Transaction Processing Performance Council.
Transarc is a registered trademark of Transarc Corporation.
UNIX is a registered trademark in the United States and other countries, licensed exclusively through X/Open Companv Lrd.
Veri log is a registered trademark of Cadence Design Systems, Inc.
Viewlogic is a registered trademark and VCS is a trademark of Viewlogic Systems, Inc.
Contents
DIGITAL FX!32: Combining Emulation and
ltwm
ondJ.
Hookw;ll' �nd JVI<lrk A. Her,kg Binary TranslationDevelopment of the Fortran Module Wizard Leo P. Trcggi,lri within DIGITAL Visual Fortran
Architecture and Implementation of MEMORY CHANNEL 2 Marco fillo and Richard R. Cillcrr
Integrating ObjectBroker and DCE Security John H. p,,rodi ;md fred W. BurgiKt·
A 160-MHz, 32-b, 0.5-W CMOS RISC Microprocessor
);uncs
Montan�ro, RichardT. Wirck,Krishna Anne, Andrc11 J. Bbck, r.liz,lbcth M. Cooper, Daniel W. Dobbeqmhl, !'�til M. Don.1hue, Jim Fno, Crcgory \V. Hoeppner, Da1·id Kruckcnw<:r, Thonus H. L<:c, Peter C. M. Lin, Li;H111Vladden,
Daniel Murray, Mark H. Pearce, Srib;llan S:mth;lll;1111, Kathryn J. Snvder, R�1· Stephanv, and
Stephen C. Thicrauf
13
27
42
49
Digital Technical )ound Vol. 9 No. l 1997
2
Editor's
Introduction
No matter holl' powcrflli the under
lving lurdw�1re, most important to users is how that power translates to greater application perr(xmance �111d availability. Among the di1·erse topics in this issue of'thcjourna/ an: inno
vative ways engineers have devised to meet application pert(mnance and
�wailability requiremcms, and new
rools
f(Jr applications developers.DIGITAL FX132 is a unique soft
ware product tiMt makes available huudreds ofa
p
plic1tio
ns written t()r Intel machi
nes to users ofAlplu 111<Khines. Described bv Ray Hookll'<ll' and tVLlrk Hcrdeg, FX132 combines soft11·:�re crnuhtion and �1dvanccd bin<lrl' tmnslation techniques to emble 32-bit applications that run on !melbased nuchines ll'ith Windows
NT
ro also run on 64-bit IUSC Alpha
based machines with VVindows NT The design provides both the pertor
m:�nce benefits and the r.mnsp:�rency ofoperJtion that
the project
engineering team s
o
ught t(Jr users./\lso designed for the Windoii'S environment is DIGITAL Visu�11 Fortran,�� tool f(x
F
onran dei'\:Jopcrs that combines technologies ti·om DIGITAl, and Microsotr Corporation. Leo Treggiari reviell's the tool's components, which include the Component Object Model (COM), Fortran 90, and Microsoft Developer Studio. 1:-le addresses the question of whv developers need help accessing dvnamic link libraries and servers based on COM, and then f(xuses on the newlv cre<lted tool that providt:s this f11nctionalitv, the Fortran Module Wizard.
Dig;ir�l Tcchnic1l Journal
DIGITAL's shared-memorv cluster interW1111ect, MEt'vlORY CHANNEL 2, delivers the high levels oh:ompu
utionaJ
pertonnance nccess�1rv ro su1)porr the l<�rgest rechnic1l and con1men:ial :�pplications. Marco Fillo and Rick Cillcrt :�ssess experiences ll'ith the first implemcnt�Hion of MEMORY CHAJ\:1\:EI, tlw led to such enhanccmcms as the cross-bar design in this latest implememation.They conclude ll'ith ped(>n11�11Ke cbra that denmnstLite u
n
p�1ralleled perf(>nlHncc in terms ofl,ltency and b�111d11·idth comp�H'Cll ll'ith traditional inrm:onnccts. MEMORYCHA.'\JNEI,
2 provides latency of less than 2.2 microseconds and banclll'idrh of 1 ,000 megabytes per second in an 8-node duster.
n�,t�l sccurirv has long been impor
tant to svstem managers but nor easily achieved in distributed heter
o
geneous svsten1s. - l) I G ITA L �md B EA Svstems 'luve imegrated Object Broker middle
ll'are ll'ith the Distributed Computing Environment's Genuic Sccuritv Service Appl ic1tion Programming lnterbce (CSS-APl), �1s described here bv John Parodi and Fred Burgher. The authors cx�1111ine rhe choice ofGSS-API f()r OhjecrBrokcr �md future directions in authentication sofiw�1re.
Design decisions made in rhe de,·el
opment of DIGITAL's StrongAR.i\1 microprocessor ll'ere driven by the son1et.imcs
o
pposing requirements of high pcrf(mnancc and l011' poll'er consumption. T1rgeted f(Jr usc in h<llld held appliances usldlv powered by wnvention�11 batteries, Strongf\RJ\11 otters signifiuntlv higher perf(mnanceVol.') :-:o. I llJlJ7
rhan compar�1ble
microprocessors:
It operates ar 160 Ml:-lz, dissipating less than 450 millill'atrs. James Montanaro,Rich Witek et al. step through the decisions designers made to imple
ment the ARM V4 instruction set ri·om Advanced !USC i\lbchines Ltd.
Upcoming in the next i
s
sue of the Journal :11-c technical p3pers about nell' Alta Vist:� software and a new VVindOII'S NT personal workstJtion basecl on an Alpha 64-bit
!USC processor. To view the results ob recent survcv sent
rojournal
Web subscribers, see http:/
jwww.
digital .c
o
mj
into/dtj.Jane C. Blake
Ma J/(/ging hail
orDIGITAL FX!32:
Combining Emulation and Binary Translation
The DIGITAL FX!32 software product uniquely combines emulation and binary translation to enable any 32-bit application that executes on an Intel x86 microprocessor running the Windows NT 4.0 operating system to be installed and to execute on an Alpha m icroprocessor run
ning Windows NT 4.0. Benchmark tests indicate that after translation, x86 applications run as fast on a 500-MHz A l pha system with DIGITAL FX!32 software installed as on a 200-MHz Pentium Pro system. The emulator and its associated run
time software provide transparent execution of applications written for x86-based platforms.
The emulator produces profile data that is used by the translator and takes advantage of trans
lation results as they become available. The translator provides native Alpha code for the portions of an x86 application that have previ
ously been executed. A server manages the translation process for the user, making the process completely transparent.
I
Raymond J. Hook:way Mat·k
A.Herdeg
Three
factors contribute to the success of a microprocessor: price, performance, and software
availability.
Th
e
DIGITAL FX132 product add resses the third factor,
softwareavailability,
by making hundreds of new applications available on Alpha-based platforms running the Windows NT operating system. DIGITAL FX132 software combines emulation and binary trans
lation to pro
v
ide
fast, transparent execution of Intelx86 applications
on Alpha systems.Since its introd uction in 1992, the Alpha micro
processor has been the f1stcst m icroprocessor available. A l arge number of native applications are available on Alpha systems, particularly those applica
tions that require
a
high-pertormance processor.With
the introduction of DIGITAL t:X!32 software, 32-bit programs that can
be
installed and executed onx86
systems running the Windows NT 4.0
operating
system
can also be installed
and executed on Alpha systems running Window NT 4.0. Except
tor
having to speci�' that a program is anx86
application, i nstalling and ru
nni
ngan
app
lic
ati
on is the same on an Alpha system as on anx86
system . The performance of anx86
app
li
cati
on runn i ng on a h igh-end Alpha system is similar to the performanceof
the same application ru
nni
ng on a high-endx86
system.A
n
umber
of systemshave successfully
used emulators to run appl ications on platfC.>rms tor which tbe applications were not initially targeted.U The
major
drawback has been poor
pertormance.2
Several emu lators have used dynamic translation, translating small segments of a program as it is executed, to achieve better pertormancc than that
obtained
by an inteqxcter alone.'-' Dynamic translation involves a basic trade-off between the amount of time spent translating and the resulting benefit of thetranslation.
I fan
emulator spends too m uch time on the translationand
related processing, the executing programwill
be u nresponsive. This limits the optimizations that can be performed by the emulator using dynamic translation .
FX! 32 overcomes the performance problem
b
y not doing any translation while the application is executing. Rather, FX132 captures an execution profile that is later used by a binary translator5 to translate into native Alpha code those parts of the application
that
havebeen
executed . Since the translator runs in the back- Digital Technical Journal Vol. 9 No. I 1997 34
ground, it can use computationally intensive algo
rithms to improve the quality of the generated code.
To our knowledge, fX!32 is the first system to explore this combination of emulation and binary translation.
In this paper, we describe how FXI 32 works. We begin with an overview <H1d discuss each ofthe major compo
nents in more detail. \Ve then present some benchmark test
r
esults andb1idly
describe several limitations of the current version ofDlGITAL FX132 somvare.Overview
On Alpha systems, the Windows NT operating system uses an cmul:1tor to run 16-bit x86 applications. These applications
can
be installed and ru11 in the same way as they are installed and run on x86 systems, but the execution is slower. The emulator built into FX!32 pro
vides a similar capability for 32-bit x86 applications.
Unlike the emulation software in the 16-bit envi
ronment, t-:X132 provides a binary translator that translates 32-bit x86 applications into native AJph:1 code. The translation is done in the b:1ckground and requires no user interaction. Using background trans
lation allows the translator to
pert(mn
optimizations that, in terms of computational resources, would be too expensive to accomplish while an application is running. An application translated by means oft-:X132 runs up to 10 times faster than the same application running under the emulator.DJ(;JTAL FX!32 software consists of the t()llowing seven major components:
1. The transparency agent, which provides t(lr trans
parent launching of 32-bit x86 applications.
2. The runtime, 1vhich loads x86 images and sets up the run-time environment to execute them. As part
of loading an image, the runtime component jack
ets imported application programming interrace
(API)
routines. Jackets are small code fragments that allow the x86 code to call Alpha Windows NT API routines.3. The emulator, which runs an x86 application mak
ing usc of translated code when it is available.
4. The translator, which produces a translated image using profile inb·mation received ti.·om
the
emulator.5.
The lbtabase, which stores execution profiles produced by the emulator and used by the translator.
Translated images arc ::�lso stored in the database, along with configuration infi:m11ation.
6. The server, which maintains the database and runs the transiJtor JS appropriate.
7. The manager, which allows the user to control resources used by the DI G ITAL FX132 somvare.
figure
l
shows the relationships between these major components, each of which is discussed in more detail in the sections that t()IIOiv.The Transparency Agent
The transparency agent provides t(>r transparent launching
of
32-bit x86 applications. LaunciJing an :�pplication on the VVindows NToperating
system always results in a call to the Create Process API routine.By hooking calls to this routine, the transparency agent can examine every image as it is about to be executed.
If a call to Create Process specifies that an x86 image is to be executed, the transparency agent invokes the run
time component to execute the image.
FX!32 inserts the transparency agent into the address space of each process. A process that comains the trans-
TRANSPARENCY AGENT
RUNTIME AND EMULATOR
TRANSLATED IMAGES
EXECUTION PROFILES
DATABASE
<REGISTRY>
SERVER
BINARY TRANSLATOR
Figure 1
DIGITAL FX'32
Svstcm
ComponentsDigital Tcdmical journc1l Vol. 9 No. l 1Y97
pan:ncy agent is said to be enabled. Once a process is enabled, any attempt to execute an x86 image causes the rumime to be invoked to execute the process. The agent is propagated through the system because each attempt to create a process to run an Alpha image results in that created process being en�1bled.
By the rime a user is logged on, fX132 has enabled all the top-level processes, and any attempt to execute
a 32-bit x86 application invokes the runtime compo
nent. The initial processes that are enabled are the Windows shell ( explorcr.exc ), the service control man
ager (serviccs.exe), and the remote procedure call server ( rpcss.exe ). When fX! 32 is installed, the fx32strt.exe file is registered as the Windows shell.
When a user logs on, fx32srn.exe runs and enables the real Windows shell, explorer.exe. The FX132 server enables the service control manager when it starts, usually when the system is booted. Currently, any ser
vice process that is started by the service control man
ager bdc)re the server is started is not enabled. (The only exception is rpcss.cxe, which is explicitly enabled by the server). \Ne hope to alleviate this limitation in a future version of the DIGITAL FX!32 software.
Processes arc enabled using a technique described by Jdhcy Richter in Chapter 16 of his book
Aduanced
\Vinduu•s NT" to inject a copy of the transparency :�gent into the process' address space.
The Runtime
The transparency agent invokes the runtime whenever an attempt is m:�de to execute an x86 image. The runtime loads the image into memory, sets up the run
time environment required by the emulator, and then calls the enur!Jtor ro execute the image.
The runtime replaces the Windows NT loader, which can only load Alpha images; the Windows NT loader returns :�n error reporting an image of the wrong architecture if it is invoked to load an x86 image. The runtjmc duplicates the hmctionality of rhe Windows NT IO<lder, which includes rcloc:Jting images that arc not loaded at their preferred base address, set
ting up shared sections, and processing static thread local storage sections.
The runtime registers each image it processes with the Windows NT operating system by inserting point
ers to that inugc into v:�rious lists that are used inter
nally by the system. Maintaining these lists allows the n:�tive \Vindmvs NT code to correctly implement API routines, such as Load Resource and Gen\1odulcHandlc, which require access ro images that have been loaded.
The registration also ensures that the DllJ\ilain tunc
tions of the loaded dynamic link libraries
(
DLLs) are called as �1ppropriarc. (The entry points of x86 DLLs�1rc jacketed b)' the runtime.)
Fortunately, the inuge lists that fX132 must modit)' arc in the user's address space, and no modification of
the vVindows NT operating system was required to register images with the system. Unr()rtunatcly, the structure of these lists is not part of the dowmcnrcd Win32 interface, and using them creates a dependency on the Windows NT version that is being run. fX!32 has dependencies on a number of undocumented ka
turcs of the Windows NT operating system. Although the DIGITAL FX!32 product is more dependent on a particular version of t he opcr:�ting system than a typi
cal layered application is, it is rcmarbble rhar the implementation of FX132 did not require :�ny changes to the Windows NT opcrJting system.
The runtime also registers the image in the FX132 database. This database maintains inbrmarion about x86 images that have been loaded, including the appli
cation that loaded the image, profile data that was pro
duced by the interpreter, and any translation of the image. The runtime accesses the database with a
u nique image identifier (ID), which the runtime obtains by
h
ash
ing the image's header. Thercti:Jrc, the image I D is determined by the content of the image, nor by its location in the file system, and the inhmnarion that FX132 associates with the image can be accessed independently of rbc image's location on the disk. For example, if an application is installed in one directory and some of the images lo:�ded bv the appli
cation arc subsequently translated by FX!32, the trans
lated images will be located by FX132 even if the application is later installed in a difkrenr directory.
When the runtime finds a rransi<Hcd image in the dat:�base, it loads this image along with the corre
sponding x86 image. Translated images arc normal DLLs, loaded by the n:�tivc Load LibraryAPl
routine.Translated images contain additional sections that store information required by the runtime to map x86 routines to the corresponding Alpha code.
The runr.ime duplicates the Windows NT loader function of binding an image's imports, using sym
bolic inti:m11arion in the image to locate the address of the imported routine or data. The runtime treats imports that refer to entries in Alpha images speci:�Jiy, however, by redirecting the imports to rekr to the correct jacket enrrv in the fX! 32 D LL, jacket.dll.
The jacket routines in jackcr.dll enable an x86 user program to call the native Alpha implementation of the Win
3
2 API. These jacket routines JIT extremely important because they allow x86 applications to use high-pcrt(xmance code that has been tu
ned to the Alpha platti:mn. Some x86 applications run faster on the Alpha plartcm11 than on the x86 platform, even without being translated, because of the large amount of time the applications spend in native DLLs.Each jacket contains an illegal x86 instruction that serves as a signal to the interpreter that <l change is to be ma
d
e to the Alpha environment. The interpreter calls an Alpha jacket routine at a rixcd offset ti·orn the illegal x86 instn1crion. The basic operation of mostDigir·.1l kclmic.1l )ourn;ll Vol. 9 No.1 1997 5
6
jacket routines is
to move
arguments tt·om the x86 stackto
the
appropriate Alpha registers, as dictated by the Alph a calling standard . Some j acket routi nes provide special semantics t()l" the native routine
be
in g called, as required by F X ! 3 2 . For example, the jacket tor the GetSystcmDirectory routine returns the path to the FX! 32 directory rather than the path to the true system directory so that x86applications
do notover
write native Alpha DLLs.
For an x86 application to run under F X 1 3 2 , every image it loads must be either an x86 image or an Alpha i mage t(Jr which jackets exisr. Therct(xe, f X 1 3 2 pro
vides jack
e
ts tor all the DLLs that implement the Wi n32 imertiKe and tor many redistributableD LLs.
F X I 32 currently provides jackets tor more than 50 native Alpha DLLs, which has enabled the FX! 3 2 devel
opment t
e
amto run
almost all the commercial applica
tions tested. Each new release of DIG ITAL FX132 sofrware prov
i
des additional jackets, and the developers i ntend to jacket new i nterbces as they are released.The Emulator
The fu ndamental job of the emulator is
to
run x86 applications betore they arctranslated . The
tirst time an x86 image executes u nder FXI 32, the image is executed by the em
u
lator.The emulator also
serves as a backup f
or translated code. Because it is not possibleto
statically d etermine all the code that can ever be executed by an app
lication (especial ly for applications that generate code on-rhetly) , the emulator is always present to execute such untranslated x86 a
pp
lication
code . Previous binary translators built by DIGITAL also depended on the presence of an emulator in this role.' Emu lator perf
or
mance is more of an issue for F X ! 32 becHISC, u n li ke those earlier binary translators, all application code is interpreted when the x86 application is first run.
The emulator is an Alpha assembly la
n
guage program that interprets the subset ofx86 instructions that can be executed by aW
in
32 application. Whilean
x86 application is run ning, the x86 processor state is kept partially in Alpha registers and partially in a per-thread data structure called the CONTEXT The x86 integer regis
ters arc permanently mapped to Alpha registers, and Alpha registers store the
stare
of the x86 conditioncodes. Whi.Je the
emulator is running,a
dedicated Alpha register pointsto
the CONTEXT The CONTEXT stores the x86 per-thread processor context and any part of the x86 processor state that must be maintain ed acrosscal ls
to otherparts
of the system,f(n
example, calls to Alpha API routines.Pipelined Dispatch
The structure of
the
emulatoris a
classic fetch-andeva! uat<::
loop . The
emulator dispatches on the tirs
ttwo
byresof
e�\Ch instruc6on , performing the lookup DigirJI Tcchnic1l )ourn<1l Vol. 9 N o . 1 1997in a table of
64
K entries.Each
entry contains rl1eaddress of
the routine to execute to i n terpret an i nstruction and the lengthof
the i nstruction .The structure
of
the d ispatch loop has been caretll
l
ly crafted to make eftlcient use of
64-bitAlpha
registers and to efficiently schedule the execution of code in the loop. Software pipe l i ni ng is used to overlap the retch and dis
p
Jtch
table lookup forthe next
i nstruction with the execution of the c urrent i nstruction.
At the top of the
l
oo
p, at least eigh
t bytes, starting at the add ressof
the current i nstruction, are inAlpha
registers. Length information from the dispatch table determines the tirst two
bytes of
the next i nstruction, allowing the dispatch table lookup to be overlappedwith
the execution of the current instruction . A tCtch of additional bytes from the i nstruction streJm is also initiated . Fin
al
ly, the loop d ispatches to the routine whoseaddress
was obtained ti·om thetable
on the previous iteration
of
theloop.
The individual rou tines have been factored by using su broutines and coroutines
to
pertorm op
era
tions like ope
rand fetching, making them as smal
l as possible.As
a result, the emulator cod
e
re
q uire
dto
execute themost frequently
executed x86 i nstructi ons fits in the tirst-level cache.Condition Code Evaluation
Condition codes are generated by the execution of many o f the x86 i nstructions. \tVe have observed that cond i tion codes arc fi-cq uently set and relatively i n frequently exami ned . The e m u lator takes advan
t<lge
of
this by evaluating the cond i ti o n codes only when they arc used, that is, by using a "lazy eva l u ation"
t
echnique. The executionof
atypical
i nstruction
saves only enough state to allow the eva l uation of conditioncodes, if
req u
ired, a t a l ater time . This takes much less effort than initial ly eval
uating the condition codes. The additional advantage in deferring the eva luation is that only the cond i tion codes that are used need
ro
be generate d .For
example, the overflow condition code may never be computed if
only the zero tlag is used .
Floating-point Instruction Emulation
The 80-bit x86 tloating-point registers arc modeled by a stack
of 64-bit
memor y l ocations that contain floati ng-point values. The decision to use 64-bit intermediate values, rather than to fai thfully replicate the 80-bit model, was based on the need
to
achieve good performance whene
xecuting x86tl
oar
ing-poin
tcod e on the
Alpha processor.T
his d ecision was supported by the bet that the vVindows NT operating system also uses a 64-bit floating-point model. Alrl1ough this is ::tn approximation, our experience to dare has shown that this wasa good
compromise. Very few applications relyon
the full precision provided by the x86 floatingpoin t u ni t's ( 1-'PU's) 80-bit registers.
The em ulator also implements a somewhat simpl i fied model of the x 8 6 FPU's register fi le . Most instruc
tions use the x86 FPU register file as a traditional operand stack; however, several i nstructions can create a register file state that is not strictly a stack by freeing registers in the middle of the stack, by moving the stack pointer without pushing or popping, or by ini
tializi ng the register tile i n a way that breaks the stack model . Modeling the ful l complexity of the x86 FPU register file would be extremely expensive, and experi
ence l1as shown that almost all programs use the regis
ter tile strictly as a stack. The current version of the emulator takes advantage of t his. We are investigating ways to model the floatjng-point registers in a way that maintains good performance but docs not depend on their being treated as a stack.
Generation of Profiles
While it is interpreting an x86 program, the emulator generates profile data tor use by the translator. The profile data includes the following information:
• Addresses that are the targets of cal l instructions
•
(Source
address,target address)
pairs tor indirect control transfers• Addresses of instructions that make u naligned ref
erences to memory
The translator uses this i n formation to generate routines, that is, units of translation that approximate a source code routine. The emu lator generates profile data by inserting values in a hash table whenever a rel
evant instruction is interpreted . For example, as part of inte rpreting the call instruction, the emulator makes an entry i n a hash table that records the target of the call . When an image is u nloaded (either as a result of a call on the FreeLibrary routine or when the applica
tion exits ) , the ru ntime processes the hash table to produce a p rofile file for tbat image. This profile is processed by the server and can result in the server invokjng the transl ator to create a new translation of the image .
To detect available translated code, the emu lator uses the same hash table that i t employs to gather the profi l e data. The x 8 6 add resses for which there are translated routines and the address of the correspond
ing translated code are entered i nto the hash table by the runtime when i t loads an x86 image that has been translated . When a call instruction is interpreted , the emulator l ooks up the target address. If a correspond
ing translated add ress ex ists, the emulator transfe rs control to that address.
The Translator
The server invokes the translator to translate x86 images tor which a profile exists in the database. The translator uses t he profile to prod uce a translated
i mage. On su bseq uent executions of the image, the translated code is used, substantially speed i ng up the application.
Structure and Order of Operations
The translator has eight major components (or phases ) : the regionizer, build , the register mangler, the comli
tion code m angler, improve, the code selector, the scheduler, and the assembler. (An additional phase that performs various peephole optimizations is dis
abled in the DIG ITAL
FX 1 32 V l . O
translator. ) The major components function as fol lows:l . The Regionizer-The regionizer uses data i n the profile to divide the source image code i nto rou
tines, which are described i n the section Generation of Profiles. Each cal l target in the profile is used to generate an entry to a routine. The regionizer rep
resents routi nes as a collection of regions. Each region is a range of contiguous add resses, which contains i nstructions that can be reached from the entry add ress of the routi ne. U n li ke basic blocks, regions can have multiple entry poi nts. The small
est col lection of regions that contain a l l the i nstruc
tions that can be reached from the routine entry is used to represent the routine. Many routines have a single region. This represen tation was chosen to efficiently describe tl1e division of the source image into units of translation.
The regionizer builds routines by followi ng the control flow of the source image. When an indirect jump instruction is encountered while f()llowing the control flow, the possible targets of the i nstruc
tion are obtained ti·om the profil e . Without this profile information, it would be very difficu lt to reliably identifY these targets, and indirect jumps would have to be treated as returns from the rou
tine. The profile i n formation makes it possible to reliably generate a more complete representation of routines with correct control flow.
After the regionizer runs, each of the other major components is run in sequence for each routine.
2. Build-Build reparscs the x86 i nstructions in the routi n e to create an i nternal representation ( l R) of the routine tor use by the subsequent components.
The I R is a graph of basic blocks and is similar to the IR used by many optimizing compilers.
3 .
The Register Mangl er-The initial I R is a straightforward representation of the source x86 cod e . This representation ignores the overlap of the x86 registers; the I R treats each occurrence of EAX,
AX , AH ,
and AL as a separate register. The register mangler adds i nsert and extract operations as necessary to represent the actual semantics of the x86 registers.
Digital Technical joumal Vol . 9 No . .I 1 997 7
8
4.
The Condition Code !Ylangler-The eftect of x86 instructions on condition codes is represented
implicitly
in
theinitial
JR. The condition code man
gler adds instructions
roexplicitly generate condi
tion codes. Since the cond ition code mangler understands the control flow of the emire routine, it knows when condition codes are live and only adds code to generate condition codes when they are used later in the routine .
5 .
Improve-Improve performs several transforma
tions that produce code more s uited to the Alpha architecture. In the initial
IR,each push and pop instr uction is explicitly represented as a decrement/
increment of the x86 stack pointer, accompanied by a store/load. Improve collects all the manipulation of the x86 stack pointer into a single decrement at the beginning of a basic block and a single incre
ment at the end of tha
t block.Improve also uses simple value numbering and analysis of memory references to try to eliminate loads and stores to both the x86 stack and the t-loating-point stack and to perform constant f()lding. Although Improve performs only relatively simple optimizations on a single basic block, we have fo und it
tobe quite effective in improving the q uality of the code that is generated .
6. The Code Selector-The code selector transf(mm the
I Rfrom a representation that contains mostl y x86 instructions to one that contains onl y Alpha instructions . This transformation is done instruction by instr uction, with each x86 instruction being replaced by a sequence of Alpha instructions that produce the same effect. The implementation of the code selector is based on
the TWIGrode generator.' Although the code selector is capable of dealing with much more complicated patterns of instruc
tions, this capability is nor currently used .
7.
The Scheduler-After the code selector is r u n , all the instructions in the
IRare Alpha instructions.
The sched uler reorders the instructions within a basic block to minimize the cycle co unt for the tar
get processor.
8 . The Assembler-The assembler builds the o utp ut translated image.
Use of Profile Data
The regionizer is the only component of the current translator that uses the control flow
int(>rmationin the protik . The
regionizeruses the profile to d etermine which parts of the source image arc translated. Futu re versions of the rransbter will usc the profi l e to perform path -directed optimizations and ro place code so as to red uce cache misses. Those changes wil l improve the pert(mnance of translated code .
Digircll Technic.1l journal Vol . 9 No. I 1997
Rctranslation of an image is triggered by growth in the size of the profile. Because profile data is generated onl y when the emulator executes previously untrans
l ated parts of the source image, an increase in the size of the pro file indicates that new parts of the program have been executed. Retranslating with the new pro
ti le will cause these additional parts of the image to be translated.
Alignment Issues
On an Alpha system, references to memory locations that are not naturally aligned res ult in exceptions that arc hand led
bythe Windows NT kernel.
Alignmentexceptions can be avoided
byusing unaligned code sequences that usc
the LDQ_Uand
STQ _Uinstruc
tions . Unaligned code sequences are slower than aligned seq uences for accessing locations that are nat
urally aligned b ut m uch t:1ster for accessing locations tlut are not natural ly aligned . Native Alpha compilers ahvays try to generate unaligned code sequences when referencing unaligned data ro avoid the expense of dealing with alignment exceptions .
When generati ng the code for an instruction that references memory, the code selector must determine whether to usc an aligned sequence or an unaligned sequence. To make the determination, the code selec
tor needs to know the aJ ignment of the address being referenced . In general, this cannot be d etermined by static analysis of the x86 code . To solve the problem, the code selector uses information in the profile abour the alignment of memory addresses . The profi le con
rains
the
addressof every instruction that made an unaligned reference to memory. The code selector generates unaligned sequences for those instructions and aligned sequences for al l other memory references.
Although this code generation process is eftcctive most of the time,
someprograms exhibit different memory rckrcncc behavior on s uccessive runs . For those pro
grams, alignment exceptions can sti l l occur.
Shadow Stack
Translating return instructions presented particular problems t(Jr the translator. The translation of a cal l instr uction saves the x86 return address on the x86 stack <ll1d then calls the translated code f(lr the routine.
After the translated cal l, the x86 return address is on the x86 stack
and thecorresponding native ret urn
;1(klress is in an Alpha register. This maintains the x86
stack
in the expected x86 state . One way to translate a retu rn instruction would be to use the x86 return add ress to look up a corresponding Alpha add1·ess;
however, it is desirable to avoid the expense of a hash
t:-�ble look up on every return . In the usual case , the
return address is not changed by the ro utine and the
translated code can pop the x86 stack and per form a
native return by using the native ret u rn address . Two
problems m ust be solved, though. First, some mecha
nism is needed to determine if rhe x86 return address has been modified . Second, a location is needed to save the native return address. Both problems are solved by using the shadow stack.
The shadow stack resides at the top of the native Alpha stack and is maintained by the translated code ( with help fi·om the emulator). A shadow stack fi·ame is created for each call of a translated routine. When one b·anslated routine calls another, the calling routine saves the x86 return address and the current x86 stack pointer in its shadow stack frame. The called routine then saves the native return address i n the ca.l ling routine's shadow stack frame. On return, the called routine expects to find the x86 return address and the current x86 stack pointer in the calling routine's shadow stack frame. In this case, the cal led routine is returning to the environ
ment that the calling routine expected and performs a native return. I f the val ue of either the return address or the stack pointer has changed from the value expected by the calling routine, the called routine returns to the emulator.
In a similar manner, the emulator uses the infimna
tion in the shadow stack to determine when it can return to translated code . A num ber of conditions can cause translated code to reenter the emulator. For example, the emulator is entered if the target of a translated indirect jump instruction is not known at translation time. Having the emulator return to trans
lated code on a n.:turn instruction minimizes the amount of time that is spent in the emulator; however, the emulator can only return to the translated code i fit knows that i t has a valid return address. The shadow stack provides a mechanism to perform that validation.
The Database
The database consists of rwo parts. As described tor the runtime, the first part of the database is a d irectory tree th:tt cont:tins profile files, translator log files, and translated i mages. The second part of the database is kept in the registry and consists of
information
about x86 applications and images that the D I G ITAL FX!32 software has run on the system, together with configuration information . The configuration i n formation i ncludes the maximum amount of disk space that can be used by F X ! 3 2 , the maximum number of i m ages that can be stored in the database, the default transla
tion options, the work list that the server uses to schedule translations, and the DatabaseDirectoryList.
The DatabascDirectoryList is a list of paths to add i tional databases that arc t o b e searched tor image pro
tiles and translation results when the image is first exec uted . Directories on this list can be used to access information about the image from other machines on a network, making available to a user translations per
f(xmcd on another, perhaps more powerfu l , machine.
The Server
The server is a \Vi ndows NT service that normally starts whenever the system is rebooted. The server automatically runs the translator when appropriate, thus m aking the translation process completely trans
parent to the user. The server also maintains the data
base to control D I GITAL FX ! 32 resource usage.
The Manager
Usually the operation of DIG ITAL fX!32 software is completely transparent to the user. Like any other pro
gram, though, FX1 32 consumes system resources and a user must be able to control that resource usage. One ofthe roles of the manager is to provide a user interface to the configuration information kept in the database.
Figure 2 shows the manager window. The upper pane contains information about t he various appl ica
tions that have been run on the system : the total amount of disk space being used for profiles and trans
lations of images loaded by the applicatjon, the num
ber of times the application has been run, the date when it was l ast run, and the optimizer ( translator) status. The lower pane contains i n formation about the i mages that have been loaded by the h ighlighted application in the upper pane: the total amount of disk space used to
store
the profile and translation of the image, the number of times the image has been loaded, the date on which it was last loaded, and the status of the last translation of the image.By interacting with the manager, the user can con
trol various aspects of f X ! 3 2 operation, such as the maximum amount of disk space to use, which informa
tion to retain in
the
database, and when the translator should run.Results
The D I G ITAL fX ! 32 development team had two pri mary goals for the sofuvare: ( l ) to achieve transparent execution of 32-bit x86 applications and ( 2 ) to yield approximately the same performance as a high-end x86 platform when running applications on a high
performance Alpha system . The DIG ITAL FX1 32 product meets both goals.
Transparency is provided by the transparency agent and a run-time environment that can load and execute a n x86 application without a translation step. Appli
cations can be launched and executed on an Alpha system that is running FX! 32 just as they can on an x86 system . We h ave performed extensive testing of more than 7 5 applications that run using f X ! 3 2 , including major commercial applications s u c h as Microsoft Office 9 5 , Visual Basic 4.0, Photoshop 4.0, and Corel D RAW 6.0.
Digital Technical Journ3l Vol. 9 No. I 1997 9
FX!32
M anage1l!lliJEJ
file �dit Y:iew .Qptions !::!elp A lication N ame
M icrosoft® S chedule+ for Windows 95[TM . . M icrosoft® Word for Windows® 95 7 0
l�llil!mi B•I
Paradox for Windows 7 00
�
awt3230.dll jit3230 dll jrt3230 dll M FC40. DLL msvcirt.dll msvcrt. dll msvcrt40. dll netscape. exe pr3230 dll uni3200. dll For H elp, press F1
Figure 2
The D I G ITAL F X ' 3 2 J'vLmagcr
D I G ITAL
F X 1 32
software also met its pcrtixmancc goa l . Figure 3 shows the relative performance on fl }7t· JI!Jap,azine's BYTEmark bench mark of a 2 0 0 - megahertz ( M H z ) Pentium Pro system and a 5 0 0 - JV! H z AJ pha system r u n n i n gF X ! 3 2 .
F o r this benchmark, the Alpha system provides about the same perf()rmance as the 2 00 - Jv! H z Penti u m Pro syste m . Figure3
also shows that the Alpha native8
6
Last R un
1 2/1 7/96 08: 42:29 AM 1 2/1 6/96 1 0:54 1 4 AM 1 2/1 6/96 03 1 7: 02 PM 1 2/1 6/96 05 35: 1 5 PM
Last R un
1 1 /1 8/96 09:31 22 AM 1 1 /1 8/96 09: 31 : 22 AM 1 2/1 6/96 03 1 7 02 PM 1 2/1 6/96 03 1 7: 01 PM 1 2/1 6/96 03 1 7 01 PM 1 2/1 6/96 03: 1 7: 02 PM 1 2/1 6/96 03 1 7 02 PM 1 2/1 6/96 03: 1 7 02 PM 1 2/1 6/96 03: 1 7 02 PM 1 2/1 6/96 03: 1 7:02 PM
S uccess S uccess S uccess S 1JCcess S uccess S uccess S uccess
40% [06: 1 6 remaining) S uccess
S uccess
version of the benchmark runs t
w
ice as bst as the Pent i u m Pro version.Of course, no singlt: benchmark characterizes the performance of a system . Even so, when running translated x86 applications, we have consistently mea
sured performance on a 5 0 0 - M Hz AJpha system to be in the range between that of a 200- M H z Penti u m sys
tem and that of a 2 0 0 - M H z Penti u m Pro svstem . For
- ,---
4
,.----
,--
,.---- ,---
2
0
200-MHZ P ENTIUM P RO 500-MHZ ALP HA 2 1 1 64A 500-MHZ ALPHA 2 1 1 64A
KEY
D
IN TEGERD
FLOATING P OINTFigure 3
RUNNING D I GI TAL FX'32 (NATIVE ONLY)
DIG ITAL fX 1 32 Pcrti>rmancc on
the
B YTE Benchmark)10 Digit,ll Tcclmiol Journal Vol . '! No. 1 1 997
some applie<nions, pertormance can exceed that of <1 Pentium Pro svsrem .
The initial version o f the DIGITAL FX132 software has some l imitations.
FX1 32
executes only application code; it does not execute d rivers. Consequently, native drivers arc required for any peripheral that is installed on an Al pha system . Also, as described in the Transparency Agent section,FX! 32
does not provide complete support tor x86 services. Further,FX1 3 2
does not support tile Windows N T Debug AP1 . Supporting that i nterface would req uire the capability to rematerial ize the x86 state ati:er every x86 i nstruction, thus severely l imiting optimizations that the translator cou ld pert(mn . Optimizing compi lers make a similar trade-off by restricting optimization when debugging information is required. Since
FX! 32
does nor support the Debug intertace, applications that require it do not run underFX 1 3 2 .
Those applications arc mosrly x86 development environments, and it probably makes more sense to run them on an x86 system . The limitations described are not serious, and most x86 applications that execute on an x86 processor that is running the 'vVindows NT operating system
�1lso execute on an Alpha system run ning Windows NT and DIGITA L
FX132
soti:ware.Summary
DI GITAL FX! 32
soti:ware provides bst, transparen t execution of 32- bir x86 applications on Alpha systems running the Windows NT operating system . This is accompl ished usi ng a unique combination of emulation and bi nary translatio n . The emulator runs an appl ication, int e rprets the code, and generates proti l e i n formation. For su bsequent executions, t h e translator uses the protile data to produce translated images that contain optimized native Alpha code. An application translated by means of D I G ITAL
FX!32
software runs up to 10 ti mes taster than the same application running under the emu lator alone. Moreover, the transla
tion takes place in the background and is therdore rr:111sparenr ro rhe user.
Acknowledgments
Building the DIGITAL
rX!32
prod uct required some extremely talented people to pertc>rm a lot ofdifticult work. The mem bers of tbe DIGITAL FX1 32 d evelopment team include Jim C<lm pbcll, Anton Chernoff, George Darcy, Tom Evans, Jim Givler, Charlie Greenm:1n, Pippa Jollie, Mark Herdeg, Ray Hookway, Maurice Marks, Sriniv:1san J'vlurari, Brian Nelson , Scott Robinson, Norm Rubin, Sherry Scskavich, Joyce Spencer, Tony Tye, and John Yates. Many of these individuals contribu ted the ideas descri bed
in
this paper.References
I . B . C1se , " Rehosting Binary Code tor Software Porta
bility," Microprocessor Report (Sebastopol, Calif. :
Micro Design Resources, Janua
r
y 1 989 ).2 . T.
HaHhill,
"Emulation: !U SC's Secret Weapon,"8}T/:.' Magazine (April 1 994 ) .
3. R. Red i..: hek, "Some Efti..:ient Ar..:hirecture Simul:nion Techniques," /!SEN/X (Winter 1 99 0 ) .
4. L. Deutsch and A. Schitlin<ln, "Efficicnr Implementa tion of the Smal lulk-80 System,'' Record uf the t'/euenth A nnual A CH Symposiu m on Principles uj' Pro,15rwnming Languaiw'· ( 1 98 3 ) .
5 . R. Sites, A . Che
rn
off, M. Kirk, lvl . Jvbrks, a n d S . Robinson, "Binary Translation," D(v,ital Tedmical journal.
vol. 4 , no. 4 ( J\!Llynard , Mass . : Digital Equipment Corporation, 1 992 ).
6. J. Richter, Aduanced \'Kiirzdows NT chap. 1 6 ( Red mond, Wash . : M icrosoft Press, 1 994 ) .
7 . A. Aho, M . Ganapathi, a n d S . Tjiang, "Code Generation Using Tree Matching and Dvnamic Progra m m i ng,"
A CJ\11 Transactions on Procr.:ramminct.: Languages and
.5)•stems,
vol. I I , no. 4 (October 1 989 ) . BiographiesRaymond
J.
HookwayRay Hookway led the DTCTTAI. FX 1 32 development ream and \vas a key conrri buror to the bina
r
y translation component of tbc DlGJTA I . F X 1 3 2 software prod uct. He has been a mem bn of the AMT group ofDJClTAL Semiconductor since 1 993. Ray joined DIG ITAL in 1989 and has worked in the CAD and AD groups of DIGITAL Semicond uctor, where he conrri bu ted to the first Alpha PC project. Prior ro joining DIGITAL , he was Directo
r
of Engineeri ng ri:>r Endot, I nc . , where he developed one of rhe first V H D L . simulation environments. H e was also a n Assistant Professor ar Case Western
Reserve University, where he did research on programv
erific<Jrion, and he was <l Visiting Professor <lt the University of Ups
alla, Sweden. Ray received i'vi .S. and Ph.D. degrees in compu ter science ti·om Case vVestern Reserve Univers
ity
and a B.S. in engineeri ng fi·om C<lSe Ins
titute ofTechnologv. He has app
lied for several parents rehned to his DIGITAL F X ! 3 2 work.Digital Tcchnic.d joumal Vol . 'J No. I I 'J'J7 I I
1 2
Mark A . Herdeg
Mark Herdeg has been with DIG ITAL since 1 98 5 . He is cu rrently a principal software engineer in the A.tv!T group ofDIGITAL Semiconductor. Previouslv, he worked on con
sole software tor the Nautilus ( VAX 8500) and Argonaut projects. The Alpha simulator developed tor the Argonaut project, M ANNEQUIN, became the tirst Alpha system on which the OpenVJv!S operating system successfu l l y booted . Mark contributed ro a rel ated project that used the Alpha simulator and a dual-architecture-aware debugger ro allow developmeiJt and execution of applications with a m ix of VAX and Alpha code. A founding member ofrhc Alpha Migration Tools group, Mark worked on its tirst prod uct, VEST, the Open VMS V�\-to--Alpha binar·y translator. He then helped design and develop the DIGITAL FX 1 32 soft
ware product, with particu lar focus on the runtime compo
nent . Cu rrently, he is the project leader tor the next release ofDJGJTAL fX 1 32 software. J\i!Jrk has submitted several patent applications tor work on the rnu l tiple-architecture execution environment and for the DIG ITAL f X 1 32 design .
Dig;iral Technical journal Vol . 9 No. l 1 997
Development of the Fortran Module Wizard within DIGITAL Visual
Fortran
The Fortran Module Wizard i s one of the tools in DIGITAl Visual Fortran, a DIGITAl product for the Fortran development environment. Visual
Fortran consists of the DIGITAl Fortran
90
compiler and run-time libraries and the Microsoft Developer Studio. Together, these technologies provide a rich set of tools for the Fortran developer who is using the Windows NT and Windows95
systems. The Fortran Module Wizard generates complete Fortran source code, allowing Fortran applications to invoke routines in a dynam ic link library, methods of an Automation object, and member functions of a Component Object Model (COM) object.
I Leo P.
TreggiariDIGITAL Visual Fortran is an integrated development environment tor Fortran applications.' It is supported on the Windows NT version 4 . 0 operating system on both AJpha and I nteJ hardware and on the Windows 95 sys
tem. DIGITAL Visual Fortran is a combination of tech
nologies ti·om D I GITAL and Microsott Corporatjon.
The DIG ITAL-supplied compiler and run-time l ibraries support the DIGITA L Fortran 90 Janguagc.' DIGITAL Fortran 90 conforms to American National Standard Fortran 90
(
ANSI X3 . 1 98- l 99 2 ) and provides many extensions to the Fortran 90 standard . The Microsoftsupplied i ntegrated development environment is the Microsott Developer Studio, which is also used by
!Yl.icrosoft Visual C++, Microsoft VisuaJ J++ ( tor Java), other Microsoft tools, and other companies' develop
ment tooJs. DeveJoper Studio i ncludes a text editor, resomcc editors, project build facilities, an incremental linker, a source code browser, an integrated debugger, and a profiler. The operation of all these tools is con·
trolled ftom a single application. Figure
l
shows an example of Microsoft Developer Studio fi-om which two Fortran source files are being edited . DIGITAL adds a number of Fortran-specific tools to the environment, one of which is the Fortran Module Wizard .Design of the Fortran Module Wizard
DIGITAL designed the Fortran Moduk Wizard to help Fortran developers working in the application-rich Windows environment. The Fortran Module Wizard supports access to dynamic link libraries ( D LLs) and servers based upon Microsoft's Component Object Model (COM ). This support allows Fortran developers to usc the popu lar mechanisms that make functionality (services) available to other software (clients) .
TraditionaJ ly, Microsoft and others have provided system i nterfaces and reusable libraries of code as D LLs. A D LL is a file
containing
fun
ct
ions that can be called by programs and other D LLs. The role of D LLs on a Windows system is very simiJar to that of shareable images on the Open VMS operating system and shared Jibraries on the UNIX system. Today, DLLs are still the primary mechanism t(Jr accessing system inter
faces on Windows.
Digital Tcclmiol JournJI Vol. 9 No. I 1 997 13