• Aucun résultat trouvé

TM system architecture

Dans le document An empirical study on the impact of (Page 108-112)

Translation automation and TM systems

4.4 Translation memory systems

4.4.4 TM system architecture

The main components that are available to any classical translator’s worksta-tion today may comprise of:

ˆ A word processor or a translation editing environment, i.e. the interface where the translator actually edits and translate the source text.

ˆ Some online/offline electronic reference works. These may include monolingual and bilingual dictionaries, thesauri, encyclopaedias and other general sources of information.

ˆ A terminology management tool to store domain-specific terminology.

Term banks facilities may be available on the individual workstation or centrally on a server which are normally fed from scratch by the user.

ˆ A concordancer / KWIC (Key-Word-In-Context) tool for displaying a given word with adjoining context.

ˆ An online/offline translation memory tool to access to access previously translated segments in the form of an aligned parallel bilingual corpus.

ˆ A plug-in to a MT engine on the Internet.

TM systems are no longer just databases. As the name ‘system’ suggests, they may consist of several components linked together within a multi-layered software architecture. A typical TM system has a search-oriented architecture.

The architectural layer is usually occupied by a database management system (DBMS), which is supplemented by a search engine. Queries for information are usually performed using Structured Query Language (SQL). The database management system comprises of all the algorithms that govern internal operations related to the creation, manipulation and maintenance of the database, such as text segmentation, alignment of TUs and storage. Typically, two databases are created: a TM database and a terminology database, but the creation of additional databases containing other resources, such as a lexicon, is also possible in some systems (e.g. D´ej`a Vu X).

TM systems may have a multi-tier architecture, in which information is maintained in a Data-Tier where it can be stored and retrieved from a database or file system. This Data-Tier may be replaced or placed behind another tier which contains a search engine and search engine index which is in-place of the database management system. The search engine itself crawls the database management system in addition to other data sources such as file systems and consolidates the results when queried. Apart from the two core components and their modules, TM systems also commonly include a text-editing module, file filters, and workflow mechanism.

Recent developments in TM technology which caused the emergence of hybrid systems combining TM with MT technology (e.g. D´ej`a Vu X) have led to the expansion of the search engine’s operations, typically limited to match search and retrieval, to include an additional operation (previously performed only by MT systems): match assembly (or construction). This operation poses several special requirements to the TM system architecture, as special modules need to be added, such as a match construction algorithm, part-of-speech taggers, parsers and various auxiliary lexical resources.

The same happens when a TM system combines technologies from other systems, such as localisation tools or term extraction tools. Then, the TM system acquires modules from these systems, and its architecture becomes more complex.

Regarding the architecture model, TM systems can be either centralised (‘desktop systems’, i.e. all system components reside on a single centralised computer and all the processes run there) or distributed systems. In the latter case, we can come across two types of TM system architecture:

ˆ client/server: The server hosts the DBMS and the search engine, with all their modules, as well as any filters and workflow mechanisms. The created databases, which constitute the TM repository, are also hosted on the server, and the system enables their sharing, as well as their remote management and processing. The server may be located on the TM developer’s computers, in which case it is accessed by the users via the Internet (‘Web-based TM’) (e.g. Logoport), or it may be located on a company’s server network, e.g. LAN (e.g. across Language Server).

The client side is an interface (a web browser, in the case of web-based TM systems) usually hosted in browser-based forms or Java applets. It provides the text-processing components, as well as the necessary code to communicate with the shared resources on the server (Savourel 2001:

329).

ˆ service-oriented: Over the last few years, as computer power and broadband connectivity have vastly increased, language data has largely moved from the desktop to the enterprise server, and is now moving to the so-called ‘cloud’, i.e. computational resources (both data and software) accessible via the Internet rather than from a local computer which frees up PC memory on users’ side. It’s no surprise in this age of unlimited connectivity and cloud computing that TMs also find their way into the cloud. When personal computers are becoming little more than a browser terminal with constant Internet connection, some TM technologies also offer the possibility to work on a SaaS model.

Major LSPs are working with cloud-based TMs where translators now log via their browsers. In 2006, Lingotek pioneered a service-oriented architecture for their commercial TM system, the result being a TM solution offered as a web service rather than as a product. Unlike traditional client/server models, web services do not provide the user with a GUI; instead, they share business logic, data and processes through a programme interface (e.g. a web browser) across the Internet.

Developers can then add the web service to a GUI (such as a web page or an executable program) to offer specific functionality to users.15 For

15Definition according to Webopedia (Available at:

http://www.webopedia.com/DidYouKnow/Computer Science/2005/web services.asp) [Last accessed on 12/09/2011]

a TM user this practically means that he has access to a TM system through the Internet, instead of owning one. The main advantages of this solution are that the TM system can be accessed from any platform (e.g. Windows, Mac, Linux) and that the software can be improved continually without the users worrying about software updates (Kreckwitz 2007).

Cloud computing is the next computing paradigm shift and refers to Internet-based development and use of computer technology (Haynie 2008).

The cloud is a metaphor for the Internet in which users access technology-enabled services. Cloud computing incorporates “software as a service” (SaaS), Web 2.0, virtualization, multitenancy and other recent technology trends in which the common theme is reliance on the Internet for satisfying the computing needs of the users.

In the field of translation automation, this new technology brings obvious benefits for cross-leveraging language data in extensive and collaborative localisation projects. However, at the same time, it raises lots of questions, especially about ownership and legal compliance. Mining the web, aligning translations, and sharing translation memories are common practice nowadays, but “no empirical studies have been done yet on how this emerging web-interactive mode suits the translators” (Garcia 2009: 203).

Thinking on the negative side of cloud-based TM, it seems that the major drawback will be that it will make more difficult for translators to build up their own linguistic assets and an obligation to work with the software suite provided by the LSP in a SaaS model. The positive side, however, is that TMs in the cloud open the way for ubiquitous access to language data independently of the operating system used by the translator and real time collaboration among the different agents working in the same project thanks to cross-leveraging.

Apart from these arguments in favour of and against the use of cloud technology in the field of translation automation, it is also worth taking into consideration the opportunities offered by this new web-interactive technology both for researchers and software developers. Web-based translation tools enable easy logging of user activity, as is the case with traditional web search engines and web activity as a whole. TMs in the cloud can prove very useful for implementing new tools and services based on productivity derived data, use of concordancers during the translation process and work patterns detected while using this web-interactive mode.

Dans le document An empirical study on the impact of (Page 108-112)