Bitesize 2024 – Summaries

ETUG Chairperson and Secretary General Julia Traub-Teubl welcomed participants to the first ETUG Bitesize webinar – a new format for the association in 2024. It is hoped that more regular, but shorter Bitesize events will be easier for ETUG members to attend, and this is the first of three Bitesize Events for 2024. 

The first speaker, Vesna Lušicky, (University of Vienna) presented about LLM term extraction using ChatGPT and Gemini to then use a workflow to move output into MultiTerm. LLM usage has gained considerable prominence in higher education – by academics and students alike, following the rapid emergence in recent years of massive corpora typically scraped from the web as well as carefully curated data. Such corpora have quickly emerged as lending themselves well to a range of natural language processing (NLP) tasks.  

Frequently used NLP tasks include question and answer pairs, sentence-by-sentence translation, and summarisation of documents. Scholarly literature is abundant about the use of LLMs for Machine Translation and CAT tools, and now there is a growing number of publications about the use of LLMs for terminology-related tasks. These use cases are now part of Translation Studies and Terminology Science courses. 

Higher education institutions are fully aware of student usage of LLMs to write coursework and essays, but nevertheless consider AI as beneficial, rather than a risk or threat. Consequently, exposure of students to LLM technology is frequent across many courses, and in private and professional settings. Venkatesh’s Unified Theory of Acceptance and Use of Technology may explain students’ positive motivation towards LLMs. The extent is this acceptance in clear: many students use LLM chats over traditional search engine usage. 

Attention then shifted to using LLMs for terminology extraction. Drewer’s 9-stage model of terminology work was taken as a starting point. For using LLMs for terminology extraction, Students used freely available technology that they were comfortable using, together with MultiTerm and Trados. Google’s Gemini v 1.5 and OpenAI’s GPT-4o were chosen for this purpose. The presenter was quick to point out some academic limitations, namely using only publicly available information and to remind the audience that copyright issues and use of personal and sensitive information might need consideration in an enterprise setting. 

Some of the tasks performed as exercises handled term extraction in domains that students are not well-versed in (e.g. medicine / migraine diagnosis). This was deliberate as definition writing might otherwise be difficult. Prompting was also used for enriching term entries with linguistic information in a bid to speed up and automate the terminology process. Attempts were also made to see whether output conversion was possible using the TBX exchange format. 

Freely available tools clearly have limitations regarding the input method. For example, ChatGPT is unable to directly process a file from a link and needed to upload it to be able to process it, a problem that Gemini does not have.  

Prompting plays an important role particularly in defining what is to be extracted, e.g. monolingual technical terms from technical documents. For this reason, students attended sessions on prompt engineering before using prompts for term extraction. While they started out using standardised prompts, they began to then experiment with their own prompts. LLM chatbots have distinct limitations. The need for specialised knowledge in the curation process is particularly apparent.  

Some other serendipitous LLM features were also observed – for example Gemini’s ability to automatically preface the terms found with a brief summary, which interpreters might find useful. Other output was more arbitrary: for example, when LLMs were asked to produce a list of terms from a German document and to provide English translations for them. Output was frequently in an arbitrary order, and there was also little by way of information about the frequency of the extracted terms. LLMs’ tendency to also overcorrect a lot was also highlighted – for example in the form of multiple variant designations appearing in output lists. 

Further prompts were used to obtain definitions in German and English, in some cases giving rise some very strange hallucinations. Subsequently, a series of prompts were used to add linguistic information, although results for grammatical information, variants, acronyms, synonyms tended to be disappointing. Little improvement was seen by using iterative prompts in this regard. Cited issues included occurrences of unexplained entries (e.g. acronyms without full forms). Similarly, LLMs also struggled with extracting precise context from documents – there were substantial hallucinations in many cases. 

The final task was to try to get LLMs to perform was to create valid TBX files. However, there was frequently missing entries, or ones with different content. To be able to work further with the generated content needed substantial quality control steps. 

Having tried this approach, the most workable workflow to emerge, with usable output, was as follows: an LLM was used to generate a sufficient number of entries. Curation was then done in spreadsheets, for example making use of Gemini’s easy option for exports to spreadsheets. The spreadsheet data was then converted using MultiTerm Convert into a MultiTerm Termbase. 

Regardless of the issues obtaining usable terminology, the LLM-based approach to term extraction helped to make terminology a “fun” exercise, in that it produces a large number of results quickly. The usefulness of the results depends heavily on the quality of the prompts used, with iterative prompting one method for fine-tuning the output. Force-prompting approaches might help for example to extract verbs, since LLMs frequently do not extract verbs. 

Future development for this kind of term extraction exercise could explore the use of Custom GPTs and Co-Pilots integrated into CAT tools and Terminology Management systems (e.g. as included from Trados Studio 2022 SP2 and newer). LLM capabilities have quickly been found to be “overpromising” and there are still a lot of real-world challenges. To date the focus has been on monolingual extraction from monolingual documents, and no bilingual extraction. This might be a possibility in the future, although this might also come with greater levels of hallucination.

In the second presentation, entitled “How Porsche tackles the tangle of words”, Ira Rotzler (Porsche) and Sophia Ackermann (berns language consulting) presented Porsche’s project, to establish a single terminology database for the entire Porsche Group, and developing terminology by grooming existing language data. The intention is to top up the existing German terminology and add to it from translations. A project of this size requires personnel resources, IT budget and senior management backing. The message towards senior management needs to emphasise the essential role of terminology in future projects e.g. one using LLMs (e.g. for using prompting for definition writing of terms). 

The presentation’s main focus was to show the various stages in the project, starting with pre-processing, merging and mapping German terminology from various sources into a single terminology database, including metadata. The terminology database is in Kalcium QuickTerm, As the data was hosted by Volkswagen’s IT, it was necessary to adapt and clean VW’s data model to create the Porsche data model. Porsche content was then gathered, extracted, imported and groomed, predominantly in German, with a little English, and system-based terminology approval workflows set up for the German language terms.  

Moving forward, the intention is to find target language equivalents for all the German terminology and add them to the terminology database. This will require the preprocessing of multilingual data (from multiple glossaries for up to 50 columns) and subsequent alignment using an exchange format and merging of multilingual termbases, and eliminating duplicates, before adding the multilingual data to the Porsche termbase, with languages terms and metadata being added, and terms being compared with preferred terms. The end result is that all target language terms are mapped against a single German term. 

There are many challenges along the way, like different terminology formats, metadata fields and language variants. Multilingual data frequently contains many duplicates, in turn also resulting in conflicts in entries. Concatenation of metadata fields or prioritisation of sources can be used to resolve conflicts automatically, but some conflicts still require human intervention. Pre-defined workflows can speed up resolving conflicts, however human knowledge finds conflicts that statistical analysis cannot. From the consolidated termbase of around 2,800 entries there are several hundred conflicts to resolve, although the number of entries will increase by the time the production stage begins. This will have some knock-on effects for the number of conflicts in other languages. 

Future work will also include work on definitions of terms. Currently the project is still in the test stage and planning is required to first move to a production stage. Planning is also underway to use LLMs (e.g. using prompts for definition writing). The process is a cyclical one, with the circle restarting each time a new dataset is added. Human validation is frequently required between steps that can be automated. Many conflicts are still solved at monthly terminologists meetings, rather than workflows within QuickTerm. 

RWS had demonstrated its first steps with AI at ETUG 2023. AI features were first integrated into Trados 2022 SR2. Luis Lopes (RWS) explained that the advent of AI, had seen Trados and Language Weaver merged into a single business unit to assist navigating the integration of AI and LLMs. RWS also operates products and platforms, and connections to other systems and extension of the landscape using other apps has become very important. Regarding the rationale for why to do something with AI, he acknowledged the ubiquitous buzz surrounding AI and how everyone is wanting to be seen to do something with AI. He noted that LLMs are showing some applicability with great benefits for terminology, although the importance of terminology itself is increasing. 

Trados uses AI in a number of ways. The Trados Copilot is RWS’ way to deliver features to help users make decisions or assist in taking action. Generative features use AI to generate more personalised content, and to allow generation using batch tasks. Trados previously did not focus strongly on multimedia aspects, but use cases have emerged to help with the processing of video or images. To illustrate RWS’s move towards AI, Luis highlighted how it has built on Trados’ Traditional Translation Engine, which focusses on using translation memories, terminology databases and NMT and which dates back to the launch of Trados’ cloud offering. 

Now there is a new Generative Translation Engine which taps into additional resources, harnessing AI and LLMs. Traditional Translation Engine components are now used to supply context to the LLM in the Generative Translation Engine. In the future context will also be able to be supplied from existing bilingual documents. A drawback of MT to date has been the amount of training it needs, whereas LLMs permit the tailoring of output to the required audience. Custom prompting can be used to generate translations in a particular style, tone or formality, for more appropriate translations for the target audience.  

Trados Copilot, Trados Studio’s AI Assistant has been bundled with the release of Trados 2024. ETUG 2023 had discussed about issues regarding inconsistent terminology when pre-translation was performed by NMT. Frequently pre-translations consistently failed to translate correctly, often producing inconsistent translations with odd terms that were unusable. An AI Assistant can pick up and apply correct terms from a termbase, and such “terminology awareness” can be activated easily. In addition, AI can handle inflections and singular/plural forms better than traditional MT engines.  

Although some errors may remain (e.g. capitalisation) less post-editing of terms required. Prompts in the Copilot can be used to obtain examples and to pick a translation – which many translators find useful. Nevertheless, such an approach is not expected to fully replace customised/trained MT engines, as valid use cases for the latter, particularly given LLMs’ tendency to be more creative in their translations than MT engines. Moreover, LLMs are still quite a bit slower than NMT engines. 

A workflow in the cloud was shown that allows the embedding of automated pre-translation tasks, for translation in either Studio or the Online Editor. Testing has highlighted the essential nature of a good terminology database – poor terminology impacts results considerably. 

The Trados Copilot also has content analysis features – e.g. for detecting the domain of a document submitted for translation, or extracting entities like languages, locations, or names. This approach allows project managers to establish what a file is about without opening it. Currently this uses Language Weaver. Trados Copilot’s Smart Help feature is like a ChatBot trained on RWS’ documentation. However, it cannot be used like a search engine to find out certain facts. Another feature of Copilot is the Smart Review. In it, the Copilot assumes the persona of a linguistic reviewer in Studio or the Online Editor to review translations. It scores each segment translation – and justifies its scoring. Users can then choose whether or not to review segments. 

MT Quality Estimation is another AI use case. It categorises results into good, adequate or poor, to help identify where the focus of post-editing should be. It is effectively a self-rating of the MT engine shown in Studio’s status column. Workflows can be used to lock segments during preprocessing based on MTQE. 

In the future, the Copilot will be able to talk to the reporting data of a reporting database. AI could then interact and take necessary actions – e.g. checking which translator has called in sick before reallocating their jobs automatically, or reallocating work away from translators who are snowed under. Concerns from the translation profession that such a tool could be misused by the industry are assuaged by the fact that data can be anonymised (e.g. to allow project managers to see what the problem is, but not who is causing it. This feature could be used to gauge whether there is a need for additional training or resources as a whole, rather than tracking individual translators. 

A potential future use case for AI in relation to multimedia use would be for translating videos. From a source video, speech-to-text processing could be used to extract subtitles, to then remove timestamps and subtitle breaks, and to create a transcript which the AI can in turn also summarise, with the summary being used to provide context and terminology for translating the subtitles. AI can also be used for extracting noises (e.g. hesitations, false starts or background noises) The final step would be to reconstruct the target video. This might be particularly useful where it is important to have subtitles, but where the quality of the translation is not so important. 

Finally, a couple of new features coming to Trados Studio 2024 were announced. There is good news regarding accessibility, in that Trados Studio 2024 now works with screenreaders, making it usable by blind translators. And an InDesign Preview for Online Editor has also been added to the Cloud AppStore. 

Scroll to Top