The Emerging Revolution in Automated Translation

AI Translation & Language Bias | Lara Translate
|
In this article

Digital tools and AI, including essential LLMs, are generally optimized for English.

This generates lower performances for billions of non-English speakers. And broadly unequal access to relevant information and a substantially lower quality experience, largely due to a lack of diverse training data for other languages and the explosion of digital content overwhelming human translation.

Logically, automated translation here becomes a crucial and indispensable solution. It serves as a vital bridge for global users, enabling them to access content and benefit from LLMs in their preferred language, thus fueling the substantial and growing demand for this technology.

Why AI Translation Matters

The internet serves as a critical global hub for information, commerce, and community, yet its automated systems, like AI chatbots, search engines, and content moderation platforms, often fall short for most of the world. These essential tools are predominantly designed and optimized for English, resulting in reduced performance and usefulness for speakers of the world’s approximately 7,000 other languages.

This disparity stems largely from a lack of diverse, high-quality training data for non-English languages, particularly those considered “low-resource”. AI models trained mainly on English data struggle to adapt effectively to different linguistic structures and cultural nuances. Consequently, the performance of multilingual AI varies significantly, often lagging behind English capabilities. For instance, ChatGPT-4 scored 85% on a question-answering test in English but only 62% in Telugu, a language with nearly 100 million speakers.

The Consequences in Content Moderation

The consequences are particularly evident in content moderation. Tech companies invest disproportionately in moderating English content; Facebook reportedly dedicated 87% of its anti-misinformation budget to English, though only 9% of its users are English speakers. This leads to inconsistent and often ineffective moderation in other languages, allowing harmful content like hate speech and misinformation to spread, especially in regions like the Global South. Efforts to build multilingual models sometimes rely on poorly translated or low-quality data for less-resourced languages, further hindering effectiveness.

top 20 most spoken languages worldwide - Lara Translate
Source: The Need for Multilingual AI in Developing Countries

Today’s digital world generates vast amounts of diverse content. However, access to high-quality, relevant information is not equally shared. Languages like English dominate with more content that is richer and better organized. This creates an uneven global information landscape. Addressing this requires significant investment in creating comprehensive datasets for a wider range of languages and developing better methods to evaluate multilingual AI performance across diverse linguistic contexts. Building truly multilingual AI is crucial for bridging the language gap and ensuring equitable digital access globally.

Users who don’t speak English face disadvantages in search, content quality, and content availability, and have access to fewer usable or useful generative AI models. However, most current AI models (LLMs) are designed to worsen this inequity.

The Emergence of Large Language Models (LLMs)

The emergence of Large Language Models (LLMs) has only exacerbated this problem. ChatGPT’s performance is generally better for English than other languages, especially for higher-level tasks that require more complex reasoning abilities (e.g., named entity recognition, question answering, common sense reasoning, and summarization). The performance differences can be substantial for complex tasks and lower-resource languages. Most of the world’s non-English speakers have a substantially lower quality experience with LLMs if they are not somehow able to interact in English.

GenAI Training Data Reflects a Strong English and European Bias | Lara Translate
Source: CSA Research 2023

 

Efforts to mitigate the inherent English language bias within current LLMs predominantly focus on enriching their core training data with substantial volumes of non-English data. However, a significant obstacle arises from the scarcity and difficulty in obtaining this multilingual data, especially when aiming for the scale, volume, and diversity that characterize existing English language datasets.

Overcoming this data deficit necessitates considerable investment in non-English data creation and acquisition. Consequently, automated translation emerges as a crucial and indispensable solution to significantly enhance the user experience for the majority of non-English speaking users interacting with LLMs and other online platforms. By leveraging effective machine translation, we can bridge the language barrier and ensure equitable access to information and functionalities for a global audience.

The Growth of Online Machine Translation 

English is the closest thing there is to a global lingua franca. It is the dominant language in science, popular culture, higher education, international politics, and global capitalism; it has themost total speakers and the third-most first-language speakers.

The increasing need for accessible information in users’ preferred languages has fueled a significant demand for automated language translation.

For example, it is estimated that Google alone has approximately 500 million daily users, and machine translation portals have been translating trillions of words daily for several years, with volumes projected to continue growing.

How many translators would it take to translate 0.01% of the world's daily content into the 100 most economically significant languages | Lara Translate

CSA estimated that in 2020, daily digital content creation exceeded 3 quintillion bytes, yet less than 1% is professionally translated due to human resource constraints. Translating just 0.01% of this volume into one language would require 1,000 translators working for 61,375 years. A more recent calculation shows that only a tiny fraction of the total amount of content created every day is ever translated.

The Drivers for the Demand

We see several factors that indicate a world where the value, usage, and impact of robust, reliable, automated translation will only grow. The drivers for the ever-increasing demand include the following:

Putting Petabytes in its place | Lara Translate1. Content Explosion

The explosion of digital content has far outpaced the capacity of human translation, making machine translation (MT) an indispensable solution. As the quality of MT improves, its adoption is accelerating across both enterprise and personal use cases. MT is no longer just a convenience: it’s a necessity. Even a small increase, such as 0.01% more translated content, could result in 100 to 1000 times the volume currently being translated. To grasp the scale: 11.36 exabytes of text are produced every day, equivalent to around 250 million DVDs. Yet only 440GB of that content is translated, and over 99% of it by machines. That means just 0.0000038% of the world’s digital content is currently translated, highlighting the enormous untapped opportunity.

2. Better Global LLM Experience

The most effective and beneficial use of LLM technology is typically experienced by users who interact with models in English. Consequently, billions of global users can achieve the best outcomes when engaging in English. Machine translation acts as a crucial bridge for non-English speakers, translating content into their preferred language. This allows them to get higher-quality, more relevant results from large language models (LLMs).

3. MT Technological Advancements 

The use of online machine translation portals has grown steadily. It’s driven by the increasing number of supported languages and the consistent improvement in the quality of “raw” machine translation. Advances in artificial intelligence, particularly recent developments with LLM-based MT, continue to enhance machine translation quality. Leading to significantly better contextual understanding and ongoing improvements in fluency. It is now a matter of fact that billions of users translate trillions of words daily, and volumes are clearly on the rise.

4. Cost Efficiency

MT costs a fraction of human translation (estimated to be 0.05% or less of human translation costs), and the demand for rapid translation continues to grow.

5. Enterprise Adoption Growth

Businesses are increasingly integrating language translation into their global operations, leading to greater MT integration and workflow automation with content management systems, customer service platforms, and eCommerce systems to enhance the global customer experience.

While we already see significant growth in the use of automated language translation, we have only begun to scratch the surface of what is possible and yet to come.

This usage will expand as the technology improves and adapts more effectively to unique customer needs. As the quality of AI translation improves, more people across global enterprises will utilize the technology for ad-hoc needs to share, communicate, understand, and disseminate relevant content as business or personal requirements dictate.

The Machine Translation User Experience

Despite the significant increase in online machine translation usage, the growing range of translatable content, and the expanding number of use cases, the fundamental automated translation user experience has barely evolved in two decades. Typically, using online machine translation portals still involves cutting and pasting text for translation. Even with improvements to handle images, websites, and documents, the underlying process remains essentially unchanged over the last decade.

MT: The Old Way

MT, the old way | Lara Translate

Online translation tools can now process many types of content. However, few users can refine translations with expert input. Only specialists can truly enhance machine translations by adding context and nuance. Most users, even professionals, don’t know the target language well. This makes it hard to judge translation accuracy. Errors often go unnoticed, leading to confusion or miscommunication. This is especially risky in sensitive business contexts. Machine translation is useful for understanding the gist of a text. But it’s not reliable for professional use without expert review.

We are seeing ongoing technological advancements as machine translation (MT) begins to utilize the additional capabilities offered by LLMs. The recent WMT24 conference, a prominent academic and industry gathering for MT research, highlighted significant improvements in translation quality, with LLM-based MT demonstrating increasing competitiveness and superior performance compared to traditional MT methods. An LLM-based MT system built on the Claude 3.5 Sonnet system achieved the top results in the WMT24 evaluations, winning in nine language pairs. 

MT: The New Way

Lara compares favorably with the Claude 3.5 Sonnet translation quality performance, representing a significant advancement in AI-driven translation. And building upon ModernMT’s established reputation and experience, for superior MT quality and adaptability. By leveraging a human-optimized approach that integrates specialized data and actively captures corrective feedback, it achieves superior fluency and naturalness. Often approaching the quality and nuance of human translation.

Lara is a next-generation automated translation technology that is a major step forward from the static, minimal-control MT experience of the past. It enables straightforward and rapid incorporation of context and style.

The New Way | Lara Translate

Lara seamlessly offers greater flexibility in generating multiple translation variation options. It incorporates contextual information promptly and provides a quality assessment of the automated translation (sometimes with linguistically relevant commentary). This results in more trusted output at higher quality levels being produced immediately.

This simple example shows how translation variations offer multiple options with minimal effort. Users can trust the output quality, even without knowing the target language (read here if you wish to deepen the understanding the Translation Styles in Lara).

This is the Faithful version

Faithful Version | Lara Translate

Here is the Fluid version:

Fluid Version | Lara Translate

We have here the Creative version, where the model takes more risks with changing phrasing but rates itself more strictly for taking these liberties:

Creative Version | Lara Translate

And here is what happens when the user asks the model to write in an encyclopaedic style using the context window:

Context Applied | Lara Translate

Concluding Summary 

Substantial and Growing Demand

  • The demand for automated translation is growing substantially and is projected to continue rising for years to come. 
  • The global machine translation market is expected to grow at a compound annual growth rate (CAGR) of approximately 13.5% to 15.9% through 2030, with the market size expanding by over $1.2 billion between 2024 and 2028.
  • This demand arises from the growing need for content localization, multilingual AI adoption, increasing internet penetration, cost efficiencies, and advancements in translation technologies.

Enhanced Demand with Improved Quality and Flexibility 

  • As automated translation technology improves in quality and flexibility-particularly through advances in NMT and large language models (LLMs)-demand increases. 
  • LLMs demonstrate exceptional proficiency in comprehending and producing text that closely resembles human language, yielding more refined and precise translations.
  • Newer systems adapt to unique customer needs by using specific context and style. This builds user confidence and increases demand. Customization and personalization are growing trends in machine translation and AI. They will be essential to future success.
  • Organizations are seeking more customizable, context-aware, and culturally sensitive translations that can adapt to unique use-case requirements, and AI-driven solutions are delivering on these needs. 
  • Advances in LLMs and hybrid systems have significantly enhanced translation accuracy and adaptability. For instance, LLM-based systems like Claude 3.5 Sonnet and Gemini-1.5 Pro outperformed traditional NMT systems in the WMT24 benchmark, particularly in low-resource languages.

Demand Amplified by Content Explosion

  • The ongoing explosion of digital content is a major growth driver for automated translation. 
  • The sheer volume of digital content creation has overwhelmed traditional human translation capacities, making machine translation inevitable. 
  • People create over 11 exabytes of text every day. Professionals translate less than 1% of it. Automated translation is the only scalable way to manage this growing volume.
  • Even a minor increase in the percentage of content translated would result in a massive surge in demand for automated translation. 
  • Even marginal increases in translation rates (e.g., 0.01%) could multiply volumes by 100–1,000X.

Increased Demand Driven by Global LLM Usage

  • As more of the world leverages LLMs for research, writing, and specialized knowledge tasks, the need for high-quality automated translation grows. 
  • Over 40% of companies experimented with LLMs in 2024, and 75% plan to adopt them by 2025. 
  • Non-English speakers rely on automated translation to fully benefit from LLMs, which are often optimized for English, thus bridging the gap and enabling broader access to AI-powered resources
  • Machine translation serves as a bridge, enabling billions of global users to receive more useful results from LLMs by translating content into their preferred language.

Growing Enterprise Multilingualism

  • Global enterprises increasingly seek to operate in multiple languages at scale.. Integrating machine translation into their content management, customer service, and e-commerce platforms. This enables them to reach diverse markets, provide multilingual customer support, and enhance the global customer experience. Fueling demand for robust automated translation solutions. 
  • Major companies are expanding their linguistic diversity – Netflix increased from 17 to 26 languages in two years, Uber added seven more languages to its apps and websites, Ford’s website is multilingual in around 42 languages, and Jack Daniel’s is proficient in nearly 23 languages. 
  • Reaching new markets unlocks major growth opportunities for companies. It’s a practical and scalable way to increase revenue.

Sustained Demand Due to Evolving AI Translation Quality and Flexibility

  • AI-powered translation keeps improving in quality and flexibility. LLM-based systems now outperform traditional methods in many language pairs. They offer real-time translation, dynamic adaptation, and integration with other AI tools. These advanced features set a new standard.
  • AI translation is shifting away from static, limited-control systems. Newer models allow easy integration of context and personalized style.
  • For the first time, companies of all sizes can efficiently and affordably scale localization for dozens of different cultures, regions, and countries
  • Future AI will better handle emotion, cultural context, and idioms. Some platforms may reach high accuracy in these areas by late 2025.
  • These advancements make automated translation more reliable and attractive for enterprise and individual users, accelerating adoption. 
  • Technologies like Lara give users more control, context, and style options. This marks a shift toward higher-quality, flexible AI translation. As trust and usability grow, demand continues to rise.

This comprehensive analysis demonstrates that the demand for AI translation will continue to grow substantially. It will do so as technology improves, content volumes increase and global businesses seek to communicate effectively across language barriers.

FAQ

Why are AI systems mostly optimized for English?

Most AI models, including LLMs, are trained on massive datasets that are primarily in English. This creates a bias, leading to reduced performance for speakers of less-represented languages.

How does machine translation help non-English speakers?

Machine translation acts as a bridge, allowing users to access English-based content and get more accurate, useful results from AI tools.

Why can’t all digital content be translated?

Over 11 exabytes of text are created every day, but less than 1% is professionally translated. Human resources can’t keep up—automated translation is the only scalable solution.

Is machine translation reliable?

It’s useful for understanding general meaning, but for professional or sensitive use, expert review is recommended to ensure accuracy and clarity.

What are LLMs and why do they perform better in English?

LLMs (Large Language Models) like ChatGPT are advanced AI systems trained on huge text corpora. Their performance is strongest in English due to the volume and richness of English training data.

How is Lara different from traditional machine translation tools?

Lara enables quick integration of context and style, offering high-quality, natural translations. It also provides multiple translation options and quality indicators—features lacking in traditional MT.

What’s driving the growing demand for machine translation?

Key drivers include the explosion of digital content, improved translation quality, the rise of global LLM usage, enterprise localization needs, and the low cost of MT compared to human translation.

Can machine translation fully replace human translators?

Not entirely. While MT handles scale and speed, human translators remain essential for nuanced, creative, or culturally sensitive content. The future lies in hybrid systems.


This article is about:

  • AI Language Bias: Most AI systems and LLMs prioritize English, disadvantaging speakers of other languages due to limited training data.
  • Growing MT Demand: There is significant and increasing global demand for automated translation to access vast digital information.
  • LLM Access: Machine translation serves as a vital bridge for non-English speakers to get useful results from English-optimized LLMs.
  • MT Cost Efficiency: Automated translation is a highly cost-effective solution, costing a fraction of human translation.
  • Enterprise MT Adoption: Enterprises increasingly adopt machine translation into global operations to enhance customer experience.
Share
Link
Recommended topics