Adaptive Translation API for Images and Audio: OCR-Based and Asynchronous

Adaptive OCR translation API - Lara Translate
|
In this article

Some content doesn’t arrive as text. Images contain text that needs translating. Audio files need to be understood across languages. Standard translation APIs don’t handle either well — and most don’t handle audio at all.

Lara Translate’s Adaptive Translation API covers both. The image translation endpoint uses OCR to extract and translate text from images, returning either a translated image or structured text data. The audio translation endpoint runs an asynchronous pipeline: upload an audio file, receive a translated audio file back.

This article explains how both endpoints work, what they support, and how to integrate them.

TL;DR

  • What: Two API endpoints: image translation (OCR-based, returns translated image or structured text) and audio translation (async file pipeline, returns translated audio).
  • Why: Visual and audio content are often left untranslated because generic MT APIs don’t support them. The Adaptive API handles both with the same customization layer as text and document translation.
  • How: Image: upload image, receive translated image (translate) or extracted translated text (translate_text). Audio: upload audio file, poll for completion, download translated audio.
  • Watch for: OCR accuracy depends on image quality and text clarity. Audio translation is asynchronous — build polling logic into your integration.
  • Tooling: Lara Translate Adaptive API (REST, Python SDK). 206 languages. Glossaries and context instructions supported on both endpoints.

Why it matters

Image and audio translation closes the gap between what gets created and what gets localized. Marketing teams produce image-based content at scale. Training and media teams produce audio at scale. Both end up untranslated because there’s no clean API path to handle them. Lara Translate’s endpoints for both modalities apply the same context control, style selection, and glossary enforcement that the text and document endpoints use — so visual and audio content can enter the same multilingual workflow as everything else.

Quick answer

Lara Translate’s image translation endpoint uses OCR to extract text from images and returns either a translated image or structured translated text. The audio translation endpoint is asynchronous: upload an audio file, poll for status, and download the translated audio. Both support context instructions and custom glossaries. Full documentation at developers.laratranslate.com.


Image translation: how the OCR-based pipeline works

The image translation endpoint takes an image file, uses OCR to extract the text, translates it, and returns the output in one of two forms, depending on which method you call.

Adaptive OCR translation API - Lara Translate

translate method: Returns the translated image — the text in the image is replaced with the translated text, rendered in the same visual context. The output is an image file in the same format as the input. This is the method to use when you need the translated content to remain visually embedded in the image.

translate_text method: Returns the extracted and translated text as structured data, rather than a new image. Use this when you need the translated strings for downstream processing — to feed them into a localization pipeline, store them in a database, or render them programmatically in your own UI.

Both methods support context instructions and custom glossaries, so the translated output follows your terminology and domain conventions.

Supported image formats

The image translation endpoint supports: AVIF, BMP, GIF, HEIC, HEIF, PNG, JPG, JPEG, TIF, TIFF, WEBP. For the full reference, see Supported document formats in Lara document translation.

For guidance on using image translation in Lara Translate, see: Translating images with Lara image-to-image.

OCR accuracy and image quality

OCR performance depends on the quality of the source image. Text that is clearly legible, high-contrast, and rendered in a standard font will extract accurately. Low-resolution images, stylized or hand-written text, text on complex backgrounds, and text at steep angles are all likely to reduce OCR accuracy.

For best results:

  • Use images with a minimum resolution of 300 DPI for printed content
  • Ensure text has high contrast against the background
  • Avoid heavily styled or decorative fonts for content that needs to be machine-readable
  • For image-based PDFs, note that OCR must be explicitly enabled during upload — it is not applied automatically. See: Translating images inside PDF documents.

Image translation: Python SDK example

from laratranslate import Lara

lara = Lara(
    access_key_id="YOUR_ACCESS_KEY_ID",
    access_key_secret="YOUR_ACCESS_KEY_SECRET"
)

# Translate image — returns translated image file
result = lara.images.translate(
    file="./banner-en.png",
    source="en",
    target="de"
)

with open("./banner-de.png", "wb") as f:
    f.write(result.content)

# Translate image — returns extracted translated text as structured data
text_result = lara.images.translate_text(
    file="./banner-en.png",
    source="en",
    target="de"
)

print(text_result.translations)

Translate images and audio via API

Context-aware, glossary-enforced, across 206 languages. OCR for images, async pipeline for audio.

Try the Adaptive Translation API

Audio translation: how the async file pipeline works

OCR translation API - Lara TranslateThe audio translation endpoint is a three-step asynchronous pipeline. It is designed for file-based audio translation — uploading a recorded audio file and receiving a translated audio file back.

Step 1: Upload. Send the audio file to the API. The call returns a job ID immediately.

Step 2: Poll. Use the job ID to check translation status. The job moves through processing states until it completes.

Step 3: Download. Once the status indicates completion, retrieve the translated audio file.

This async pattern is standard for audio processing, where translation time depends on file length and server load. Build polling logic into your integration rather than expecting a synchronous response. For short files, polling a few times with a short delay is enough. For long recordings, longer polling intervals reduce unnecessary API calls.

Audio translation: Python SDK example

import time
from laratranslate import Lara

lara = Lara(
    access_key_id="YOUR_ACCESS_KEY_ID",
    access_key_secret="YOUR_ACCESS_KEY_SECRET"
)

# Step 1: Upload audio file
job = lara.audio.translate(
    file="./briefing-en.mp3",
    source="en",
    target="es",
    instructions="Internal product briefing for sales team. Formal tone."
)

# Step 2: Poll until complete
while job.status != "completed":
    time.sleep(5)
    job = lara.audio.get_job(job.id)

# Step 3: Download translated audio
result = lara.audio.download(job.id)

with open("./briefing-es.mp3", "wb") as f:
    f.write(result.content)

Audio translation options

Table 1. Configuration options for the audio translation endpoint
Parameter Description
source Source language code (e.g., "en")
target Target language code (e.g., "es")
instructions Context instructions: audience, tone, domain, preferred terms
glossaries Custom glossary IDs to enforce terminology in the translation

Comparing image and audio translation capabilities

Table 2. Image vs. audio translation in the Lara Translate Adaptive API
Capability Image translation Audio translation
Pipeline type Synchronous Asynchronous (upload / poll / download)
Output options Translated image (translate) or structured text data (translate_text) Translated audio file
Context instructions Yes Yes
Custom glossaries Yes Yes
Languages 206 languages 206 languages
Primary use case Marketing visuals, product screenshots, signage, image-based documents Recorded briefings, training content, voice memos, media files

Related articles in this series


FAQ

How does image translation work in Lara Translate’s Adaptive API?

The image translation endpoint uses OCR to extract text from an image, translates it, and returns either a translated image (with the text replaced in the visual) or the extracted translated text as structured data. Use the translate method to get a translated image file back, or translate_text to get the translated strings as structured output for downstream processing.

What image formats does the translation API support?

Supported image formats include AVIF, BMP, GIF, HEIC, HEIF, PNG, JPG, JPEG, TIF, TIFF, and WEBP. For the complete list, see Supported document formats in Lara document translation.

Does the API translate text inside PDF images?

Text embedded in images within a PDF — scanned pages, screenshots, or image-based pages — requires OCR, which must be explicitly enabled during upload. It is not applied automatically. Without enabling image translation, text in images passes through untranslated. See: Translating images inside PDF documents.

How does audio translation work in Lara Translate’s Adaptive API?

The audio translation endpoint is an asynchronous file pipeline. You upload an audio file, receive a job ID immediately, poll for translation status, and download the translated audio file once the job is complete. It supports context instructions and custom glossaries. The pipeline is designed for file-based audio translation — recorded audio that you upload and receive back translated.

How long does audio translation take?

Processing time depends on file length and server load. Build polling logic into your integration with appropriate delays between status checks. For short files (under a minute), a few seconds between polls is sufficient. For longer recordings, use longer polling intervals to avoid unnecessary API calls.

Can I apply glossaries to image and audio translation?

Yes. Both the image and audio translation endpoints support custom glossaries. Upload your terminology once in the Lara Translate platform and reference it in your API calls to enforce consistent terminology in translated output. See: How glossaries work in Lara Translate.


Translate visual and audio content via API

OCR-based image translation and async audio file translation. Context-aware, glossary-enforced, 206 languages.

Get started with the Adaptive Translation API


This article is about:

Product: Lara Translate Adaptive Translation API — image and audio translation capabilities

Topic: Image translation API, audio translation API, OCR translation, machine translation for non-text content

Concepts covered: OCR-based image translation, translate method (image out), translate_text method (structured data out), async audio file pipeline, upload/poll/download, context instructions, custom glossaries, 206 languages, image quality for OCR, PDF image translation opt-in

Content types: Image translation, audio file translation

Related topics: visual content localization, image localization API, audio localization, multimedia translation, OCR translation, scanned document translation

Share
Link
Avatar dell'autore
Niccolo Fransoni
Content Strategy Manager @ Lara Translate. Niccolò Fransoni has 15 years of experience in content marketing & communication. He’s passionate about AI in all its forms and believes in the power of language.
Recommended topics