Some content doesn’t arrive as text. Images contain text that needs translating. Audio files need to be understood across languages. Standard translation APIs don’t handle either well — and most don’t handle audio at all.
Lara Translate’s Adaptive Translation API covers both. The image translation endpoint uses OCR to extract and translate text from images, returning either a translated image or structured text data. The audio translation endpoint runs an asynchronous pipeline: upload an audio file, receive a translated audio file back.
This article explains how both endpoints work, what they support, and how to integrate them.
TL;DR
|
|
Why it matters Image and audio translation closes the gap between what gets created and what gets localized. Marketing teams produce image-based content at scale. Training and media teams produce audio at scale. Both end up untranslated because there’s no clean API path to handle them. Lara Translate’s endpoints for both modalities apply the same context control, style selection, and glossary enforcement that the text and document endpoints use — so visual and audio content can enter the same multilingual workflow as everything else. |
Quick answer
Lara Translate’s image translation endpoint uses OCR to extract text from images and returns either a translated image or structured translated text. The audio translation endpoint is asynchronous: upload an audio file, poll for status, and download the translated audio. Both support context instructions and custom glossaries. Full documentation at developers.laratranslate.com.
Image translation: how the OCR-based pipeline works
The image translation endpoint takes an image file, uses OCR to extract the text, translates it, and returns the output in one of two forms, depending on which method you call.

translate method: Returns the translated image — the text in the image is replaced with the translated text, rendered in the same visual context. The output is an image file in the same format as the input. This is the method to use when you need the translated content to remain visually embedded in the image.
translate_text method: Returns the extracted and translated text as structured data, rather than a new image. Use this when you need the translated strings for downstream processing — to feed them into a localization pipeline, store them in a database, or render them programmatically in your own UI.
Both methods support context instructions and custom glossaries, so the translated output follows your terminology and domain conventions.
Supported image formats
The image translation endpoint supports: AVIF, BMP, GIF, HEIC, HEIF, PNG, JPG, JPEG, TIF, TIFF, WEBP. For the full reference, see Supported document formats in Lara document translation.
For guidance on using image translation in Lara Translate, see: Translating images with Lara image-to-image.
OCR accuracy and image quality
OCR performance depends on the quality of the source image. Text that is clearly legible, high-contrast, and rendered in a standard font will extract accurately. Low-resolution images, stylized or hand-written text, text on complex backgrounds, and text at steep angles are all likely to reduce OCR accuracy.
For best results:
- Use images with a minimum resolution of 300 DPI for printed content
- Ensure text has high contrast against the background
- Avoid heavily styled or decorative fonts for content that needs to be machine-readable
- For image-based PDFs, note that OCR must be explicitly enabled during upload — it is not applied automatically. See: Translating images inside PDF documents.
Image translation: Python SDK example
from laratranslate import Lara
lara = Lara(
access_key_id="YOUR_ACCESS_KEY_ID",
access_key_secret="YOUR_ACCESS_KEY_SECRET"
)
# Translate image — returns translated image file
result = lara.images.translate(
file="./banner-en.png",
source="en",
target="de"
)
with open("./banner-de.png", "wb") as f:
f.write(result.content)
# Translate image — returns extracted translated text as structured data
text_result = lara.images.translate_text(
file="./banner-en.png",
source="en",
target="de"
)
print(text_result.translations)
Translate images and audio via API
Context-aware, glossary-enforced, across 206 languages. OCR for images, async pipeline for audio.
Audio translation: how the async file pipeline works
The audio translation endpoint is a three-step asynchronous pipeline. It is designed for file-based audio translation — uploading a recorded audio file and receiving a translated audio file back.
Step 1: Upload. Send the audio file to the API. The call returns a job ID immediately.
Step 2: Poll. Use the job ID to check translation status. The job moves through processing states until it completes.
Step 3: Download. Once the status indicates completion, retrieve the translated audio file.
This async pattern is standard for audio processing, where translation time depends on file length and server load. Build polling logic into your integration rather than expecting a synchronous response. For short files, polling a few times with a short delay is enough. For long recordings, longer polling intervals reduce unnecessary API calls.
Audio translation: Python SDK example
import time
from laratranslate import Lara
lara = Lara(
access_key_id="YOUR_ACCESS_KEY_ID",
access_key_secret="YOUR_ACCESS_KEY_SECRET"
)
# Step 1: Upload audio file
job = lara.audio.translate(
file="./briefing-en.mp3",
source="en",
target="es",
instructions="Internal product briefing for sales team. Formal tone."
)
# Step 2: Poll until complete
while job.status != "completed":
time.sleep(5)
job = lara.audio.get_job(job.id)
# Step 3: Download translated audio
result = lara.audio.download(job.id)
with open("./briefing-es.mp3", "wb") as f:
f.write(result.content)
Audio translation options
| Parameter | Description |
|---|---|
source |
Source language code (e.g., "en") |
target |
Target language code (e.g., "es") |
instructions |
Context instructions: audience, tone, domain, preferred terms |
glossaries |
Custom glossary IDs to enforce terminology in the translation |
Comparing image and audio translation capabilities
| Capability | Image translation | Audio translation |
|---|---|---|
| Pipeline type | Synchronous | Asynchronous (upload / poll / download) |
| Output options | Translated image (translate) or structured text data (translate_text) |
Translated audio file |
| Context instructions | Yes | Yes |
| Custom glossaries | Yes | Yes |
| Languages | 206 languages | 206 languages |
| Primary use case | Marketing visuals, product screenshots, signage, image-based documents | Recorded briefings, training content, voice memos, media files |
Related articles in this series
- Lara Translate Adaptive Translation API: full overview — all supported content types, capabilities, and connection options
- Adaptive API for Text — context, styles, glossaries, Lara Think, and the text translation workflow
- Adaptive API for Documents — 70 file formats, layout preservation, localization formats, and bulk translation
FAQ
How does image translation work in Lara Translate’s Adaptive API?
The image translation endpoint uses OCR to extract text from an image, translates it, and returns either a translated image (with the text replaced in the visual) or the extracted translated text as structured data. Use the translate method to get a translated image file back, or translate_text to get the translated strings as structured output for downstream processing.
What image formats does the translation API support?
Supported image formats include AVIF, BMP, GIF, HEIC, HEIF, PNG, JPG, JPEG, TIF, TIFF, and WEBP. For the complete list, see Supported document formats in Lara document translation.
Does the API translate text inside PDF images?
Text embedded in images within a PDF — scanned pages, screenshots, or image-based pages — requires OCR, which must be explicitly enabled during upload. It is not applied automatically. Without enabling image translation, text in images passes through untranslated. See: Translating images inside PDF documents.
How does audio translation work in Lara Translate’s Adaptive API?
The audio translation endpoint is an asynchronous file pipeline. You upload an audio file, receive a job ID immediately, poll for translation status, and download the translated audio file once the job is complete. It supports context instructions and custom glossaries. The pipeline is designed for file-based audio translation — recorded audio that you upload and receive back translated.
How long does audio translation take?
Processing time depends on file length and server load. Build polling logic into your integration with appropriate delays between status checks. For short files (under a minute), a few seconds between polls is sufficient. For longer recordings, use longer polling intervals to avoid unnecessary API calls.
Can I apply glossaries to image and audio translation?
Yes. Both the image and audio translation endpoints support custom glossaries. Upload your terminology once in the Lara Translate platform and reference it in your API calls to enforce consistent terminology in translated output. See: How glossaries work in Lara Translate.
Translate visual and audio content via API
OCR-based image translation and async audio file translation. Context-aware, glossary-enforced, 206 languages.
This article is about:
Product: Lara Translate Adaptive Translation API — image and audio translation capabilities
Topic: Image translation API, audio translation API, OCR translation, machine translation for non-text content
Concepts covered: OCR-based image translation, translate method (image out), translate_text method (structured data out), async audio file pipeline, upload/poll/download, context instructions, custom glossaries, 206 languages, image quality for OCR, PDF image translation opt-in
Content types: Image translation, audio file translation
Related topics: visual content localization, image localization API, audio localization, multimedia translation, OCR translation, scanned document translation




