Adaptive Document Translation API: Translate 70+ File Formats with Layout Preserved

Adaptive document translation API - Lara Translate
|
In this article

Document translation is harder than text translation. Formatting breaks. Tables collapse. Columns shift. Localization formats like XLIFF need their structure left untouched while only the translatable strings change.

Lara Translate’s Adaptive Document Translation API handles all of it. It preserves layout, supports 70+ file formats across office, publishing, localization, and image types, and applies the same context, style, glossary, and translation memory controls available in the text API — all in a single call.

This article covers how the document translation endpoint works, what formats it supports, how to configure it, and what to watch for when integrating it into a production pipeline.

TL;DR

  • What: Lara Translate’s document translation API translates 70 file formats while preserving the original layout, across 206 languages.
  • Why: Manual document translation breaks formatting, creates version control problems, and doesn’t scale. The API handles structure and content together.
  • How: Upload the document, specify language pair and options (style, context, glossary, TM), receive the translated file. Bulk and multilingual in a single call.
  • Watch for: PDF files return DOCX by default — pass outputFormat="pdf" to get a PDF back. Image content in PDFs requires OCR to be explicitly enabled during upload. Minimum billing charge of 20,000 characters per document on Pro and Team plans.
  • Tooling: Lara Translate Adaptive API, Lara CLI, Lara MCP Server, Trados/MemoQ/MateCat plugins. See the full format list.

Why it matters

Teams handling multilingual documentation at scale need more than translated text — they need translated documents that look and behave like the originals. The document translation API handles layout preservation, localization format integrity, and terminology consistency in a single call, removing the manual reconciliation step that typically eats post-translation time. For localization engineers, the support for formats like XLIFF, PO, and TXML also means the API slots directly into existing localization toolchains without conversion overhead.

Quick answer

Lara Translate’s document translation API accepts 70 file formats, translates while preserving layout and structure, and returns the translated file in the original format (with one exception: PDF source files return DOCX by default). It applies context, style, glossary, and translation memory settings per call, and supports multilingual output in a single request. Access via REST API or the Python SDK. Full documentation at developers.laratranslate.com.


What does the document translation API do?

The document translation endpoint takes a file, translates it into one or more target languages, and returns the translated file with the original layout intact. Tables stay as tables. Columns stay as columns. Slide layouts stay as slide layouts.

Adaptive document translation API - Lara Translate

Beyond the layout preservation, every document translation call can include:

  • A translation style (Fluid, Faithful, or Creative)
  • Context instructions (audience, tone, domain, preferred terms)
  • Custom glossaries for consistent terminology
  • Translation memories for reusing approved translations
  • Multiple target languages in a single call

The same customization layer that applies to text translation applies here. The difference is that the API handles the file extraction, translation, and reconstruction automatically — you don’t need to strip the content, translate it, and reassemble the document yourself.

Which file formats does the document translation API support?

Lara Translate supports 70 file formats across eight categories. For the authoritative and up-to-date list, see Supported document formats in Lara document translation.

Table 1. File format categories supported by the Lara Translate document translation API
Category Formats Notes
Office documents DOC, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF, XLS, XLSX, XLSM, XLTX, XLTM, ODS, OTS, ODP, OTP, PPT, PPTX, PPTM, PPSX, PPSM, POTX, POTM Word, Excel, PowerPoint, and OpenDocument equivalents
PDF PDF Returns DOCX by default. Pass outputFormat="pdf" to receive a PDF. Image content requires OCR to be enabled during upload.
Apple iWork PAGES, NUMBERS, KEYS Pages, Numbers, and Keynote files
Desktop publishing MIF (FrameMaker), IDML (InDesign), ICML (InCopy), DITA, TEX Professional publishing and structured document formats
Localization XLF, XLIFF, PO, TTX, SRT, VTT, SBV, TXML, XINI Structure preserved; only translatable strings are modified. See also: XLIFF translation guide.
Interchange CSV, TSV, XML, DTD, JSON, YAML Data and configuration files. See: CSV translation guide.
Programming HTM, HTML, XHTML, TS, RESX, WIX, STRINGS, MD, PHP, PROPERTIES App and software localization files
Images AVIF, BMP, GIF, HEIC, HEIF, PNG, JPG, JPEG, TIF, TIFF, WEBP Returned in the same format. Text in images requires OCR.
Plain text TXT

PDF translation: what you need to know

Adaptive Document Translation API - Lara TranslatePDF translation works, but it has two behaviors worth knowing before you build around it.

Default output format is DOCX. When you upload a PDF, the API returns a DOCX file by default. If you need the output back as a PDF, pass outputFormat="pdf" in your call. This is not a bug — it reflects the fact that reconstructing a pixel-perfect PDF from translated content is harder than producing a well-formatted Word document, and DOCX gives you more editability.

Image content requires OCR to be explicitly enabled. Text embedded in images inside a PDF (scanned documents, screenshots, image-based pages) is not translated automatically. You need to enable image translation during upload for OCR to run on that content. If you upload a scanned PDF without enabling image translation, the text in images will pass through untranslated. See: OCR in document translation and translating images inside PDF documents.

How to translate a document via the API: Python SDK example

The recommended way to call the document translation endpoint is via the Lara Translate Python SDK. Full documentation and additional examples are at developers.laratranslate.com.

from laratranslate import Lara

# Authenticate with your credentials
lara = Lara(
    access_key_id="YOUR_ACCESS_KEY_ID",
    access_key_secret="YOUR_ACCESS_KEY_SECRET"
)

# Translate a document
result = lara.documents.translate(
    file="./product-manual.docx",
    source="en",
    target="fr",
    instructions="Technical product documentation for enterprise IT teams. Faithful style required."
)

# Save the translated document
with open("./product-manual-fr.docx", "wb") as f:
    f.write(result.content)

To get your access key ID and secret, see: API key setup for Lara Translate. To track usage, see: how to track API usage.

Translating into multiple languages in a single call

The document translation API supports multiple target languages per call. Pass an array of language codes in the target parameter and the API returns translated files for each language pair in the same response.

This reduces API overhead and simplifies pipeline logic for teams managing multilingual releases across several markets simultaneously. Instead of batching individual calls per language and managing multiple async jobs, you get all outputs from one request.

Bulk document translation

For high-volume workflows, the API supports translating multiple documents at once. The multiple document translation guide covers the bulk translation flow and how to manage job queuing.

For teams running translation as part of a CI/CD pipeline or content publish workflow, the Lara CLI handles bulk document jobs without custom scripting — useful for localizing documentation, release notes, or content exported from a CMS on every deploy.

Translate documents at scale with layout intact

70+ formats, 206 languages, glossary-enforced and style-controlled. One API call, translated file back.

Try the Adaptive Document Translation API

Customization options for document translation

Every document translation call can include the same customization parameters available in the text translation API.

Translation style. Fluid for general-purpose readability, Faithful for precision-critical technical or legal content, Creative for marketing and brand material. Applied end-to-end across the document. See: Translation styles in Lara Translate.

Context instructions. Pass audience, tone, domain, and preferred term guidance per call. The model uses these to shape translation decisions throughout the document. See: Context feature: common use cases and practical examples.

Custom glossaries. Upload terminology once, apply it to every document call. Product names, legal terms, and brand vocabulary stay consistent regardless of document type or volume. See: How glossaries work in Lara Translate.

Translation memories. Approved translations are stored and reused at the segment level. Repeated content in large document sets is handled consistently, without re-translating content that has already been approved. See: What is a translation management system (TMS)?

Billing note: minimum character charge

On Pro and Team plans, each document translation call is billed at a minimum of 20,000 characters, regardless of the actual document length. For short documents — a one-page brief, a brief release note — this minimum applies. Factor this into your cost estimation when designing workflows with a high volume of short documents. For those cases, batching content into fewer, larger calls is more cost-efficient than submitting many small files individually.

Localization format support: XLIFF, PO, SRT, and more

Localization-specific formats receive special handling: the API modifies only the translatable string content while leaving the file structure, attributes, IDs, and metadata untouched. This is critical for formats like XLIFF and PO, which are consumed directly by localization tools and CAT platforms — any structural change would break downstream toolchain compatibility.

Supported localization formats: XLF, XLIFF, PO, TTX, SRT, VTT, SBV, TXML, XINI.

For teams already using Trados Studio, MemoQ, or MateCat, Lara Translate has native plugin integrations that connect the Adaptive API directly to your CAT tool workflow without needing to export and re-import files manually. See: Lara Translate for MemoQ.

Related articles in this series


FAQ

What file formats does Lara Translate’s document translation API support?

Lara Translate’s document translation API supports 70 file formats including Office documents (DOCX, XLSX, PPTX and their variants), PDF, Apple iWork files (PAGES, NUMBERS, KEYS), desktop publishing formats (IDML, ICML, MIF, DITA, TEX), localization formats (XLIFF, PO, SRT, VTT, TXML, XINI), interchange formats (CSV, JSON, XML, YAML), programming files (HTML, RESX, STRINGS, PROPERTIES), images (PNG, JPG, TIFF, WEBP and others), and plain text. See the full format list for the complete reference.

Does the document translation API preserve formatting and layout?

Yes. Layout preservation is a core feature of the document translation endpoint. Tables, columns, slide layouts, embedded formatting, and document structure are maintained in the translated output. For localization formats like XLIFF and PO, structure, attributes, IDs, and metadata are left untouched — only translatable strings are modified.

Why does PDF translation return a DOCX file by default?

PDF source files return a DOCX by default because DOCX is more editable and gives you greater control over the translated output before finalizing it. If you need the translated file back as a PDF, pass outputFormat="pdf" in your call. This behavior is documented in the API reference at developers.laratranslate.com.

Does the API translate text inside images in PDFs?

Not by default. Text embedded in images within a PDF — scanned pages, screenshots, image-based content — requires OCR, which must be explicitly enabled during upload. Without enabling image translation, that content passes through untranslated. See: OCR in document translation.

Can I translate a document into multiple languages in one API call?

Yes. Pass an array of target language codes in the target parameter and the API returns translated files for each language in the same response. This is more efficient than sending one call per language and managing separate async jobs for each.

How does billing work for document translation?

On Pro and Team plans, each document translation call is billed at a minimum of 20,000 characters. If the document is shorter than 20,000 characters, you are still charged the minimum. For workflows with many short documents, batching content into fewer, larger calls is more cost-efficient.

What is the best way to translate XLIFF and localization files via the API?

Upload the XLIFF or PO file directly to the document translation endpoint. Lara Translate handles localization formats with structure-preserving logic: only the translatable string content is modified, and all structural attributes, IDs, and metadata are left untouched. This ensures the output is immediately compatible with your CAT tools and localization pipeline. See: XLIFF translation in Lara Translate.


Translate your documents the way your pipeline actually needs

70 formats, layout preserved, glossary-enforced, multilingual in one call. Professional-grade document translation via API.

Get started with the Adaptive Translation API


This article is about:

Product: Lara Translate Adaptive Translation API — document translation capabilities

Topic: Document translation API, file translation API, machine translation API for documents

Concepts covered: layout preservation, file format support, PDF translation, OCR, multilingual translation, bulk translation, localization formats (XLIFF, PO, SRT), Apple iWork, desktop publishing (IDML, MIF, DITA), glossaries, translation memories, translation styles, 20,000 character billing minimum

Content type: Document translation (files)

Related topics: document localization, XLIFF translation API, PDF translation API, CAT tool integration, Trados, MemoQ, MateCat, localization pipeline, multilingual documentation

Share
Link
Avatar dell'autore
Niccolo Fransoni
Content Strategy Manager @ Lara Translate. Niccolò Fransoni has 15 years of experience in content marketing & communication. He’s passionate about AI in all its forms and believes in the power of language.
Recommended topics