How do I convert a PDF to JSON?

Dernière mise à jour :

June 20, 2025

5 minutes

With the Koncile API and its Swagger interface, you can transform any PDF file into a structured JSON file, ready to be used in accounting software, CRM, or ERP. Here is a step-by-step guide to doing this conversion simply, without having to code.

Discover how to transform these documents into structured JSON to automatically use them in your business tools (accounting, CRM, ERP...). Thanks to the Koncile API, convert your PDFs into ready-to-use data, without coding. This comprehensive step-by-step guide shows you how to automate this process, whether you're a developer or not.

pdf to json

4 steps to follow to transform a PDF into JSON via API

Koncile offers a turnkey solution to transform your PDFs into structured JSON, even for complex or handwritten documents thanks to its OCR software new generation.

1. Generate your API access key from your Koncile account

To establish a secure connection with the Koncile API, start by going to the your account settings (for administrators only).

Go to the tab API, then click “Generate an API key” to get your personal access key.

clé api

2. Connect to the API via Swagger

Go to Koncile's Swagger documentation at: https://api.koncile.ai/documentation

Then select the endpoint: POST /v1/upload_file/ — Upload file

This is where you can send a document once the connection is established.

This is where you can send a document once the connection is established.

Activate authentication

  1. Click on the icon 🔒 padlocks to enable authentication.
  2. In the window that opens, Paste your API key In the field Value.
  1. Click on “Authorize” then “Close” to validate.

3 - Configure the sending of the document

Once authenticated, you can prepare to send your file to be converted:

Fill in the optional parameters

You can add in the fields provided:

  • One Document ID
  • One Folder ID
  • Or a predefined template (if you want to apply a specific treatment model)

Add your file to the Request Body

Scroll down to the section “Request Body” and upload the PDF file you want to convert to JSON.

Start sending

Then click on “Execute” to start the request.

In the section “Responses”, a task_id will be returned to you: it will allow you to retrieve the file converted to JSON format in the next step.

4. Get data from your PDF in JSON format

Once the document is sent, you can retrieve the extracted data in JSON:

1. Go to the next endpoint, located just below in the Swagger documentation : GET /v1/fetch_tasks_results/ — Fetch Tasks Result

2. In the field provided, Paste it task_id obtained during the previous step.

3. Click on “Execute” to start the request.

The API then returns you a structured JSON file containing all the information automatically extracted from the sent document that you can then copy or download directly.

Focus on Koncile: complete API and intelligent OCR

Koncile offers a turnkey solution to transform your PDFs into structured JSON even when they are complex or of average quality documents (scans, photos, manuscripts).

What Koncile allows

  • Precise OCR analysis, even on complex documents (scans, manuscripts...)
  • Read difficult documents : dark images, low resolution, hand-annotated documents... everything can be interpreted accurately.
  • Manage complex tables : the solution automatically detects rows, columns, totals, and converts them into a hierarchical JSON structure.
  • Extracting key data (amounts, dates, suppliers, lines)
  • Returning results in JSON format
  • Connection to your internal tools to automate the sending of data  : CRM, ERP, business software...

this data can be used to:

  • Automatically fill out a supplier form
  • Add rows to an Excel table
  • Trigger validation or payment

For developers on your team

Koncile provides:

  • A comprehensive technical guide
  • One API ready to use
  • One Python library to easily connect your internal tools
  • An automated alert or notification system (webhook)

If you don't know how to code? No-code solutions exist

Good news: you don't have to be a developer to harness the power of JSON. Thanks to the emergence of tools No-code and low-code, many businesses can now automate their document processing without writing a line of code.

Integrate your daily tools directly such as Slack, Google Drive, Drobox

Examples of simple no-code integration with platforms like:

Platform Type Use case
Zapier / Make Visual no-code connectors Automate tasks: send PDF to Koncile, retrieve JSON, and forward the data to various tools (spreadsheets, CRM, emails, etc.).
Bubble No-code app builder Integrate the Koncile plugin to analyze documents and display results in a custom interface.
n8n Open-source automated workflows Create a complete scenario: trigger from Google Drive or Dropbox, call the Koncile API, then send the data to Slack, Notion, CRM, databases, etc.

What you can do without coding:

  • Trigger a document analysis as soon as a file arrives in a folder.
  • Automatically extract key information and integrate it into a monitoring table.
  • Send the extracted data to your business tool via a visual interface.

It is the assurance of increasing productivity, without systematically depending on the technical team

Thus, Make, for example, allows you to create an automated scenario: as soon as a PDF is added to Google Drive, it is sent to the Koncile API for analysis. Once the file is converted to JSON, the data can be automatically retrieved and stored or used in another tool, without writing code.

Why converting a PDF to JSON has become essential to automate your documents

PDF is a universal format: it is used all over the world to transmit commercial, legal, accounting, administrative documents... But while it is ideal for Human reading, it is much less so for automated processing. In contrast, JSON is a structured format, designed so that machines can easily understand and reuse data.

Transforming a PDF into JSON is therefore rendering its content can be used automatically by your business software via an API. This is a key step in increasing efficiency, reducing human errors, and automating your internal processes.

The limits of the PDF in an automation logic

Even when generated digitally, a PDF file remains difficult to exploit automatically. Some invoices or scanned documents have variations that complicate extraction:

  • different fonts and text sizes,
  • misaligned or nested tables,
  • checkboxes, signatures, stamps,
  • handwritten or partially fuzzy data.

Result: these documents, often semi-structured or even unstructured, still require a human intervention to be understood... unless we go through OCR processing with export in JSON.

JSON: the key to making your documents readable by software

For your business tools to understand a document, you need extract data in a structured format. That's where the JSON come into play.

This lightweight and universal format makes it possible to represent data in the form of key/value pairs. Concretely, this amounts to transforming a static PDF into an “intelligent” file: readable by a machine, usable by an API, integrable into your business software.

What tools should I use to convert a PDF to JSON?

Today, there are several solutions to transform a PDF into JSON. The choice depends on the nature of the document (text or image) and automation level desired.

Text PDF vs image PDF: a key difference

  • One PDF text contains numeric characters that can be directly used (e.g. a quote exported from Word or an ERP). These documents can be analyzed without OCR, using text extraction tools like PymuPDF or PDF.js.
  • One PDF picture is a visual capture (scan, photo, digital print). To extract information, it is necessary to use a technology of optical character recognition (OCR) like Tesseract, Adobe PDF Services, or an advanced solution like Koncile.

PDF to JSON — Remember

Step Why it matters Key benefits
1. Extraction PDF ➝ JSON via OCR or text parsing Structures the data (amounts, dates, line items) for easy reuse.
2. Structuring Lightweight, machine‑readable JSON format Faster analysis, nested formats, native interoperability with modern databases.
3. Automation Integration via API or workflows (Zapier, Make…) Reduces manual tasks, automates document workflows.
4. Reliability Intelligent extraction (AI/OCR) Higher accuracy, even with challenging scans.
5. Storage & processing JSON stored in a database or used via API Simple queries, data visualizations, business analysis.
6. Software integration Sent to ERP, CRM, accounting, reporting tools Time savings, fewer errors, smooth operations.

The conversion PDF → JSON Represents a fundamental work to transform static documents into dynamic data. Thanks to structuring and automating via API, you gain in reliability, speed and performance in document management.

Integrate this structured data directly into your business systems to ensure reliable, fast processing that is perfectly integrated into your accounting, analytical or operational processes.

Author and Co-Founder at Koncile
Jules Ratier

Co-fondateur at Koncile - Transform any document into structured data with LLM - jules@koncile.ai

Jules leads product development at Koncile, focusing on how to turn unstructured documents into business value.

Discover how parsing automates data extraction from PDF, scanned, and digital documents. By combining OCR, NLP, and rule-based methods, it transforms raw content into structured data. This article explains the key concepts, technologies, and use cases behind modern document parsing.

Blog

18/6/2025