
Use Make, Zapier, or n8n, no-code automation tools to connect the Koncile OCR application and automate document control and verification workflows
Practical guide
Dernière mise à jour :
July 4, 2025
5 minutes
When a PDF file contains several documents, it is often necessary to separate them in order to process them individually. Without automation, this operation is tedious: unit scanning, manual insertion of breaking pages or visual identification, all time-consuming and error-prone methods. Fortunately, there are now several approaches to automate this separation in an effective way.
How can I easily separate multiple documents in the same PDF? This article introduces the main methods for increasing efficiency based on file structure and content.
When the same PDF file contains several documents; whether invoices, contracts, attachments or statements, it is often necessary to isolate them in order to be able to classify, archive or use them individually.
This separation step can be tedious if it is carried out manually, especially on large volumes.
Fortunately, there are several approaches that make it possible to facilitate this separation, with varying levels of complexity and precision. The choice of method depends on the type of documents, their structure, and the degree of control desired.
There are generally three main approaches to achieving this separation:
It is the easiest method. The PDF is cut at fixed intervals, for example all N pages. This method is ideal when a batch of invoices or standardized documents is exported as a single file, with regular pagination known in advance (for example, 10 contracts of 2 pages each in a 20-page PDF). Numerous solutions make it possible to Automatically split a PDF into multiple files according to a defined number of pages
However, in case of variation in length between documents, this method quickly becomes unsuitable. A 3-page invoice may be truncated, or two short documents may be merged incorrectly. It is therefore not recommended when the documents are heterogeneous or unpredictable.
Examples of solutions: PDFsam, iLovePDF or Sejda.
Here, triggers are defined to detect the start of a new document. For example, the presence of a specific logo or keyword at the top of the page (such as “Invoice No.” or “Contract”) may indicate a new section. Technically, this can be done via regular expressions (text search) or other filters. Some platforms offer the possibility to configure a custom rule (regex) to add a separator as soon as a pattern is detected.
This allows, for example, to automatically separate pages as soon as a new invoice number or contract title appears. This method is more flexible than fixed separation, as it adapts to the content of the document as long as there is an identifiable recurring element at the beginning of each document.
Examples of solutions: ABBYY FineReader, Kofax Power PDF, Adobe Acrobat Pro
This is the most advanced method. An artificial intelligence algorithm, trained on documents, analyzes each page to determine if it belongs to the same entity as the previous page or if it marks the start of a new document. Concretely, The AI “reads” the content and can identify where each document in the PDF begins and ends. This approach can combine multiple clues (layout, titles, titles, numbering, style, etc.) to decide the cut-off point, without the need for predefined rules for each case. AI separation is ideal for heterogeneous batches of documents or when the demarcations do not follow a fixed pattern. It may learn from the corrections made (feedback) to improve its accuracy over time.
Example of solutions: Koncile, Planet AI, NovaCore.
These separation techniques apply to numerous concrete cases:
Often, suppliers or services scan several invoices at once, which produces a single PDF file containing, for example, 5 separate invoices. Smart separation will make it possible to identify each new invoice and create 5 separate files (or 5 sections) corresponding to each one, without having to manually cut the PDF.
It is not uncommon for a signed contract to be followed by its annexes (general conditions, forms, etc.) in a single PDF. If you want to archive or process the contract independently of its annexes, you must be able to split the document in the right place. For example, a separation rule can detect an “Appendix” title or simply apply an AI separation that will recognize that the appendix has a different layout from the main contract.
In some processes, a PDF invoice then includes supporting documents such as an order form, delivery note, customs form, or calculation details. For accounting, only the invoice itself needs to be processed in a system, while attachments can be stored elsewhere. Smart separation will identify the end of the invoice and automatically separate attachments into a separate document. For example, if each attachment starts with a specific title (such as “Purchase Order” or “Purchase Order”), a rule based on that text can be used as a separator. Otherwise, the AI can learn to distinguish an invoice from an appendix thanks to the structure of the document.
In many sectors (banking, insurance, HR, real estate...), documents relating to the same customer or employee are often scanned in bulk: identity document, proof of address, contract, amendment, signed mandate, etc. However, each document must be isolated and classified individually in the documentary or EDM system. Intelligent separation makes it possible to automate this division, by detecting the nature of each document and preparing for their indexing. This avoids long and error-prone manual treatments, while guaranteeing better traceability of parts.
At Koncile, intelligent document separation is offered as an advanced feature, available on request, directly integrated into our OCR engine.
It is based on a phase of parallel pre-processing who analyzes all the pages of a PDF to extract the discriminating information : unique invoice number, recurring header, specific structure, etc.
The aim is not simply to look for page numbers or keywords, but to Understand the content thanks to language models (LLM), capable of interpreting the logic of links between pages.
The system then derives continuous ranges corresponding to each document and performs the separation automatically, even in heterogeneous or non-standardized files.
Unlike some solutions that rely on pagination alone (unreliable in the event of a missing page or error), Koncile treats each case in a contextual and dynamic way. The processing is fast, because it is distributed in parallel, and allows a fine separation, even in large volumes.
This approach is particularly useful for processing batches of invoices, contracts with appendices, or logistics documents, without manual intervention. Once the documents are properly separated, they can be automatically extracted, categorized or integrated into your business tools via the other modules of the platform.
Resources
Use Make, Zapier, or n8n, no-code automation tools to connect the Koncile OCR application and automate document control and verification workflows
Practical guide
Analyzing bank statements should no longer be a constraint. Increase efficiency by automating this key process: make your financial data reliable, secure your flows, and have clear indicators to manage your cash flow.
Blog
The car registration document is becoming digital: end of paper, data automation and time savings for all.
Blog