Detect duplicate documents
Why detect duplicates?
A duplicate is not always an identical file. It may be two different files that actually represent the same document — for example, two photos taken at different times or a scan and a photo of the same paper.
The same documents can circulate multiple times without being noticed. This can happen when:
• A file is sent by several people
• A supplier sends the same invoice multiple times
• A user uploads a new version without deleting the old one
• A document is scanned twice
• An automated workflow creates accidental copies
These duplicates cause errors, unnecessary costs, document clutter, and wasted time during manual checks.
Automatic duplicate detection keeps your document flow clean and reliable.

How duplicate detection helps you
Koncile’s OCR software analyzes each document as soon as it is imported. If the document closely resembles a file already processed, it is flagged as a duplicate.
This allows you to:
• Avoid extracting or processing the same document multiple times
• Quickly detect duplicate uploads from suppliers
• Reduce errors in financial or administrative workflows
• Maintain a clean and consistent document base
• Alert your teams in case of anomalies
How duplicate detection works
The detection relies on several combined techniques:
- Content analysis: text, numbers, page structure, extracted fields
- Layout analysis: formatting, blocks, images, tables
- Similarity scoring: measuring resemblance with existing documents
- Context checking: dates, suppliers, identifiers, key zones
- Automatic alert: if similarity exceeds a set threshold, the document appears in the duplicates list
You can review the confidence level, ignore the alert, or confirm the detection.
Concrete examples of use
Frequent situations where duplicate detection simplifies your daily work:
• A supplier sends the same invoice three times in one day
• An accounting team scans the same delivery note twice
• A generic email inbox receives multiple copies of the same document
• A user shares several versions of the same contract
• An internal system generates duplicated reports
• A document is renamed but contains no real change
A Koncile client in the construction industry discovered more than €100,000 in duplicate invoices in 2025 and recovered the amount from the supplier.
Type: Automation
User: Business
Complexity: Low
L'outil qui automatise vos documents

All your questions about duplicate document detection
Koncile is a document extraction and automation tool. It transforms invoices, quotes, reports, purchase orders, identity documents, prescriptions, contracts, and more into structured data that can be exported or integrated into your tools.
Duplicate detection is automatically enabled as soon as you import a document into Koncile, with no configuration required. Here’s how it works in practice:
Automatic 3-step process:
- Document import: You upload your file (PDF, image, scan) via the web interface, the API, or by email.
- Instant analysis: Koncile analyzes the content, structure, and metadata in 2–3 seconds.
- Duplicate alert: If a similar document already exists, a notification appears immediately.
Where to see detected duplicates:
• In the “Status” column of your Koncile dashboard
• By clicking the document to view and compare its associated duplicate(s)
Available actions:
• Confirm: Mark the file as a duplicate (the document will not be processed)
• Ignore: Validate that it is not a duplicate (normal processing continues)
Duplicate detection works even when you process hundreds of documents in bulk:
Each file is analyzed individually and compared against your entire document database within seconds.
You can also enable the smart document separation feature within a single PDF if your file contains multiple documents.
Duplicate detection is completely free and included by default in all Koncile plans, with no hidden fees or extra charges.
You only pay for OCR (page digitization), and duplicate detection runs automatically in the background without consuming any additional credits.
How the pricing works:
• You purchase page credits (example: 1,000 pages/month)
• Each scanned page uses 1 credit (Koncile Pro) or 0.5 credits (Koncile Lite) for OCR + extraction
• Duplicate detection runs simultaneously without deducting extra credits
• Even if 30% of your documents are duplicates, you only pay for the initial OCR
For more details, you can refer to our pricing.
Concrete example if you process 1,000 invoices per month:
• 150 detected as duplicates
• Total cost: 1,000 credits (not 1,150)
• Savings vs competitors: €0 instead of ~€75/month for detection
Market comparison:
Most competing solutions charge duplicate detection as a premium add-on (between €50 and €200 per month depending on volume), or deduct extra credits for each check.
With Koncile, the service is built directly into our analysis engine, with zero financial impact.
Yes — and this is precisely what makes Koncile far more powerful than classic “file deduplication” tools.
Koncile analyzes the actual document content, not just filename, size, or metadata.
Examples of cases Koncile detects:
• Same invoice photographed twice (different angles, lighting)
• A scanned version and a photographed version
• A file renamed with a different filename but identical content
• Different image qualities (300 DPI scan vs smartphone photo)
• Different formats (PDF, JPG, PNG, TIFF)
• Different orientations (portrait vs landscape, rotation 90°)
• Slight cropping differences
• Compression differences
Technical approach:
- Advanced OCR
- Layout recognition
- Pattern detection (invoice number, dates, amounts)
- Similarity algorithms
- Machine learning feedback
Koncile achieves more than 98% accuracy on standard professional documents (invoices, quotes, contracts), with less than 2% false positives.
Accuracy levels:
Very high (98–99%):
• Structured documents
• Standard formats (native PDF, quality scans)
• Printed text
High (93–97%):
• Semi-structured documents
• Medium-quality photos
• Legible handwritten text
Moderate (85–92%):
• Degraded scans
• Difficult handwriting
• Heavily annotated documents
Koncile improves continuously through machine learning and monthly updates.
Koncile initiates an intelligent validation workflow:
1. Instant alert:
• Badge “Duplicate detected”
• Optional email alert
• Dashboard counter
• Webhook event for your system
2. Automatic processing pause:
The duplicate is paused and not processed until you validate it.
• No extraction
• No export to ERP
• No accounting entry
• Linked with the original document
This prevents double payments or duplicate entries.
Yes, Koncile can detect duplicates even in handwritten documents, thanks to our handwritten OCR engine (ICR – Intelligent Character Recognition) combined with structural analysis. However, accuracy varies depending on handwriting quality and document type.
Concrete examples:
Handwritten medical prescription
• Pre-printed form + doctor’s handwriting
• Detection accuracy: 97%
• Koncile identifies: same doctor, same patient, same date, same prescriptions
• Use case: avoid processing the same prescription twice (scanned then photographed)
Handwritten purchase order
• Standard company form
• Detection accuracy: 92%
• Koncile identifies: same PO number, same supplier, same quantities
• Use case: avoid duplicates between a manager’s scan and the accounting department’s copy
Free-form handwritten note
• Cursive writing on a blank page
• Detection accuracy: 85%
• Koncile relies mainly on the extracted text content
• Use case: detect meeting notes that were written twice
A duplicate is the same document received multiple times.
A duplicate version (duplicata) is a new, intentional version (e.g., corrected invoice).
Koncile detects duplicates automatically but lets you decide the final classification.
.png)


