Fonctionnalités logiciel
Document management

Detect duplicate documents

Easily identify duplicated documents in your workflow, receive alerts, and prevent anomalies.

Why detect duplicates?

A duplicate is not always an identical file. It may be two different files that actually represent the same document — for example, two photos taken at different times or a scan and a photo of the same paper.

The same documents can circulate multiple times without being noticed. This can happen when:

• A file is sent by several people
• A supplier sends the same invoice multiple times
• A user uploads a new version without deleting the old one
• A document is scanned twice
• An automated workflow creates accidental copies

These duplicates cause errors, unnecessary costs, document clutter, and wasted time during manual checks.
Automatic duplicate detection keeps your document flow clean and reliable.

How duplicate detection helps you

Koncile’s OCR software analyzes each document as soon as it is imported. If the document closely resembles a file already processed, it is flagged as a duplicate.

This allows you to:

• Avoid extracting or processing the same document multiple times
• Quickly detect duplicate uploads from suppliers
• Reduce errors in financial or administrative workflows
• Maintain a clean and consistent document base
• Alert your teams in case of anomalies

How duplicate detection works

The detection relies on several combined techniques:

  1. Content analysis: text, numbers, page structure, extracted fields
  2. Layout analysis: formatting, blocks, images, tables
  3. Similarity scoring: measuring resemblance with existing documents
  4. Context checking: dates, suppliers, identifiers, key zones
  5. Automatic alert: if similarity exceeds a set threshold, the document appears in the duplicates list

You can review the confidence level, ignore the alert, or confirm the detection.

Concrete examples of use

Frequent situations where duplicate detection simplifies your daily work:

• A supplier sends the same invoice three times in one day
• An accounting team scans the same delivery note twice
• A generic email inbox receives multiple copies of the same document
• A user shares several versions of the same contract
• An internal system generates duplicated reports
• A document is renamed but contains no real change

A Koncile client in the construction industry discovered more than €100,000 in duplicate invoices in 2025 and recovered the amount from the supplier.

Type: Automation

User: Business

Complexity: Low

L'outil qui automatise vos documents

Découvrez comment Koncile peut transformer vos documents en valeur ajoutée.

All your questions about duplicate document detection

Everything you need to know about duplicate detection.
Qu'est-ce que Koncile ?

Koncile is a document extraction and automation tool. It transforms invoices, quotes, reports, purchase orders, identity documents, prescriptions, contracts, and more into structured data that can be exported or integrated into your tools.

How can I detect duplicates with Koncile?

Duplicate detection is automatically enabled as soon as you import a document into Koncile, with no configuration required. Here’s how it works in practice:

Automatic 3-step process:

  1. Document import: You upload your file (PDF, image, scan) via the web interface, the API, or by email.
  2. Instant analysis: Koncile analyzes the content, structure, and metadata in 2–3 seconds.
  3. Duplicate alert: If a similar document already exists, a notification appears immediately.

Where to see detected duplicates:

• In the “Status” column of your Koncile dashboard
• By clicking the document to view and compare its associated duplicate(s)

Available actions:

Confirm: Mark the file as a duplicate (the document will not be processed)
Ignore: Validate that it is not a duplicate (normal processing continues)

Duplicate detection works even when you process hundreds of documents in bulk:
Each file is analyzed individually and compared against your entire document database within seconds.

You can also enable the smart document separation feature within a single PDF if your file contains multiple documents.

How much does duplicate detection cost?

Duplicate detection is completely free and included by default in all Koncile plans, with no hidden fees or extra charges.
You only pay for OCR (page digitization), and duplicate detection runs automatically in the background without consuming any additional credits.

How the pricing works:
• You purchase page credits (example: 1,000 pages/month)
• Each scanned page uses 1 credit (Koncile Pro) or 0.5 credits (Koncile Lite) for OCR + extraction
• Duplicate detection runs simultaneously without deducting extra credits
• Even if 30% of your documents are duplicates, you only pay for the initial OCR

For more details, you can refer to our pricing.

Concrete example if you process 1,000 invoices per month:
• 150 detected as duplicates
• Total cost: 1,000 credits (not 1,150)
• Savings vs competitors: €0 instead of ~€75/month for detection

Market comparison:
Most competing solutions charge duplicate detection as a premium add-on (between €50 and €200 per month depending on volume), or deduct extra credits for each check.
With Koncile, the service is built directly into our analysis engine, with zero financial impact.

Can Koncile detect duplicates even if files are not identical?

Yes — and this is precisely what makes Koncile far more powerful than classic “file deduplication” tools.
Koncile analyzes the actual document content, not just filename, size, or metadata.

Examples of cases Koncile detects:
• Same invoice photographed twice (different angles, lighting)
• A scanned version and a photographed version
• A file renamed with a different filename but identical content
• Different image qualities (300 DPI scan vs smartphone photo)
• Different formats (PDF, JPG, PNG, TIFF)
• Different orientations (portrait vs landscape, rotation 90°)
• Slight cropping differences
• Compression differences

Technical approach:

  1. Advanced OCR
  2. Layout recognition
  3. Pattern detection (invoice number, dates, amounts)
  4. Similarity algorithms
  5. Machine learning feedback
What is the detection accuracy?

Koncile achieves more than 98% accuracy on standard professional documents (invoices, quotes, contracts), with less than 2% false positives.

Accuracy levels:
Very high (98–99%):
• Structured documents
• Standard formats (native PDF, quality scans)
• Printed text

High (93–97%):
• Semi-structured documents
• Medium-quality photos
• Legible handwritten text

Moderate (85–92%):
• Degraded scans
• Difficult handwriting
• Heavily annotated documents

Koncile improves continuously through machine learning and monthly updates.

What happens when a duplicate is detected?

Koncile initiates an intelligent validation workflow:

1. Instant alert:
• Badge “Duplicate detected”
• Optional email alert
• Dashboard counter
• Webhook event for your system

2. Automatic processing pause:
The duplicate is paused and not processed until you validate it.

• No extraction
• No export to ERP
• No accounting entry
• Linked with the original document

This prevents double payments or duplicate entries.

Does duplicate detection work on handwritten documents?

Yes, Koncile can detect duplicates even in handwritten documents, thanks to our handwritten OCR engine (ICR – Intelligent Character Recognition) combined with structural analysis. However, accuracy varies depending on handwriting quality and document type.

Concrete examples:

Handwritten medical prescription
• Pre-printed form + doctor’s handwriting
• Detection accuracy: 97%
• Koncile identifies: same doctor, same patient, same date, same prescriptions
• Use case: avoid processing the same prescription twice (scanned then photographed)

Handwritten purchase order
• Standard company form
• Detection accuracy: 92%
• Koncile identifies: same PO number, same supplier, same quantities
• Use case: avoid duplicates between a manager’s scan and the accounting department’s copy

Free-form handwritten note
• Cursive writing on a blank page
• Detection accuracy: 85%
• Koncile relies mainly on the extracted text content
• Use case: detect meeting notes that were written twice

What is the difference between a duplicate and a duplicate version?

A duplicate is the same document received multiple times.
A duplicate version (duplicata) is a new, intentional version (e.g., corrected invoice).

Koncile detects duplicates automatically but lets you decide the final classification.

Découvrez les fonctionnalités de Koncile

Transformez tous vos documents en données grâce à un outil complet de gestion des documents. Testez les fonctionnalités pour automatiser vos tâches manuelles.  
Document categorization
Séparez vos fichiers contenus dans un même document
OCR
Handwriting detection
Séparez vos fichiers contenus dans un même document
OCR
Email import
Séparez vos fichiers contenus dans un même document
OCR
Document splitting
Séparez vos fichiers contenus dans un même document
OCR
Document Renaming
Séparez vos fichiers contenus dans un même document
OCR