
Document fraud detection with OpenCV in Python: real tests and limitations.
Comparatives
Dernière mise à jour :
February 4, 2026
5 minutes
Document fraud is growing faster than manual teams can keep up. In this article, we compare three concrete approaches to document fraud detection software, from simple Python tools to AI-powered platforms like Koncile.
Three complementary approaches to document fraud detection software, from image forensics to AI-powered consistency checks on financial documents.
Document fraud consists of altering official documents to obtain an advantage that should not be granted: a loan, a rental, a social benefit, a contract, a job, and so on. Detecting document fraud means verifying that the files provided are both authentic and internally consistent before triggering any business decision.
In practice, this mainly concerns financial and identity documents such as bank statements, pay stubs, tax returns, tax notices, identity documents, proof of address, invoices, or other administrative records. If any of these documents is manipulated, the entire decision chain becomes unreliable: credit scoring, tenant screening, KYC, onboarding, anti-fraud procedures.
Historically, document fraud detection relied on manual review. An analyst would compare documents one by one, check amounts, dates, logos, signatures or stamps, and use experience to spot inconsistencies. This remains useful, but quickly becomes unmanageable when volumes increase.
Today, three main approaches are emerging: visual analysis based on the document image, metadata inspection based on the file’s technical footprint, and AI-powered consistency checks that verify whether the information in the documents makes sense together. The goal is not to choose a single “best” method, but to understand what each one covers, its limitations, and how to combine them into an industrial-grade workflow.
Document forgery is the most common form of document fraud. It involves altering, recomposing, or fabricating documents so they appear authentic while telling a false story. This can range from crude edits to highly sophisticated manipulations that are invisible to the naked eye.
To address this effectively, there is no single silver-bullet technique. Instead, multiple complementary approaches are required, each targeting a different type of forgery.
The first method focuses purely on the visual layer: the image of the document. The idea is simple: before you even look at the content (names, amounts, addresses), you ask a question:
“Has this image been tampered with?”
This is where Python image-forensics libraries come into play. They analyse pixels, compression, patterns and textures to detect signs of editing such as cloned regions, pasted stamps, altered fields or inconsistent noise.
This approach is helpful to catch obvious forgeries or quick Photoshop jobs, but it is fragile with low-quality scans, mobile photos, or carefully prepared fakes based on genuine templates.
In this method, a document (ID card, invoice, tax notice, pay stub) is treated as a plain digital image. Algorithms typically:
Common red flags include:
Again, this doesn’t tell you if the income or identity information is realistic. It simply flags documents that visually look “edited” and should be escalated.

Python packs for visual forensics are mostly used for R&D, proof of concepts and internal tools. They’re powerful bricks but not turnkey “document fraud detection software” for business teams.
They are great when you want to:

DocAuth is a Python project for basic document authentication and image forgery detection.
Typical features:
DocAuth is quite limited and not heavily maintained, but it’s a good starting point to explore the topic.

PhotosHolmes is a Python library focused on digital image forgery detection. It is not dedicated to identity documents, but its techniques can be reused for document images.
Typical features:
For document fraud, you can adapt PhotosHolmes to scan IDs, invoices or tax notices for copy–paste patterns and local edits.

pyIFD is a Python toolkit that bundles several image forgery detection techniques in one place.
Typical features:
It’s useful in exploratory phases, but you need to check maintenance and compatibility before using it in production.

Forensically is a well-known web tool for image forensics. It’s not a Python pack, but it’s often used as a reference or manually by fraud teams.
Typical features:
Forensically is great to assist analysts or validate suspicions on a handful of documents. For high-volume automation, it needs custom integration work.
Visual Python packs help catch obvious manipulations on scanned or photographed documents. They are a good layer for experimentation and training, but they don’t replace a full document fraud detection software stack.
The second method focuses on the technical history of a document. Every digital file carries metadata, including creation and modification timestamps, the software used to generate or edit it, and sometimes device-related information.
Metadata analysis helps answer simple but important questions: was the document generated by an official system? Was it edited manually before submission? Do multiple documents share the same unusual technical signature?
It does not assess whether the content itself is fraudulent, but it highlights abnormal behavior around the file.
The idea is to extract metadata from files using open-source tools and Python libraries, then compare this information with what would normally be expected in a given process.
In practice, analysts examine creation and last modification timestamps, the software and version used to create or edit the document, the type of device involved (such as a smartphone camera, scanner, or desktop application), and, where relevant, EXIF or XMP fields embedded in image-based documents.
Certain abnormal situations tend to stand out quickly. A tax notice allegedly downloaded from a government portal may turn out to have been modified in a generic PDF editor shortly before submission. A pay stub that claims to come from payroll software may actually have been saved using a photo-editing application. In other cases, multiple documents within the same file or application may share the exact same unusual metadata pattern, suggesting mass-produced or reused documents.
Again, metadata analysis does not prove document fraud on its own, but it provides valuable signals that can be fed into document fraud detection systems and risk scoring mechanisms.
Metadata-based analysis is relatively fast to deploy and scales well across large document volumes. It remains effective even when documents appear visually clean and consistent, and it helps surface technical inconsistencies that are difficult to detect through manual review alone.
Metadata can be missing, incomplete, or intentionally altered. Legitimate operations such as OCR processing, electronic signatures, or PDF merging frequently modify metadata without indicating any fraudulent intent. For this reason, no single metadata field should be treated as a standalone proof of fraud.
When used in isolation, metadata analysis mainly produces weak signals. Its real value emerges when these signals are combined with other approaches, such as content analysis, cross-document consistency checks, and business rules within a broader risk scoring framework.
This method relies heavily on open source tools and libraries that you can embed into scripts or processing pipelines.

ExifTool is the go-to open source tool for extracting technical metadata from images and some PDFs (EXIF, XMP, IPTC and others).
Typical features:
For document fraud detection, ExifTool is especially useful to check if the technical story of the file matches the claimed origin.

Exiv2 is a C++ library with a command line interface used to read and write image metadata.
Typical features:
In a fraud context, Exiv2 is a strong alternative or complement to ExifTool when your stack is C++ or heavily image-focused.

hachoir-metadata is part of the Hachoir ecosystem, a Python library for parsing binary files and extracting metadata.
Typical features:
For document fraud detection, it’s particularly interesting when you want a pure Python pipeline analysing large volumes of uploaded files.
Metadata tools add an extra layer to your document fraud detection strategy. They don’t evaluate business content but help you understand how the file was produced and modified.
The third method is the one that comes closest to how seasoned fraud analysts actually work. Instead of focusing solely on the appearance of the file or on its technical metadata, it examines the story the documents are telling. The central question becomes: do all these documents make sense together for this person, this profile, and this application?
AI-powered solutions such as Koncile, Inscribe or Resistant AI follow this logic. They combine OCR specialised in financial and identity documents, data structuring and normalisation, business-rule engines, and anomaly-detection models to assess whether a dossier is coherent.
This approach is especially relevant for financial and administrative documents like bank statements, payslips, tax returns, tax notices, invoices, utility bills, and identity documents used in financial decisions. Instead of asking whether a document has been visually altered, it asks whether the information across the entire file forms a credible, consistent narrative.
Typical steps:
The goal is no longer just “is this file edited?”, but “is this entire dossier believable?”.
Several players use this approach; here are three representative examples.

Koncile focuses on automating the reading and checking of financial and administrative documents in credit, tenant screening and onboarding workflows.
Key capabilities:

Inscribe targets banks and fintechs who need to detect forged or manipulated documents during onboarding and underwriting.
Typical capabilities:
It’s a good example of a solution that treats document fraud as a consistency problem across documents, not just an image manipulation problem.

Resistant AI offers a multi-layer “document forensics” approach for financial institutions processing high volumes of documents.
Typical capabilities:
AI-powered consistency checks sit at the top of your document fraud detection stack. They bring together visual, metadata and content signals into a single decision layer.
No single method will eradicate document fraud. But combined intelligently, these three layers make fraud much harder, more expensive and more risky for attackers:
Modern document fraud detection software like Koncile brings these layers together so your teams can focus on the cases that truly matter instead of manually reviewing every single document.
Move to document automation
With Koncile, automate your extractions, reduce errors and optimize your productivity in a few clicks thanks to AI OCR.
Resources

Document fraud detection with OpenCV in Python: real tests and limitations.
Comparatives

Comparative test of Open Source OCR API tools in 2026: integration difficulty, JSON output quality, self-hosting and real-world benchmark.
Comparatives

Yann LeCun’s vision for the future of AI, beyond LLMs and AGI.
Comparatives