‍

Document fraud is growing faster than manual teams can keep up. In this article, we compare three concrete approaches to document fraud detection software, from simple Python tools to AI-powered platforms like Koncile.

Three complementary approaches to document fraud detection software, from image forensics to AI-powered consistency checks on financial documents.

‍What is document fraud detection?

Document fraud consists of altering official documents to obtain an advantage that should not be granted: a loan, a rental, a social benefit, a contract, a job, and so on. Detecting document fraud means verifying that the files provided are both authentic and internally consistent before triggering any business decision.

In practice, this mainly concerns financial and identity documents such as bank statements, pay stubs, tax returns, tax notices, identity documents, proof of address, invoices, or other administrative records. If any of these documents is manipulated, the entire decision chain becomes unreliable: credit scoring, tenant screening, KYC, onboarding, anti-fraud procedures.

Historically, document fraud detection relied on manual review. An analyst would compare documents one by one, check amounts, dates, logos, signatures or stamps, and use experience to spot inconsistencies. This remains useful, but quickly becomes unmanageable when volumes increase.

Today, three main approaches are emerging: visual analysis based on the document image, metadata inspection based on the file’s technical footprint, and AI-powered consistency checks that verify whether the information in the documents makes sense together. The goal is not to choose a single “best” method, but to understand what each one covers, its limitations, and how to combine them into an industrial-grade workflow.

‍

Document Forgery: The core of document fraud detection

Document forgery is the most common form of document fraud. It involves altering, recomposing, or fabricating documents so they appear authentic while telling a false story. This can range from crude edits to highly sophisticated manipulations that are invisible to the naked eye.

To address this effectively, there is no single silver-bullet technique. Instead, multiple complementary approaches are required, each targeting a different type of forgery.

‍

💡 Good to know

Don’t try to catch every case of document fraud with a single tool. Combine visual checks, metadata analysis and AI-based consistency tests to reduce fraud without blocking all applications.

‍

Method 1: Python packs for visual document fraud detection

The first method focuses purely on the visual layer: the image of the document. The idea is simple: before you even look at the content (names, amounts, addresses), you ask a question:

“Has this image been tampered with?”

This is where Python image-forensics libraries come into play. They analyse pixels, compression, patterns and textures to detect signs of editing such as cloned regions, pasted stamps, altered fields or inconsistent noise.

This approach is helpful to catch obvious forgeries or quick Photoshop jobs, but it is fragile with low-quality scans, mobile photos, or carefully prepared fakes based on genuine templates.

How the method works

In this method, a document (ID card, invoice, tax notice, pay stub) is treated as a plain digital image. Algorithms typically:

split the image into blocks and search for repeated patterns (clone detection)
analyse compression artefacts and noise to highlight retouched regions
look for visual inconsistencies: strange contours, different grain, edges that don’t match
generate heatmaps or scores that indicate suspicious areas

Common red flags include:

a profile picture on an ID that doesn’t blend naturally with the background
a stamp or signature pasted on top of an existing scan
a logo copied from another document, with a different resolution or sharpening

Again, this doesn’t tell you if the income or identity information is realistic. It simply flags documents that visually look “edited” and should be escalated.

Example

‍

Python packs for visual document fraud detection

Python packs for visual forensics are mostly used for R&D, proof of concepts and internal tools. They’re powerful bricks but not turnkey “document fraud detection software” for business teams.

They are great when you want to:

explore visual forgery detection on a specific document type
train fraud teams to recognise common visual signals
build a first filter before more advanced checks

DocAuth (Python)

DocAuth is a Python project for basic document authentication and image forgery detection.

Typical features:

run image analysis on ID cards, certificates and official documents
detect modified regions in sensitive zones (photo, signature, stamp)
produce suspicion scores or heatmaps per region
integrate into simple Python pipelines for experiments or demos

DocAuth is quite limited and not heavily maintained, but it’s a good starting point to explore the topic.

PhotosHolmes (Python)

PhotosHolmes is a Python library focused on digital image forgery detection. It is not dedicated to identity documents, but its techniques can be reused for document images.

Typical features:

clone detection: highlight regions copied and pasted within the same image
localisation of retouching operations on specific areas (headers, stamps, amounts)
visual maps to guide a human analyst towards suspicious regions
combination of multiple forensic algorithms in one workflow

For document fraud, you can adapt PhotosHolmes to scan IDs, invoices or tax notices for copy–paste patterns and local edits.

pyIFD (Python Image Forgery Detection toolkit)

pyIFD is a Python toolkit that bundles several image forgery detection techniques in one place.

Typical features:

multiple algorithms for detecting manipulations in JPG/PNG/TIFF images
example scripts to test different techniques on the same dataset
flexible architecture for extending or combining methods
quick experimentation on scanned documents and attachments

It’s useful in exploratory phases, but you need to check maintenance and compatibility before using it in production.

Forensically (web-based)

Forensically is a well-known web tool for image forensics. It’s not a Python pack, but it’s often used as a reference or manually by fraud teams.

Typical features:

clone detection to spot duplicated regions
error level analysis (ELA) to highlight retouched zones
additional forensic tools (noise analysis, metadata view, etc.)
visual UI that’s very convenient for investigating sensitive cases

Forensically is great to assist analysts or validate suspicions on a handful of documents. For high-volume automation, it needs custom integration work.

‍

Comparing visual detection packs

Visual Python packs help catch obvious manipulations on scanned or photographed documents. They are a good layer for experimentation and training, but they don’t replace a full document fraud detection software stack.

‍

Tool	Type	Use Case	Strengths	Limitations
DocAuth	Python package document authentication	Testing ID cards, certificates or official documents for POCs and demonstrations.	Easy to use, good starting point to explore visual fraud detection.	Project not actively maintained, limited scope, not production-ready as is.
PhotosHolmes	Python library image forensics	Analysis of internal copy–move edits and local retouching on document images.	Strong clone detection algorithms, useful complement in a visual pipeline.	Not specialized for “documents”, requires tuning for specific business cases.
pyIFD	Python toolkit Image Forgery Detection	Quick experiments for detecting alterations on scanned images.	Multiple techniques grouped into one toolkit, convenient for comparisons.	Older tool, maintenance and compatibility depend on file formats.
Forensically	Web tool image forensics	Manual or semi-automated checks on IDs, invoices, and tax notices.	Very educational interface, ideal for analyzing sensitive cases.	Not designed for large-scale automated integration without heavy adaptation work.

Pros of visual detection

good at catching cheap, quickly edited fakes
provides visual evidence (heatmaps, highlighted areas) to support decisions
useful for R&D, internal POCs and analyst training

Cons of visual detection

heavily dependent on image quality (compression, angle, resolution, blur)
blind to the business meaning of fields (income, dates, names, addresses)
hard to scale and maintain as the only line of defence

‍

Method 2: metadata analysis with open source tools and Python packs

The second method focuses on the technical history of a document. Every digital file carries metadata, including creation and modification timestamps, the software used to generate or edit it, and sometimes device-related information.

Metadata analysis helps answer simple but important questions: was the document generated by an official system? Was it edited manually before submission? Do multiple documents share the same unusual technical signature?

It does not assess whether the content itself is fraudulent, but it highlights abnormal behavior around the file.

How the method works

The idea is to extract metadata from files using open-source tools and Python libraries, then compare this information with what would normally be expected in a given process.

In practice, analysts examine creation and last modification timestamps, the software and version used to create or edit the document, the type of device involved (such as a smartphone camera, scanner, or desktop application), and, where relevant, EXIF or XMP fields embedded in image-based documents.

Certain abnormal situations tend to stand out quickly. A tax notice allegedly downloaded from a government portal may turn out to have been modified in a generic PDF editor shortly before submission. A pay stub that claims to come from payroll software may actually have been saved using a photo-editing application. In other cases, multiple documents within the same file or application may share the exact same unusual metadata pattern, suggesting mass-produced or reused documents.

Again, metadata analysis does not prove document fraud on its own, but it provides valuable signals that can be fed into document fraud detection systems and risk scoring mechanisms.

Advantages of metadata analysis

Metadata-based analysis is relatively fast to deploy and scales well across large document volumes. It remains effective even when documents appear visually clean and consistent, and it helps surface technical inconsistencies that are difficult to detect through manual review alone.

Limitations of metadata analysis

Metadata can be missing, incomplete, or intentionally altered. Legitimate operations such as OCR processing, electronic signatures, or PDF merging frequently modify metadata without indicating any fraudulent intent. For this reason, no single metadata field should be treated as a standalone proof of fraud.

Why this method remains complementary

When used in isolation, metadata analysis mainly produces weak signals. Its real value emerges when these signals are combined with other approaches, such as content analysis, cross-document consistency checks, and business rules within a broader risk scoring framework.

‍

Tools (open source & Python) for metadata analysis

This method relies heavily on open source tools and libraries that you can embed into scripts or processing pipelines.

ExifTool (open source)

ExifTool is the go-to open source tool for extracting technical metadata from images and some PDFs (EXIF, XMP, IPTC and others).

Typical features:

read detailed metadata for images and certain document formats
extract creation and modification dates
reveal software, device and other technical information
run in batch via command line or scripts

For document fraud detection, ExifTool is especially useful to check if the technical story of the file matches the claimed origin.

Exiv2 (open source)

Exiv2 is a C++ library with a command line interface used to read and write image metadata.

Typical features:

handle EXIF, IPTC, XMP for most image formats (JPEG, PNG, TIFF, etc.)
integrate directly into C++ applications and image-processing tools
support for both reading and updating metadata fields

In a fraud context, Exiv2 is a strong alternative or complement to ExifTool when your stack is C++ or heavily image-focused.

hachoir-metadata (Python pack)

hachoir-metadata is part of the Hachoir ecosystem, a Python library for parsing binary files and extracting metadata.

Typical features:

extract metadata from many file types (images, videos, archives, etc.) via binary parsing
integrate naturally into Python scripts and services
combine metadata analysis with other checks (OCR results, visual forensics, rules)

For document fraud detection, it’s particularly interesting when you want a pure Python pipeline analysing large volumes of uploaded files.

‍

Comparing metadata tools

Metadata tools add an extra layer to your document fraud detection strategy. They don’t evaluate business content but help you understand how the file was produced and modified.

‍

Tool	Integration	What it inspects	Strengths	Limitations
ExifTool	CLI + scripts	EXIF, XMP, IPTC and other technical metadata on images and some PDFs.	Industry reference, very feature-rich, great for batch processing.	Mostly CLI-based, less accessible for non-technical users.
Exiv2	C++ library + CLI	EXIF, IPTC, XMP metadata on most common image formats.	Well integrated in the open source ecosystem, strong fit for C++ apps.	Image-focused, less versatile than ExifTool, requires C++ skills.
hachoir-metadata	Python library	Low-level technical metadata from many file formats via binary parsing.	Easy to plug into Python pipelines, suited for high-volume inspection.	Requires custom fraud rules to turn raw metadata into risk signals.

Pros of metadata analysis

reveals technical inconsistencies that are invisible to the human eye
easy to automate in batch or near real time
helps identify recurring fraud patterns (same editor, same device, same pipeline)

Cons of metadata analysis

doesn’t analyse the actual content: amounts, names, addresses can still be fake
some fields may be missing, cleaned or manually edited by advanced fraudsters
must be combined with visual and business-level checks to be truly effective

‍

Method 3: consistency checks with AI-powered document fraud detection software

The third method is the one that comes closest to how seasoned fraud analysts actually work. Instead of focusing solely on the appearance of the file or on its technical metadata, it examines the story the documents are telling. The central question becomes: do all these documents make sense together for this person, this profile, and this application?

AI-powered solutions such as Koncile, Inscribe or Resistant AI follow this logic. They combine OCR specialised in financial and identity documents, data structuring and normalisation, business-rule engines, and anomaly-detection models to assess whether a dossier is coherent.

This approach is especially relevant for financial and administrative documents like bank statements, payslips, tax returns, tax notices, invoices, utility bills, and identity documents used in financial decisions. Instead of asking whether a document has been visually altered, it asks whether the information across the entire file forms a credible, consistent narrative.

‍

How the method works

Typical steps:

OCR API and extraction
Extract key fields from documents: amounts, dates, IBAN/account numbers, employer, address, balances, periods, IDs, etc.
Structuring and normalisation
Convert these fields into consistent schemas: standard date formats, currencies, unified employer names, transaction categories, income vs expenses, etc.
Business rules and consistency checks
Apply domain-specific logic, for example:
- does gross, net and contributions match on a pay stub?
- are the declared incomes consistent with cash flows on bank statements over several months?
- do tax numbers, addresses and income levels line up across tax returns and pay stubs?
- are employment periods continuous and plausible?
AI and anomaly detection
Use AI models and statistical patterns to highlight:
- income profiles that don’t fit the job or industry
- combinations of documents rarely seen in genuine applications
- unusual transaction patterns or suspicious document layouts

The goal is no longer just “is this file edited?”, but “is this entire dossier believable?”.

‍

AI-powered solutions for consistency-based document fraud detection

Several players use this approach; here are three representative examples.

Koncile (AI-powered)

Koncile focuses on automating the reading and checking of financial and administrative documents in credit, tenant screening and onboarding workflows.

Key capabilities:

Invoice OCR, as well as OCR for bank statements, pay stubs, tax notices, tax returns, and other financial documents
structuring of extracted data into schemas tailored to your use cases (credit application, rental file, KYC)
business rules to check consistency between fields and between documents (income, expenses, balances, dates, identities)
AI-based anomaly detection to surface dossiers that look unusual or high-risk
API-first approach to plug document fraud detection directly into your existing systems

Inscribe (AI-powered)

Inscribe targets banks and fintechs who need to detect forged or manipulated documents during onboarding and underwriting.

Typical capabilities:

analyse bank statements, pay stubs, tax documents and IDs
combine forensics, metadata and data checks to highlight hidden inconsistencies
generate risk scores and fraud alerts for teams reviewing applications

It’s a good example of a solution that treats document fraud as a consistency problem across documents, not just an image manipulation problem.

Resistant AI (AI-powered)

Resistant AI offers a multi-layer “document forensics” approach for financial institutions processing high volumes of documents.

Typical capabilities:

support for bank statements, pay slips, utility bills, tax forms and more
analysis of file structure, content patterns and manipulation clues
risk profiling at both document and case level
integration into KYC, lending and monitoring workflows

‍

Comparing AI-powered solutions

AI-powered consistency checks sit at the top of your document fraud detection stack. They bring together visual, metadata and content signals into a single decision layer.

‍

Solution	Positioning	Document types	Analysis type	Best suited for
Koncile	OCR + AI for financial and administrative documents.	Bank statements, payslips, tax notices, invoices, tax returns.	Structured extraction, business rules and consistency checks across fields and documents.	Lenders, landlords, KYC/Onboarding teams that want automation plus fraud detection.
Inscribe	Document fraud detection for banks and fintechs.	Bank statements, pay stubs, tax documents, IDs.	Forensics, metadata and data checks to surface hidden inconsistencies.	Lenders and fintechs aiming to cut fraud without adding friction for good customers.
Resistant AI	Multi-layer document forensics platform.	Bank statements, payslips, utility bills, tax forms, and more.	Risk profiling based on file structure, content and manipulation patterns.	Banks, insurers, BNPL and lending platforms processing very high document volumes.

Pros of consistency-based AI checks

closest to how humans really validate dossiers, but at scale
able to catch fraud that looks visually clean and technically “normal”
integrates naturally into decision workflows and case management tools

Cons of consistency-based AI checks

requires upfront work to define schemas, business rules and use cases
needs a minimum data and process maturity to deploy effectively
doesn’t completely remove the need for human review on high-risk cases, but sharply reduces volume

‍

In summary

No single method will eradicate document fraud. But combined intelligently, these three layers make fraud much harder, more expensive and more risky for attackers:

visual detection for cheap, obvious forgeries
metadata analysis for suspicious file histories
AI-powered consistency checks for deep, dossier-level fraud detection

Modern document fraud detection software like Koncile brings these layers together so your teams can focus on the cases that truly matter instead of manually reviewing every single document.

‍

Q&A

What is document fraud exactly?

Document fraud means creating, altering or reusing documents (bank statements, payslips, tax returns, IDs, etc.) to obtain an unfair advantage, such as a loan, a rental contract or a benefit.

Which documents are most frequently falsified?

The most targeted documents are bank statements, payslips, tax documents, IDs, proof of address and invoices, because they are used to assess identity, income, stability and risk.

Are Python visual detection packs enough to fight document fraud?

No. They are useful to flag obvious image tampering, but they do not assess the business logic or consistency of the data. They should be combined with metadata analysis and AI-powered consistency checks.

Why is metadata analysis useful for fraud detection?

Metadata analysis reveals the technical “story” of a file: when it was created or edited, which software or device was used, and whether this is consistent with how the document is supposed to be issued.

How is Koncile different from a basic OCR tool?

Koncile does more than reading text. It structures the data from documents, applies business rules and uses AI to detect inconsistencies across fields and documents in the same file, in order to surface suspected cases of document fraud.

Jules Ratier

Co-fondateur at Koncile - Transform any document into structured data with LLM - jules@koncile.ai

Jules leads product development at Koncile, focusing on how to turn unstructured documents into business value.

In this article

This is some text inside of a div block.

Resources

See all resources

Image of a contract approved with a stamp

Document deepfakes: catch the fakes slipping past your checks (2026)

Document deepfakes pass classic checks: how to detect AI-generated fake documents in 2026 with semantic coherence analysis.

Feature

25/6/2026

Stylized presentation of the 10 document fraud detection software platforms compared in the article (Koncile, Inscribe, Resistant AI, Klippa, Nanonets, Onfido, Jumio, Mitek, ComplyCube, Socure)

Top 10 Best Document Fraud Detection Software in 2026

Ten document fraud detection platforms compared on detection approach, fraud focus, integration and target profile, from semantic specialists to identity-verification incumbents.

Comparatives

15/5/2026

Stylized presentation of the logos of the 10 featured AP automation platforms (Koncile, Tipalti, Stampli, AppZen, BILL, Medius, Basware, Rossum, SAP Concur, Vic.ai)

The 10 Best AP Automation Software Platforms for 2026

Ten accounts payable automation platforms compared across AI agents, fraud detection, ease of integration, and target profile, from enterprise incumbents to AI-native challengers.

Comparatives

15/4/2026

Voir toutes les ressources

Solution

OCR Platform

OCR API

Fraud detection

All OCR Templates

Koncile Control

Documentation

Blog

Documentation

OCR Comparison

Everything About OCR

OCR Benchmark

Identity

Identity Document

Driving License

Proof of Address

Procurement

Invoice

Quote

Receipt

Transport & Logistics

Road Transport Invoice

Maritime Transport Invoice

Express Transport Invoice

Real estate

Reservation agreement

Rent Receipt

Sales Agreement

Legal

Certificate of Incorporation

NDA

Residential Lease

Finance & Accounting

Bank check

Bank Account Details

Bank Statement

Koncile SAS

Security & Confidentiality

Terms and Conditions

Legal Notice

Status

Product updates

96 bis Boulevard Raspail,
Paris, 75006, France

contact@koncile.ai

+33 9 75 86 62 90

Document fraud detection software: 3 ways to catch fake documents

‍What is document fraud detection?

Document Forgery: The core of document fraud detection

Method 1: Python packs for visual document fraud detection

How the method works

Example

Python packs for visual document fraud detection

DocAuth (Python)

PhotosHolmes (Python)

pyIFD (Python Image Forgery Detection toolkit)

Forensically (web-based)

Comparing visual detection packs

Pros of visual detection

Cons of visual detection

Method 2: metadata analysis with open source tools and Python packs

How the method works

Advantages of metadata analysis

Limitations of metadata analysis

Why this method remains complementary

Tools (open source & Python) for metadata analysis

ExifTool (open source)

Exiv2 (open source)

hachoir-metadata (Python pack)

Comparing metadata tools

Pros of metadata analysis

Cons of metadata analysis

Method 3: consistency checks with AI-powered document fraud detection software

How the method works

AI-powered solutions for consistency-based document fraud detection

Koncile (AI-powered)

Inscribe (AI-powered)

Resistant AI (AI-powered)

Comparing AI-powered solutions

Pros of consistency-based AI checks

Cons of consistency-based AI checks

In summary

Q&A