<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "What is document fraud exactly?", "acceptedAnswer": { "@type": "Answer", "text": "Document fraud means creating, altering or reusing documents such as bank statements, payslips, tax returns or IDs to obtain an unfair advantage, for example a loan, a rental contract or a benefit." } }, { "@type": "Question", "name": "Which documents are most frequently falsified?", "acceptedAnswer": { "@type": "Answer", "text": "The most frequently falsified documents are bank statements, payslips, tax documents, IDs, proof of address and invoices, because they are used to assess identity, income, stability and financial risk." } }, { "@type": "Question", "name": "Are Python visual detection packs enough to fight document fraud?", "acceptedAnswer": { "@type": "Answer", "text": "No. Python visual detection packs are useful to flag obvious image tampering, but they do not evaluate the business logic or consistency of the information. They should be combined with metadata analysis and AI-powered consistency checks." } }, { "@type": "Question", "name": "Why is metadata analysis useful for fraud detection?", "acceptedAnswer": { "@type": "Answer", "text": "Metadata analysis shows the technical story of a file: when it was created or edited, which software or device was used, and whether this is consistent with how the document is supposed to be generated in real life." } }, { "@type": "Question", "name": "How is Koncile different from a basic OCR tool?", "acceptedAnswer": { "@type": "Answer", "text": "Koncile combines specialized OCR, data structuring, business rules and AI to detect inconsistencies across fields and documents in the same case file, going far beyond basic text extraction to identify potential document fraud." } } ] } </script>
Document fraud is growing faster than manual teams can keep up. In this article, we compare three concrete approaches to document fraud detection software, from simple Python tools to AI-powered platforms like Koncile.
Three complementary approaches to document fraud detection software, from image forensics to AI-powered consistency checks on financial documents.
What is document fraud detection?
Document fraud consists of altering official documents to obtain an advantage that should not be granted: a loan, a rental, a social benefit, a contract, a job, and so on. Detecting document fraud means verifying that the files provided are both authentic and internally consistent before triggering any business decision.
In practice, this mainly concerns financial and identity documents such as bank statements, pay stubs, tax returns, tax notices, identity documents, proof of address, invoices, or other administrative records. If any of these documents is manipulated, the entire decision chain becomes unreliable: credit scoring, tenant screening, KYC, onboarding, anti-fraud procedures.
Historically, document fraud detection relied on manual review. An analyst would compare documents one by one, check amounts, dates, logos, signatures or stamps, and use experience to spot inconsistencies. This remains useful, but quickly becomes unmanageable when volumes increase.
Today, three main approaches are emerging: visual analysis based on the document image, metadata inspection based on the file’s technical footprint, and AI-powered consistency checks that verify whether the information in the documents makes sense together. The goal is not to choose a single “best” method, but to understand what each one covers, its limitations, and how to combine them into an industrial-grade workflow.
💡Good to know
Don’t try to catch every case of document fraud with a single tool. Combine visual checks, metadata analysis and AI-based consistency tests to reduce fraud without blocking all applications.
Method 1: Python packs for visual document fraud detection
The first method focuses purely on the visual layer: the image of the document. The idea is simple: before you even look at the content (names, amounts, addresses), you ask a question:
“Has this image been tampered with?”
This is where Python image-forensics libraries come into play. They analyse pixels, compression, patterns and textures to detect signs of editing such as cloned regions, pasted stamps, altered fields or inconsistent noise.
This approach is helpful to catch obvious forgeries or quick Photoshop jobs, but it is fragile with low-quality scans, mobile photos, or carefully prepared fakes based on genuine templates.
How the method works
In this method, a document (ID card, invoice, tax notice, pay stub) is treated as a plain digital image. Algorithms typically:
split the image into blocks and search for repeated patterns (clone detection)
analyse compression artefacts and noise to highlight retouched regions
look for visual inconsistencies: strange contours, different grain, edges that don’t match
generate heatmaps or scores that indicate suspicious areas
Common red flags include:
a profile picture on an ID that doesn’t blend naturally with the background
a stamp or signature pasted on top of an existing scan
a logo copied from another document, with a different resolution or sharpening
Again, this doesn’t tell you if the income or identity information is realistic. It simply flags documents that visually look “edited” and should be escalated.
Example
Python packs for visual document fraud detection
Python packs for visual forensics are mostly used for R&D, proof of concepts and internal tools. They’re powerful bricks but not turnkey “document fraud detection software” for business teams.
They are great when you want to:
explore visual forgery detection on a specific document type
train fraud teams to recognise common visual signals
build a first filter before more advanced checks
DocAuth (Python)
DocAuth is a Python project for basic document authentication and image forgery detection.
Typical features:
run image analysis on ID cards, certificates and official documents
detect modified regions in sensitive zones (photo, signature, stamp)
produce suspicion scores or heatmaps per region
integrate into simple Python pipelines for experiments or demos
DocAuth is quite limited and not heavily maintained, but it’s a good starting point to explore the topic.
PhotosHolmes (Python)
PhotosHolmes is a Python library focused on digital image forgery detection. It is not dedicated to identity documents, but its techniques can be reused for document images.
Typical features:
clone detection: highlight regions copied and pasted within the same image
localisation of retouching operations on specific areas (headers, stamps, amounts)
visual maps to guide a human analyst towards suspicious regions
combination of multiple forensic algorithms in one workflow
For document fraud, you can adapt PhotosHolmes to scan IDs, invoices or tax notices for copy–paste patterns and local edits.
pyIFD (Python Image Forgery Detection toolkit)
pyIFD is a Python toolkit that bundles several image forgery detection techniques in one place.
Typical features:
multiple algorithms for detecting manipulations in JPG/PNG/TIFF images
example scripts to test different techniques on the same dataset
flexible architecture for extending or combining methods
quick experimentation on scanned documents and attachments
It’s useful in exploratory phases, but you need to check maintenance and compatibility before using it in production.
Forensically (web-based)
Forensically is a well-known web tool for image forensics. It’s not a Python pack, but it’s often used as a reference or manually by fraud teams.
Typical features:
clone detection to spot duplicated regions
error level analysis (ELA) to highlight retouched zones
visual UI that’s very convenient for investigating sensitive cases
Forensically is great to assist analysts or validate suspicions on a handful of documents. For high-volume automation, it needs custom integration work.
Comparing visual detection packs
Visual Python packs help catch obvious manipulations on scanned or photographed documents. They are a good layer for experimentation and training, but they don’t replace a full document fraud detection software stack.
Tool
Type
Use Case
Strengths
Limitations
DocAuth
Python package document authentication
Testing ID cards, certificates or official documents for POCs and demonstrations.
Easy to use, good starting point to explore visual fraud detection.
Project not actively maintained, limited scope, not production-ready as is.
PhotosHolmes
Python library image forensics
Analysis of internal copy–move edits and local retouching on document images.
Strong clone detection algorithms, useful complement in a visual pipeline.
Not specialized for “documents”, requires tuning for specific business cases.
pyIFD
Python toolkit Image Forgery Detection
Quick experiments for detecting alterations on scanned images.
Multiple techniques grouped into one toolkit, convenient for comparisons.
Older tool, maintenance and compatibility depend on file formats.
Forensically
Web tool image forensics
Manual or semi-automated checks on IDs, invoices, and tax notices.
Very educational interface, ideal for analyzing sensitive cases.
Not designed for large-scale automated integration without heavy adaptation work.
Pros of visual detection
good at catching cheap, quickly edited fakes
provides visual evidence (heatmaps, highlighted areas) to support decisions
useful for R&D, internal POCs and analyst training
Cons of visual detection
heavily dependent on image quality (compression, angle, resolution, blur)
blind to the business meaning of fields (income, dates, names, addresses)
hard to scale and maintain as the only line of defence
Method 2: metadata analysis with open source tools and Python packs
The second method ignores the visual appearance and focuses on the “technical story” of the file. Every digital document carries metadata: creation time, modification time, software used, sometimes device information or extra technical fields.
Metadata analysis helps you answer questions like:
“Was this document really generated by the official portal or payroll system?”
“Was it edited in a PDF or image editor just before submission?”
“Do all documents in this application share the same suspicious signature?”
It doesn’t tell you if the numbers are fake, but it does reveal strange behaviour around the file itself.
How the method works
The idea is to extract metadata from files and compare it with what should normally happen in your process. You typically look at:
creation and last modification timestamps
software and version used to create or edit the document
device type (smartphone camera, scanner, desktop app)
EXIF or XMP fields for images where relevant
Typical abnormal situations:
a “tax notice” allegedly downloaded from a government portal but last modified in a generic PDF editor right before submission
a pay stub supposedly generated by payroll software, but actually saved by a photo-editing app
multiple documents in one file that all share the exact same weird metadata pattern, hinting at mass-produced fakes
Again, metadata doesn’t prove fraud by itself, but it gives powerful signals to feed into your document fraud detection software and risk scoring.
Tools (open source & Python) for metadata analysis
This method relies heavily on open source tools and libraries that you can embed into scripts or processing pipelines.
ExifTool (open source)
ExifTool is the go-to open source tool for extracting technical metadata from images and some PDFs (EXIF, XMP, IPTC and others).
Typical features:
read detailed metadata for images and certain document formats
extract creation and modification dates
reveal software, device and other technical information
run in batch via command line or scripts
For document fraud detection, ExifTool is especially useful to check if the technical story of the file matches the claimed origin.
Exiv2 (open source)
Exiv2 is a C++ library with a command line interface used to read and write image metadata.
Typical features:
handle EXIF, IPTC, XMP for most image formats (JPEG, PNG, TIFF, etc.)
integrate directly into C++ applications and image-processing tools
support for both reading and updating metadata fields
In a fraud context, Exiv2 is a strong alternative or complement to ExifTool when your stack is C++ or heavily image-focused.
hachoir-metadata (Python pack)
hachoir-metadata is part of the Hachoir ecosystem, a Python library for parsing binary files and extracting metadata.
Typical features:
extract metadata from many file types (images, videos, archives, etc.) via binary parsing
integrate naturally into Python scripts and services
combine metadata analysis with other checks (OCR results, visual forensics, rules)
For document fraud detection, it’s particularly interesting when you want a pure Python pipeline analysing large volumes of uploaded files.
Comparing metadata tools
Metadata tools add an extra layer to your document fraud detection strategy. They don’t evaluate business content but help you understand how the file was produced and modified.
Tool
Integration
What it inspects
Strengths
Limitations
ExifTool
CLI + scripts
EXIF, XMP, IPTC and other technical metadata on images and some PDFs.
Industry reference, very feature-rich, great for batch processing.
Mostly CLI-based, less accessible for non-technical users.
Exiv2
C++ library + CLI
EXIF, IPTC, XMP metadata on most common image formats.
Well integrated in the open source ecosystem, strong fit for C++ apps.
Image-focused, less versatile than ExifTool, requires C++ skills.
hachoir-metadata
Python library
Low-level technical metadata from many file formats via binary parsing.
Easy to plug into Python pipelines, suited for high-volume inspection.
Requires custom fraud rules to turn raw metadata into risk signals.
Pros of metadata analysis
reveals technical inconsistencies that are invisible to the human eye
easy to automate in batch or near real time
helps identify recurring fraud patterns (same editor, same device, same pipeline)
Cons of metadata analysis
doesn’t analyse the actual content: amounts, names, addresses can still be fake
some fields may be missing, cleaned or manually edited by advanced fraudsters
must be combined with visual and business-level checks to be truly effective
Method 3: consistency checks with AI-powered document fraud detection software
The third method is the one that comes closest to how seasoned fraud analysts actually work. Instead of focusing solely on the appearance of the file or on its technical metadata, it examines the story the documents are telling. The central question becomes: do all these documents make sense together for this person, this profile, and this application?
AI-powered solutions such as Koncile, Inscribe or Resistant AI follow this logic. They combine OCR specialised in financial and identity documents, data structuring and normalisation, business-rule engines, and anomaly-detection models to assess whether a dossier is coherent.
This approach is especially relevant for financial and administrative documents like bank statements, payslips, tax returns, tax notices, invoices, utility bills, and identity documents used in financial decisions. Instead of asking whether a document has been visually altered, it asks whether the information across the entire file forms a credible, consistent narrative.
How the method works
Typical steps:
OCR and extraction Extract key fields from documents: amounts, dates, IBAN/account numbers, employer, address, balances, periods, IDs, etc.
Structuring and normalisation Convert these fields into consistent schemas: standard date formats, currencies, unified employer names, transaction categories, income vs expenses, etc.
Business rules and consistency checks Apply domain-specific logic, for example:
does gross, net and contributions match on a pay stub?
are the declared incomes consistent with cash flows on bank statements over several months?
do tax numbers, addresses and income levels line up across tax returns and pay stubs?
are employment periods continuous and plausible?
AI and anomaly detection Use AI models and statistical patterns to highlight:
income profiles that don’t fit the job or industry
combinations of documents rarely seen in genuine applications
unusual transaction patterns or suspicious document layouts
The goal is no longer just “is this file edited?”, but “is this entire dossier believable?”.
AI-powered solutions for consistency-based document fraud detection
Several players use this approach; here are three representative examples.
Koncile (AI-powered)
Koncile focuses on automating the reading and checking of financial and administrative documents in credit, tenant screening and onboarding workflows.
Key capabilities:
specialised OCR for bank statements, pay stubs, tax notices, invoices, tax returns and other financial docs
structuring of extracted data into schemas tailored to your use cases (credit application, rental file, KYC)
business rules to check consistency between fields and between documents (income, expenses, balances, dates, identities)
AI-based anomaly detection to surface dossiers that look unusual or high-risk
API-first approach to plug document fraud detection directly into your existing systems
Inscribe (AI-powered)
Inscribe targets banks and fintechs who need to detect forged or manipulated documents during onboarding and underwriting.
Typical capabilities:
analyse bank statements, pay stubs, tax documents and IDs
combine forensics, metadata and data checks to highlight hidden inconsistencies
generate risk scores and fraud alerts for teams reviewing applications
It’s a good example of a solution that treats document fraud as a consistency problem across documents, not just an image manipulation problem.
Resistant AI (AI-powered)
Resistant AI offers a multi-layer “document forensics” approach for financial institutions processing high volumes of documents.
Typical capabilities:
support for bank statements, pay slips, utility bills, tax forms and more
analysis of file structure, content patterns and manipulation clues
risk profiling at both document and case level
integration into KYC, lending and monitoring workflows
Comparing AI-powered solutions
AI-powered consistency checks sit at the top of your document fraud detection stack. They bring together visual, metadata and content signals into a single decision layer.
Solution
Positioning
Document types
Analysis type
Best suited for
Koncile
OCR + AI for financial and administrative documents.
Bank statements, payslips, tax notices, invoices, tax returns.
Structured extraction, business rules and consistency checks across fields and documents.
Lenders, landlords, KYC/Onboarding teams that want automation plus fraud detection.
Inscribe
Document fraud detection for banks and fintechs.
Bank statements, pay stubs, tax documents, IDs.
Forensics, metadata and data checks to surface hidden inconsistencies.
Lenders and fintechs aiming to cut fraud without adding friction for good customers.
Resistant AI
Multi-layer document forensics platform.
Bank statements, payslips, utility bills, tax forms, and more.
Risk profiling based on file structure, content and manipulation patterns.
Banks, insurers, BNPL and lending platforms processing very high document volumes.
Pros of consistency-based AI checks
closest to how humans really validate dossiers, but at scale
able to catch fraud that looks visually clean and technically “normal”
integrates naturally into decision workflows and case management tools
Cons of consistency-based AI checks
requires upfront work to define schemas, business rules and use cases
needs a minimum data and process maturity to deploy effectively
doesn’t completely remove the need for human review on high-risk cases, but sharply reduces volume
In summary
No single method will eradicate document fraud. But combined intelligently, these three layers make fraud much harder, more expensive and more risky for attackers:
visual detection for cheap, obvious forgeries
metadata analysis for suspicious file histories
AI-powered consistency checks for deep, dossier-level fraud detection
Modern document fraud detection software like Koncile brings these layers together so your teams can focus on the cases that truly matter instead of manually reviewing every single document.
Q&A
What is document fraud exactly?
Document fraud means creating, altering or reusing documents (bank statements, payslips, tax returns, IDs, etc.) to obtain an unfair advantage, such as a loan, a rental contract or a benefit.
Which documents are most frequently falsified?
The most targeted documents are bank statements, payslips, tax documents, IDs, proof of address and invoices, because they are used to assess identity, income, stability and risk.
Are Python visual detection packs enough to fight document fraud?
No. They are useful to flag obvious image tampering, but they do not assess the business logic or consistency of the data. They should be combined with metadata analysis and AI-powered consistency checks.
Why is metadata analysis useful for fraud detection?
Metadata analysis reveals the technical “story” of a file: when it was created or edited, which software or device was used, and whether this is consistent with how the document is supposed to be issued.
How is Koncile different from a basic OCR tool?
Koncile does more than reading text. It structures the data from documents, applies business rules and uses AI to detect inconsistencies across fields and documents in the same file, in order to surface suspected cases of document fraud.
Move to document automation
With Koncile, automate your extractions, reduce errors and optimize your productivity in a few clicks thanks to AI OCR.