Capturing key data in procurement agreements using AI OCR

Last update:

May 7, 2025

5 minutes

A Fortune report shows that poor supplier contract management can cost up to 9% of annual revenue. Instead of static PDFs, turn contracts into a searchable, filterable database. With OCR and AI—especially LLMs—that’s now possible. Let’s see how.

Using LLM and OCR, extract key data from your supplier contracts to better anticipate renewals, manage risks and identify cost savings opportunities

Jules Ratier

Procurement team’s dream: turning any supplier agreement into usable data

Why do procurement and legal teams struggle to manage supplier agreements?

  • Agreements are scanned or image-based. A reliable OCR tool is essential to make them searchable and usable.
  • Key clauses are hidden in amendments and annexes. Important updates are often buried in separate documents or emails, leading to fragmentation and lack of visibility.
  • Documents are stored across multiple platforms. Contracts are scattered across shared drives, procurement systems, and inboxes, making retrieval and tracking difficult.
  • Retrieving the right information takes too long. Legal language and inconsistent formatting make it hard to extract the key data—like figures, deadlines, or obligations—that teams actually need.

Here’s what happens when you don’t extract the right data

  • Renewals slip through the cracks: Without visibility on notice periods or automatic renewals, you risk staying locked into unfavorable terms.
  • You incur unnecessary penalties: Late payments or missed deadlines can trigger financial penalties that could have been avoided.
  • You miss out on savings opportunities: Rebates, discounts, or kickbacks often go unclaimed simply because no one is tracking them.
  • You overlook critical risks: Key clauses like liability caps or indemnities remain hidden, leaving you exposed without realizing it.

Is there such a thing as a Procurement Agreement OCR?

Good news, yes. But it’s not really only an OCR (Optical Character Recognition).

It involves two main steps:

  1. Converting PDFs into text and structured tables: This is where OCR (Optical Character Recognition) comes in—it transforms scanned documents into machine-readable text.
  2. Extracting key information from the raw text using AI: Large language models (LLMs) analyze the content to identify and interpret important clauses, dates, values, and obligations.

It spans from recognizing characters to understanding the meaning of legal clauses. In that sense, contract OCR isn’t just a scanning tool—it’s a document data management system.

It doesn’t just extract information; it also deduces, calculates, and infers the key data points your team actually needs.

AI makes it possible to build the contract analytics you’re looking for

Analytics means having a complete, centralized list of all your agreements—each enriched with the key information you need to take action.

Here are a few examples of the insights you can extract:

  • How many active contracts do I have in Europe?
  • What is the total contract value per country or region?
  • Which contracts can I terminate now without penalties?
  • Which agreements should I renegotiate this month to avoid automatic renewal?

With structured data, you can filter, sort, and build the aggregate key figures you need.

Ideas to structure your data capture template

Let’s focus on building a business-oriented list of key information.

The goal is to avoid getting overwhelmed by overly legal or irrelevant details that don’t serve your procurement workflows.

In most cases, you don’t need to dig into every legal nuance—what matters is capturing the actionable data that helps you manage suppliers, control costs, and anticipate risks.

You can test this exact approach using your free Koncile account, which includes 50 credits to get started.

1. Supplier details

This may seem simple at first, but legal naming conventions can quickly complicate things.

If contracts aren’t accurately linked to the correct supplier entity, the risk of errors increases significantly.

For example:

  • ProServ Digital Services LLC → ProServ Digital Services | US Entity
  • PROSERV Digital Services S.C.A → ProServ Digital Services | Italian Entity

To avoid these issues, your system must support reliable automatic matching with your supplier database.

And because accuracy is non-negotiable here, the tool should include a human-in-the-loop process to flag and correct mismatches or uncertain cases.

2. Contract category

Thanks to the combination of OCR and large language models (LLMs), you can now automatically assign the right category to each contract—based on the full context, not just keywords.

But categorization isn’t always obvious.

Take this example list of categories:

  • Consulting Intellectual Services
  • Legal
  • Audit
  • Business Process Outsourcing
  • Contractor Intellectual Services
  • Insurance
  • Membership Fees
  • Facility Management
  • etc.

Now imagine you’re dealing with a contract from NovaLex Solutions, which provides analysis and design of cloud infrastructure.

At first glance, this could fall under either Business Process Outsourcing or Contractor Intellectual Services.

In this case, it should be assigned to Contractor Intellectual Services, because it involves expert-based, knowledge-driven services delivered by independent professionals.

So how can AI make the right classification?

  • Provide supplier examples for each category: This helps the model recognize patterns and apply similarity logic.
  • Define each category clearly: Include a short, business-oriented description of what belongs—and what doesn’t—in each one. The more context you give, the better the AI can align with your expectations.

We’ve built very detailed list of procurement contract categories. You can take a look.

3. Renewals

Some contracts include tacit (automatic) renewal clauses, which can catch you off guard.

If you miss the termination notice deadline, it’s often too late—you’re locked in until the next renewal cycle.

That’s why early detection and proactive alerts are critical.

Here’s how AI can help anticipate renewals:

First, it spots the right termination clause and the exact date:

  • This Agreement shall commence on the Effective Date and shall remain in effect until October 23,2025 (the “Termination Date”)

Second step, in your prompt or extraction logic, you can include buffer period to trigger early warnings. For instance, if the agreement requires a 90-day notice, the system should raise an alert by July 25, 2025, giving your team enough time to act.

4. Contract value

Following the Pareto principle, identifying your most valuable contracts helps you prioritize the ones that matter most.

Let’s focus on the 20% of agreements that drive 80% of your spend.

As a rule of thumb, your data capture model should include two key fields:

  • A brief summary of the pricing structure and scope: Is the contract tied to a single entity or multiple subsidiaries? Is it flat-rate or usage-based?
  • An estimated annual contract value: A projected value over a 12-month period based on the available information.

For many contracts—like SaaS subscriptions—this is relatively straightforward.

But for others, such as telecom agreements, it can be more complex. The value might depend on variables like the number of phone lines or the scope of services across your entities.

That’s where AI inference becomes valuable.

To improve the accuracy of AI-estimated values, include contextual information such as your company size, number of users, geography, and legal entities in your prompt or extraction settings.

This helps the model make better assumptions when values are not explicitly stated.

5. SLA & penalties

Service Level Agreements (SLAs) and penalty clauses are often underutilized by clients.

They’re negotiated up front—but rarely enforced or even revisited.

Yet these clauses can be critical for performance monitoring, leverage in renegotiations, and financial recovery in case of service failures.

So how do you capture them efficiently?

  • Summarize the key terms: Focus on obligations, thresholds, and trigger conditions (e.g., “99.9% uptime required; €1,000 penalty per hour of downtime”).
  • Include a penalty range or cap: Add a field to indicate the maximum penalty that can be claimed. This keeps your focus on high-impact clauses and avoids noise from trivial ones.

This approach makes SLA and penalty data usable—not just stored.

6. Termination for convenience

Not all contracts can be exited immediately—and knowing which ones you can terminate now versus those you need to wait out is critical for proactive contract management.

Many procurement teams would love a simple, actionable list of agreements they can walk away from today, without penalties.

But beware: termination for convenience often comes with conditions or financial consequences.

For example:

If termination is initiated by the Licensee for convenience, an exit fee equivalent to 25% of the remaining license term’s value shall be payable to the Licensor within 30 days of termination.

To capture this effectively, your data model should include:

  • Whether termination for convenience is allowed
  • Any associated exit fees or conditions
  • The exact amount or formula used to calculate those fees

This gives your team the ability to make informed decisions—well ahead of renewal or negotiation deadlines.

Type image caption here (optional)

7. Payment terms

By capturing and centralizing the payment policy of each supplier, you can better manage your working capital and reduce exposure to late payment penalties.

For example, Koncile is based in France, where the law imposes a standard maximum payment term of 30 days, unless otherwise agreed in the contract. Knowing when exceptions apply is key to staying compliant and negotiating favorable terms.

And this isn’t just theoretical—payment delays can be costly. 55% of all invoices issued in the U.S. are paid late, and small businesses typically receive payments 8 days after the deadline.

Capturing this data lets you stay ahead of due dates, flag risky terms, and avoid preventable losses.

8. Geography coverage

The goal is to be able to filter and segment your agreements by geographic area—so you can focus efforts where they matter most.

To do this effectively, you can include simple fields in your data capture model such as:

  • Europe: Yes/No
  • United States: Yes/No
  • Other regions or countries: Custom fields as needed (e.g., APAC, LATAM)

9. Kickbacks and rebate

Kickbacks and rebates are typically annual refunds or discounts based on total spend—and they’re often overlooked. If not tracked, they can quietly expire, leaving valuable savings on the table.

To capture them effectively, your data model should include:

  • The formula or mechanism: Is it a percentage of annual spend? Based on volume thresholds?
  • An estimated or maximum amount: Include a range to help determine whether the rebate is financially significant or not.
  • The conditions: Are the eligibility criteria realistic and achievable based on your current spending patterns?

Provide as much context as possible—like your company’s spend volumes, entities, and historical contract values. This helps the AI better assess whether the rebate is relevant and worth pursuing.

10. Liability Cap

It may sound like pure legalese, but liability caps are essential for good risk management.

These clauses limit the amount a supplier can be held responsible for in case of damages—and directly affect your ability to recover losses in the event of a dispute.

While interpretation and enforcement may depend on local laws, it’s crucial to flag them early.

In most cases, your legal team should be looped in, as these clauses are often subject to strict legal frameworks.

11. Exclusivity & Competition Constraints

While relatively rare in procurement contracts, exclusivity clauses can limit your ability to engage with alternative suppliers or competitors.

Even if they’re unusual, it’s important to detect and monitor them—they can affect your sourcing flexibility and negotiating power, especially in critical categories.

12. Change of Control

This clause is more common in M&A or strategic partnerships, but it’s worth watching for—even in supplier agreements.

A change of control clause may impose restrictions or termination rights if the supplier is acquired, particularly by a competitor.

While uncommon in operational procurement, it’s good practice to track these clauses in case corporate ownership shifts unexpectedly.

13. Liability cap

Sounds a bit legal stuff, but for good managmemnent, and risk control, you must be warned about any lilimitaeiton to damages that can be imposed, and we’ll limit your ability to

Though, let’s put the legal department in the loop for them, as they are strictly encadrées by the applicable law.

14. Exclusivity & competition contraints

even if they are very rare, you want to know about them. You’re restricted to deal with another supplier, another

15. Change of control ?

Beware of this for M&A purposes. Highly unrealistic in procurement agreement. But again you need to know them when they arise. Notably if the company is acquired by a competitor of your supplier. One never knows.

Jules Ratier

Co-fondateur at Koncile - Transform any document in structured data with LLM - jules@koncile.ai

Jules leads product development at Koncile, focusing on how to turn unstructured documents into business value.