OCR (Optical Character Recognition) is a technology that allows different types of documents, such as scanned images, PDF files, or photos of text, to be converted into editable and searchable text data. In other words, OCR transforms an image containing text into a text file that you can edit.
This technology works by analyzing the image of a text, identifying individual characters and their layout, and then converting them into editable text. An OCR software typically uses artificial intelligence and machine learning algorithms to improve recognition accuracy.
"Traditional" OCR software simply transcribes raw text. However, advanced solutions like Koncile OCR go beyond simple transcription. They do not merely convert all the text in a document into data. Thanks to AI integration, particularly LLMs (large language models), these tools can identify and extract the specific data the user is looking for.
For example, in an invoice, Koncile can automatically find and extract the total amount, supplier name, date, line item details (products, quantities, unit prices), VAT numbers, and much more. Koncile understands the document and extracts relevant information in a structured way, ready to be used in other systems (accounting, ERP, etc.). This is known as intelligent data extraction.