How to use Azure AI Document Intelligence for AI-based Text Recognition

Microsoft's Azure AI Document Intelligence provides a comprehensive solution for automatic scanning and analysis of documents in all file formats. We offer a hands-on introduction to this cost-effective AI technology.

Key facts:

  • What it’s all about: Azure AI Document Intelligence is an AI-based text recognition solution in the Microsoft Azure Cloud. The solution can extract information from PDFs, photos, graphics or handwriting and prepare it in a structured way.
  • Advantages: This simplifies processes, automates unpleasant tasks and saves costs. Examples of applications include customer service (applications, complaints), accounting (scanning receipts), healthcare (prescriptions), laboratory technology (laboratory reports), archiving (document archives) and many more.
  • Usage: Integration via API
  • Costs: approx. 1.50$ per 1,000 pages

What Azure AI Document Intelligence does

Azure AI Document Intelligence (see Microsoft product page) is a component of the Azure AI platform that employs machine learning to scan, detect, and classify documents. It can handle a wide range of document types, such as invoices, receipts, forms, and even handwritten notes, as well as bespoke document formats.

 

 

Unlike standalone solutions like OmniPage or Adobe Acrobat Pro DC, the Azure-based solution integrates with your existing apps via API. This direct link enables you to build an application that is specifically matched to your needs and processes. Some IT work is required for API integration; however,  the solution is relatively simple to configure and utilize. Microsoft also offers all code samples, which saves developers considerable time.

Advantages of Azure AI Document Intelligence

  • Recognize text and tables in documents of all formats such as PDF, graphics, handwriting (OCR)
  • Introduce fast document processing in the company (efficiency)
  • Reduce costs (automation instead of manual work)
  • Minimize errors (e.g. avoid typing errors when typing manually)

Costs

Azure incurs the typical usage-based expenses. The advantage of the cloud option is that you pay based on usage volume. This means you do not have to acquire a solution with hefty license charges, but can pay flexibly based on the pages or documents utilized ("pay as you go" model).

  • 0-500 pages/month: free of charge
  • 1.000 pages/month: 1.50 $ (cheaper for > 1 million pages/month)
  • see cost calculator on the Microsoft Azure website

Text recognition : Typical use cases and business benefits

Azure AI Document Intelligence enables different sectors to save money and increase efficiency. In brief, the approach can be useful wherever a significant number of papers are consistently handled. Here are some instances.

Invoice processing: Azure AI Document Intelligence can help you automatically scan incoming invoices, extract key data like the amount, date, and invoice number, and send it to your accounting system. This reduces manual, time-consuming tasks and accelerates the overall accounting process.

Customer service inquiries can be handled more quickly by automatically assessing and categorizing incoming documents such as application forms or letters of complaint. This results in speedier assignment of requests to the appropriate workers, which improves customer service.

Azure AI Document Intelligence improves patient file processing efficiency by automatically gathering key information such as diagnosis and treatment plans. This helps to improve patient care and streamline administrative operations.

In logistics, computerized processing of delivery bills and bills of lading can result in speedier supply chain procedures by extracting and processing key information such as delivery addresses or product lists right away.

In the realm of digital humanities, Azure AI Document Intelligence helps to create digital archives by digitizing and analysing historical texts and manuscripts. Project Gutenberg, which makes thousands of digitized public domain books available for free, and the Internet Archive, which collects digital information ranging from websites to books and music, are two prominent instances of such document libraries. These applications provide widespread access to cultural and historical resources, encourage study and education, and make it easier to create interactive learning materials.

 

Tutorial: Azure AI Document Intelligence in 5 steps

In no time at all, you can set up a new instance of the solution in Azure, try it out interactively in the Studio and then integrate it into your own processes via API.

About this short tutorial:

  • Goal: Use Document Intelligence in Azure and learn how to integrate it via API
  • Suitable for: Azure beginners and professionals, developers, data analysts
  • Time required: 15 minutes
  • Cost: free to very low

The steps at a glance:

  • Step 1: Set up an Azure account
  • Step 2: Create a Document Intelligence resource
  • Step 3: Call up Document Intelligence Studio
  • Step 4: Use Document Intelligence Studio
  • Step 5: Integration via API

Step 1: Set up an Azure account

If you do not currently have an Azure account, you can test Azure for free for 30 days and earn a $200 starting credit, which is more than enough for a large amount of data and experiments.

  • Set up an Azure account
  • Then log in to Az ure

Step 2: Create a document intelligence resource

Now let's create a free cloud instance of Document Intelligence (formerly known as "Form Recognizer"). To achieve this, navigate to the "Azure AI Services" service in Azure and click the "Create" button to establish a new Document Intelligence resource. On this overview page, "Azure AI Services" will always be on the left in the menu beneath "Azure AI Document Intelligence".

 

Settings:

  • Subscription: Select your Azure subscription
  • Resource group: Create a new resource group (this bundles several Azure services together and you can easily find and delete them later)
  • Name: DocumentIntelligence-RS1 (suggestion: product name your name abbreviation number of your test, here a 1)
  • Server region: Germany West Central (or other location in Europe)
  • Cost plan: Free F0 (up to 500 pages free of charge)
  • Click on “Create” and wait 1-2 minutes until your instance has been created

 

 

Step 3: Call up Document Intelligence Studio

In the next step, we navigate to the Document Intelligence Studio in Azure.

  • Click on “Go to Resource”
  • Click on Document Intelligence Studio > Try it
  • or directly via URL

 

 

Step 4: Use Document Intelligence Studio

In the Azure Cloud, there is an interactive “Studio” application for many Azure tools, with which you can easily test the tool.

 

 

As a test, we'd want to read a table from an annual report twice: once in PDF and once in scanned visual format. There are existing templates for this in the Studio.

Settings:

  • Application: Click on “Layout” (we want to extract tables, i.e. documents that have a “layout”)
  • Select document type. The choices are: Invoice, Receipt, Identity, Health Insurance card, Business card, Contract, Tax Forms. However, the strength of the solution is that you can create and train your own document types. You select a suitable document type so that the data runs into the correct structure.
  • Select document: Upload your own documents or select the annual report from the templates on the left.
  • Click on “Analyze options”: Here you can make a few more settings, such as the page to be scanned if you want to scan a multi-page PDF.
  • Click on “Run analysis”: This starts the text analysis. The tool will now highlight all recognized texts in color.

 

Result:

When you click on an extracted area, Document Intelligence displays the extracted data to the right of it, such as a complete table from a business report graphics file, with all cells and headers accurately identified. This is now available in a structured format, making it easier for machines to process.

 

Step 5: Integration via API

Azure AI Document Intelligence may be easily integrated with existing applications. Programming languages supported include C#, Java, Python, JavaScript, and REST API.

Alternatives: Other text recognition solutions

There are various other software solutions on the market that perform comparable activities to Azure AI Document Intelligence, particularly in the areas of document analysis and processing with artificial intelligence and machine learning. Some of the solutions include:

Standalone (“On Premise”) solutions:

  • Adobe Acrobat Pro DC: Provides advanced PDF editing capabilities, including text recognition and conversion, document comparison and easy integration with other services.
  • OmniPage by Kofax: A powerful OCR tool used for document conversion and digitization that provides high accuracy in text recognition.
  • BBYY FineReader: An OCR and PDF software solution that enables scanned documents and PDFs to be converted into editable and searchable formats.
  • Readiris: OCR software that enables text recognition in scanned documents, PDFs and images and saves the converted files in various formats.
  • ScanSoft PaperPort: Provides document management and digitization capabilities and allows digital documents to be organized and shared.

Cloud solutions:

  • Google Cloud Vision API: This solution from Google provides advanced image analysis capabilities and can recognize and extract text in documents, similar to Azure AI Document Intelligence.
  • Amazon Textract: A service from Amazon Web Services that makes it possible to automatically extract, process and analyze text and data from documents.
  • IBM Watson Discovery: This tool from IBM uses AI to understand and analyze complex data and gain valuable insights from it. It can also be used to process documents.
  • ABBYY FlexiCapture: An advanced data capture and document processing solution that uses machine learning to analyze documents and extract information. (Cloud and standalone available)
  • Kofax Capture: Provides automated capture, processing and integration of documents and data into business processes and systems. (Cloud and standalone possible)
  • Ephesoft Transact: A platform for intelligent document processing that uses machine learning and AI to extract and classify data from documents. (Cloud and standalone possible)