How to use OCR in UIPath

Vashisht Devasani
7 min readMay 3, 2021

--

The OCR Activity is the most used activity nowadays for extracting content from the website, Image, Scanned PDF, Hand Written Text, and so on.

Extracting information or data from images, scanned documents, or PDFs is a very tedious job. Normal activities are not recommended for extracting these types of inputs. OCR uses a different method and approach to extract the information.

What is OCR?

OCR also known as Optical Character Recognition is a technology that helps professionals to convert various types of documents, such as scanned paper, images captured by a digital camera into editable data and PDF files. Using OCR software enables one to single out letters on the images; putting them into words and then forming sentences. This provides easy access and edit of the original document content. Powered with better search capabilities and optical character recognition for scanned documents, enterprise content management solution providers can produce the best OCR software for the business using full-text search and document management capabilities.

TYPES OF OCR’S:

There are mainly two types of OCR available in UI Path Studio:

1.Microsoft OCR

2. Google OCR

These OCRs are available as individual activities and also used internally in the screen scraping tool. You can select the required OCR according to the purpose. We will discuss them in detail in this blog further.

Microsoft’s OCR is known as MODI, and Google’s OCR is called Tesseract. OCR is not limited to only these two types of OCR. You are free to use another type of OCR. There are many different flavors of OCR available like third-party activities.

Fig. — OCR engines in UI Path

MICROSOFT OCR:

PROPERTIES:

Input:

It accepts only the image variables on which we want to perform our OCR activities like GET OCR TEXT etc.

Options:

Extract Words: If this check box is selected, the on-screen position of each detected word is extracted.

Language: This is used to specify the language used in the image for better extraction. It should be mentioned with full name as “English” etc.,

Profile: The profile contains four options about what the image is

  • None: Does not apply a Pre-processing profile.
  • Screen: Pre-processing suitable for remote desktop applications.
  • Scan: Pre-processing suitable for scanned files.
  • Legacy: Uses the engine’s default settings for Pre-processing images, this is the default option.

Scale: The scaling factor of the selected UI element or image. The higher the number is, the more you enlarge the image. This can provide a better OCR read and it is recommended with small images.

Output:

Text: The extracted string. This field supports only String variables.

Result: The extracted words along with their on-screen position. This field supports only KeyValuePair <rectangle, string>variables.

Tips:

  • Multiple languages are supported by default.
  • It is suitable for extracting text from a large area and works very fine if the scale is increased.

Google OCR:

Google’s OCR is called Tesseract.

The Properties of the Tesseract OCR are same as the Microsoft OCR but some more options are given for Tesseract OCR Engine.

Options:

Allowed Characters: The OCR engine extracts the given string according to the characters specified here.

Denied Characters: The OCR engine extracts the given string without taking into account the characters specified here.

Invert: If this check box is selected, the colors of the UI elements are inverted before scraping. This is useful when the background is darker than the text color.

These are the other options available for Tesseract OCR which are not present for Microsoft OCR.

Tips:

  • Multiple language support can be added in Google OCR.
  • It is suitable for extracting the text from a small area.
  • It has full support for color inversion.
  • It can filter only allowed characters.

Microsoft Azure Computer Vision OCR:

This OCR uses the Microsoft Azure Computer Vision OCR engine for extracting the specified string from the image.

This OCR engine is capable of extracting the text even if the image is non-classified image like contains handwritten text, graphs, images etc.

Logon:

API Key: The API key used to provide you access to the Microsoft Azure Computer Vision OCR. This OCR engine requires to have an azure account for accessing the computer vision features.

End Point: The endpoint associated with your Microsoft Azure Computer Vision OCR API key. This field supports only strings and string variables.

Options:

Handwriting Recognition: This is a Boolean check box. If this is checked, then the OCR engine will extract the handwritten text in the image. If unchecked, it will ignore the handwritten text.

Tips:

  • It works perfectly for the classified images without any issues.
  • It even works decently if the image is non-classified.
  • I used it for the extraction of the scanned handwritten text and it's accurate.
  • We can use the computer vision features if we have an Azure account, then the API key and Endpoint pretty easy to get.

Microsoft Project Oxford Online OCR:

It extracts a string and its information from an indicated UI element or image using the MODI Microsoft Cloud OCR engine. It can be used with other OCR activities, such as Click OCR Text, Hover OCR Text, Get OCR Text etc.

Logon:

API Key: The API key used to provide you access to the Microsoft Cloud OCR.

This OCR connects with the Microsoft Cloud for performing the extracting features of the OCR. It helps in the more specific extraction of the text and the position of the text.

Google Cloud Vision OCR:

It extracts a string and its information from an indicated UI element or image using the Google Cloud OCR engine.

It gives faster and precise results when compared with the Tesseract OCR engine and is connected with the Cloud.

Options

ResizeToMaxLimitIfNecessary: When selected, the engine attempts to downsize the target image so that it does not exceed the size limit of the Google Cloud Vision engine. By default, this check box is cleared.

It works the same as the Microsoft Cloud OCR and works better on the smaller images and comparatively faster than the Microsoft OCR.

ABBYY OCR:

This OCR is the third party OCR which is famous for extracting the text more accurately and faster than the other OCR’s available and with many options even for the different kinds of documents.

Options:

Correct Orientation: If selected, the page orientation is detected by the engine, and if needed, is corrected automatically. By default, this check box is selected.

Correct Skew: Detects whether the page is skewed and automatically corrects it. The drop-down contains three options,

  • Auto — deskews only images that are detected as being skewed.
  • Yes — forces deskew on all pages.
  • No — does not automatically deskew pages
  • By default, this property is set to Auto.

Custom Recognition Profile Path: The full path to a custom-built Recognition Profile. This field supports only strings and String variables.
FineReader Version: Specifies which version of the Fine Reader Engine is to be used. The options are FineReader Engine 11 and FineReader Engine 12. By default, this property is set to FineReader Engine 12.
Predefined Recognition Profile: Specifies the Predefined Recognition Profile that is to be used when processing an image. This field supports only strings and String variables. The Predefined Recognition Profiles present in ABBYY are present in this link.

Output:

Confidence: The resulting confidence score, stored in an Int32 variable. This field supports only Int32 variables.

The other properties are similar to the other OCRs that are available in the UI path.

Advantages:

  • This OCR helps in giving accurate and fast results.
  • It contains features for converting the TIFF and JPEG into searchable PDF and PDF/A and extract data or text from photos or screenshots.
  • It can support multiple languages effectively and accurately.

NOTE:

  • ABBYY FineReader Engine SDK is required.
  • The engine only works with a license distributed by the UI Path sales department.

ABBYY Cloud OCR:

This OCR is accessible only when subscribing to the ABBYY Cloud and then we can use the features given by the ABBYY Cloud platform.

Logon:

ApplicationID — The application ID provided when subscribing to the ABBYY Cloud OCR service.

Password — The password provided when subscribing to the ABBYY Cloud OCR service.

ServerUrl — The Server URL provided when subscribing to the ABBYY Cloud OCR service.

This OCR engine gives better results and has many options or features to perform on the different types of documents.

CONCLUSION:

  • Among all the OCR engines the Cloud OCR engines produce accurate results.
  • These OCR engines are also used with other OCR activities (Click OCR Text, Hover OCR Text, Double Click OCR Text, Get OCR Text, Find OCR Text Position).
  • This OCR is used in the recording wizards like Screen Scrapping, Citrix, etc.,
  • Accordingly, the best OCR engine with many options and fast and accurate is the ABBY OCR engine and Microsoft Azure computer vision OCR engine.

Originally posted at https://sedintechnologies.com/ocr-conversion-in-uipath/

--

--