Extract Document Text Block

⚠ Experimental Feature

This feature is Experimental and may change based on user feedback and testing. Share your thoughts via our chatbot to help us improve it.

The Extract Document Text Block in Leapwork enables users to extract text from images and scanned PDFs using Optical Character Recognition (OCR) technology. This functionality is essential for automating validation processes and ensuring that text-based information is correctly processed within workflows.

Example: A user uploads a scanned contract in PDF format and extracts the text automatically for validation in an approval workflow.

Note: This feature is available starting from Release 2025.1.XXX.

When fully expanded, the Get Document Information block displays the following properties:

The Block Header ("Extract Document Text")

The green input connector in the header is used to trigger the block to start executing.

The green output connector in the header triggers when the file type has been successfully converted to text.

The title of the block “Extract Document Text” can be changed by double-clicking on it and typing in a new title.

Users must select a supported file type when importing a document.

Source Type

Once you drag the file into the block, the block will automatically recognize the file type. A user can choose any of the below options as a source type:

Data File: File for upload will be saved inside of Leapwork.
Local Path: File for upload will be referred to from a specified path.

Select the file to extract the text

This field allows users to upload a file. By selecting "Import New File", a window will open to upload the document.

Extracted Text

This parameter returns the text recognized within the image or document. If the PDF contains multiple pages, the text is extracted from all pages and combined in order.

Example:
A scanned PDF containing the text "Contract Agreement 2024" will return:

Extracted Text: Contract Agreement 2024

Failed

If the Extract Document Text Block fails to extract text, the failure connector is triggered. Possible reasons for failure:

Unreadable or low-quality scanned text.
Unsupported file format.
Password-protected or encrypted PDF.
A corrupted PDF file.

When triggered, this connector can be used to handle failure scenarios within the automation flow.

Default Timeout

If the Default Timeout property checkbox is not selected, then the timeout value is 10 seconds. If the Default Timeout property checkbox is selected, then the Default Timeout value selected in the flow settings will be applicable.

Timeout

The maximum time spent converting the file type before giving up and triggering the Failed connector.

Note: All cases have a global timeout that can be configured in the Settings panel. This is unrelated to the timeout of a single building block. However, a running case will automatically be cancelled if it runs for longer than the global timeout.

Created 2025.02.12

Product Tour

Product Tour

Generative AI

Integrations

Partners

Security

Releases

What’s the latest?

By Job

By Technology

By Industry

By Application

Application Monitoring

Functional UI Testing

Continuous Testing

Agile transformation

Regression Testing

DevOps

Test Automation

End-to-End Testing

Salesforce

SAP

Mainframe

ServiceNow

Dynamics 365

Citrix

Oracle

Banking and Financial Services

Retail and Ecommerce

Healthcare and Life Sciences

Telecoms

Manufacturing

Logistics and Transportation

Web Applications

Mobile Web

Desktop Applications

Virtual Desktop

By Job

By Technology

By Industry

By Application

Blog

Events and webinars

Resource hub

Knowledge base

Technical support

Training

Learning center

Leapwork certification

Documentation

Architecture Overview

Licensing and Deployment

Installation

Working with Leapwork

Building Blocks

Administration

Public Rest API

For DevOps

Our story

Our story

Customers

News

Careers We’re hiring!

Leadership

Become a partner

LinkedIn

Twitter

Documentation

Extract Document Text Block

The Block Header ("Extract Document Text")

Source Type

Select the file to extract the text

Extracted Text

Failed

Default Timeout

Timeout

Product

Company

Support

Contact