Get Document Information Block

⚠ Experimental Feature

This feature is Experimental and may change based on user feedback and testing. Share your thoughts via our chatbot to help us improve it.

The Get Document Information block in Leapwork enables users to extract key metadata from documents, such as author, page count, and creation or modification dates. This functionality is essential for automating validation processes and ensuring that documents meet specific criteria.

Example: A PDF file containing a contract can be analyzed to extract the author, page count, and creation date for validation in an approval workflow.

Note: This feature is available starting from Release 2025.1.XXX.

When fully expanded, the Get Document Information block displays the following properties:

The Block Header ("Get Document Information")

The green input connector in the header is used to trigger the block to start executing.

The green output connector in the header triggers when the file type has been successfully converted to text.

The title of the block “Get Document Information” can be changed by double-clicking on it and typing in a new title.

File Type

The Get Document Information block supports the following file types:

  • PDF
  • Word Document

Users must select a supported file type when importing a document.

Source Type

Once you drag the file into the block, the block will automatically recognize the file type. A user can choose any of the below options as a source type:

  • Data File: File for upload will be saved inside of Leapwork.
  • Local Path: File for upload will be referred to from a specified path.

Select pdf file to upload

This field allows users to upload a file. By selecting "Import New File", a window will open to upload the document.

Author

This parameter returns the author of the document, if available. The author is typically stored in the document metadata and can be used to validate document ownership or source.

Example: A PDF file created by "John Doe" will return:

Author: John Doe

Page Count

The page count indicates the total number of pages in the document. This is useful for validation when a document must meet specific length requirements.

Example: A Word document with 5 pages will return:

Page Count: 5

Date Created

This parameter retrieves the date and time when the document was originally created. It can be used to verify document versioning or compliance with time-sensitive requirements.

Example: A document created on January 15, 2024 will return:

Date Created: 2024-01-15T10:30:00Z

Date Modified

The last modified date reflects the most recent time the document was edited. This is useful for tracking document changes over time.

Example: If a document was last modified on February 10, 2024, it will return:

Date Modified: 2024-02-10T14:45:00Z

Document Title

If a document has a title set in its metadata, this parameter will return it. The title is often used in structured documents, such as reports or legal files.

Example: A PDF document with the title Annual Report 2024 will return:

Document Title: Annual Report 2024

Failed

This connector is triggered if the Get Document Information block fails to retrieve metadata from the document. A failure can occur due to various reasons, such as:

  • The document being inaccessible or corrupted.
  • A network issue preventing communication with the document processing service.
  • The processing time exceeding the allowed timeout.

When triggered, this connector can be used to handle failure scenarios within the automation flow.

Default Timeout

If the Default Timeout property checkbox is not selected, then the timeout value is 10 seconds. If the Default Timeout property checkbox is selected, then the Default Timeout value selected in the flow settings will be applicable.

Timeout

The maximum time spent converting the file type before giving up and triggering the Not converted connector.

Note: All cases have a global timeout that can be configured in the Settings panel. This is unrelated to the timeout of a single building block. However, a running case will automatically be cancelled if it runs for longer than the global timeout.

 

Created 2025.02.12