Efficient data extraction with AI

Challenge
Our client needed to extract accurate and detailed data from various gas pipeline PDF statements, each with a different format. The process was intricate due to the necessity of extracting essential data items such as statement dates, meter IDs, and daily readings from tabular data. Their current manual process was functional but time-consuming and error-prone, provided only monthly totals, and did not give the client access to the data in a granular and timely manner.
OUTCOMES

The value of the project extended beyond providing a proof of concept. The refined workflow was incorporated into the client’s production pipeline.  Completed within a short 3-week engagement, this initiative highlighted Dialexa’s proficiency in delivering rapid, impactful results by leveraging AI technology.

100% Accuracy: Data Extraction

This exceptional level of accuracy underscored the solution’s reliability and effectiveness in handling complex data extraction tasks.  The results allowed us to move the POC into production to provide an immediate impact to our client.

3 Weeks: Time to Value

Dialexa delivered a comprehensive solution, from proof of concept to production deployment, within a span of just three weeks. Our ability to rapidly turn around a sophisticated AI-based tool delivered impactful results to our client and exceeded the original plan of the engagement.

90% Reduction: File Processing Time

This solution has significantly reduced the time spent on manually extracting the data and removed the errors inherent in that process.  Now, the process is so efficient they are able to extract daily readings along with the monthly reading, which enables invaluable insights into day-to-day operations and anomalies.

WORK

Manual process turned efficient and accurate with AI

To solve this, Dialexa crafted a solution employing Optical Character Recognition (OCR) technology and a Large Language Model (LLM).  Within the LLM, a customized prompt was created to identify and extract the relevant data, and then create a structured JSON output. This approach proved to be highly effective, delivering a flawless accuracy rate of 100% across five of the six tested document formats. To ensure comprehensive coverage of all formats, preprocessing scripts were set up to toggle between the OCR technology and a third-party parsing solution as required prior to sending the data into the LLM, thereby enabling the precise processing of all existing formats.

Client

Climate solutions as a service firm

Roles
  • Engagement Manager
  • Engineering Lead
  • Product Owner
THINKING
  • Collaborative Quality
  • Product Engineering
  • Generative AI
Connect

Let’s make it happen.

The path to your next great product, invention or software application starts here. The first step is starting the conversation. Simply fill out this form, and we’ll reach out immediately.

"*" indicates required fields

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.