WHAT WE THINK

Will Generative AI Kill Data Transformation?

Generative AI
26-Aug-24

Enterprises invest millions of dollars in multi-year data transformations with the best intentions: to centralize and organize their data so disparate systems can communicate, eliminate duplications, comply with regulations, and importantly, uncover insights. Yet standardizing and integrating this data is a challenging, time-consuming task, compounded by the significant investment in reporting tools like Power BI, which involve multiple steps just to make the data usable in the first place. Suffice it to say, data transformations are An Ordeal.

However, like many other processes, the advent of vector databases and generative AI is revolutionizing data transformation. These technologies provide a shortcut from data to insights, bypassing the need for extensive transformations. No Ordeal. 

It’s Time to Replace your OCR with AI.

Consider a typical scenario faced by professionals in the government, tax, healthcare, and energy sectors: compliance regulations require your agency or company to digitize all records. This often means scanning millions of documents. While these scans technically digitize the records, the data within them remains unusable. Scanned documents are essentially image files, necessitating a text extraction process to retrieve the embedded data. 

The challenge lies in the inconsistency of these images. Tax bills, license plates, and driver’s licenses all vary by location and classification. (In the US and its territories alone, there are 720 different driver’s license formats.) To address this, humans create multiple templates, manually drawing rectangles on parts of documents to indicate where specific information can be found. Even after extraction, the data must undergo post-processing to convert it into a format suitable for databases or spreadsheets. 

 


scenario how LLMs provide a shortcut for data extraction:

A Large Language Model (LLM) can eliminate the entire cumbersome process between input (scanned document) and output (spreadsheet line item). Provide an LLM with the same scan and request an SQL statement to insert the data into a table. Despite the 720 different styles of driver’s licenses, the LLM can accurately identify information such as names, heights, and eye colors, and export the data correctly.


Spend almost no time refactoring your database.

As enterprises scale, and new projects crop up, they wind up with a “data junk drawer”— an unorganized and unconnected collection of data with unknown value.

To make the data usable, enterprises then sign multi-year contracts with data warehousing companies. Ten to thirty engineers work to refactor the whole database without interrupting their systems, and almost everyone is required to stop what they’re doing and assist on the migration. Engineers translate the data to plain text to make it searchable by keyword, and Marie Kondo it as they go. (“Is it compliant? Duplicative?”)

 Even the most well-run businesses in the world have enormous technical debt and data lying around that they shouldn’t. And they build organizations to deal with the risk and fallout knowing it will happen. Again, what an Ordeal!

LLMs don’t mind a disorganized “data junk drawer”.

AI flips this issue on its ear. Large Language Models doesn’t care how your data junk drawer is formatted. Instead of keywords, it relies on tokens (vectorized data), which don’t merely consider a word, but the relationship of that word to all the others around it. Vector databases allow us to do likeness searches. Fed your SAS tools, widgets, and various databases in different formats, an LLM can help you find the correct data and identify what needs to be purged.


real-World client example

When national fast food franchise owners experienced technical issues (i.e. an error screen on the back office computer, an oven is acting up, etc.) they called corporate for support. Corporate call center employees then conducted a keyword search within their support documents to find a solution. The trouble is that the keyword search might return 100 different documents, most of which were irrelevant.

We created an LLM-based bot for call center employees, which we fed their call support documents. We vectorized the data so that when a user typed in an L3 or L4 error issue, the LLM would provide the call center employee with relevant documentation and provide follow-up questions for them to diagnose the issue and find the appropriate solution, significantly reducing the time it took employees to fix issues.


With AI, you can discover more value in your data as it sits today, without all the overhead.

We set up big systems, such as CRMs, with the hope to make actionable decisions on the data: to send out emails, texts, push notifications to the right people about a sale coming up, etc. Or to fix an issue like cart abandonment. Traditionally, it’s a massive effort to get the data to a state where you can then make assessments about it at a broad level. Enterprises must think through an entire workflow, mapping out every entity and step of the process they need to achieve the goal. 

I’m not implying you should ditch your CRM altogether. Rather, treat them like a data store. Your CRM can send out your emails, but don’t give up ownership of your delivery services. Hook your LLM up to your CRM’s KPI instead.

AI identifies opportunities and threats instantly.

AI can provide proactive notifications about opportunities and problems and their implications in real time. Enterprises can know near-instantly the potential cost or revenue generated by a specific action. If there’s an opportunity with a customer being at the right time in the right place, the AI can craft a hyper-personalized message in accordance to the rules and data systems we specify and act on it immediately. A few examples:

  • When a user adds over $500 of merchandise into their cart, they’re offered an automatic 10% discount offer to encourage the sale
  • When a user makes a GDPR request for all their data, the AI can scrape it within 30 seconds. (Enterprises would usually spend years building a system to be compliant with that!)
  • If the AI detects a suspicious pattern of behavior, it can adjust the settings for a specific user, rather than dumping an alert into a que

The actions that come out of these systems are simpler, more effective, and higher quality than those of laborious, heavily human-dependent traditional methods.

But wait: AI isn’t perfect.

You can’t have a conversation about the incredible opportunities with AI these days without acknowledging the risks. Large language models can hallucinate. They’re capable of error. Luckily, there are good defenses against these risks—the chief among them, a governance model, which is essentially another LLM checking the first LLM’s work for accuracy and compliance. Think of it this way: if you run a complex problem through an LLM and get the correct answer 80% of the time, running it through three, four, or five times you might be looking at a correct answer 98% of the time. Lastly—and this bears repeating—AI needs human oversight. A human being needs to review outputs before implementation in “the wild”.

Your best move right now: eschew transformation, embrace AI.

Large language models have amazing promise in delivering the right data, at the right time, in the right place. This is not a novel goal: it’s what we’ve been asking of data organization to date, only we previously had to move mountains (and pay through the nose) to achieve it! Finally, enterprises can find value in their data as it sits today—no extraneous overhead or lengthy investment contract needed. AI needs to be the next evolution of your data transformation journey.

 Is it perfect? No. Will there be another big announcement from one of the big players in generative AI upending all we know on the topic once again? You bet, but don’t wait for it, because there will always be another (and another) innovation down the line. Don’t let your data—or your enterprise, for that matter—sit on the sidelines during this momentous renaissance.

What can generative AI do for your enterprise? Let’s talk. 


ABOUT THE AUTHOR

Joel Dykstra, Digital Architect, has over 15+ years of hands-on experience leading the architecture, design, build, and management of digital experience platforms. I’m able to see and communicate the big picture in an inspiring way, and I’m skilled at generating new and creative approaches to problems when all solutions seem exhausted.

Connect with Joel on LinkedIn

Connect

Let’s make it happen.

The path to your next great product, invention or software application starts here. The first step is starting the conversation. Simply fill out this form, and we’ll reach out immediately.

"*" indicates required fields

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.