Many of us have gotten good at using ChatGPT to answer questions, generate basic content, and finetune the style or tone of our writing. These capabilities are amazing and helpful. Even though the science behind generative AI is not new it took significant innovations in data architecture, software, and infrastructure to pave the way for the groundbreaking, simple, and accessible generative AI tools we use today.
These innovations unlocked the massive data processing scale needed to map out the structure and flow of human languages. These language maps drive Large Language Models (LLMs), the foundational technology behind text generators like ChatGPT.
One noteworthy limitation of LLMs is that they are restricted to the information used to train them. Unlike mathematical models, which aim to create generic formulas, LLMs map the actual word sequences from their training data.
For example, if an LLM was trained only on the works of Shakespeare and we asked it to tell us about the International Space Station it would have no relevant information about space exploration and would likely give us a nonsensical answer. As we add more diverse training data the models gain broader context and become more accurate in responding to our questions. So, if we feed our Shakespearian model recent journal articles about space exploration, the model will be able to return meaningful information about it, perhaps even in verse. Unfortunately, adding training data to LLMs is a long and expensive process.
In the next 12-18 months more companies will transition from experimentation to delivering material returns on their investments in AI. AI experts agree that the most successful generative AI implementations will unlock narrow, knowledge-intensive use cases specific to individual domains, industries, companies, and teams. High-quality, up-to-date enterprise data, company knowledgebases, procedures, and policies will become key AI value drivers for organizations.
To unlock such narrow use cases Generative AI solutions will need both the power of large-scale generic LLM capabilities and small-scale, private, company-specific context. There is a significant gap between the slowly changing generic language capabilities offered by the big LLM vendors and the need for many companies to operate their AI solutions in the fast-changing contexts of their own businesses. Organizations need AI solutions that speak their language, understand their business, and can keep their data private and up to date.
An architectural approach called Retrieval Augmented Generation (RAG) was developed to address this gap. RAG architecture augments large language models with an authoritative knowledge base outside of the LLMs’ massive historical training data. This knowledge base provides the company-specific context, while LLMs deliver the power of language comprehension and composition.
RAG is relatively simple and widely supported by LLM vendors and the open-source community. With RAG, the application first searches company data, retrieves chunks of relevant content, and passes them to the LLM along with the original question. LLMs are instructed to keep the company data private and provide the answer relying only on the context provided in the request. The diagram below shows the high-level architecture of a RAG-based AI chat application.
There are many tools and options for building and deploying RAG applications. Big software vendors like Amazon, Microsoft, Google, and Databricks are developing services and infrastructure that will enable developers to deploy RAG applications with minimal upfront cost. Several promising no-code and low-code alternatives from smaller vendors are also available. This is an emerging fast-changing space with many pre-release and version-1.0 AI tools launching almost daily. Microsoft Fabric and Amazon Bedrock are the current leaders and offer end-to-end services for building and running RAG applications.
In parts two and three of this series, I will share my experience building a RAG chat application using Microsoft Fabric and Amazon Bedrock. My application will enable intelligent chat with an AI corporate travel policy advisor.
Stay tuned.
#AI #BusinessIntelligence #RAGArchitecture #AIinnovation #EnterpriseAI #DataDriven #DigitalTransformation #GenerativeAI #CDAO #CDO