My twitter “for you” feed is full of users proclaiming that X AI is going to disrupt Y industry and yet they haven’t even tried the product. There is so much hype right now and very little understanding of how this might apply to real business use cases.
Alex and I are focused on the real world application of new tech, and the AI du jour are Large Language Models (LLMs) with ChatGPT being the fastest growing product of all time.
Given the pace of AI developments it would be hubris to prescribe what will be disrupted; instead I’m going to break down the current limitations and how this applies to use cases.
LLMs generate their answer based on the next word being the most statistically likely based on the previous messages. There is no truth or facts, just what is the most statistically likely to happen next. Since the LLMs have seen thousands of examples of 2 + 2 = 4. when prompted with “2 + 2” it will answer “4” but only because that is the most common pattern. When you ask something nonsensical it will always play along statistically and this is the hallucination problem. For example:
To understand what are good use cases, you first you need to understand two questions:
For example, it is very quick to verify if generative AI art is good or not. The same goes for generative code, it’s very easy to run the code and see if it works or throws an error. Conversely, historical analysis would take time to verify.
For generative content use cases, such as writing marketing or add copy and Jasper is leading the way here. However, it doesn’t have to stop there. If it’s very easy to correct the output and the verification is quicker than human generation; it’s possible to make use cases where extremely high accuracy rates are needed viable. For example, generating internal company announcements where the user can then quickly verify and make quick edits before releasing. This is sometimes referred to as “Human in the loop”.
So bear in mind:
There is a maximum token limit per request depending on which model OpenAI has enabled for you. Most people have 4,000 token limit on GPT-4 but there is a restricted 32,000 model. 32,000 tokens are equivalent to 22,000 odd words or around 10 to 20 pages of text. It’s not enough to upload all your company data.
Now you’re probably wondering how Intercom’s Fin and Slite’s Ask are able to build ChatGPT products on top of the incredibly large data sets. The answer is a hybrid approach:
This architecture works well when the answer to the question only requires a few references, i.e. What does X mean, when did Y happen? When you start to ask questions that aggregate multiple references, i.e. what is the revenue last month from the tier 1 customers? The search query on the vector database will return greater than 4,000 tokens. Which means your LLMs are answering on partial information and will return incorrect information.
Alex and I believe this is one of the biggest problems to solve. There are a number of interesting strategies around this using Langchain and Data Independent on YouTube has a good video on it. There are other options too, instead of a Vector DB you can instead turn the LLM query into a SQL or API query. ChatGPT Code Interpreter is doing the rounds. GPT-3 had fine tuning and it’s possible that OpenAI have something in the works for the latest models.
So bear in mind:
Deeply embedded in the AI model there is an element of randomness which means that you can’t always guarantee 100% of the time that given the same input you will always get the same output.
To mitigate the impact you can set the temperate to 0. This is highly effective when the prompt and expected answer is short (ensure to use stop sequences). However, when the use case requires generating longer text then the probability of divergence is higher.
I asked a friend who works in PR if and how they use ChatGPT. He said he did, and it was great. First thing they do is ask ChatGPT to write a press release. This he says is the “default answer” and they use it as a base of what NOT to write.
This is just one example of Data Drift, where the LLMs become outdated and the reason is that LLMs have to be trained on a huge corpus of data. This is an expensive and multi-month process meaning the core LLM is always out of date. Plus, for safety reasons they are not trained beyond a certain date, GPT-4 cut off date being September 2021. This also means that LLMs (or at least our current architecture) don’t learn in real-time.
This means that I can’t ask ChatGPT to help me with debugging LangChain (it was started a year after the training cut off date). Sure you could upload the LangChain docs as the prompt and then ask questions. If the problem space is evolving quickly and requires expertise then much of the heavy lifting will be needed in the prompt context which is already a constraint.
So bear in mind:
Constraints conspire creativity and necessity is the mother of invention. Traditionally the wave of B2C apps from the last platform wave (mobile apps) were free and ad driven. God forbid paying 99¢ a year for instant messaging and phone calls in one app.
The volumes of consumer markets will probably not permit too many businesses to charge. Remember, Youtube and podcast apps will just interrupt right then and there with an advert. Do not be surprised if half way through your AI answer there’s an advertisement being printed out.
I think the more interesting evolution will be in B2B applications. SaaS per seat pricing could be threatened. If you have a highly capable bot the costs generated by usage will not be evenly distributed, i.e. you will have a handful of users running complex queries with multiple GPT-4 round trips (expensive) and then you might have dozens of users just retrieving info requiring single GPT-3 requests (cheaper). Further, the UX of a bot, particularly in slack enables the whole company to use it. We wouldn’t be surprised to see the pendulum swing towards transactional based pricing that is widely used by developer platforms. Adopting elements of this means that price can scale based on computation.
Costs are going down. Some use cases i.e. classification or deduplication of content can use cheaper models. Open source is becoming increasingly competitive.
So when settling on a business model and use case, bear in mind: