How Avenue Engineering built an Ops Copilot with OpenAI
You’ll be surprised at how easy it is to create an effective AI assistant. With the release of large language models like GPT-3.5, building an AI assistant that can understand and respond to natural language queries has become much more accessible.
In this engineering blog, we'll explore how Avenue engineers built Copilot, an AI assistant that helps users create and manage monitoring tools and ask questions about our tool using GPT-3.5. I'll dive into the technical details of how we designed Copilot, discuss the challenges we faced, and share some tips and tricks that can help you create your own AI assistant.
What is a Large Language Model (LLM)
Large Language Models (LLMs) are artificial intelligence models that are trained on massive amounts of natural language data, allowing them to generate human-like language and understand natural language queries. GPT-3.5 is an example of an LLM. These models have recently gained popularity due to their ability to perform a variety of language-related tasks, such as translation, summarization, and question-answering. In fact, this blog post was written by one of these models!
LLMs can be used to create AI assistants like Copilot, which can understand and respond to natural language queries. AI chatbots make it easier for users to interact with complex systems and save users time onboarding to your product. However, LLMs are not perfect and have some limitations. They can sometimes generate responses that are nonsensical, which is usually referred to as hallucination. As with any AI system, it is important to verify the information provided by LLMs, especially when dealing with business-critical tasks.
Another important limitation of current LLMs is that they can only handle a relatively small amount of tokens. For example, GPT-3.5 can only handle 4,096 tokens per request and response. Although it may seem like a lot, later on you will see that it actually imposes some significant constraints on our prompt.
What is a token anyway? A token is a part of a word that OpenAI’s models use to predict the next word. A helpful rule of thumb is that a token is about 4 characters, which is what I usually use to estimate the number of tokens my prompts have. Check out OpenAI’s blog post “What are tokens and how to count them” for a more in depth look into tokens.
Since we have such a small number of tokens the amount of contextual knowledge we can give our assistant is fairly limited. However there are some ways we can deal with this, which we will explore later in this blog.
For those who don’t know, a prompt is our primary way of interacting with an assistant and improving its performance. A prompt is a block of text that we provide an LLM to instruct it to perform a task. Here’s an example of a summarization prompt in OpenAI’s playground:Prompts can be very simple. For relatively lightweight tasks, such as summarization, this may be all that you need. However, as your needs grow, the complexity of your prompt will also increase.
Lets look at an example of a basic AI assistant prompt for use with GPT-3. The following is a conversation with an AI assistant. The assistant is helpful, creative, friendly.
Human: Hello, who are you?
AI: I am an AI created by OpenAI. How can I help you today?
You’ll notice that this prompt starts with defining the role the model plays as well as some initial messages separated by “Human:” and “AI:”, which helps the model understand the flow of the conversation. This is a decent prompt for a very simple chat assistant that can answer basic questions. For example:
Human: Who won the gold medal for the 100 meter freestyle in the 2016 Olympic games?
AI: The gold medal in the 100 meter freestyle at the 2016 Olympic Games was won by Kyle Chalmers from Australia.
This prompt will give you a very general assistant that is similar to ChatGPT. For your use case you will definitely want something more specific.
Structure of a Prompt
To make it easier to construct your prompt, and to humanize the algorithm, I find it helpful to make analogies to human thought processes. My analogy starts with breaking the prompt down into a set of discrete parts that I call Identity, Knowledge, Directive, Imagination and Short Term Memory.
Identity is simply the name, personality, and role that the AI assistant takes on. The start of the example prompt given above is a good example of an identity that could be given to an assistant and may not need to be much more complicated.
Knowledge is institutional knowledge about your product inserted into the prompt. Knowledge can be added to a model with training or refining your own language model. But this requires a large training set to be done effectively and may not always be feasible, especially on your first implementation. As well, if your product changes and your docs need to be updated, then you have to build a new training dataset and retrain your model all over again. However, by inserting knowledge as part of your prompt you can dynamically add relevant knowledge, at run time, by utilizing classification. Effectively giving you an infinite and flexible knowledge base to work with.
A Directive is an extension of the assistants identity and can help steer the conversation. In the prompt for Copilot, our directive includes more information such as “Copilot may be asked by the user to help come up with ideas for new monitors or collaborate on a monitor idea (’Ideation’)”. I find it useful to separate the directive from the identity so that it can be changed at run time to better fit a user message using classification. This can help your assistant be more flexible in the type of questions you wan’t to service, while still giving it some structure to work with.
Imagination is an extension of the directive. It is a set of example conversations between a user and the assistant based on its current directive. It can help give some more structure to the conversation and allows you to insert specific strings that can be parsed out later to add more direct functionality to your assistant. For example, in Copilot’s imagination, we instruct Copilot to add the string “RECOMMENDATION:” proceeding every monitor recommendation it makes. This allows us to parse the title and description from a monitor recommendation and provide a link for the user to click on which then creates a monitor draft. Similar to knowledge, this can be accomplished with a custom model or a refined model if you have the data available.
Short Term Memory is simply the log of a single conversation between a user and the assistant. LLM’s are stateless, so continuity during a conversation is maintained by adding each user message and model response to a conversation list. It should be obvious that after a while this will become a problem due to the limit on tokens I mentioned earlier. Luckily there are some tricks for dealing with this.
Building your prompt
Let’s imagine a SaaS company called “Pequod,” which enables users to build interactive maps and dashboards for GPS location data. Pequod wants to create an AI assistant named “Ahab” that will help users brainstorm ideas for an interactive map dashboard and for general documentation related questions. Let’s start with the identity of the assistant.
You are an AI assistant. You are helpful, creative, clever, inquisitive, and very friendly. The AI assistant is named "Ahab" and was created to assist customers at the company "Pequod".
Next, the static portion of our institutional knowledge may look something like this. Right now, Pequod currently only has one piece of documentation, which we can attach to our knowledge section.
Pequod is a SaaS company created to help users create interactive maps using the Google Maps API and dashboards with their own GPS data. Pequod's target customers are primarily international shipping companies, who feed their data into our platform for visibility into shipping routes. Pequod's flagship product is "Essex" which is an interactive map API.
Data supplied to Essex must be made with a http post request in the format.The metadata field is an unstructured json blob that can be used to supply your maps with more specific information about your data set with our frontend editor.
Since we want our assistant Ahab to offer ideas for interactive maps our directive may look something like this:
Ahab may be asked by the user to help come up with ideas for new interactive maps made with Essex ("Ideation") or with general questions on working with Essex ("Documentation"). Ahab should ask the user for some information about the organization that they work for and what their role within the organization. In "Ideation" mode, if Ahab feels that it has enough information to help the user, Ahab will generate at least 3 possible map ideas for the user to create. When Ahab gives a list of interactive map recommendations it always starts his recommendations with the phrase "**RECOMMENDATION**:".
You should note that at this point the assistant has no context on the user or their organization, so we instruct it to gather this information before it makes any recommendations. When giving your assistant instructions, you should encourage it to ask clarifying questions to minimize the amount of unhelpful hallucinations it may otherwise return.
To complement our directive, we include an example conversation between a user and Ahab for the assistants imagination. Here is an example "Ideation" conversation between Ahab and a customer where "H" denotes the human participant, and "A" denotes the assistant, Ahab:Then, we can jumpstart the conversation with an initial user message which is simply “Hello, who are you?”
If we combine all of our sections into a single prompt, we can test it out in OpenAI’s playground and see how well we did!
The playground is a great way iterate quickly and improve your assistants capabilities. Try experimenting with changing your prompt and your parameters to see what works best for you and your organization.
The above use case was made for OpenAI’s text-davinci-003 API for simplicity. The GPT-3.5-turbo chat API (which Avenue uses for Copilot) is different and you may need to modify the prompt slightly. The main difference is that instead of accepting one large block of text as the prompt, the GPT-3.5-turbo API accepts a list of messages that can have roles associated with them. The three possible roles are “user”, “assistant”, and “system”. The prompt sections we’ve created above can be added as system messages proceeding the user and assistants conversation. See OpenAI’s documentation on instructing a chat model for more information on sending requests to GPT-3.5.
Strategies for dealing with the token limit
Keeping in mind the GPT-3.5 token limit of 4,096 tokens, our Pequod example’s prompt is about 1,000 tokens. This leaves us with over 3,000 tokens for the conversation between the user and the assistant. While this may seem like a large number, it is very easy to fill this space with documentation sections.
If we wanted to add more documentation into our knowledge section to give the assistant more context on the Pequod platform, we would very quickly run out of space. The way we dealt with this problem in our implementation of Copilot was by using OpenAI’s “Embeddings” API.
The Embeddings API allows you to convert text into a numeric vector, which is a token representation of the text. With embeddings you can easily calculate a similarity score by taking the dot product of two vectors.
We store vector representations of short documentation snippets (<400 tokens) and at run time we calculate similarity scores for the latest user message and the entire conversation for all of our documentation snippets.
We then pick the two documentation sections that have the highest similarity scores and dynamically insert them into the knowledge section of the prompt. This allows you to have a theoretically infinite knowledge base for your assistant, while keeping the total number of tokens small. And because we do this for each user message, our knowledge section changes throughout the lifetime of a conversation based on the users needs. See this tutorial in OpenAI’s documentation for more information on using their embeddings API.
If a user is having a long conversation with the assistant you may still run out of space however. Solving this is much simpler. You can use the same GPT-3.5 API to begin summarizing the oldest messages into a single system message. This is a lossy transformation and the assistant will start losing some context about older messages, but this should be sufficient for the majority of use cases and will stop the conversation from failing abruptly.
Keep in mind the limitations and potential pitfalls of current LLMs, and always verify the information provided by your assistant for business-critical tasks. With these considerations in mind, you can create an AI assistant that provides real value to your users and makes their lives easier.
Interested in learning how Avenue can help you brainstorm ways to solve ops issues, write sql, and answer questions using OpenAI's ChatGPT? Checkout our blog highlighting, Copilot, your all-in-one ops, analytics, and customer service teammate here!