Unlocking structured chatbot responses
Artificial Intelligence (or Machine Learning as some prefer) has finally made its very public entrance and now, millions of people are tinkering with the tech - the modern equivalent of taking apart the radio to see how it works. But as any tinkerer knows, dissection is only the beginning. True understanding and advancement only come from a more hands-on approach.
So let us begin with a prompt, human to human, as it were. How do we leverage this new tech in the language of our own applications? Can we channel this liquid intelligence in a structured way?
Join us on our journey to answer just that.
Groundwork
鈥淚 have made this letter longer than usual because I have not had time to make it shorter.鈥 - Blaise Pascal
One of the most significant areas of exploration has been in the space of prompts. The importance of prompts first came into focus with image-generation models like MidJourney where it was simple to change the output image in predictable but powerful ways by adding style and resolution cues like 鈥渋n the style of Salvador Dali鈥, or 鈥渦ltra photorealistic 4K鈥, or 鈥渧olumetric light鈥.
The prompt continues its starring role in the most recent pure language models like GPT3 and 4. In fact, a little digging reveals that the only difference between the raw GPT3/4 API and the consumer-facing ChatGPT app is that ChatGPT simply includes previous questions/responses when sending the current prompt to the API. It鈥檚 actually that simple.
With this in mind, let鈥檚 look again at what we have with fresh eyes. We have a tool that can seemingly 鈥渦nderstand鈥 what we say even when we say it in an unstructured way. But what does it mean to 鈥渦nderstand鈥? That鈥檚 a tough nut to crack. Let鈥檚 fall short and assert that in practical terms 鈥渦nderstanding鈥 is present when a naturally expressed request to transform one thing into another is carried out satisfactorily.
A self-contained example in the context of ChatGPT can be made using the domain-specific language () . Mermaid is a tiny programming-adjacent language that allows the author to concisely express visual diagrams in text. ChatGPT 鈥渦nderstands鈥 how to transform prompts into diagrams.
Prompt: Create a simple mermaid diagram showing a developer asking you to create a mermaid diagram, you responding with that mermaid diagram, the developer asking you to modify the mermaid diagram, and you doing so. Create a circular flow diagram.
ChatGPT: Here's a simple mermaid diagram showing a circular flow of a developer asking me to create a mermaid diagram, me responding with the diagram, the developer asking me to modify it, and me making the changes:
```mermaid graph TD; A[Developer] --> B[ChatGPT]; B --> C[Mermaid Diagram]; C --> D[Developer]; D --> E[Modification Request]; E --> B;
In this diagram, the flow starts with the developer asking me, ChatGPT, to create a mermaid diagram (labeled "Mermaid Diagram" in the diagram). I create the diagram and present it to the developer. The developer then asks me to make a modification to the diagram (labeled "Modification Request" in the diagram). I make the changes and present the updated diagram to the developer again. The flow then loops back to the developer who can continue to request modifications until the diagram meets their needs.
We can visualize this diagram ChatGPT wrote by using a mermaid renderer, which are readily available as plugins for VS Code, Google Docs and many others, as well as being available at .
This is interesting. In truth, nothing different is happening here than when you ask ChatGPT to generate some python code for you. However, Mermaid is a lot simpler than a Turing Complete language like Python.
When I imagine how the underlying models were 鈥渢rained to code in Python鈥, I imagine a training set that involves a substantial portion of openly available code repositories, stackoverflow questions and answers, and blogs. But for a little domain specific language (DSL) like Mermaid? I can easily imagine a concise expression of the syntax and available options with some extra bits that leverage all the vast training already performed generally.
Let鈥檚 combine what we鈥檝e discussed so far:
ChatGPT is a simple layer that embeds the current prompt with previous dialog before sending to GPT.
If GPT knows a domain specific language (DSL) then you can translate unstructured text to it.
Keep this in mind as we move forward.
Embedded Prompting
Plenty of digital ink has already been spilled over how to write effective prompts and at the risk of repeating both them and myself, a prompt can make or break the response. At its most general, the prompt provides context - it works to communicate your expectations about the response. Often we think about using the prompt to condition the content of the response, but it鈥檚 just as effective in conditioning the structure in which that content is embedded.
Consider the following exchange with ChatGPT:
Prompt: "position the car three units to the left of the boat"
ChatGPT: To position the car three units to the left of the boat, you would first need to establish a reference point. Let's assume that the boat is the reference point.
Next, you would need to determine what units you are measuring in. For example, if you are measuring in meters, you would need to know how long one unit is. Let's assume that one unit is equal to one meter.
Finally, you would move the car to a point that is three units to the left of the boat. This means that you would move the car to a point that is three meters to the left of the boat, as measured from the boat's position.
This prompt is lacking in many respects but mostly in the context that would serve to make the response useful in any way. Consider instead:
Prompt: Given the following taxonomy and examples, transform the following prompt:
"position the car three units to the left of the boat"
Taxonomy:
{ 聽聽"verbs": [ 聽聽聽聽{ 聽聽聽聽聽聽聽聽"primary": "move", 聽聽聽聽聽聽聽聽"synonyms": ["translate", "shift"] 聽聽聽聽}, 聽聽聽聽{ 聽聽聽聽聽聽聽聽"primary": "scale", 聽聽聽聽聽聽聽聽"synonyms": ["size up", "size down", "shrink", "blow up", "increase"] 聽聽聽聽} 聽聽], 聽聽"nouns": [ 聽聽聽聽聽聽{ Name: "car object", Id: "23211221", position: [1, 0, 1] }, 聽聽聽聽聽聽{ Name: "boat object", Id: "23943429", position: [-1, 0, -1] }聽聽聽聽 聽聽聽聽] }
Example Prompt1: "increase scale of the boat object by 15%"
Example Response1: { "verb": "scale", "noun": ["23943429"], "property": {"numberValue": 1.15 } }
Example Prompt2: "position the car one unit to the left of the boat"
Example Response2: { "verb": "move", "noun": ["23211221"], "property": {"vectorValue": [-2, 0, -1] } }
ChatGPT: Transformed Prompt: "Move the car object three units to the left of the boat object"
Response: { "verb": "move", "noun": ["23211221"], "property": {"vectorValue": [-4, 0, -1] } }
Explanation: Using the given taxonomy, we can identify the primary verb "move" and the two relevant nouns "car object" and "boat object". The prompt specifies the direction and distance, so we can use a vector value of [-3, 0, 0] to move the car object three units to the left of the boat object. However, since the initial position of the car object is [1, 0, 1] and the initial position of the boat object is [-1, 0, -1], we need to subtract 2 from the x-coordinate of the car object's position vector to get the final position vector of [-3, 0, -1]. Therefore, the response would be { "verb": "move", "noun": ["23211221"], "property": {"vectorValue": [-4, 0, -1] } }.
This is a relatively simple example, but it illustrates how we can skip the raw ML training phase by putting in a little work up front to condense our domain into a concise description. If we add to the concise description some canonical examples of the kinds of transformations on the input prompt we want performed we鈥檒l find the responses to be significantly more accurate and applicable.
One thing to notice here is that we got ChatGPT to extract structured text in the form of a object from our prompt, but it鈥檚 still embedding that JSON in the middle of free-form text. Experimenting with various prompts with the goal of getting ChatGPT to respond only with the JSON led to some interesting results. While it was possible to make GPT respond with only the JSON content, we found that the accuracy of the responses plunged. It would seem that when ChatGPT shows its work, ChatGPT produces better answers.
Maybe our math teachers were on to something.
Putting It Together
We鈥檝e got all the basic ingredients we need to start plugging GPT into our own products. There are many ways, from the perspective of UX, that we could do so, but we鈥檙e not too concerned with that aspect of the effort here. For simplicity鈥檚 sake, let鈥檚 assume we鈥檝e got a text box where the user can type in what they want to have happen i.e. their prompt.
What we want is for the user鈥檚 intent to be translated into the behavior of our application without further user involvement. Let鈥檚 put together everything we鈥檝e learned so far.
First we need some basic functionality as a target for users鈥 intent. For demo purposes, let鈥檚 use a calculator app. It鈥檚 full-featured, so we can 鈥榓dd鈥, 鈥榮ubtract鈥, 鈥榤ultiply鈥, and 鈥榙ivide鈥. Here are our function signatures:
def add(arr): return sum(arr) def subtract(arr): return arr[0] - sum(arr[1:]) def multiply(arr): return reduce(lambda x, y: x * y, arr) def divide(arr): return reduce(lambda x, y: x / y, arr)
As above, we want a prompt template in which to embed the users鈥 prompts. There are many ways to attempt this and some evolution of the prompt is expected from trial and error. Here鈥檚 an example that works well enough for our purposes:
Given the following python API and example transformations, transform the prompt and provide only the output array of operations and a brief explanation:
鈥渰调调鲍厂贰搁冲笔搁翱惭笔罢皑皑皑鈥
Python API:
def add(arr): def subtract(arr): def multiply(arr): def divide(arr):
Example 1
Prompt: 鈥渁dd 2 and 2鈥
Output: [{ 鈥渙peration鈥:鈥漚dd鈥, 鈥渧alues鈥: [2, 2] }]
Example 2
Prompt: 鈥渁dd 3 to the product of 3 and 6鈥
Output: [{ 鈥渙peration鈥:鈥漨ultiply鈥, 鈥渧alues鈥: [3, 6], 鈥渘ame鈥: 鈥渟tep0鈥 }, {鈥渙peration鈥:鈥漚dd鈥, 鈥渧alues鈥:[step0, 鈥3鈥漖}]
An example response passing an embedded prompt of "Subtract 3 from the product of 2 and 5"
ChatGPT: Output:
[{ "operation":"multiply", "values": [2, 5], "name": "step0" }, {"operation":"subtract", "values":["step0", 3]}]
Explanation:
We need to perform the operation "Subtract 3 from the product of 2 and 5".
First, we need to find the product of 2 and 5, which we can do using the `multiply` function. We pass in the values 2 and 5 to the `multiply` function, and assign the result to a variable called "step0".
Next, we need to subtract 3 from the result of the previous step. We can do this using the `subtract` function. We pass in the variable "step0" and the value 3 to the `subtract` function, which gives us the final result.
Therefore, the output array of operations is:
[{ "operation":"multiply", "values": [2, 5], "name": "step0" }, {"operation":"subtract", "values":["step0", 3]}]
Another example response from passing an embedded prompt of 鈥渟quare 5鈥
ChatGPT: Output:
[{ "operation":"multiply", "values": [5, 5] }]
Explanation: The prompt "square 5" is asking for the square of 5, which is equivalent to multiplying 5 by itself. Therefore, the output array of operations would simply be a single operation that multiplies 5 by 5 using the 'multiply' function from the provided Python API.
Now we just need to write some code that steps over the extracted operations and supplies the values. Some care needs to be taken in this example to substitute the values at a particular depth for the associated placeholder string in subsequent operation descriptions such as step0, step1, etc.
While code like that can be fun to write, it鈥檚 typically less exciting to read and since it鈥檚 not critical to understanding my point, I will spare you the inconvenience. The essential ideas here are:
Provide an API description or DSL summary in a prompt embedding.
Provide an output transformation schema that you know how to walk.
Provide example prompt transformations into outputs.
Provide the actual prompt.
Use regex to extract the structured JSON embedded in the response.
Validate and iterate over that JSON to orchestrate and supply inputs to application code.
In actual practice, one has to refine and home in on an embedded prompt that works for their context, including identifying the best set of example transformations to include within the prompt. Practically speaking, some guidance to users to try to keep their language simple, in these early days, would go a long way, as well.
As a final piece of advice from my personal refinement journey, consider only using these types of flow when your desired behavior can be corrected/undone or has low-impact. Allowing GPT to identify operations that can鈥檛 be undone seems problemattic at this early stage in the technology. Also don鈥檛 hook it up to your missile launcher.
Next Steps
We now know how to embed a first-order prompt inside a higher-order prompt to make the response structured according to our own idiosyncratic context. We have the broadest outline of the basic flow. There are ways to improve and embellish this. Here are some examples to get you started:
You can mix in or other voice to text solutions to take the keyboard out of the equation.
You can update your embedding to allow for ChatGPT to ask for clarifications, and when it does, follow different control flow paths in your own code.
You can challenge ChatGPT responses in patterned ways to elicit corrections.
You can have ChatGPT itself summarize the salient bits within a longer exchange so as to provide greater context to future prompts.
You can make multiple requests to put a request through a series of transformations or categorizations as if you鈥檙e using GPT to transition through a state-machine.
You can allow the user to see and potentially cancel a complex request flow.
In joy, fellow tinkerers!