As we're nearing the end of another year, some of us are giddy with anticipation over the magic of the holidays and the promise the new year brings, and for many of those working in the tech industry, we are anticipating the release of the next generation of ChatGPT, GPT-5 confirmed by OpenAI CEO Sam Altman.
The release of GPT-4 by OpenAI brought another leap in performance and capabilities, building on the impressive text generation abilities of its predecessor, GPT-3 and ChatGPT. We are witnessing a new era of text generation, where machines can complete our sentences and even write entire essays, screenplays, code snippets and determine our work schedules. But with great power comes great responsibility. As impressive as these tools may be, there are limitations and considerations that must be taken into account before using them.
If you are a developer looking to build new products using ChatGPT / GPT-4, there are a few tips that you can keep in mind to make sure that you are getting the most out of the AI tool.
Firstly, make sure that you are testing multiple models and experimenting with different parameters. OpenAI offers a User Interface (UI) to test out responses for ChatGPT. GPT4 access remains limited, but there are 4 different pre-trained GPT3 models available and with OpenAI’s open source API, you can directly integrate one of the models into your software. OpenAI provides excellent developer documentation and examples at beta.openai.com. Overall, the API is very easy to use and provides access to the different GPT models via Python or node.js libraries. To use, simply send a query to the GPT model along with some configuration parameters and get a response back — all processed in the Cloud. Pricing depends on which model you are using, but OpenAi offers a free model tier, however, fees will kick in with each query once you exceed the free tier.
Does the Output Justify the Cost?
To get started, you need to evaluate and determine which GPT model options could be suitable for testing out your application. Sure, more complex models will obviously have a more robust output and handle more difficult text generation tasks, but like anything else that comes with the caveats. The lower performance models have their rightful place, and if you’re looking to build a tool with simple text and code completion or text translation, then a model like Ada can provide you with the AI engine you need with less cost and faster output.
More advanced models should be reserved for more creative and larger text generation tasks, like generating original stories, songs, English lit papers, or B2B marketing content creation. This would require a more advanced model, the most advance being DaVinci, and in turn takes longer to receive output and comes with a higher price tag. But these are jumping off points, and given that the API is easy to use, load the different models and see what kind of outputs you’re getting off each before making your decision. I highly recommend testing out a couple versions, getting a feel for the output on each, and then determine if a more robust solution is worth the extra time and cost over a simpler model.
This is important because the output of the model can vary significantly depending on the input parameters, and you want to make sure that you are using the right model for your specific use case. You may find that a simpler model is sufficient for your needs, or you may need to use a more complex model to achieve the desired results. It is also important to test the output of the model under different conditions to ensure that it is robust enough to handle a variety of inputs.
Remember, Garbage in Equals Garbage out.
Take time to get into the nitty gritty of testing each model’s output. You can significantly improve the output of any of the GPT models without much effort, and it may vastly change your opinion on the depth of complexity you need to make your tool work. First, modify your prompt parameters. Each model exposes a set of parameters which you can tune to change the behaviour of the model. Most notably for text generation are ‘temperature’ and ‘top_p’ parameters. Modify these values to impact the “risk factor” in the output. Think of this as how “outside the box” or the level of creative output. Example, if you want to generate speeches for wedding toasts, a low-risk approach would involve recycling well known quotes about love and commitment. But, if you want to create a tool that offers less clichéd text and something more original, well you’re at higher risk of output that might make the happy couple eject you from the wedding venue. For simple applications like code completion, you likely want to set these values to be very conservative, but if you want an obscure theme for trivia night, go big.
Another simple way to impact output quality, add additional context to the input of the
model. This is new field of ‘Prompt Engineering’, where learning how to best prompt the AI will improve the output. One example we’ve tried was generating NPC (non-player character) replies in an RPG (role playing game). Specifically, using Minecraft and villagers in the middle of the ocean living on a giant rock. We tested the depth of character responses and their backstory using the chat script of the player and the DaVinci model. The responses we got were incredibly accurate for the context of the character. Questions like “how do you decorate” received responses like “with plants in a flowerpot”, or the answer of “fishing” after being asked about the character’s hobbies. We also integrated these GPT models into the RPG mechanics, like the narrative that gives context to a game’s quest. These were also fairly convincing and relevant to the characters. e.g., a quest to collect 30 wooden logs came with the AI generated narrative that a storm from the ocean toppled their house, and the wood would help build a new home. Using the same questions and scenarios, we could test a couple different models to see if the price tag matched the quality of output for our needs.
Illustrating the Above with a Chicken Eating a Burrito
We work on quite a few AI projects in our day-to-day here at Brash, and members of our team will leverage AI tools as jumping off points to help with brainstorming. We typically find ourselves sharing the "garbage out" amongst the team for a good laugh — including below — and using AI image generation is a perfect way to visualize the points we just made.
Using a free version of AI image generation tool, Leonardo.Ai, we inputted an obscure prompt of "chicken eating a burrito in a Christmas scene" with the "risk factor" set at the recommended standard and wound up with following:
Fairly normal output, seems pretty reasonable that a chicken would eat a burrito with either of these approaches. After adjusting the prompt to be more specific, "a chicken eating a burrito with its wings in a Christmas scene" and adjusting the risk factor to rely more heavily on my prompt for output, things got weird.
Garbage in equals garbage out. I asked the tool to rely heavily on my prompt, which meant it interpreted it much more verbatim and created an image that looks like the wings of a chicken are feasting on the burrito. The fault doesn't rely on the tool or the fact that it's a free version, the fault lies with the user's (or my) garbage prompt and adjusting the output to more heavily rely on that garbage.
The Caveats of AI Output AKA Expect Failure
As you’re in the throes of testing, expect failure. It’s going to get weird. The more you test, expect to get weird responses, or chicken images, that really do not make sense.
Seriously, just ask Bing.
In context of ChatGPT, “weird” usually takes the form of the model contradicting itself or outputting a completely wrong answer — but with flawless grammar. A “wrong answer” may just occur 1 out of 100 or 1000 interactions but expect it to happen. Upon our own development of an AI tool for a client, we asked the bot “What services does X company provide?” and received the response back “they provide services.” It’s not inaccurate, but it’s also not useful.
OpenAi does have a guide on how to phrase things and how to get the most out of your data sets, but that will never erase the opportunity for weird to happen. The lesson is that you want to use GPT where that won’t have significant impact or can be quickly corrected, like decorating ideas from Minecraft. You can always ignore what the model suggests and just write what you think is best. But as a Customer support bot, the stakes are higher, and you don’t want to provide a poor customer experience with nonsensical responses. You may want to keep an option for a human to take over in cases of nonsense.
Lastly, Keep this in Mind for Code Completion
We are generally familiar where GPT is generating outputs for people, but you can use GPT to generate outputs for other part of your code, too. You can ask it to generate lists and format them in JSON for example. You can also parse the output easily to use in a different part of your program. In our RPG example from earlier, our GPT model was creating a quest for the player. The program then needed to register what item the player had to retrieve, and the reward of the quest as generated by GPT.
In conclusion, the GPT tool has shown its remarkable abilities and potential for automation. However, the most intriguing applications are those where GPT serves as an assist rather than a complete replacement, allowing humans to contribute their unique skills and creativity — including helping to write the conclusion of this article. Generating functional code with ChatGPT is an excellent example of this approach. With GPT4’s recent release we have seen another jump in performance and the level of performance and potential automation will only increase with successive versions.
If you have any questions about ChatGPT or need some guidance getting your project off the ground, get in touch with Brash. A great partner goes a long way to help navigate the development landscape, you can always reach out to our team at letsgo@brashinc.com at any point in your product development journey.
It’s not just a product, it’s our passion.
Be Brash.