OpenAI DevDay's Implications for LLM Apps in Prod

OpenAI DevDay's Implications for LLM Apps in Prod

We hosted a watch party for OpenAI's DevDay on our Discord channel and had a lot of fun discussing everything new and improved that was launched.

If you're just catching up, read about all the updates here on the OpenAI website.

Since we're all about LLM Apps in Production, let's dive into the changes that affect production apps. And no, we're not talking about all the companies that OpenAI did or did not obliterate.

So let's talk about the updates that will matter to you in production.

Here's the tl;dr
β–ͺ️ GPT-4 Turbo is the new cheap, fast, large context window model.
β–ͺ Function calling has been improved and models talk JSON on demand.
β–ͺ Generate reproducible outputs with the 'seed' param.
β–ͺ Fine-tuning is cheaper.
β–ͺ Assistants bring chat memory, RAG, and code interpreter to the API.
β–ͺ Lots of assurance around security, privacy & legal guarantees.

1. Cheaper, faster and better models

The new GPT-4 Turboβ€”is a speed demon (think GPT-3.5-turbo fast), has a knowledge cutoff of April 2023, supports 128k context length, AND is ~2.75x cheaper. We see it as the new winner for all those complex use cases.

The updated GPT-3.5 Turbo now features a 16k context window by default and has been improved on instruction following. This is also ~2.75x cheaper!

GPT-4 Turbo is severely rate limited at the moment - you can only make 20 requests per minute, and 100 request per day right now.

We expect this to change in a few weeks when OpenAI does a stable release for the model.

2. Improved function-calling & JSON mode

Accuracy of function-calling has been significantly improved for both GPT-4 Turbo and GPT-3.5 Turbo.

Parallel function calls may just be the unsung hero here that will invite so many newer use cases that just weren't possible easily before. You can call the same function multiple times or different functions depending on the user input.

eg: "Tell me the weather and book me a cab to the airport" would generate 2 function calls that we can execute in parallel.

Instruction following has been improved on the newer models according to OpenAI, so expect lesser prompt-tuning to make things work.

JSON Mode has to be one the best improvements on the models. You can now send a response_format parameter in your completion calls as json_object and the model will guarantee to generate a valid JSON object. It could still be inaccurate, but we're taking this as a big win.

3. Generate reproducible outputs

LLMs are non-deterministic by nature and even when you keep every variable intact, there is still a chance that the output will not be similar across different requests.

OpenAI has introduced the new seed param for Chat Completion requests, where you can set similar seed across different requests, and it can help ensure that the outputs are consistent!

Top notch feature that impacts app developers from day 1! Read more about it here.

4. Cheaper fine-tunes

If we've been waiting for a sign to just double down on fine-tuning, this is it.

GPT-3.5 fine-tuning is now 4X and 2.7X cheaper on the input and output tokens. It also supports both the 4k and 16k context windows which is just fantastic.

GPT-4 fine-tuning is in early preview to a select few organisations, but we expect this to roll out to more folks pretty soon. Don't expect this to be cheap though!

We're proud to announce the beta of autonomous continual fine-tuning on Portkey.

If you're interested in fine-tuning for your application, drop us a note on [email protected]!

5. Build complex apps fast with assistants APIs

Assistants has been the huge drop of the event. It's a complete new way to experience LLMs with assistants, threads, messages, tools & runs.

Here's what matters for your app.

  • OpenAI can now store chat histories in threads, removing the need to store & retreieve it continuously.
  • Assistants can retrieve information at a 98% accuracy from files. Think - highly efficient, zero-effort RAG applications.
  • Assistants also support the code interpreter, so building data analysis agents becomes super easy.

The playground is super slick, and people are excitedly exploring assistants!

6. The enterprise-grade guarantees

Copyright Shield promises to stand guard against copyright infringement lawsuits directed towards users & developers. That's a bold claim to make and will certainly give the Fortune 5000 CISOs a lot of relief.

A detailed data usage policy has been published which again reinforces the fact that NONE of the API data will be trained on by OpenAI unless explicitly opted-in.

There are also detailed guides on safety and production bext practices.

While we've been fanboying over all the new shiny stuff that's come out, there's also some stuff we didn't really like.

  • The pricing structure, is a bit of a head-scratcher. There are a lot of components to it, especially with assistants which makes it hard to estimate how much it's going to cost in production.

    We're working hard to make this available quickly on our dashboards so individual calls can be accessed easily.
  • Polling for run statuses is just tacky. Since assistants will be used mostly on chat interfaces, it's going to be hard to keep polling to check if new messages were added.

    Top that with no streaming support, and our apps will feel sluggish. Hoping this is something that'll get sorted in the beta.
  • Backward compatibility seems to have taken a back seat. Most APIs for the new stuff are completely built from the ground up which means completely new pipelines for our apps.

    Someone on X mentioned that "all our code is now tech debt". We share this sentiment, unfortunately.

So, that's a wrap. While there's a lot more stuff we urge you to check out on the OpenAI updates website, this was what we felt affected our users the most.

We're constantly updating Portkey to support the latest & greatest across LLMs and frameworks. Do drop by our Discord and say hi!