Yesterday, during the OpenAI Dev Days opening keynote, OpenAI dropped their biggest announcements yet, annexing new territory into their ever-growing ecosystem. But as the field accelerates at breakneck speed, it's become too common to feel disoriented and question our own place as tech specialists.
Personally, I have only one way to deal with such feelings. I take a step back, study the road taken, pinpoint the current context, and chart the most likely trajectory. So that’s what I’ll be doing in this post.
What’s up, I’m ChatGPT
In November 2023, OpenAI released ChatGPT, its first B2C product based on GPT-3.5. By fine-tuning GPT-3 to work in a chat context, and releasing it to the public in a user-friendly platform, they’ve finally sparked interest outside of academia.
Although it wasn’t the first Chat Model, it certainly was the largest to ever be trained. But the real game changer was about to come with the rollout of GPT-4. By significantly improving the results of ChatGPT, this update sent ripples through the tech world. Within just a couple of months, the spotlight finally shifted away from the toxic and saturated crypto narrative to the potential of AI in everyday use.1
Meanwhile, as users delved into ChatGPT’s capabilities, the initial excitement quickly faded, revealing the limitations. Finite memory, outdated data, lack of context, hallucinations, the cumbersome need for manual input, and the generic default “persona” made it highly impractical to handle complex scenarios.
Because, of course, many eyed AI as a workforce substitute. But as most work often includes executing multiple tasks, thought processes, and planning; expecting a single Chat LLM to do it all with Q&A alone wouldn’t be cutting it.
Dealing with agents
As always with software engineering, it’s not even a recent problem with a new solution. Marvin Minsky wrote about this exact topic in “The Society of Mind” which was first published in 1986. He had already tackled part of this issue, pointing to what we now commonly refer to as "agents." Once again, the answer didn’t change; it was only rebranded.
And of course, in October 2022, a small library that was soon to become its own ecosystem was released under the name of Langchain. It put a heavy focus on creating pipelines for agents. With a point of entry managing the conversation, and being able to call agents for their expertise, the results became way more satisfying.
Using Langchain is basically giving LLMs their own phonebook. “Want to search Google? Ask this guy.”, “Need a copywriter for your article? Give it to this one”
To build an LLM agent, you need 2 things:
A system prompt to tailor the response format to the requirements, which gave rise to the wonderful world of “prompt engineering”
hustlersexperts due to very unreliable consistency in the responsesOptionally, tools to retrieve information to provide up-to-date and context-aware answers by querying data, or asking another agent.
The matter was, even if they could be chained, agents were still basic executants.
Artificial General Intelligence
After all, most solution-oriented thinking doesn't typically emerge when you're knee-deep in execution. This concept is best explored in Daniel Kahneman’s Thinking Fast And Slow (2011), introducing us to the dynamism between System 1 and System 2 thinking. And agents were still operating in the realm of System 1. Fast. Intuitive.2
So the response came swiftly under the guise of the badly named “AGI”3, with BabyAGI and AutoGPT being the most renowned projects at the time. By leveraging chain-of-thought agents that plan, split, and dispatch complex tasks into smaller units; they effectively mimic System 2's logical processing. These planner agents not only delegate tasks but also evaluate the outcomes of other executing or planning agents — essentially simulating a team of experts.
Such systems are still gaining traction as can be attested by the release of frameworks such as Microsoft AutoGen, the $12 million raised by AutoGPT, or the countless projects aimed at emulating teams.
So while these "autonomous" systems currently yield mixed results, they operate on increasingly abstract objectives, foreshadowing near-future capabilities.
Chat ecosystem 101
After a reminder of the recent history and the current context, we can extrapolate and map the playbook to building a complete LLM ecosystem. A task which all of the previously mentioned projects have been building upon.
The LLM ecosystem playbook
An entry point for every interaction
Leverage a list of executive agents
Leverage a list of persona agents
Provide instant context intake from sound, image, text, files
Access to personal data such as emails, chat, documents
Auto-select the appropriate agent
Create agents creation on the fly
Agents with code execution
Instant personalities for formulating responses based on the question
Create chain-of-thought agents on the fly to deal with multi-step actions
Fine-tune user-specific models
Allow payments for 3rd party services through the platform
Change the default interface4
Automate the creation of recurring tasks
What’s the moat?
On that matter, OpenAI’s chat.open.com is certainly the most advanced product of all. An incredible feat, considering it’s operating as a SaaS, Enterprise product, and Consumer product. With their latest announcements, they are blasting through the few items that were remaining on my list:
They have a Site + App as entry points
They are releasing the V2 of their “app store” with GPTs
GPTs are introducing heavy customization
They are introducing revenue-sharing
ChatGPT can hear audio, see images, read documents
Output isn’t limited to text, it can now generate images, audio, and other files
ChatGPT can create on-the-fly executing agents with “Advanced data analysis”
It can fetch up-to-date information with “online search”
Agents are now auto-selected
They demoed a GPT working with Zapier meaning countless integrations with your own data
They still have the best and most performant model on the market
They are working on improving the speed and introducing more chain-of-thought reasoning
All of that is becoming available as a SaaS product through their API
Simply put, they are spearheading the paradigm shift from the traditional mobile/web application to a centralized experience with ChatGPT. And, of course, they are crowdsourcing every advancement through it, from training data to the creation of personalized assistants. We are giving them the tools to remain in this leadership position willingly.
Moreover, by abstracting context fetching, reasoning, and agent selection, they are rendering a lot of startups working as simple wrappers around GPT-4 obsolete. Also, by streamlining the integration of these tools through their API, they are eliminating complexities previously managed by the now-redundant open-source projects.
But, to me, the defining shift compared to the smartphone revolution is that anyone can create GPTs. Developer skills are a bonus, not a requirement.
Sharing the crumbs
So you may be like me, trying to position yourself in the current context and work with or around these “AI” offerings. If so, you’re probably wondering “What’s left for me?”
If you can’t beat the ecosystem, you might as well join it. And in this regard, there are a couple of ways you can still make it out.
Join the gold rush to create the most popular GPTs
Leverage any data you already have to make them
Create micro-APIs to fill in the gaps not covered by the advanced data analysis tool or existing agents
Become an expert in creating new agents/GPTs as a service
Learn to integrate them into products that aren’t chats, and streamline existing user flows
Leverage it to build faster yourself
The custom wrapper you’ve built or you’re building around GPT’s API won’t cut it anymore. The barrier of entry is being blasted to pieces by OpenAI and you might as well join the assault.
Where integration once reigned, in a landscape where anyone can create new user experiences, quickly seizing opportunities is the new key to success.
Also, I’d love to get your opinion on the matter in the comments 😉
Unfortunately, the web3 hustlers seeing the wind shift, quickly flocked to the “AI revolution”, dooming our social media timelines for a couple of months.
Funnily enough, prompting an LLM to respond like Daniel Kahneman’s and lay out step-by-step reasoning before the final response improved the answer quality and boosted my learning through exposure to the reasoning. I’m currently using a simplified version of this prompt in GPT4. It feels more like collaborating with an expert than managing a task-doer.
AGI stands for Artificial General Intelligence. Unlike narrow AI, which is designed for specific tasks, AGI has the cognitive abilities to understand, learn, and apply knowledge across different domains, reason through problems, and adapt to new situations. Essentially, the full range of human intelligence; which none of these projects exhibited.
Is a screen truly the best way to chat?
Thanks for sharing 👏!