Overview
We have found that recent developments in AI makes a lot of things easier, faster, and even some infeasible things trivially possible. The current thrust is in using these AI models like ChatGPT to solve problems for individuals, businesses and other entities, by leveraging these new capabilities. The current market is dominated by OpenAI and its ChatGPT ecosystem, but there are several other developments that provide more options to push the limits and capabilities further. This article discusses the immediate view, and then steps back to look at the medium term and long term horizon.
Success in AI today is about:
Using Conversational Generative chat apps to become more productive
Building Custom AI and using them to automate repetitive tasks
Using AI to build apps that use AI to make the users more productive
Today, the best ecosystem to do all this is OpenAI's ecosystem:
Use ChatGPT Plus with GPT-4 Turbo
Use GPT Builder to build custom GPTs
Use OpenAI APIs (GPT, Dall-E 3, Whisper, TTS, Assistants API) from OpenAI
Beyond OpenAI and ChatGPT
OpenAI tools and APIs are not likely to be all that is needed, even today. Ask ChatGPT Plus to find other tools for your needs. Runway ML, Pika, Stable Diffusion, ElevenLabs, midjourney, Zapier, etc. will often emerge as worthy tools to consider depending on your use case.
OpenAI models still have issues: flaky, vary in performance over time, some are expensive - so there is a huge push to get open models to step in to fill the gap. OpenAI's GPT-4-turbo remains the best model for most tasks that it supports. However, new alternatives are emerging that are better in different ways:
smaller, can be deployed on a mac with Apple Silicon or windows with reasonably powerful Nvidia GPUs - most of these are still at best around gpt-3.5-turbo in their performance, but the gap is closing fast - mistral models, esp. mixtral and the many small (7B, 13b, 30b) variants of Llama 2 are examples. Some concepts/examples/tricks used to achieve effective models that are smaller include distillation, student-teacher models, quantization, Lora, llama.cpp, llama2.c, GGUF (that supercedes GGML), Mixture of Experts, PPO, DPO, etc.
customized for specific tasks - models like Microsoft's phi-2 - so called small language models, are very small, but trained on a specific narrow task as opposed to the more general large language models like GPT-4. There are also ways to fine tune large language models (both hosted and local) to achieve better performance on specific domains/tasks.
smaller models that can handle longer context lengths well using various tricks
Natively multimodal, multitask, multilingual - there are ways to build all these on top of OpenAI models, but more and more models are natively building these into the architecture and training datasets of new large language models.
Practical considerations
It is currently expensive to train large models that match the general capabilities of GPT-4 turbo, while smaller models are easy to train/fine tune and are open to experiment with, but do not perform or generalize as well as GPT-4 across a variety of tasks. This space is evolving rapidly, with other alternatives catching up in performance, and Large models getting more robust and predictable/consistent in their behavior. We should keep a watch out for new models that give us the best of all worlds.
However, independent of the models themselves, the cost and complexity of hosting your own models for production uses is generally much higher than just calling an API - and so far that has been a huge reason for the success of OpenAI models (along with their higher performance than most other models) despite their several shortcomings. There are several frameworks and tools trying to make it as easy as calling a hosted model - notably huggingface - but the cost benefit currently still tips in favor of hosted models for most businesses.
How to pick a model
If you want to find a custom model for any AI task, there is a fairly simple first cut process to get to that: Find the closest corresponding task on paperswithcode.com/sota or huggingface. Find top models at papers with code based on benchmarks or on huggingface by popularity. The models often have further links to evaluate and use them (github repo, downloadable weights, sample code on how to use them). Find the best model that matches your needs, and use it.
Chat does not necessarily mean a chat end user interface
ChatGPT allows us to converse in natural language to ask what we want, and get responses in natural language or some structured formats (like plots or code or json or sql, etc.). We can use this directly with ChatGPT. This also allows us to build apps that can leverage this conversational aspect instead of having to specify things precisely in code, while talking to these models. For example, we can ask ChatGPT (or the model) to extract all the relevant details of a candidate from a resume, and it can figure it out. This is a subtle but perhaps most powerful thing about these chat models. When we build applications, note that this does not necessarily mean that we must expose a chat interface to our users - we need to think about using conversational language/prompts to ask for what we want, either from our code, or from our user, or some combination of both. The true power of what we build is in finding the right combination of these.
Beyond Applied Generative AI
Currently, most of the industry and the investment community is interested in finding a way to apply AI to solve problems. For many people coming into AI anew or after a long gap, that is the area to focus on. And the above are the areas to focus on to succeed in that space.
Beyond that, there is a lot more fascinating research and discussion around AI that is either restricted to academic research with applications in the far future, or work at a few big tech firms that are building and training these large language models, or startups trying to find an optimization that provides better alternatives. All these are worthy causes to pursue, and especially for someone entering the field in the near future, there are several exciting opportunities there. However, most funding and business is likely to focus on the applied aspects mentioned above.
There are many areas of research beyond just applied AI, and they warrant several other articles to expand on them.
Artificial General Intelligence
One area of research and debate that comes up constantly is around the notions of AGI (Artificial General Intelligence) and superintelligence. My thoughts on that:
Most debates miss to define what AGI means - and each person takes their own interpretation (often fulfilling some underlying agenda) and running with it.
There are a few broad viewpoints around this debate:
OpenAI has defined AGI as a point where an AI can do tasks at the level of a median human on every task. Some see this as a threat: to them, it means most jobs can be replaced by AI. Others see this as an opportunity: If AI can do most tasks humans do today, then humans can build upon that to take on more complex tasks that they could not imagine taking on before.
OpenAI subscribes to the idea of building larger and larger models, because they found something called predictable scaling, which allows us to scale models to achieve each unsolved task purely by scaling (the model, data, compute) without having to really change the architecture. So, in theory, every task has a threshold of some metric at which it is considered solved, and predictive scaling allows OpenAI to predict exactly at what scale of model that task will be solved. So, solving a problem is simply getting and deploying the resources and funds needed to scale the model/data/compute to that required size. The other players do not like this approach for various reasons. For one, many of them have been directly challenged or disrupted by ChatGPT and its appearance, and the others cannot really regain lost ground by subscribing to this model. The other arguments are described in the remaining viewpoints below.
The success of ChatGPT lies in emergence - where the model started doing well on tasks it was not trained for - and no one fully understands why or how. The predictable scaling approach above relies on emergence to continue predictably at higher scales, which is somewhat shaky. Many fear that as models get larger their ability to do things that are not aligned with the best interests of humankind. There are several understood ways to improve models that have been in the works for years, which can be taken forward, even if they take much longer, in a safe predictable way to achieve true generalization.
Reinforcement Learning is a form of AI that learns by trial and error.
It can do this in several ways:
By trial and error learning based on behavior of human experts.
By trial and error without human techniques, so called self-play.
By trial and error in a world where there is little information available except what happens when they do something starting in some state and see the effect of that action, see how it affects the immediate or eventual outcome.
These approaches have led to several successes in some areas:
Google Deepmind's AlphaGo AI beat the world champion in the game of Go, trained on learning from human Go experts.
Google Deepmind's AlphaGo Zero then learnt based only on the rules of the game of Go, without any human expert involved, to play against itself to find the best strategies - this reached an even higher level of game play, far surpassing the best humans. When it played against humans, the moves were often not understood by human experts, but it would eventually win.
They then built AlphaZero, which learnt to self play across many games, including Chess and others, just using the rules, and exhibited such superior performance using techniques unfamiliar to human experts.
Similar techniques were used - to detect protein structure that revolutionized drug discovery and medicine, to beat the best human algorithm for efficient matrix multiplication.
Deepmind recently announced FunSearch, a technique that uses Large Language Models along with a systematic evaluator to guide it to tackle unsolved problems. It beat previous best known solution to a math problem in combinatorics called the Cap Set problem. it also showed promise in solving algorithmic problems. It also makes Large Language Model behavior more explainable, which has so far been a downside of Large Language Models (we don't know why it gave the response it did).
Another view of AGI involves the hypothesis that if the human brain at such a small size and low energy consumption has the general intelligence of a human, it should be possible to build an AI that is comparable. Yann LeCun, one of the top researchers in AI, subscribes to this view. The solution, as seen now, combines notions of a world model - a knowledge graph that encapsulates common sense rules, that can be readily retrieved when needed, and trial and error - reinforcement learning, with neural networks (the common core shared with large language models). The current intent is to start by aiming for "cat-level AI" - an AI that uses these principles to perform the tasks that a cat can do, and then progressively refining it to reach human level AI.
Conclusion
The near future is mostly about applying the new found capabilities of hosted large language models to various problems they are able to solve. Coming soon are smaller models that can be used for these same things with much less resources needed. And along with them will come infrastructure to host and run them easier to compete with hosted large language models. Reinforcement learning is also making progress in various fields, and is likely to be used to solve unsolved problems effectively in the future. Other longer term approaches that make models more efficient are also in active research, and are likely to come up with completely different ways to achieve powerful AI at a fraction of the resources/cost.
What an exciting time to be alive!