State of AI End of 2024: A Brain Dump (WIP)
- Update - Jan 2, 2025
- AI coding assistants are net-zero by default, but with practice, can get net positive
- section on Nvidia GPUs vs local models vs Mac (see new subsection titled Processors and new subsection titled NVIDIA under AI Companies)
It is December 30th. 2024 is nearly over. AI will impact more lives in more ways in 2025. This post is a brain dump of my (often opinionated) take on what matters today and over the next year and near future. I very loosely divide this into sections meant for everyone, for programmers and for AI researchers - though, being a brain dump, the lines may be a bit blurred.
Warning - please read this first
This is an extremely rough draft post capturing a lot of things I have read/believe to be true about AI as of December 30, 2024.
There are bound to be typos, incomplete thoughts, etc.
Things are changing very fast on almost everything in this space. At a pace that is orders of magnitude too large to keep up. And the value is buried under a lot of hype and noise and biased agenda driven content, which takes a lot of effort to filter through.
So, please check facts before using. I may update this page to correct for/expand on things as I learn more. Or add more posts to expand on some of these.
Also, none of this is to be taken as expert advice, such as financial, legal or medical advice. For advice, contact experts in the respective domains.
Please let me know in the comments if you see any issues, will definitely read all feedback, and try and fix them.
Key takeaways
For everyone
Agentic AI apps powered by AI models will automate several tasks, changing the role of humans and traditional software in many ways.
Currently, hosted (too large to run locally) usually proprietary AI models are the leaders, but smaller open models with permissive licenses will overtake these in 2025 or 2026.
To adapt to our changing roles, we should get familiar with using AI in everything we do, things that are done by humans or other software today. Both things we use other humans and software for, and the things we do in our different roles that could be replaced by AI.
For Programmers
Programmers should look at how AI can automate what humans and other software do today. Many such things are done or can be done better and/or faster and/or cheaper using AI. First check if there is a good solution that already does this well, and if so use it. If there is no good solution for a task, then try to solve the task in the following order:
directly call the best free open models locally with permissive licenses.
directly call hosted open models with permissive licenses.
directly call pipelines or workflows of models
create a single agent with the right tools/functions for the task at hand, and call it.
create a composite agent combining multiple single agents.
Keep a list of tasks you are interested in, care about, and look for existing solutions, and if not, try this order periodically for each task. As models and tools and agentic patterns get better, more and more tasks will be easier to automate.
Programmers should be automating the mundane/boilerplate parts of their development using AI in 2025. Use AI coding assistants.
AI coding assistants are net-zero in efficiency for automating coding tasks. They can automate 99% of your code, requiring you to spend 99% of your time on the 1% where it fails. The problem is no one knows which 99% AI can automate and which 1% will fail. But we can get net positive with practice, getting better at knowing what we can get AI to do and what we should do ourselves.
Hosted models
Proprietary text/chat models still remain better at most tasks, but local open models are almost there. Best models first, alternates if any later.
chat: claude-3.5-sonnet, gpt-4o
chat with vision: claude-3.5-sonnet, gpt-4o
chat with web: gpt-4o
chat with documents: notebooklm
chat with web research: gemini pro 1.5 with deep research
chat with lot of thinking: o1
o3 will be released around Feb 2025
there is also an o1 pro for 200$ a month, but omitted because that price is not justified except for the most advanced research customization use cases.
Local models
Best hardware for local models: Apple Mac Studio with M2 Ultra and 192GB unified RAM (costs $7500+ USD in the US)
Best local models that fit on this hardware are shown below
Best model first, alternates later
All models are accessed via the ollama framework. The name is the name of the model as referenced by the ollama framework
Throughput is based on total time taken across a custom dataset of 160 varied questions - from simple to extremely complex and unsolved problems.
Local Best Models
Name | Provider | RAM (GB) | Context (K) | Throughput (t/s) |
llama3.3:70b-instruct-fp16 | meta | 141 | 128 | 4 |
tulu3:70b-fp16 | ai2 | 141 | 128 | 8 |
qwq:32b-preview-fp16 | alibaba | 66 | 128 | 9 |
marco-o1:7b-fp16 | alibaba | 15 | 32 | 37 |
llama3.2-vision:90b | meta | 55 | 128 | 12 |
Local Fast Models
Name | Provider | RAM (GB) | Context (K) | Throughput (t/s) |
marco-o1:7b-fp16 | alibaba | 15 | 32 | 37 |
llama3.3:70b | meta | 43 | 128 | 12 |
tulu3:70b | ai2 | 43 | 128 | 12 |
llama3.2-vision:90b | meta | 55 | 128 | 12 |
qwq:32b-preview-fp16 | alibaba | 66 | 128 | 9 |
Vision Models
Name | Provider | RAM (GB) | Context (K) | Throughput (t/s) |
llama3.2-vision:90b-instruct-q8_0 | meta | 95141 | 128 | 8 |
llama3.2-vision:90b | meta | 55 | 128 | 12 |
Thinking Models
Name | Provider | RAM (GB) | Context (K) | Throughput (t/s) |
qwq:32b-preview-fp16 | alibaba | 66 | 128 | 9 |
marco-o1:7b-fp16 | alibaba | 15 | 32 | 37 |
Image Generation Models
These are not available through ollama
Use stablediffusion-web-ui for GUI and API access
- alternate for API access - huggingface diffusers
Name | Provider |
flux.1[schnell] | Black Forest Labs |
sdxl-1.0 | Stability AI |
Coding Assistance
I recommend cursor with claude-3.5-sonnet as the primary code editor for coding with AI assistance. Consider VS Code with Github Copilot or replit AI as alternates. For more fully automating coding tasks, look at the open source Allhands.
Some tips for using coding assistance:
For most projects: Use AI to autogenerate boilerplate code, then shut it off when you are thinking about design.
checkin or stash changes often, so you can revisit if AI goofed up somewhere.
For complex projects, use "AI TDD":
create tests using AI
review the tests carefully to ensure that the tests cover all acceptance criteria
once tests are good, have AI generate the code to make the tests pass. If a test fails or you get errors, just paste the error/failure log into the AI chat. Repeat until all tests pass.
If AI keeps failing to generate correct code, simplify it for AI:
Ask AI to make tests pass one at a time, avoiding regressions.
If that doesn't work, manually break down the steps/tests needed even further and walk through the tests one at a time.
How to access all these models?
GUI to access hosted models:
Anthropic Claude : claude-3.5-sonnet
OpenAI ChatGPT: Gpt-4o, o1
Google Gemini: Gemini Pro 1.5 with Deep Research
AI2 playground: Tulu 3 and olmo
Best GUI to chat with local models: LMStudio
from terminal/CLI: ollama
alternate: huggingface spaces
Best framework to access local models programmatically: ollama
- Alternate: huggingface - models transformers diffusers
Local models can also be accessed through API providers at a cost:
Cerebras is the fastest, but costliest - followed by Groq with spec decoding
All others are significantly slower
Where can I find models for various tasks?
Read this first: the scores and rankings on these sites are not to be blindly trusted, as many models intentionally or unintentionally land up gaming the ratings and scores. The models do perform as shown on the benchmarks on these sites, but may not generalize as well to your use cases. Use these sites to find candidate models, and then test the top few candidates on your use cases to decide which one works best.
Artificial Analysis is the current best resource to compare AI foundation models - hosted and local, across modalities. A great alternate for comparing text only models is LMSys Arena . For an even broader set of ML models across a wide variety of tasks, see paperswithcode/sota For the largest set of local and open deep learning models across tasks, see huggingface
Where to find open models?
mot.isitopen.ai lists and ranks models based on various openness criteria.
AI Research
LLM Scale plateau
LLM scale has plateaued, driven by technical, socio-economic, political factors. This has led to several approaches to find alternatives, all promising:
smaller models
distillation
- student-teacher distillation, where larger models are used to create smaller models that perform almost as well as the larger model.
quantization
rough conversion params to RAM: P parameters (in Billions) to RAM (in GB)
16 bit floating point: fp16: ~ 2P GB of RAM
8-bit integer: q8: ~P GB of RAM
4-bit integer: q4: ~P/2 GB of RAM
with ollama, q4_k_m is the highest degree of quantization that balances quality with latency and RAM size, but for a given model, to get better quality by adding more RAM and trading off latency, fp16 is best but q8_0 is also ok.
PEFT
'adapter' approaches to augment models with additional knowledge and capabilities.
example: LoRA is a much smaller adapter
example: ControlNet (used with image generators) is a large augmented model that adds new capabilities.
smaller data:
- smaller models can be trained on curated smaller data, even in narrower domains for specialized smaller models. SLMs like the phi family of models from Microsoft are an example
alternatives to transformers as scalable blocks:
- SSMs - SAMBA, RWKV, etc. Some of these (like RWKV) are highly open, which can accelerate research in these directions.
more inference time compute:
thinking harder:
RL trained on chain of thought
examples: o1, o3, marco-o1, qwq
agentic AI:
give AI tools/function calling abilities to do better
create several AI agents with specialized roles to perform tasks
Andrej said: "assign a task to a cohort of all top models and ask them to get back only when they have a consensus"
ensemble ideas from ML - boosting, random forests, etc. can be applied to cohorts of LLMs
create great eval agents - it is critical to create near perfect eval agents to minimize error accrual when automating complex workflows with several steps.
chat with documents
Chatting with documents enables scaling to large or many documents
RAG was the solution for this in 2023, and continues to be relevant.
Google's notebooklm has nailed the experience around this.
A similar experience is extended to web search with Google's Gemini Pro with Deep Research, where web documents are retrieved using Google's search index, and then RAG is done on it and synthesized.
open models:
more and more open models will lead to faster convergence of research to solve some of the top problems.
Here are different models from most open to more proprietary:
The pioneer and leader in highly open models is AI2 (Allen Institute for Artificial Intelligence) -
their olmo (and molmo) models are the most open models out there.
and (more focusedtulu models.
Llama models from Meta (Facebook) are open with their weights, and publish a lot about their architecture and methodology, but do not divulge their training data.
Alibaba's Qwen families of models also have fairly open models somewhat similar to Llama, though I am not sure exactly where they are less transparent. Some of their models have non-commercial licenses, while many are more fully permissive.
Google Gemma models are open source models with more details shared (their Gemini models are proprietary)
Google Gemini Thinking models are transparent about their inference time thinking process, but otherwise proprietary.
Anthropic Claude is quite proprietary, though perhaps a bit more open about its details than some others.
OpenAI publishes some key insights about their models, and some steps of its thinking models, but is otherwise extremely closed.
AGI
AGI stands for Artificial General Intelligence - loosely, this means AI that can generalize to tasks it was not explicitly trained for.
Several people have defined this term in different ways. The definition of AGI has become so highly valuable that some companies have chosen to redefine it in monetary terms. However, AGI remains a technical concept, and the best definition I have found so far is inspired by a definition by Sam Altman in the early days of the ChatGPT craze:
AGI is achieved by an AI when it can perform all human digital or digitally simulated tasks at or above human median level performance on an appropriate metric.
That said, all other definitions of AGI, though biased by somewhat individual agendas, are playing a crucial role in pushing research forward. Here are a few:
AGI is achieved when a machine can do everything humans do with similar level of resources as the human brain (energy, size, number of neurons, etc.)
Example: Yann LeCun
augmenting Neural networks with common sense world models
creating direct human sensory models (vision, sound, video, etc.) as opposed to current text based models, since text is lossy/noisy/ambiguous
there is a lot of promising research happening here from this camp, though a fully feasible model may be a few years out
AGI is achieved when it mimics the human brain's internals
Example: Jeff Hawkins and Numenta Thousand brains project
Applying insights about the neocortex to artificial intelligence
The neocortex, a part of the brain, uses self similar modules (columns) for diverse tasks. If we can crack this on a machine, we can generalize to diverse tasks using the same architecture.
LLM Scale does this with transformer blocks. But tranformers are inherently limited to the data they are trained on. Even the emergent capabilities of transformer based models is dependent on the data it is trained on.
My take: True generalization and emergence would likely use a much simpler block (like the neocortex columns) and generalize by considering what-if scenarios (analogous to, but more powerful than synthetic data generated by AI to train larger AI).
There is a recent doc detailing their current thinking that I haven't yet reviewed, may have more (or different) details.
AGI is achieved when it solves a lot of useful problems
- In that sense, AGI has already been achieved by the best models, as it has replaced or significantly augmented humans and/or software with better/faster/simpler solutions.
AGI is achieved when it reduces the search space of candidate approaches for hard problems. The shortlisted candidates are then combined with expert humans and software to make solutions more viable
Example: Terence Tao
Terence Tao (renowned mathematician) is a proponent of using foundation models to shortlist candidates in various areas of science research - solving math problems, building climate forecasting models, etc.
The key idea here is to use a good balance of human experts, (deterministic or non-AI) software and AI - if AI continues to scale and solve more and more problems, more and more humans are likely to land in this area of work - pushing the envelope in their own areas of creative endeavor far beyond current state of the art, using AI to identify candidates
AGI is achieved when machines achieve consciousness
I actually made up this definition, just to plug an interesting line of research around measuring and detecting Consciousness in machines
Example: Joshua Bengio's research on measuring consciousness in machines
Necessary but not sufficient conditions for AGI
Example: Francois Chollet's ARC-AGI
- o3 claims to beat it recently, but there is some controversy, since there might have been some data leakage involved - still very impressive results for o3.
Example: Turing test
- This has been beaten many times over, depending on how it is stated. And not beaten in some forms (for example, the total Turing test that expects it to also be indistinguishable for physical aspects of human behavior).
AI Companies
This is my highly speculative and personal opinion take on AI companies to bet on going into 2025.
Considerations
Open Models -
Open models will become dominant in empowering convergence to AGI
The most open models are olmo family from AI2
Meta Llama has several open weight models and a high focus on openness, but does not publish their training data (maybe, and I am guessing here, because of the legal hassles other companies got into when their training data became known).
Alibaba Qwen has some very good open models, though I don't know how open they are. US AI community is a bit hesitant to embrace these, at least in part due to the uncertain political relations between US and China in 2025.
Permissive licenses - bets on Agentic AI will require models with permissive licenses for low budget startups to invest in different ways to solve problems.
Fully permissive licenses include Apache 2.0 and MIT license.
Permissive licenses to foster innovation - for low budget startups are custom licenses that are permissive for companies below a certain size/revenue/etc. threshold, but require permission otherwise.
Non-commercial/research only licenses are useful for pure research applications (but I avoid too much time with them, because most things I research are interesting only if I can apply them commercially )
Redistribution restriction licenses - some licenses like some variants of GPL require that we open source our code and or weights when we redistribute modifications to existing models. These are ok, as long as we do not intend to modify the models.
Processors
- NVIDIA had a great year with its GPUs. NVIDIA provides the best option for GPUs for training and inference in the cloud, though others (Intel, AMD) are working on alternatives (don't know much about these others in mainstream usage, though they are beginning to appear in AI benchmarks)
- NVIDIA has a limit of 24GB of RAM available for AI on consumer grade processors (as of RTX 4090 - I think it remains the same with the 50xx series), which is very low compared to the two other options:
- A Mac Studio with Apple Silicon (M2 ultra is the best right now, from 2023) can give upto 192GB RAM, of which ~75% can be used for AI workloads - so, effectively 144GB. This is way more than the NVIDIA consumer local machine setup. There are consumer hardware options with multiple NVIDIA GPUs but they are way more expensive (in a web comparison, I found it to be 10s of 1000s USD compared to 7500-10500 for the Mac Studio mentioned).
- Cloud NVIDIA GPU options can be used with higher RAM, but at a much higher cost (compute cost is at least hundreds of (US) dollars per month, usually thousands of US dollars. This makes these accessible to large enterprises.
- As models get smaller and run on lower RAM (best open models are 70B params, ~35GB in Q4_K_M - like Llama 3.3 70B; best vision open models are 90B params, ~45GB in Q4_K_M - like llama3.2 vision 90B) we will see a period where something bigger than 24GB RAM is needed to run locally, so for startups and indie developers (like me), a better option is the Mac Studio with M2 Ultra or the newer late 2024 Macbook Pro M4 Max (upto 128GB RAM, of which ~75%, ~96GB available for AI) - though as of today, I think on Apple Silicon, processor architecture/generation has less effect on AI performance than RAM, so I prefer Mac Studio M2 ultra.
- Unfortunately, Mac is not a great option in the cloud (though AWS and others do provide Mac in the cloud, haven't looked at it deeply because I assume they will be very expensive as well). So, for very large models that are way beyond single machine, and need way more than 192GB RAM for training, fine tuning and inference.
Companies
AI2 is likely to play a key role in the coming year(s) with its focus on extremely open models.
olmo models are built from scratch to be transparent
tulu models focus on being transparent around post training techniques (tulu v3 uses llama).
Meta - Meta is th dominant force in open model innovation.
Llama open weight models are leading most innovation across the board.
Yann LeCun leads their AI science team, and is a pioneer on important research around general intelligence.
Meta is also heavily invested in AR/VR and there is an opportunity to cross pollinate with their innovations in AI.
Meta AI in their social properties are becoming more and more useful as Llama models get better - I am seeing increasing use of Meta AI in Whatsapp groups and other communities.
Alibaba's Qwen families of models are also open and quite good. However, there is some hesitancy in the US AI community to embrace these as fully as others due to uncertainties around political equations with China. That said, it seems like Qwen models are hosted on the Alibaba cloud, and may see significant usage and adoption in China (and other countries).
OpenAI
OpenAI will struggle with its attempt to become fully profitable
There is a lawsuit to prevent the transition from its non-profit agenda, by Elon Musk, who will also be a part of the ruling party in the US.
A key sticky point here is the definition of AGI in their non-profit clause, and Microsoft and OpenAI have recently redefined the term.
OpenAI (along with other companies, like Stability AI) are being sued for using copyrighted content to train their models
An OpenAI whistleblower had analyzed this in detail (and sadly, was found dead, ruled suicide).
one key factor here is the "fair use" of data, which is more accepted for non-commercial/non-profit entites - this likely impacts OpenAI's conflicting goal to become profitable to invite more investment, from Microsoft and others. They need investment to become profitable, get more processors and compute to continue innovating.
OpenAI ChatGPT Search is pretty good, better than Bing search. However, not sure how it stacks up against Perplexity search and Google Gemini Pro with Deep Research for longer inference time solutions.
Microsoft
Microsoft is a major investor in OpenAI, but its investment relies on future profitability of OpenAI.
Microsoft shifted away from its OpenAI exclusivity and focus over the last year, hedging its risks with other models being made available in Azure and other AI properties.
AI integration into its Copilot brand was met with mixed results.
AI summarization in Microsoft Teams (I have heard) is fantastic, reducing the need for attending meetings
Microsoft's focus is on Azure, and (best I can tell) Azure focus is on enterprises. Enterprises have much bigger bucks than smaller startups and consumers. Enterprises sticky with Microsoft over other cloud providers use AI from Azure increasing Microsoft revenue from enterprises.
Microsoft will face competition from AWS Bedrock for AI
AWS Bedrock provides a single high SLA cloud provider of most top foundation models (except OpenAI and Alibaba Qwen models)
NVIDIA
- NVIDIA will enjoy the demand for their GPUs from all top AI companies whose models are too large to load on single machines.
- Enterprises who run things in the cloud will use NVIDIA GPUs for a while to come, further increasing demand for GPU.
- As models get smaller, indie developers and startups will initially use local mac options as described earlier, to access more RAM for models like the 70-90B param models that are getting close to state of the art among open models.
- When smaller models or agentic apps made with smaller models provide comparable performance to too large to fit on a machine proprietary models, several use cases that do not require a cloud endpoint can be run directly on local machines (for example, content creation workflows: I have a workflow that creates educational visual guides on topics, which I run on my mac. I don't need a cloud endpoint.) - for these, a mac remains a better option.
- As even smaller models or agentic apps with them become viable (~7B param models) for some use cases, they can run directly on edge devices (mobile phones, tablets).