What could reduce current AI demand for GPUs?

As of now, large models that work best require GPU for training and running (for inference/generation/etc). Training, in general, takes more resources than running.

Also the data (input or output) modality (text, image, audio, video, etc.) also affects the processor needed. Bigger size modalities like video require much more processor than smaller ones like text.

There are several efforts to make models smaller which would require less prcessing power for training and running models. Quantization, Lora, model distillation, SLMs, Apple silicon's neural engine based architecture, some adversarial techniques, etc. are some examples of classes of these efforts. Most approaches compromise quality or scope to reduce compute (and memory) requirements.

As smaller options emerge, they require smaller GPUs or less GPUs to perform the same task. Some can run or even be trained on less expensive CPUs.

If one of these efficiency efforts takes off sufficiently, the demand for GPUs would reduce, and in some cases vanish entirely.

A concrete recent personal experience example: for my haixu project that generates text+image content, I currently use a local running instance of an image generation model as it can run entirely on an M1 mac and about 3 times faster on Nvidia laptop GPU (mine has NVIDIA GeForce RTX 3080) instead of a cloud based or hosted one running on cloud class (more expensive, like A100) GPUs. This ability to run it locally is based on some optimizations that make it possible to save and load models with less resources - see gguf for more details. This allows me to effectively generate images in reasonable time for free (i.e, just the cost of my laptop and power), etc. If I want to get faster, I might move this to the cloud, but there is a low cost option like google colab pro account (~10$ a month) and get access to A100 processors (a bit more money can get more A100 time). For my use case, this should work better - initial experiments with this give me ~5-6 times faster than local on m1 mac, ~twice as fast as NVIDIA RTX 3080 GPU based laptop, even without any advanced optimizations. For more generations, this can be invaluable - fixed cost and access to A100 via colab is very attractive. And at higher scales, a full cloud solution may be more effective at higher cost.

But every aspect of all this is improving and the optimal solution tomorrow may not be the best one today.

About me*: I am R C Anand, an AI scientist/engineer/product guy. Currently working on haixu - human AI collaboration to create new and better automated multimodal experiences. I have released ~45 AI-created educational visual guides on various topics. Check them out in the [same link](rcanand.gumroad.com).*