ChatGPT and Local Inference: Running Neural Engines on CPUs and GPUs

Official link to the ChatGPT in the App Store, as there are many clones.

I wonder if there will be a point where they will also use Neural Engines in iPhones and iPads. At 175 billion parameters (175B), ChatGPT 3.5 is still much too big to run locally, but it goes very fast with the smaller 7B, 13B and 30B models. So it has call.cppwhich runs LLMs locally on CPUs and GPUs, inference on Metal GPUs got it workingand becomes worked hard to further improvements.

The biggest limitation will be working memory. For a 13B model, at least 8GB is needed, and for a 30B model, 16GB (and then little else is allowed to run). This will therefore also be reserved for iPads for the time being (since even the iPhone Pros only have 6 GB of RAM).

The new Macs with 96, 128 or even 192 GB of unified memory can already run much larger models, such as LLaMA 65B and all variants, Falcon 40B and the new 104B InternLM.

On the r/LocalLLaMA subreddit has a lot on this too.

[Reactie gewijzigd door Balance op 8 juni 2023 18:05]

2023-06-08 15:42:29
#ChatGPT #app #iOS #Siri #integration #support #iPads

Stunning new photos show Mars Valles Marineris 20 times wider than Grand Canyon

Do we really have to wait for the Nintendo 64 Classic Mini in 2021?

LoL - Season 10: Ranked mode and Matchmaking in 2020

From days between: Latest on Samsung Galaxy S23 series

ChatGPT and Local Inference: Running Neural Engines on CPUs and GPUs

Related posts:

Related

Leave a Comment Cancel reply

Related posts:

Share this:

Related

Leave a Comment Cancel reply