Should you buy the RTX 4070 super for local AI in 2026? I've gotten tons of comments about this in a lot of my previous GPU videos and the answer

today is a little bit different since we now have a lot of incredible tooling that specifically targets GPUs with a little bit less VRAM. Welcome to

AI flux. Let's get into it. So this GPU is kind of interesting. Nvidia has kind of re-released the 4070

in a number of different versions. It's a much more capable option in comparison to something like the 4060 or 4060 Ti that have 16 gigs of VRAM. And

if you're looking at this card at all, especially on eBay or Amazon, you want to be sure that you're looking at a 16 GB version because there actually

quite a few different variants of this card. Surprisingly enough, this is a very capable GPU for

certain kinds of local AI, but the pricing may or may not change your mind. So watch until the end so we can get into that. So the specs on this GPU

are actually not bad. Uh the 4070 Ti super is distinctly better than the 4070 Ti not only because it has 16 gigs of RAM compared to 12 gigs, but the

real upgrade is in this memory interface width. So we're getting a little bit of an increase here from

192-bit to 256-bit. This means you'll have much better throughput from the GPU RAM to the GPU itself. Now, there's some models with offloading that

kind of negate this win because at that point more of your bottleneck is how much memory bandwidth you have between your host machine and the GPU and

the GPU VRAM not just between the GPU and the VGPU VRAM, but that really only matters if you're using

models that you can't fit in the VRAM. So the specs are quite good. You know, is this better than something like a 4090? No. Is it really better than

even a 3090? We'll have to kind of wait and see, but generally speaking, my answer is still no. So one of the more interesting models to use with this

GPU given its speed, especially with the latest Kimmy release, which I believe we will see quants

of very, very soon that will push it into this 16 gig GPU, is the Kimmy K25 GGUF quant from Unsloth. So I fully expect Unsloth to have a quant out in

no more than 24 to 48 hours that is equivalent to this 25 quant that you can actually run on your 4070 Ti super 16 GB. This model is one of the more

capable general purpose models. This is a thinking and reasoning model, so the benchmarks of this can

get a little weird, specifically because this model thinks before it executes. And this is maybe not the best coding model, but in terms of a just

getting things done model, I believe this is one of the best in existence currently. This will also work great with Gemma 4 along with some of the

newer releases from GLM. And another exceptional model choice here is Qwen 3.6 specifically for coding.

And we've already seen this model be very impressive on other 16 gig GPUs. And what's interesting is with just another few days of quantization and

optimization, we've actually been able to see versions of Qwen 3.6 actually fit in that VRAM and not require offloading for like the last few

gigabytes. However, this is one more area where this GPU falls a little bit short where the 3090 is probably a

better option. So this was recommended to me from a viewer of the channel. It's called local ai.computer. They have some work to do on the benchmarks

here, but I like that they have basically all of the benchmarks in one place at least with a rough sense of compatibility. So what's pretty cool here

is we can see these are some older versions of DeepSeek, but if we get to Kimmy K2 thinking, this is

a very usable amount for some one of these tiny models. So these are kind of the smallest Q4 quants you can get, so you can expect these to be quick,

but maybe not the most useful. If we get down here to DeepSeek R2, it would be nice if these were actually ranked properly. So very interesting. So if

we look at some of these Qwen 3 quants here from DeepSeek or Qwen 3 coder, uh specifically some of

these later Liquid AI LLM 2 benchmarks, there's a lot here and what this shows is even for realistic int4 quants, this is actually a pretty capable

GPU. And if you're using it with a vLLM, the newer GPUs perform a little bit better in terms of being more efficient shuffling between GPUs and

understanding how to do that. Sometimes with the less capable older GPUs from the 3000 series, there can be

issues with that. Now, one thing I like to do is to look at what other people recommend to do with these GPUs. So I can personally attest that Qwen

3.5 9B is an incredible model for any GPU with less than 24 gigs of VRAM, ideally kind of right in that 16 gig sweet spot. This person nails it here

saying that the rough throughput is probably going to be around 85 tokens per second, which is very

usable if you're doing kind of an agentic coding workflow or using OpenHermes or another agentic AI framework. These models are also quite good at

concurrent throughput that also scales well across multiple GPUs. I would also recommend it for those reasons. Now, what do these GPUs actually cost?

This is probably the biggest downside of this GPU. One because finding them at MSRP, at this point

they're all inflated, so they'll be easily in the $700 range if you can find them. You'll mostly find the 12 gig variants. Ironically, it's actually

much easier to find these on eBay. So like occasionally here, you'll do really well and you'll find a like a two GPU bundle for $720. And

unfortunately, most of these are more realistically priced and these are sold listings that are both auctions and

buy it now listings. So occasionally, like let's say you find one for under $700, that could be a good deal, but the issue is even with where 3090s

are now at inflated costs well above $1000, there's still lots of deals to be had. Like for instance, if I put in 3090 Nvidia on eBay, sure, this is

still kind of more than I'd want to spend for a GPU that's 6 years old, but there plenty of these that

sold just in the last few days or overnight that are right within the $1200 to $1000 range and even some that have aftermarket coolers that are

selling for well under $1000 even if they're in Germany. This is really going to be a big impact in my thinking about this GPU as capable as it is and

even if you have one of these already, there's a question with the age of this GPU, you know, it's not

the current generation, this is the 4000 series and there's a question of like should you really be buying this in today's dollars new MSRP? And if

you go on Amazon, what's a little bit more disappointing is for not very much more money, you can get a 5070 Ti 16 gig or a 5060 Ti 16 gig, which

unfortunately is just a better GPU. It's really just not that much cheaper. Most of the 16 gig variants

are actually going for around $1000 if you can even find them. Another issue is, you know, let's say you buy one with the idea that you'd eventually

buy two, you're effectively spending more money for a less capable GPU relative to an RTX 3090. You have less VRAM, you don't have NVLink support even

if NVLink bridges for 3090s now cost more than 3090s, which is a thing I probably expected would

happen, but I didn't think it would happen this soon. Yeah, so unfortunately, this is a great GPU, but I don't think in good conscience I can actually

recommend that you buy this for local AI. If you have one, I hope this video has helped you pick some really great models. And the other feedback I've

gotten is more videos about what models to use based on how much VRAM you have. So unfortunately,

I would not buy this GPU for local AI in 2026. It's a great Nvidia GPU, but unfortunately, the pricing relative to the performance just doesn't quite

make sense. So if you disagree with me, let me know in the comments below. If you think you avoided this and have a better GPU that you really like

using for local AI, also let me know in the comments below. As always, I hope you learned something

and I'll see you in the next one.