after all. Okay. Nope. All right. Well, it started off nice.
Yeah, that's that's 100% a security camera. I don't I've never seen that level of detail in this result. Today, we're going to be looking at Show Me
Mimo V2. 5 Pro, which is the newest open-source model. Now, you may be thinking, Van, this model's been out for like a few days now.
Yes. However, they did just recently release the weights for this on HuggingFace. So now if we click this hugging face hyperlink, which we don't need
to do cuz I do just have the tab open already, we will see that this model in its entirety is now hosted open source with an MIT license for anyone
interested in actually running this themselves. Now the caveat on that is most people don't have the hardware to actually do so. But although we're
not going to be testing it in this video, they did also release the V2.
5 version. So the difference here is this is a V2. 5 Pro. This is V2. 5, which is about a little more than a third smaller.
And additionally to that, this is a native Omnioto model. So, it is very likely that I will do a dedicated video on V2. 5 as well. Though, for today,
we're going to be solely focused on V2. 5 Pro, as it does have the potential based off of some of the things I'm seeing to be the most performant
open-source model currently existing, which is quite exciting.
So, before we get into it, please do feel free to subscribe as I do want that 100K plaque and then I'll stop asking. I probably won't. and I'll
probably ask for like whatever the next plaque is that comes after that. But let's begin by just taking a quick peek at the announcement blog post
here where of course we start with a benchmark, not a benchmark JPEG because this does actually have tool tips over the specific model that is showing
benchmarked against here, but this is apparently stacking up very favorably against closed source state-of-the-art models. Now, we should note that
Opus 4.
6 and GPT 54 are now not the most current up-to-date variations of those specific models. However, at the time these benchmarks were done, they likely
probably were. And we see right here, this does stack up pretty well. Now, I always like to just test the things and see how I feel based on that
versus just judging based on benchmarks. Though, I will say I did test V2 Pro when it came out and it had one of the more impressive Ship Combat
Simulator results I had seen where the water effects it drew and the ship models and just overall the quality of the results that we received in that
test were extremely impressive.
So, I am really excited to test this new version of this model. In terms of some more tech spec stuff, we see right here that not only was the model
itself released, but the base model for this was also released, which is pretty exciting because one could perhaps use this to make a baby Mythos
style model. Now, I'm well, I'm not kidding cuz you could, but it's just, you know, it's cool to see base models there as well. We have some more
additional benchmarks down here, but for now, let's swap over to the hugging face model card for this specific pro variant, and we'll take a peek at
it. So we see this is a total size of just over 1 trillion total parameters.
It is a mixture of experts model with 42 billion active. It has hybrid attention architecture and MTP. So hypothetically it will be able to run a bit
faster being that it has that MTP which speeds up the token generation speed to put it simply as opposed to a model that did not have that.
Additionally to that we see some more technical specifications right here. If we scroll on down, we see the context length is 1 million, which is
quite a healthy context length.
Now, again, I don't specifically have tests to really stretch the context length and push it very far. That is something I'd like to work on, but it's
just interesting to see models with that context length being more common than it used to be. So, in terms of pricing, if we assume at the maximum
context length for the pricing, the input is $2 per million tokens and output is $6 per million output tokens. So, not a bad price. And especially if
this does really stack up as shown in some of the benchmark graphics that we see right here, this could potentially be a fairly cheaper in parenthesis
option comparatively to a closed sourced expensive model.
And we can see right here, I had to search for this for a little, but I did want to be able to find it. The maximum output length is 128K. So a
healthy length. Now our next step really is to hop into the Mimo Studio right here. I do have this hooked up in Open Code just through Open Router.
So, we will likely be doing some of the tests in that. But for a model like this, I always do mostly enjoy testing it just in the web chat interface
because it provides the quickest access to like a variety of different tests to get a general feel for the model's capability across a diverse set of
tasks. So, with that, we shall begin with our tried and trueue browser OS test v 2. 5. This is the version of the browser OS test where it has to
create two of the five apps being functional 3D games with one of them being a GTA clone and the other up to it.
the ability to change wallpaper, the special feature that it decides on and implements. I might title this models in is insane, but like the
dictionary definition of the word just based off of this thinking chain. Oh wow. Okay, so after 663. 4 seconds of thinking, it has begun to write the
script.
That's 10 minutes, right? 60 seconds is a minute. 10 of those would be 600 seconds. Yeah, that's 11 minutes and 3 seconds. All right.
Well, this better be a good result. All right. So, here is our Mimo V2 Pro browser OS after 11 minutes of thinking. Okay. I can't say that I've ever
seen anything quite like this because I have absolutely no idea what this is supposed to be.
I will say that the flickering effect of it could perhaps be a potential lawsuit were this to be an actual piece of software that was delivered to
folks just from like a medical standpoint, but it is definitely creative and I've never seen something like this. So to begin, let's of course check
if there is a right click. Okay, there is. Ah, toggle Nebula. Good.
I think maybe just for now we're going to leave that off for the rest of this video. But we can see that there is a clock in the bottom right which is
showing the correct time in my local a little after 4:00 a. m. Now, okay, let's just begin by going through uh first our start menu. Very good.
Everything looks good. Everything looks nice and clean. I have no complaints here. Some models put in like a search feature here where you can search
for an app, but that's not specifically listed as something it needs to have. So, it's just something to notice.
Now I'm curious though like okay well I this is inevitably going to be the special feature I would imagine. Next wallpaper. Can we change wallpaper with the nebula on? Okay. I don't think we can.
Okay. But we can change it from the right click there. This actually it's not bad. All right. So I normally just start from like the first app and then move sequentially.
But I'm always curious about the GTA results. So, okay. So, our GTA app is not working. We're getting a ton of errors. Let me just toggle Nebula off and we'll at least try the terminal first.
I am going to get the GTA result to be fixed. So, no worries about that because I want to see it. Let's just do calc. Okay. 4 + 4.
Hey, very good. We do have a working calculator. Let's do Neoetch. Let's check out our system stats. Okay, that looks good.
Omnios. Okay, on MS, we have 9. 5 megabytes of memory. So perhaps a more resource constrained system, which could be the reason that the Neon City app
is not working. But we can do um this is a good-look terminal.
I will say though, I do like the yellow and green color theme. And whenever they put in some sort of like stylistic thing like this, I do like