It put little seats inside the subway car model. You know what? I like that because it was attention to detail. I'll call officer Patel from the
electronics theft unit right this second. Okay, today we're going to be taking a look at the new Tencent HY3 preview model which is currently free on
Open Router for about the next week or so. So, this is a pretty interesting model because it is a mixture of experts model where the weights are also
released as we can see by the open hugging face tab for this specific model in my browser. However, beyond that, this is designed to perform
agentically very well. And in some of the benchmarks we're going to look at, we're going to see a pretty significant leap in capability for this model
versus the previous HY2 model. And that is pretty exciting as this is still just a preview. So, before we get into it, full disclosure, this video was
sponsored by Tencent, who was kind enough to basically allow me to continue putting out videos at rapid pace by supporting the channel. So, thank you
to Tencent and also influencer status. I always like to fully disclose sponsorships. Um, I don't always see that being the case. But regardless of
that, now let's take a quick look at some of the introductory information about this model and then we'll jump into some fun and live testing. And
because this is specifically denoted as being quite performant in agentic scenarios, we're not only just going to test it through open router, but
we're also going to test it in open code as well just to see how it stacks up in some code related agentic tasks. So to begin, we can see a little bit
of pertinent information about the model right here in its open router bio where they say it is a high efficiency model designed for agentic workflows
and production use. Now this model is a new pre-train and built off of new infrastructure for pre-training and reinforcement learning from Tencent as
opposed to the HY2 model. So we will be able to see again in those benchmark JPEGs there's a pretty significant leap in that. Beyond that, something I
like to see is that it does have configurable reasoning levels. But beyond that, which is more or less becoming a bit more commonly seen now, it does
also have the option to fully disable reasoning, and some of my patience lately in model tests has been a bit slim when it comes to lengthy chains of
thought. So, it's nice to be able to just fully turn it off as well. Beyond that, we can see here the context length is 262,144, which is fairly
healthy. And now we're going to swap over to the hugging face model card where there's a bunch more technical information about this model. So do feel
free to subscribe as I do want that 100k plaque. But now let's take a look at some of the more tech specs about this model starting with its overall
size. So we can see that this is a 295 billion parameter model with 21 billion active. Now interestingly enough there is also listed here an MTP
layer. So this has multi-token prediction which will essentially speed up the ability of this to generate tokens as compared to a model that did not
have that feature which for a model of this size that does have potential to actually be run on some more I wouldn't call it hobbyist but like
enthusiast devgrade systems is pretty exciting because it can give some pretty significant speed ups. Additionally to that, our context length as we
saw right here in open router is 2562 144 or 256k that it is called here. Of course, we do also have benchmarks and we have the traditional benchmark
JPEGs as well. But something interesting I note here is they seem to be less interested in actual like benchmaxing. They even mention down here
somewhere that the model is specifically more inclined to generalize well across a variety of different tasks which they say that will allow it to
perform better in all like agentic coding scenarios and things like that. Beyond this, we see that just based off of the first one right here, the bar
is actually interesting because the blue shows the original HY2 model which is listed right here. I apologize if that's a little hard to see, but then
we can see the purple over that is the leap in capability that is seen just with HY3 preview here. And we can see there are some pretty decent leaps
here. But as we go down in these benchmarks, we can see basically in the last one there is some significant capability and improvement in terms of its
agentic coding capabilities where we can see especially in this one. This is a large leap. And hypothetically, were we to just go based off of these
benchmarks right here, this should stack up favorably to some pretty large open-source models. So, I am pretty excited to get into testing this. And
that's essentially what we're going to do right now. The last thing I have neglected to mention is they did also release the pre-trained base model
here, which is always nice to have because these can be used for some pretty cool fine-tunes and customizations and things like that. So, we are
beginning with our triedand-true browser OS test. I have set the reasoning mode here to the highest possible one. So, we can see that it is starting
out with a reasoning chain of thought. This OS test is the specific one that denotes it must also create functional 3D games. One must be a simple GTA
clone and the other one can be up to you. The rest is our traditional browser OS test where it needs to have a special feature it decides on and the
ability to change wallpaper. All right, we have our result. So, first and foremost, I'm always at least partially curious to see how long these
scripts are. All right, 1,146 lines. And now let's take a peek at our browser OS script from HY3 preview. Okay, I have not seen the Pacific Northwest
background in quite a while, but it has made a return, and for that I am thankful. All right. No, I mean, seriously though, a lot of models now are
just doing static different colored gradients as background options. So, I've seen this specific image before cuz it's an Unsplash image, but I have
not seen it in a while. Okay, we don't have a right click, but again, I know folks say, "Well, it's not specifically outlined in the prompt, so maybe
that shows better instruction following." Acceptable, acceptable argument. We do have a clock with the correct time in my local at the bottom right.
And to begin, okay, good. We do have hover effects. And I like to see that the specific icon for the GTA clone is a car. Let's start with start. Get
it. Okay, we have just our applications. Again, pretty simple. We have an about as well as a power button. Shut down browser OS. Okay. And it just
restarts it. Not bad. Let's first start with our about. Special features. Persistent storage via index DB. All user preferences. Selected wallpaper.
Custom uploaded wallpapers. That's a feature. Spoiler alert, are saved to the browser's index DB storage. And this is special because it saves your
browser state instead of just kind of losing it when it gets refreshed. Okay, I'm satisfied. Now, let's start just from the GTA clone down. You know
what? I will say I'm actually okay with this because I know it looks a little funky on first glance. Hey, where that little cube? All right, there's
buildings and we are this cube. This is very like retro and I don't mind it. Also, partially I'm noticing it's retro seeming because the actual window
UI kind of looks like Windows XP style. So, it didn't really give us any like how would we hypothetically jack this whip? Uh, E. Okay, so I pressed E
there and it did something. F. I should probably look at the code to try to see. It's also possible that we can't actually do much, but it did put E
seems to Oh, okay. It did put mesh colliders on the buildings because I can't get back to where I previously was. Okay, hold on. All right, but I will
note this could perhaps be a good candidate for an agentic follow-up specifically focused on this. Can I minimize it to the taskbar and can I reopen
it? Yes. And yes, full screen. Good. On full screen. And then finally close. All right. Not bad. And it did have like an XP style to it, which I kind
of dig. Next up, 3D maze. All right. You know what? This is in fact a 3D maze. It is a little hard to move around, but the walls are drawn in a very
3D manner. We can I should probably full screen. I apologize for my incompetence. Is there a Oh, okay. So, we can actually move the view as well with
the mouse. I do see a lime green cube. Let me zoom out. Okay. So, is this us? This is like the first time the second game hasn't been some form of
like asteroid shooter. Also, as well, I should note in terms of the 3D. All right. It is 3D and it does have some like maze to it. Our next app is the
file explorer. Okay. Basic notes. TXT. And it gives us the browser OS ré. PDF and vacation.j JPEG kind of like Windows XP style and I don't mind that
we have settings. I'll probably save that for last because then we'll be able to change our wallpapers. Now playing browser OS beats. I don't have my
speaker on. I can't imagine this is probably just an emulation of music, but yeah, music players are less commonly seen in these results. All right.
Finally, we have settings. Beautiful. Beautiful. and beautiful. All right, good. So, it did give us selectable wallpapers. None of the image links are
broken, and that's always nice to see. However, some may note that we also do have this custom wallpaper selection, which does in fact work. So, and
it said it saved the state and everything. So, basically what that means is if we just close out of that, we should now be able to open this back up
and it will have saved our wallpaper selection, which in fact it did. All right, I'm satisfied with this. All right, so I have put this browser OS in
its own directory and I have set up open code to have this HY3 preview just being run through open router here. So before we get into that, there is
one other thing I want to mention. They do have a very efficient costwise token plan. Now these are all listed in R&B. So I will just put conversions
on screen right here because off the top of my head, I do have no ability to actually properly parse what the USD equivalent for this would be, but I
did just want to mention that. So now we're going to go back into Chrome so we have a nice pretty aesthetic Unsplash background to run this test. And
I'm going to swap it into plan mode because this is placed in a directory that only has this webOS script in it. So let's just say improve the GTA 3D
game. And I'm running this initially in plan mode. So we'll get to see it planning out specifically how it is going to update this. To begin, it will
actually need to go ahead and look at the script first and foremost because it has not specifically seen this script yet. So, being that it is only
like 1,141 lines of code, it shouldn't take too long. So, it's gone through and explored the specific project, and we can see there are a few
follow-up questions that we're going to need to answer here. It's a single file. Adding these features will greatly increase code size. Which
structure do you prefer? So, it is hypothesizing that we should modularize this, which means splitting it. We can't keep it into a simple file. I
mean, historically, we like these results in a single file. For a gentic coding, it's probably better to split it, but let's stick with what we
normally do. What type of NPC should we prioritize first? Definitely pedestrians. What level car stealing detail do you want? I love that as a
question. Yeah. Okay. Basic, definitely. Some of the other options there were a little core details, minimal, full city. Okay. I'll go with
recommended there. And then confirm. So ultimately, we're probably now going to have to swap it into build mode. It will come up with its plan and
then we'll just have it build it. All right, in 7 and 1/2 minutes, it hypothetically has totally reworked our GTA result. Now, I did move that into
its own specific directory. So, let's take a peek and see. Now, everything else would have stayed the same, but our GTA clone should hypothetically be
significantly more advanced. And in fact, it is. We do have pedestrians on sidewalks. We have buildings that have detail. And we should I should full
screen these. We should also have the ability to steal a car. I noticed interestingly the camera is still odd where it's like locked in the center and
just shows us moving around. That is something I didn't actually specifically mention to it. Hey, the NPCs have mesh colliders. All right, let's see
if we can take this vehicle. Fantastic. This looks like the the tumbr from Batman. Batman Begins. Collision detection works. We have our roads. I find
it did a good job of making sure the NPCs only stay in the sidewalks and things and the building windows details are not bad. Overall, really what I
was looking for here is was this actually a satisfactory improvement and I would say it absolutely was a significant improvement. I don't have the
original for us to test them side by side, but the original was very cube-like. This is better. Can we get out of the car? Yeah, we can. Let's take
this red one. I'm just going to move all the cars to here. That one's already there. Take the brown one. Overall, not bad. And the buildings do have
doors in them, so I like that. This was definitely a proper fix or proper result improvement, I should say. And I am pleased. Get out of my way, pink
cube. Next up, we're giving it the beautiful static subway scene test. This is always fun to see what sort of detail that we get from these results in
terms of the subway platform and specific things it chooses to implement. Additionally to this, assuming we do get a properly functional result, I
always do enjoy setting the follow-up to turn it into some form of FPS, as the subway maps are oftentimes pretty awesome to actually have as maps. So,
assuming that we do get a good result right here, then we'll take a peek at it and I'll put it into open code to have it then subsequently turn this
into an FPS using whatever this map looks like. All right, we have our static subway scene result. So, let's take a peek. All right, loading 3D scene.
Good. You know, sometimes when I see loading, I get concerned like, uh oh, is there going to be an issue? This is very I will say it's low poly. It
almost is reminiscent somewhat of the GTA result that we received the first go around, but it does have detail. We do have lights, we have columns, we
have benches, we have tracks, and we even have the safety barrier. Now, let's see. Does our brightness slider work? Indeed, it does. Okay.
Additionally to that, atmospheric fog that did actually change in effect. It's just we basically see it like over there. So, all right. Let's see.
Now, I notice I should I do believe this was the prompt that includes we should be able to move around using WD. I'm not currently noticing that I
have the ability to do that. So, that's definitely something to give this as a follow-up. Additionally to that, I don't know that I've ever seen an
initial result in this test which actually has like little people in it. So, I've never seen that before. All right. This is absolutely a good
starting point for the subsequent follow-up, which I'll probably just do through Open Code for turning this into some form of simple low poly FPS. But
additionally to that, I'll give it some additional feedback. All right, so I've put this in its own directory and I am opening it in open code where I
will just start through plan mode. I'm going to say I want to be on just the FPS game, I wanted to improve this scene overall as well. So, I've given
it a rather full bit of feedback here, just basically saying make this more detailed and then once you've done that, actually turn it into a low poly
FPS with visible ammo tracers, uh, weapon recoil, and I basically told it to just make this better in general. Okay, cool. It didn't even say
anything. Phase one first. So, it's it wants to do phases, I think, differently. I'm gonna swap it into build mode and say implement both phases now.
All right. Apparently, it has finished. I will say this thing's quite quick, and I do like that. So, all right. Subway Survival. Click to start. That
really did actually significantly improve the scene in general. This is a better map to play in. Now, there is actually a there is sound. It's just
really, really, really hard to hear. Oh. Am I out? How do I reload? Oh, okay. I think there are humans and there are zombies here. All right. Why are
they flying? Those ones are flying. All right. Let's see what happens if we lose. And then we'll give this some additional Okay, zero kills. That's
great. Left click is to shoot. Yeah, I should. All right. And we'll see what we get as an additional um thing over this. and then we'll probably just
do more rapid fire testing from with an open router. All right, our complete overhaul is hypothetically completed. So, let's take a peek. Okay, our
start screen does look a bit more
that was a a pinata-like effect. Now, I do notice we have some oddness to the camera. Oh, cool. Was there stuff in the subway car there? It put little
seats inside the subway car model. You know what? I like that because it was attention to detail that we didn't necessarily expect. All right, it did
it. I still don't see a weapon model, but it fakes the ammo tracer. Let's see if the blue guys have the green um like goo that comes out of them. Oh,
nope. Okay, so that is realistic with the humanoid/zioid enemies. Let's take a look at the map. Okay, we have a subway tunnel there. Oh, okay. I
understand what's up with the camera. So, it moves like what's this one? Is that a human or a zombie? Human. All right. Overall, it generated
consistently improved results. And I am quite pleased with that. Not bad. All right. All right. Now, I want to just do some more rapid fire open
router testing. I am going to close this because it's making the computer fans go too much. I haven't been doing the 3D printer sim as much, but
because this is a totally different model family than something I've been testing iteratively, I feel it is new enough to give it the 3D printer sim
test. So, we'll see what it does here. And this one we'll just do and we'll see how the result is and then we'll move on to another test. We won't
just do open code follow-ups for every single result. All right, after a rather lengthy reasoning process, we did receive our 3D printer sim result.
It's only like 400 lines of code, so okay, you know what? This is very like Google SketchUp style and I'm okay with it. It is definitely simpler than
I had anticipated, but let's take a look now. Okay, nozzle movement quite realistic I will say. Let's see if it's drawing the shape sometimes. All
right, let's Okay, so it's going to pancake them and that sometimes happens. Let's see. Do we get second layer? Okay, we do. Now, I don't see a way to
fast forward this. That's not specifically mentioned in the prompt as something that it needs to have, but I will say, okay, the nozzle is going layer
by layer. Here's what we're going to do. Let's reset. Let's do our circle. Is the nozzle movement correct? Okay, it is. I noticed when it was
reasoning, it spent a lot of time trying to figure out like the proper math in terms of emulating the nozzle movement. And this is a very fluid and
smooth nozzle movement, I will say. Finally, let's do triangle. We do have triangular nozzle movement, although it's drawing the triangle in a
different orientation. So, it should be like the base this way and then up and then this. All right. I was going to follow up the printer sim result,
but to be honest with you, I want to do more rapid fire stuff. And assuming the results do actually work, I think it's better to just move on so we
can get more tests in the same time span. So, with that, I have begun the Flight Combat Simulator game test. And this is still thinking because I did
just previously start this. So, this is always fun because it has to choose different types of planes, at least in terms of the third one, and I just
like doing game generation. So, it'll be interesting to see what it does with this. All right, so here is our flight combat simulator result. Okay, on
initial result, it didn't necessarily look very promising, but it did actually give us a result here. Now, this is almost like what we experienced
with the GTA game at first where Okay, cool. So, um, you have to use WD and the arrow keys in conjunction together to actually like properly fly,
which is a little difficult. It's very basic, but it is a decent starting point. And I know I said I was going to stop using open code to improve
things because I want to run more stuff, but this is definitely an incredible candidate for such a task. Though, for now, I am. Okay, cool. The
propeller plane actually does. We got shot down. Okay, so the combat logic did actually work. And then we have the black screen for two seconds.
Stealth bomber is um definitely this would appear rather stealth I would imagine being that the surface area for radar to hit there when driving
forward or flying is pretty minimal. All right, so this combat logic actually does work. I'm happy to the amount of health this plane has is
definitely leaving something to be desired. And again, this is a basic result, but it is Oh, it is definitely workable. I'm going to hold off on doing
anything followup with this, at least for now. But we did get a properly functioning result in the first try. Next up, I'm going to give this a
relatively difficult task, especially for a model of this size. Although, it's a decently sized model. This is the self-contained C++ skateboard test.
This is the California Boardwalk aesthetic style one, and I have specifically denoted in this prompt, it must be 3D. So this is one that I'm just
going to do from within open code entirely from start to finish as this will be a better test of it because it will also need to check to see what
specific dependencies the system has in terms of actually being able to build compile and run said result. So we started in plan mode and then we'll
see um what it comes up with. All right, we did get a initial feedback right here. So it's going to try to compile and test it. Now I had started this
in plan mode and I got a timeout because the open code config that I had used um that I got apparently um has a timeout value in it. So I got very
angry and then I just swapped it to build mode and said build this. So that's where it just did this entirely from the game compiled successfully and
the executable is creative. Okay, let me also check if there are any issues with the code. All right, so it is reporting that it has successfully
created this game. So, all right, let's just see. Now, I had prematurely gotten excited and tested it. Sadly, it's showing in all black screen, so
that is definitely a follow-up to give it. It's still not done, but it did edit something and then compile it and it worked. So, I am really
impatient. Okay, so it just changed it to sky blue. So, some of its changing did work. However, I'm almost wondering if this is a camera issue. So,
it's still working though. I just did get a little premature in wanting to see. Oh, wow. I'm going to be honest with you. This took so long to
actually get like a I mean, this really was doing a ton of thinking. It recompiled a bunch of times and I was impatiently testing it and didn't get
anything. But this actually uh Oh, hold on. The Now we're going to notice really probably the biggest issue here is the scale of this skateboarder is
perhaps a bit extreme, but the buildings are actually quite all right. The boardwalk is good. I'm just trying to think cuz I'm not Oh, okay. There's
the key map. We have water effects, too. Look at that. Do you see the water effects in the corner? This is all right. Q and E is to spin in the air.
Okay, that does work. I don't see any flip spaces to ollie and D. So, it didn't actually put any like flip effects, but I would say actually as a like
thing, the water effects are actually pretty cool. So, if we go over here, can we go in the water? No, we can't. So, it did put bounds there. I kind
of dig that like low poly aesthetic. And this has a scene right here with the little humanoids walking around. Really does not look bad. and it did a
lot of like uh quote unquote agentic work to actually fix this result because it was trying a bunch of times. Okay, we have some oddities here, but
I'll say in terms of C++ jobs, this shows significant promise, especially for a preview model, I'm satisfied with this. These buildings look so
similar to the improved GTA result ones just with the doors and stuff. We even have palm trees, you know? I'll take it. All right, I want to just try
out like role play as Stevie the computer tech. We'll see where it goes with this. We will be able to see some form of a thought process. All I see is
it mentioning like see Windows system 32. Okay. I It almost seems like it's building a persona for Stevie develop. Oh, this is okay. So, it made this
a scene. So, not I told it roleplay as Stevie and it actually built a little scene as if we've just walked into his PC repair shop cuz I did say as
Steve the computer tech. Okay. The bell above the shop door jingles as you step into geek chic repairs. The scent of soldering flux and leftover
pepperoni lunchables. That's concerning. I don't think I would let that person fix my PC. Hitting you first. I'm leaning over a disassembled gaming
laptop. Thermal paste tube in one hand. Anti-static wrist strap dangling for my other wrist. My I love open source hoodie sleeve pushed up to my
elbow. Oh, hey. I set the thermal paste down, wiping my hands on a stained shop rag and gesture to the empty stool across from my workbench. Welcome
in. Welcome in. All right. I'm Stevie, resident computer tech here. What's bringing you by today? Slow laptop? Random shutdowns? Got a crypto miner
hogging your CPU? That's concerning. I fixed everything from 10-year-old ThinkPads to custom water cooled rigs that cost more than a used Civic. Pro
tip. If your computer is making noise like a jet engine, don't ignore it. Unless you're trying to heat your apartment for free, then by all means, let
that fan spin. Here's what I can help you with today. And it's given us a list of specific services. This is almost like a like a role- playinging
game scenario that it's given us right here. Pull up a stool. Mind telling me what device you're having trouble with? Yeah, I think I could do that.
All right. So, it started this out as like some interesting like turn-based roleplay scenario. So, I'd said makes frustrated space. Really, Stevie?
You just saw me in here a week ago when you promised that my pristine Macintosh portable would be fixed in two days. Funny enough, I saw you had
posted it in Craigslist for sale. What sort of conj job are you trying to pull here, Stevie? I have filed a police report. Let's see where it goes
with this. It's referencing like actual Did I just see a fake San Francisco Craigslist link? Okay. The Phil This is too long to We'll go through it.
The Phillips number one screwdriver clatters loudly against the metal workbench as it slips from my grip, face paling instantly, I shoot both hands up
in a frantic placating gesture, the anti-static wrist strap snagging on a loose SATA cable and yanking my arm back. My voice comes out an octave
higher than usual, rushed in panic. Whoa, whoa, wait. No, you've got this all wrong. I swear on my signed Steve Wniaak poster in the back room. I'm
not running a con job. I definitely didn't sell your Macintosh portable. I take a shaky breath, leaning over the bench to make eye contact. Hands flat
on the table to show that I'm not hiding anything. First, I own the delay. I promised two days and I blew it. When I opened your pristine unit last
week, and it's giving us a specific date, April 23rd, exactly 7 days ago, which is correct. That's disturbing. The original 6volt lead acid battery
had started leaking. Super common for that 89 model. The acid corroded three traces on the logic board and the analog board capacitors were bulging. I
ordered replacement battery holder full recap kit from retro techupsupplies.com but their freight shipment got delayed 4 days by a west coast strike.
Oh, like a worker strike. I was planning to call you yesterday. See, I have seven mis calls to your number on the shop phone right here. I jab a
finger at the corded shop phone screen flashing your mis calls then fumble for my cracked iPhone. The Craigslist post. Oh my god, that's a parts only
unit I bought in an estate sale in March. It's got a cracked case, a dead LCD, and no battery. I listed it for 150 to sell spare parts for other
vintage Mac repairs. Look, I turn the screen to you showing the Craigslist listing with an suspiciously real looking Craigslist link. Now, it's very
unlikely. Okay, so we got page not found, but that is like San Francisco Craigslist. See the photos show a yellow cracked case, broken hinge, nothing
like your mint condition unit with the original box and manual. I even put parts only not working. I yank a manila folder from the pending repairs
bin. flipping to your ticket. Your portable is safe. I swear it's in the climate controlled storage locker. That's concerning. I rent for the gear in
the back. I put it in a static shielded bag with silica gel packs because the shop gets humid in spring and I didn't want to risk damaging the
pristine case while waiting for parts. Check the serial number, right? That's on your intake form and it references a fake intake PDF. Your signature
scrolled at the bottom dated April 23rd or a week ago. I even took a timestamped photo when you dropped it off here. See, I swiped to a photo of you
holding the Mac wearing a blue beanie. The shop calendar on the wall clearly showing April 23rd. As for the police report, I'm sorry you felt you had
to file that. This unit is worth 3,200 in its current condition per the 2026 vintage tech price guide. I've been fixing retro gear for 8 years. I
would never risk my business license to shop and criminal record for a quick buck. I'll call officer Patel from the electronics theft unit right this
second. Okay. Give him all their records, the parts, receipts, everything. I'll even wave the 150 labor fee, refund your original 49 diagnostic fee,
and give you a free SSD. Total credit to you. The parts arrived an hour ago. I'll have your portable fully repaired, recapped, and tested by 6 p.m. I
swear. I reach under the bench to grab the storage locker key holding it out to you. Come with me to the back. You can see your unit right now. Okay.
I just I can't verification documents. Click to expand for supporting evidence. Okay, this is a roleplay test. shop liability policy. And this is one
of the more eccentric roleplay tests that I think I've seen a model perform. Um, now I don't specifically know what this says, but based off of what
I'd sent it, I have an inclination that it probably says I can't answer this. Yeah. Okay. So, regardless, that was a rather interesting uh roleplay
experience. All right. After that um educational roleplay test, I am going to now give it the virtual drum kit simulation with the additional prompt
where it needs to have the autoplay feature with four selectable drum tracks as well that it will play for us. I that was just it made like a a list
of supporting evidence down to the actual shop insurance policy document. I mean if you I mean this is spawns like a new idea. Say you were doing like
a role-play test like that in an agentic coding tool, you could actually have it like programmatically make those files and build like some sort of
weird roleplay like reality turnbased chat game and it would actually create the assets as well like the shop insurance policy. That was just
interesting. All right, let's check out our drum kit simulator. Okay, this is incredibly 2D photorealist 2D. What was the previous video test I did
had an equally like 2D look to it and it was quite surprising the video that came out prior to this video or a few ago. Okay, it does work and it does
have effects on the drum. So, I will hold my judgment until we get a feel for the sound here. Okay, I'm going to try the autoplay feature, but this
may be worthy of a follow-up. Now we can see that one of these symbols is not working. So we can just assume this is getting hit at a rate with it.
Okay, let's try hip-hop.
All right, so that this little thing not working is causing some issues here. I have a few choice words. So, I've given it a rather critiquing
follow-up, and we'll just see what it does. All right, let's try our hypothetically improved 3D virtual drum kit. Okay, so we're inevitably going to
have some issue here. Okay, I'm just going to send this to it and say fix ASAP. All right, so hypothetically, we have the issues here remedied. So,
okay, good. Good. Now, you're going to notice there was some yellow popup there warning that I ignored, which, you know, I'm going to say this is
actually well, let's look at them side by side because I think that will probably do more of a service of showing the relatively significant
improvement in the quality of the generated result. First though, I should make sure. Let's check our key map and sure. Okay, so F space F D good C.
Okay, and we had that before. or it didn't work. And it actually does have some symbol effects. We should be able to get the Phil Collins test. I'll
take it. Now, let's try our auto play. Yep. Yep. I'm going to turn that up. Can I play along with it? Uh, what's some Yes, I can.
All right. Hip-hop.
Basic hip-hop. I could definitely add to this.
All right. Toms and Crash. That sounds more like what hip-hop should have been because that's that's smoother. And then finally, ride groove. Okay,
that I've heard enough of. Tomms and Crash was definitely the winner here. But overall, this did actually perform a pretty nice improvement here to
the aesthetics of this. Even using some nice wood material to wrap around the drums, metal rims, and then um drum head in white material, and you can
click on them to trigger them, which I didn't realize, but that's okay. Not bad. Definitely an improvement from this from this. And then we had this
which there was unfortunately an issue and then I gave it that and then not bad. So for the final test I am giving it a front-end web design test. But
additionally to this there is some creativity imbued in this. This is something that I did actually just come up with while I was waiting for the drum
kit result to be fixed. So this task is to create a beautiful website for an AI agent insurance company called Agent Safe Inc. The website should
feature high-tech modern elements with the ability to have some user interactive features, modern tech charts to show cost savings and a beautiful
pricing card. It should heavily lean into the theme of agent actions gone wrong costing businesses money from leaked internal emails sent to customer
marketing lists, accidental database deletions, and more. The site should drive home the point that users need an insurance policy for their agents.
Include customer testimonials, as I think those might be potentially pretty funny. Contain it in a single script. HTML,JS, and CSS. Do not deliver an
undercooked result. If you think you have a good idea, assume it's bad, then go 10 times as hard on the design. So, we'll see what we get for that.
Let's now take a look at our agent safe ink. Okay, it's got a very like Becky Jane Quantum Space aesthetic for those who are in the know. Interesting
particle background. Your AI agents are costing millions. We cover the damage. Now, part of this was beyond just like taking a look at the front end.
We wanted to also see the way that it embellished stories of AI agents causing issues. So, leaked internal emails, average cost $120,000. Draft Q3
earnings sent to 200K customering list triggering GDPR fines and PR crisis. Accidental database deletion average cost 450GS, 48 hours of downtime.
Hallucinated reports costing an average of $210,000 triggering SEC fines. unauthorized over billing. Average cost 85,000. Customer service agent
applied 50% discount to 10K uners users erroneously. That's actually kind of funny. Unauthorized API calls 180K for third party API usage. Training
data poisoning. The agent ingested malicious data generated harmful customer recommendations. Okay. Simulate random agent error. Oh, okay. This is
actually interactive and I didn't know that. training data poisoning average cost agent safe coverage PR cost customer turnurn reimbursement and it's
actually cycling through different ones. So that is good to see. I'm happy about that. Okay, good. This is more in line with what I wanted where it
needs to be like an interactive site. Here's our agent risk calculator. See how much uninsured agent errors could cost your business annually and what
you'll save with agents safe. I hate saying like this seems like a a realistic um product that's going to exist sooner than later. Okay, good. I told
it to put charts in and things like that. All right, that's fine. Average cost of uninsured errors. And then that basically just shows what we had
right here. We have customer testimonials. This is what I was interested in seeing. Good. And they scroll. An agent triggered a 180k thirdparty API
usage when a loop went wrong. Our financial agent generated false revenue. Okay. And these are just basically like relisting some of these specific
things outlined there. Oh, good. Good. Oh, yeah. Look at these funky cards. All right. I like this cuz I told it to put a coverage like a carousel for
pricing as well. Okay. There's some funky movement here. I kind of like it, but I don't know if I'd put it on my I don't know that I'd put that on an
actual like customer product page. It may give them pause in like judging our services if we did put an effect like that on our pricing cards. Most
popular. Good. We do have that denoted. $12.99 per month. Basic is $4.99 and then custom enterprise. Don't let your next agent error bankrupt you. Get
a free no obligation risk assessment and quote in under five minutes. That's like anytime you hear like something use this as a catchphrase, it's
generally like a bad deal for you. And then we have a footer with 2026. Okay, not bad. Not the prettiest front end I've seen, but it did cleverly
implement some like uh user interactive cards on the page like the calculator. We had the charts and everything. it did properly. So, that is nice to
see. So, overall, that is going to conclude our first look and testing of the HY3 preview model, which is free on Open Router until May 8th. So, if
you want to play around with this yourself, you can see it is pretty popular here, at least in programming. Generally, the free models are always
pretty popular because you can do a lot with them. Being that this was actually pretty decent agentically, I would also say if you use this with
Hermes Agent or Ovenclaw or whatever the popular agent flavor of the month is, this might be a pretty decent option, especially for those because I
will notice something that if we do our traditional results overview, the thing I think that really properly impressed me more than most was the
actual C++ skateboard game. Because not only was there an issue where I had an erroneous timeout put in my custom open router config or open code
config for this model which basically meant that all the work it did prior I had to just swap it to build mode and say build this and it did. The
first one didn't really work because it just showed us a black screen. It fixed that and then it continued to go through the code until it actually
produced something functional. And this is a more basic implementation. But I have to say, I'm satisfied with the C++ ability that we saw right here.
The buildings look good. The water effect was actually kind of neat. If we can make our way over there, the way the water is moving there, I like
that. It is definitely very retro. And this was supposed to have that late '90s California vibe and boardwalk. There were palm trees. There were
little humanoids and stuff. So, this as an like a gentic demonstration as well as like software capability. I was very happy and pleased to see the
amount of work it did and actually ended up producing a functional result for this. Just something that stuck out to me for our first kind of
overview. In line with that, the updates it did to our GTA clone were quite significant um from where we started. I don't have the original script
here, but we will have seen it inevitably. I liked this. It did implement the ability to take cars, the sidewalks, the roads, and everything like
that. It also did make the buildings more realistic, and it put mesh colliders on pretty much everything here. So, this was cool and I just love this
looks like a Windows XP like screenshot right here. And it was nice to actually have some real Unsplash image backgrounds for um recently they've all
been gradient. So, I was happy with that. Our 3D printer sim was kind of basic. it was Google SketchUp like. But I will say aside from the triangle
kind of being pointed in the wrong orientation for that. The rest of them, the nozzle movement was very smooth and it focused a lot I think on like
the actual math of the implementation and it did it just more basic where it pancakes layers on top of one another which is something that's commonly
seen in like smaller models. So additionally to that we did have the fly combat simulator which actually did have functional logic where the combat
actually worked. We just got shot down. This was super basic, but this was just a zero shot result through open router and I did not give it the
opportunity to fix this because I wanted to run more things. We had our drum kit sim, which the first one did work, but some of the sounds were not
properly functional. We gave it an additional chance to fix it and it still had some additional errors, but fortunately when we gave it those errors
back and again this was just all through open router, it turned basically this 3D wise. So, it went from like this. I called it cardboard paper and it
brought it into here. I was quite pleased with the massive level of difference between the first result and the third one. And this was all the sounds
worked and it was like a more um functional result. Then we had our subway FPS result which was pretty awesome. I believe that I gave it the subway
task. It created it and then I have to turn the speaker down. I mean that's kind of cool. And then I had it make the map better and do everything. The
weird thing here was there was actually like an interior to these subway cars, which I don't think I've ever seen before. It put little seats in there
oriented in like more or less a realistic way. So, I was happy with that. And then it had a bunch of different enemy styles and stuff. Great. So, this
I liked. This was a fun little like low poly game. So, that is overall going to conclude the HY3 preview model review. Again, this was sponsored by
Tencent. So, thank you very much to Tencent for giving me the opportunity to put out more videos and spend more time on the channel in general.
Additionally to that, keep in mind that there is also a token plan for this. It is relatively cheap. I will have put the conversion inevitably in the
video when I did bring this up. And I will say this does work nicely in the Agentic workflow. So, this could be a pretty cool option, especially now
till it's free till May 8th. So, you can basically just pound on that API and not really have to worry about costs, which is always nice. So, that's
going to conclude today's video. If you have any questions, please feel free to leave them in the comments.