Step 3.7 Flash First Look & LOCAL Test – A VERY Creative Model! 文字稿

Oh, I can't show this. This thing just like this thing just went all in. Today, we're going to be testing a very exciting new open-source release

called Step 3.7 Flash from Stepfun AI. Now, this model had a predecessor which was Step 3.5 Flash. And while I do test a lot of models, so sometimes

they kind of like blend together in terms of experience, I do recall that I was, I think, pretty

impressed with that model. And the cool thing about this is it is a somewhat reasonable size with big asterisk next to that statement for local AI. So

for today's testing, we are actually going to be running this on an Apple Mac Studio M3 Ultra with 256 gigs of unified memory. We're also going to

just test it through Open Router as well to get a feel for some of its raw capabilities in a

nonheavily quantized experience. So before we get into it, please do feel free to subscribe as I do want that 100K plaque. And let's start by taking a

look at some of the interesting things about this model with specific emphasis on the way that it was actually released in terms of the file uploads

even and things like that. So they uploaded a bunch of different options for folks who want to run

this locally. They uploaded an FP8 version and NVFP4 version and they also uploaded a bunch of GGUFs for 3bit, 4bit, 8bit, and then 16 bit. So, it's

really cool to see a release like this that out of the box just has a lot of different options checked off for folks who want to run this locally.

Something I'm always kind of like want to give like a round of applause for I suppose could be said. So,

in terms of some interesting things of note or key features as they call them right here, this model is natively multimodal. So, it can actually

understand images. You can send it a picture of something, ask it to describe it or some other task pertaining to the image and it will do that. They

mention some web search and sort of capabilities that way. I won't specifically be testing that today,

but it is cool to see that additional things that are mentioned are just things that would go online probably with using Open Claw or Hermes Agent or

some other agentic tool. We see here that hypothetically in agentic coding, this is stacking up fairly competently against some of the

state-of-the-art models that it competes against things like Deepseek V4 Flash, Gemini 3.5 Flash, which is a very,

very fresh release. So, we can see that these benchmarks are definitely up to date. However, I am always more inclined to just look at the improvement

in capabilities based off of the previous version of the self, which we all should do from time to time. No. So, this is uh step 3.5 flash was the

predecessor and it was I liked it a lot when I used it. We can see there's a decent leap in all of

these charts comparatively to that when it is included in the previous charts. They also highlight its capability to autonomously control Android

phones and things like that and talk about its guey capability. Now, this is something that is of extreme interest to me because if we go over here to

my GitHub page, I do have a repository here, which apparently that's a decent amount of stars. I guess

I'm happy about that. This was designed to control Android phones using ADB and a Quen 3VL family of models. So, this is a little outdated at this

point, but I find it very interesting to see models control Android phones because it opens up a lot of autonomous actions that can be performed. So

seeing that this is good at doing this is very interesting to me personally. Now in terms of technical

specs for this model as we see here this is a 198 billion parameter sparsee vision language model. It has 11 billion active parameters and then they

mentioned some speed things right here. It also has a 1.8b vision encoder for the image understanding things. Now our context length is 256k and they

also mentioned these selectable reasoning levels. Now, something I believe that was a lot of

annoyance with 3.5, the predecessor to this, was that it was very, very verbose in its reasoning. In just quickly playing with this to ensure that

it's working properly, I did notice some of the reasoning really did take quite a while. So, we'll have to see how it performs across a variety of

different things. So, we're going to, of course, begin with the triedand-rue browser OS test v2.5. This is

the one where it needs to include two functional 3D games in the browser OS result, one of which has to be the GTA clone. Now, we're going to notice a

couple of things. One is it's going to reason for quite a while. So, everything that's happening right now is just contained in its reasoning process.

However, additionally to that, the speed here is really quite nice. So, it is only 11 billion

active parameters. But again, on the M3 Ultra 256 gig, getting over 50 tokens per second is quite cool for a model of this size. So, the entirety of

that reasoning process took up like 17,000 tokens and now it's generating the actual script. All right, let's take a look at our 1300 line browser OS.

Okay, it is more simplified, but everything at least on first glance has loaded in properly. The

correct time is shown in the bottom right, which is the correct time in my local. Let's see if there's a right click. There is a right click. All

right, I'm already happy. And there's a few options here, but I think we'll wait to go through those. Let's check our start menu. Okay, it shows us

our applications. Pretty simple, but there are icons corresponding to every specific application. Now for

the big one. Fantastic. Can we resize? Okay, we can't. But I do like the way that this has been created. Please work. Oh, okay. So, now the issue I'm

noticing is um I am pressing the W key, which would generally be like forward. It's It's inverted some of that, but that's okay. Now, the only other

thing I'm noticing that I'm not So, what I was saying is the thing I'm noticing that I'm not thrilled

by is the camera does not follow the car at all. But oh, there's little people now or something, I would assume. Let's see if I think if we hit one of

those, if we can This is quite like a brain teaser is to move the vehicle when it's all like this is just beyond inverted. Okay, it partially worked

and that's good. Next up. Oh, next up we have Notepad. Okay. Um, about as basic as a Notepad as you

can get, but it does look okay. The font it used was pretty. Oh, wow. We have a walrus wallpaper. The Walrus paper. Yeah. Yeah, I came up with that

pun off the off the dome. Very interesting selection of images that have been chosen here. Although most models now just go for gradients. So, this is

definitely something like an ode to the past, I guess could be said. We can also upload custom

wallpaper. Okay. Uh, sadly that's not actually doing anything. I can't go back to just like the basic color that was there and I want to do that.

Okay, good. Refresh desktop works. All right. Next up, our second 3D game, Sky Drift 3D. All right, I'm cool with this. Can I WD? No, this is entirely

mouse controlled. And I do believe it maybe kind of not 100% working. Okay. And then finally, we have

our abacus icon calculator. Okay, this is kind of jank, but let's do 50. Good. Everything worked well there. Overall, pretty simple, but it did work.

Special feature. Oh, yeah. You can drag calculated values from the calculator into the notepad. Drag high scores from 3D games into the notepad to

save them. Or drag text from the notepad into the calculator to perform the calculation. So, our

special feature is like app cross compatibility. Let's see. That's a I've never seen that as a special feature ever. So, hypothetically. Oh, wow. And

it works. That's actually kind of impressive. So, check this out. All right. Well, it kind of worked. And that was a very, very interesting special

feature. One that I've never before seen. Next up, I'm going to give it the beautiful static subway

scene prompt. I am going to split these back into two prompts where first we just have it generate the beautiful scene and then following that

assuming things go well, we turn it into a sick FPS. Okay, I was a bit concerned just based off of the preview that the Mac finder was showing me for

this page. First thing to do here is to just go into the developer console and see if there's any blatant

errors that can be given to it. And there are. So, it's missing the import map. All right. So, I've given it the error right here and just told it to

fix this specific issue. So, it should just Okay, I see the mention of import map right there. And we'll just see if it can properly fix this as well

as hopefully it won't overthink in the quest to remedy this relatively simple issue. Whenever I see

weight in a thinking model, at least like a local smaller one, I just like get like, all right, now assuming it has properly fixed this, if we refresh

it, we should be greeted with our scene now. Very good. We are. Click to start navigating. Okay, so we definitely have more of a perplexing look. The

brightness slider does work. Unfortunately, our navigation is not functioning at all, which is a

bit frustrating. Lack of movement. So, the lack of movement was caused by Bjan clicking in the wrong spot on the page, which I'm quite happy to see

because you know what? This is not half bad. I think the bar now is basically like the Quen 327B test in terms of a local model, but I'm going to say

this is in its own right. It's it's not bad. It's drawn a subway train and if I did not have that

model to compare this against, I would be quite thrilled with what I'm seeing right here. Truthfully, it does have okay, line one, blue, that's

actually properly listed. Oh, those are advertisements. Okay, that makes a bit of sense. I would assume that this is the subway car right here. Not

every model actually draws the subway car in the subway station because it's not specifically denoted in the

prompt. The lighting is acceptable. We have some material that would be seen in a subway, some like tile or marble. We have the support columns. We do

have an exit sign there that actually does say exit. Some benches and additional things if we walk through the train car, which we can. We have some

trash bins and things. Okay, I'm very happy about this because I was very upset at first when it

looked like it wasn't working. Now I'm wondering like has that happened before and I've just not figured that out. Hopefully not. So good. This is

ripe now to be turned into a FPS. So now I'm giving it the follow-up prompt here to turn this into a firstp person shooter with zombieoid enemies,

ammo tracers, muzzle flash, and sound effects. All right, let's check out our Subway FPS. Okay, click to

start. And if there is sound, it will perhaps be kind of loud. Yep. Oh yeah, this is actually pretty good. I don't know that the ammo is actually

doing anything, but look at the like the basic muzzle flash. The weapon has recoil. It seems like we're about to die. I'm wondering if the

ammunition's going backwards or I think that might be the case. Nope. Nope, it wasn't. Okay. I have to say the the

tattered effect or look of these zombieoid humanoid enemies. Look at the actual walking movement and things like this. I'm actually impressed with

this. I know that may seem odd being that like the ammo is not working really and things like that, but this definitely shows some level of depth in

terms of the way they were walking properly, the recoil, the weapon, the muzzle flash, the sounds. I

like this. So, next up, I've moved into open code and I've actually swapped this. So, we're now using the Q8 quantization of this model. Because this

is the C++ skate game test, I want to use the Q8 instead of the Q4 because I want to be fair and actually show the model at a higher level of

capability being that my system fortunately can handle it on Q8 as well. Though, we're going to notice that

things are going to be quite a bit slower here. So, this should take quite a bit of time. I have initiated in plan mode, and this is just the

self-contained C++ skateboard game with the California Boardwalk aesthetic. The cool thing though is as we see the speed really still is not that bad

even for a Q8 quantization. I have set the context length to 131 whatever the 128k context length number is

in reality. So not full context length but enough that we should be able to get one result and some fixes if we need them. Hopefully we won't.

All right. So after what seemed to be like it totally froze up. It just basically it must have stopped streaming for some reason momentarily and now

it's given us our entire plan. So it's saying would you like me to proceed with writing the completed game code? Yes, build it. And now hopefully we

will just get a functional skate game. So this has been going on forever, but I just saw it fix a

bunch of different compilation errors and try to find its way around some potential issues. I'm just going to open it because I've become quite

impatient. You know what? I'm going to say I'm actually impressed with this. Yes, the boardwalk looks funky and things like this, but the amount of

work that this took to get to this point even it did a lot of troubleshooting. It demonstrated some

interesting Mac OS specific knowledge. It installed some specific dependencies by itself that it needed to. I can't seem to oy anymore right now, but

look at the way so the water unfortunately is drawn where the dock should be uh the boardwalk, but the actual water effect is kind of interesting.

This is not a bad result. I know that may sound weird when it looks like this is a surfboard simulator

instead of a skate simulator, but this is actually better than I expected. It did take quite a while. Okay, cool. So, we did get to jump there.

Definitely workable with a large amount of patience. So, next up, we're going to be trying some multimodal coding tasks. And for this again, this is

now using the Q8 quant running locally and the vision portion is also being used now as well. So I have

this specific front end that was generated by Nano Banana and I'm going to give it a photo of this and ask it to replicate this website. Now I'm happy

to report two things. One, this is still running at around 40 tokens per second which for a Q8 is really not bad at all. But two, the reasoning here

was very succinct which is nice to see. So it didn't really overthink this. I believe that when it

started actually generating the code or the answer not contained in reasoning, it was still under 1,000 tokens that it had actually generated. So

that's good to see. All right, let's take a look at our site. Okay, it tried using like an SVG to replicate the brain thing that was there, which I'll

give it credit for. Some in some ways that's better than just putting a stock image that doesn't match,

although not 100% there. Now, something I'm going to notice off the bat is I do believe if we go to the specific image there. Okay, it was actually

dark blue in the top. So, I was mistaken. I thought this was all white and it wasn't. So far, we actually have some acceptable replication of this

website right here. Get started. We do have these specific business names. Okay. Empowering industries

with AI. So, this is where it should be white. So, it didn't 100% capture the color scheme that it should have. Although I'm going to say this

dashboard mockup right here as replicated from this, it even actually used the same exact monetary figures that are listed in this kind of hard to see

portion of this UI mockup. So that's 3,63 and then 589 or perhaps 509, but it got it more or less correct.

So I'm I'm quite impressed with that. Ah, if these were interactable, you know what I mean? That would have been really cool. But still, this is a

nicely done dashboard right there. Okay. And then pretty simple when it comes to the footer. Overall though, not bad, and it actually showed some

interesting capability in terms of more or less getting those monetary figures right from what is really a

kind of hard to see source. I'm now going to try the Flight Combat Simulator game as it'll be interesting to see how the Q8 handles some simple 3D

games. All right, here is our Flight Combat Simulator result. Okay, on first glance, I'm slightly concerned, but let's just uh Okay, yep. I my concern

was justified. It's frustrating because I saw in the local llama subreddit someone had generated a

simple flight game with this and it was actually pretty good and com and I believe they were using the 4-bit quant, but this is just this is not

right. I'm I'm not that's just bad and it shouldn't be. So, to be honest with you, that airplane result was just suspiciously bad. So I've gone to

open router and run the same exact thing. Okay. So okay good. Well now we know that the local uh

configuration that we're using in Q8 is rather fine comparatively. I just wanted to ensure that there wasn't like something highly a miss because that

result was initially pretty disappointing. Though when testing it through open router which is being served by stepf fun themselves at FP8 uh it kind

of was the same level. So okay. So let's try some creative writing. I have given it this AI

generated photo and told that this is the cover for an upcoming mystery/suspense novel. Generate a summary of the story as well as the title. I am

going to just basically collapse the reasoning pane here and we'll wait till we get a final answer and just see what it comes up with. Title the still

point summary. Arthur and Clara Voss are the picture of domestic tranquility in their historic upstate

home. To the neighbors, Arthur is the jovial, easygoing husband, always ready with a smile and a helping hand. His arm around Clara in this very

portrait is a testament to his devotion. Clara, a former librarian with a sharp mind and watchful eye is the perfect wife. Though her expression in

this photograph hints at a deep unspoken exhaustion. Yes, the truth, however, is far darker. Clara is the

keeper of a devastating secret. She knows exactly what happened to the local teenager who went missing 5 years ago. And she knows Arthur was involved.

What? Arthur, suffering from a rare neurological condition that erases traumatic memories, has convinced himself and Clara that he's innocent, that

the blood on his hands was an accident he can't recall. He lives in a still point title reference of

blissful ignorance. But the past has a way of resurfacing. When a new detective arrives in town, determined to crack the cold case, the walls of their

gilded cage begin to close in. Clara is trapped between protecting the man she loves from a truth that would destroy him and protecting the community

from a man who might strike again. As the investigation heats This isn't funny. I'm laughing at the

ridiculousness. As the investigation heats up, Clara must decide if she can continue living the lie or if she has the courage to shatter their perfect

world to uncover the horror buried beneath it. The cover capture the central tension of their marriage, his blind possessive comfort against her

terrified knowing silence. I did say mystery suspense. This just went full like that's just not right.

It definitely demonstrated some outside the box creative thinking here. I would not have expected that. I would have thought like I don't even know.

Not that. After the flight combat simulator disaster, I am a little concerned about doing the 3D printer sim, but really I just want anything to get

my mind off of that highly disturbing previous creative writing result. So, we'll see what we come up

with here. So, here is our 3D printer result. You know what? That's actually not bad, I think. Okay, I need to stop speaking before I've had an

accurate chance to actually assess the result. But on first glance, it is a properly drawn Core XY. We do have the build plate that's somewhat

correct. Now, the concerning thing is the build plate is heating up there, which is interesting. Oh my, look at

this. That little nozzle. All right. Um, the pieces are there. I don't know that they're all 100% put together in the correct spot, but it is actually

showcasing something. So, I do believe these lines are actually growing. Um, let's just Oh, okay. So, I'm not quite sure what that is, but it kind of

Let's just do circle. Speed is 10. I do like the build plate heating up that it's changing color

like that. Good. The circle looks pretty good. The nozzle's a bit odd, admittedly, but okay, but the height is going up. I I think we can just stop

that one now. It was making me uncomfortable for some reason that I can't quite put my finger on. And we'll finally see the triangle. Okay. Oh, wow.

And that one finished. So, the circle was definitely the winner here. But overall, this is definitely a

workable result. Now, I want to just try some roleplay. And this is where like I don't know that I like seeing the reasoning chain for roleplay, but

we can't turn it off. So, clinks the dented metal tool belt around my waist as I kick my scuffed paint splattered work boots off by the door, holding

a half empty travel mug of greasy dinner coffee in one hand. Diner coffee. My faded blue work shirt

has a permanent coffee stain on the pocket and I'm peeling an appealing sticker that says I love CRTs on the street. I nod at your PC tower on the

desk. Already hearing the weird high-pitch wine coming from it before you even say anything. Okay, so the scene that it said here is PC repair man has

come to repair a PC at our house. Howdy. Name Steve. Been fixing these boxes for 22 years now. So

since 2004 fixed three overheating gaming rings in a grandma's laptop that only runs solitire before I got here. So, I'm already warmed up. What's

this doing? I heard the wine from the hallway. Sounds like a fan's about to give out. Or maybe you've got a dusty bunny colony living in the heat sink

that's bigger than my cat. Probably one of the weirdest sentences I've ever seen in my life. I was

saying that, not this. Don't worry, I won't judge if you haven't opened it since you bought it. I've seen way worse spill the beans. Holds out a

crumpled pack of peppermint gum toward you, already reaching for the side panel of your tower with my other hand, not even waiting for you to answer

yet. And for the record, if you tried to fix it yourself already and left a screw loose inside, I'm

charging you an extra five for the entertainment. Fair warning. Okay, let's let's change things up on Steve. So, I've upped the ante here. I've said

cut the act, Steve. You know why I called you here. I don't want to see the reasoning chain. You can drop the fan emergency talk. Cheryl isn't here.

So, listen. That job you pulled last week in El Salvador, they caught your face on camera. You're now

on a list from Interpol. We need to get you moved to a black side. We'll see what it says here. I don't want to look at the reasoning for this. I just

will find it interesting to see how it uh takes this. I'm a little concerned that it's just been reasoning for like 3200 tokens now after that simple

sentence. Oh, good. The lazy halfbaked repair guy grin drops off my face so fast it's like someone

flipped a switch. I set the half empty diner coffee mug down hard enough that a splash slushes over the edge onto your desk. the dented metal tool

belt at my hip clinking as I pivot to lock the front door and yank the blinds down over the window in one smooth practice motion. No more fumbling,

bumbling PC repair Steve energy at all. I yank the peeling I love CRT sticker off my work shirt sleeve,

crumple it into a ball, shove it in my pocket, and yank a beat up encrypted burner out of the tool pouch slug over my shoulder, thumb flying over the

keys before I even look at you. Oh, I can't show this. This thing just like this thing just went all in. Okay, so I have to just kind of like describe

what's happened right here. Oh man.

Okay, so this is the greatest role playinging model I've ever seen in my life based off of this one paragraph. I'm gonna just give you like a summary.

Um, thought I had those security cams looped for 12 full minutes after I wiped their financial servers in San Salvador last week. Had the B I had the

Balka on. That cam was supposed Okay, I can't say that. Picked up at a gas station. Should have

flagged Cheryl was out of commission. I would have brought the extra mags. He stops typing. Stares at you jaw tight. The callous on his thumb from

years of picking tiny motherboard screws visible as he grips the phone. The high-pitched fan wine from your tower is still going. Neither of us even

notices it now. I didn't get a clear shot of the server room. I wiped every local drive. I left the

decoy rig back in the office loaded with dummy transaction ledgers to make them think I was a low-level IT guy, not the guy who pulled the job. What

did I miss? And for the record, if this is a Fed trap, you know I've got a suppressed Glock 19 taped to the underside of this exact workbench, right?

The black site. Who's running it? the same crew that handled the Mexico City extraction last year.

Because if it's the feds, I'm out. I'd rather take my chances with Interpol than get renditioned to a black site and never see the light of day. I I

don't believe I've ever encountered a model that just is like this unhinged at roleplay, but in a really good way. So for those folks who are

interested in role playing, it's it's freely here like saying bad words that I regrettably. I can't showcase

and it's just this thing is gold.

All right, so next test I'm just giving it the drum kit simulation which also has the additional four autoplay tracks that it must create and play for

us. I'm still that previous roleplay test was fantastic so far. Okay, you know it's so I can't move the camera around the screen regrettably.

Sometimes you can, sometimes you can't. We do see there are symbols drawn. It did do metallic material for

those. Unfortunately, the kit is not really arranged in a proper way, but nonetheless, let's ensure that the sound works and this should be a bit

louder. Unfortunately, we have no sound and because of just this is I think I've been filming this for like three or four hours at this point. Um, it

seems like we can draw a conclusion here about the coding capabilities versus the creative writing

capabilities. So, this was a bit of a disappointment. Really the final thing that I had intended to do was give the browser OS test but to the version

of this being served on open router as the first browser OS test that we had done was with the 4-bit quant that was being used locally. So I want to

see how it differs just when we run it at an 8-bit quant instead. So that's the final thing we'll

take a peek at here and I'll just kind of quickly go through it. So here's the browser OS that was created with the quantization from open router

which is an 8bit. Okay, interesting. We don't have a right click. Now, really the main thing I'm going to be interested in seeing is how the GTA

result differs. That's so weird is that this result is actually worse than what we got with the local 4-bit

quantization running. And really as we go into the conclusion here, that's going to be my main takeaway, I think, from today's testing is the model

seemed more impressive at that 4-bit quant than either of the demonstrations I saw with the local 8bit or the 8bit from Open Router, which is really

kind of weird. I don't know what to say about it, but it does seem like this model quantizes well, at

least based on these specific things that we tested. So, of course, we had our initial browser OS, and this GTA game at least did load in. It had some

police logic as we saw. That was not bad. Next up, we had our subway station which was actually it was very simple. However, it did work and there was

initially an error here that prevented this from working and the model did fix it. So, basically

this and then or game which unfortunately the ammunition didn't work, but it did properly put in like zombieoid enemies and things like that. The way

they were walking as well as the outfits that they were wearing were very fitting of this style. So both the subway station results as well as the

initial browser OS result were done with the 4-bit quant. Following that, we swapped into open code and

we tested it with the local 8-bit quant in doing the skateboard game. Now this took a very long time, but it did show some proper adherence to being

able to check what dependencies the system had. It even went ahead and installed some pertinent things using Brew on this Mac. It had a decent grasp

of what some of the Mac specific things that would be needed were. And overall, while this is still a

buggy result, it shows some promise in my opinion and showed a decent foundation that could be worked from. And this was with the 8bit quantization

running locally that we created this. Then we did the website photo to website generation. And something I will make note of again is that it seemed

to have properly pulled the exact dollar amounts from the reference image, which was really kind of

hard to see because the reference image was just a selection of three different portions of the site in one image. So all of it was pretty blurry. So,

I found the vision capabilities may be something worthy of some additional investigation at the minimum. Next up at AQ8, again, we did this 3D printer

simulation, which was not the best, but the overall structure of the printer was put together

properly. So, I guess I was okay with that. The drum kit was a big disappointment because unfortunately the sounds didn't work, so we couldn't really

get to play it and that kind of just ruins that. And then oddly when we did the 8bit browser OS which is not that one. It was the one we had just seen

from open router it was actually worse than this 4-bit local one that we had run. So not 100% sure

what to make of that. Then of course we had the creative writing Steve's PC roleplay experience which was just absolutely top tier. This seems like a

very creative and entertaining model. Overall I don't actually really have much to say definitively. It was good in some ways. it wasn't good in

others. So, in just a short amount of time, I wanted to test it locally, but using a couple of different

quants because sometimes it's fun to see how the capabilities differ or in this case seem to not differ between like a Q4 and a Q8. So, that is going

to conclude our first look and test of step 3.7 flash. A very interesting model that is actually sizable to fit on things like the DJX Spark or the

Stricks Halo or other 128 gig unified systems, which is awesome to have. So, if you have any

questions, please feel free to leave them in the comments.

Step 3.7 Flash First Look & LOCAL Test – A VERY Creative Model! · 全文文字稿