Today we're going to be taking a look at Docker sandboxes, which as the name may be indicative of, is a sandbox for our AI agents to be able to work
from within, which will allow them to perform actions safely, but non-destructively as opposed to having full access on our native host system. So
this is a very interesting tool and it really does coincide with what is being spoken about right now
pretty frequently in some of the AI and security circles which is just the whole concept of AI and its impact on cyber security as a whole. Now, with
the recent introduction of the Claude Mythos model, which is potentially looming on the horizon for folks who are not handpicked to be able to use,
but don't quote me on that cuz I cannot confirm or deny that. There has been a ton of folks talking
about using AI for cyber security purposes. But a broader theme is just talking about what form of accessibility or capabilities we allow them to have
on the systems that we're actually running them from within. And this is the whole thing that Docker Sandboxes is designed to approach, which is
giving our agent full access. So, as we scroll down here on this page, we're actually going to see the
concept of this yolo mode or something of the sort, where this is designed by nature to actually allow our agents to autonomously perform all of the
actions that they otherwise would want to were they in a dangerously skip permission state or something of the sort. So, this is designed to allow us
to not have to babysit and click approve every time the agent actually wants to perform an action,
which depending on how much you use them can definitely be kind of tiring to have to deal with. But in a safe environment where we are the guard rail,
we are basically telling it what it is allowed to access. And there's a whole bunch of other security things that we'll touch upon related to this
here, but it's really just an interesting feature to have. And for today's video, we're going to be
testing it on a Mac computer. Now, before we go any further, I do want to say that this video is sponsored by Docker. So, thank you to Docker for one,
supporting the channel, and two, having an exemplary taste in content creators, if I do say so myself. But more seriously, it is really quite simple
to actually get started with this. So, as I may have mentioned, we are going to be working on a
Macintosh system today. This is also prominently listed here as working with Windows, but if we go into the docs, we see that it also does have Linux
support as one would expect it to. Though, because we are on Mac, we are going to be focused on the Mac installation command. And really, this is not
going to be as much of an installation tutorial as it is just a quick runrough of the setup.
Partially because this is just really extremely easy to actually install and set up. It does culminate in just running one specific command right
here. I do apologize for the um rather not attractiveness of my terminal here, but this is the one hangup that we may see if you are on a Mac that
does not have Brew installed on it. If you go to run this command, you are going to be met with this issue
where brew is not found. So I did just want to showcase that so I can explain to folks how to get past that so then we can get started with our
sandboxes. So if we encounter this issue here where brew is not on our system and keep in mind this is only going to be a Mac OS specific issue that
we would encounter right here. It is really simple to rectify. We will just go to brew.sh this website
right here. And from there, there is a single line command that we will go back to our terminal and enter here, which will basically handle all of the
prerequisites for actually using this on Mac OS, which is kind of cool. Now, interestingly, you don't actually even need Docker Desktop installed to
use these sandboxes, which I just find cool as it's simplifies the installation process, I suppose
could be said. So, now that Brew has been installed, our next step is to go back here to the Docker page and just copy paste this command back into
our terminal. You can also just use the up- down arrow keys to cycle through previously entered commands if you would like. And we now see that our
sandbox installation is currently in process. And after a short bit of time, we will see that SBX was
successfully installed. SBX referring to the Docker sandboxes, of course. So our next step here is really quite simple, but it does culminate in
something that I would make as a suggestion to anyone interested in doing this themselves, as this is not really fully like a walkthrough tutorial.
The documentation for this is pretty verbose in a good way. So it gives you a lot of specific information
about using this. I would very much suggest that anyone interested just read through the documentation here. It is just this specific like area in the
doc. So if we do this drop down arrow right here, all of these things and up are specific to Docker sandboxes and it just gives you a good bit of
information about specific things for setup and what kind of agents we can configure and things of that
sort. So, I suppose the only other real prerequisite here is that you do need to have an account and you need to be able to authenticate just through
the SBX login command right here, which I am going to do. There's not like a paid thing or anything like that. You just sign up and then that works.
Now, okay, so I do have Chrome. Uh the default browser on this system is Chrome. But the thing that
we're going to notice that I did want to mention before I was distracted by a different browser opening is that it is going to show you a code right
here to have you verify that this is also what is appearing in the terminal or referred to as a one-time device confirmation code. So I can just
confirm that this is correctly the specific code that I see. And then once you click that you will just
log in with your credentials for Docker. And once you've logged in you'll just get this message that you're all set. But the terminal will then change
to reflect that you have logged in. So once we've authenticated, the next thing we see is a network policy allow feature. And essentially what this is
we can choose from the most open to the most stringent lockdown variation of the network policy
for the sandbox. Really the simplest way is to just go to the specific policy page. Right here we can see that open will just allow all outbound
traffic. Balanced is default deny, but it has an allow list for a bunch of common things that would be related to what the agent in our sandbox may
specifically need. things like AI provider APIs, package managers, etc. You can also modify this policy by
just allowing specific domains that you would then whitelist. And then finally, we have locked down where all outbound traffic is locked including
model provider APIs. So this would be something um you would need a special use case for that because you would not really be able to have this thing
even confirming that the API key is good for the specific model agent that you are using. because I do
want to just stick with the default one for now. I am just going to choose option two which is balanced. And once we do that, we can see that okay, we
get a little bit of feedback here just saying if you want to change things or whitelist things, you can do this. Our next step following that is to
actually run one of these sandboxes but with a chosen specific agent. And we'll get into that now. So
our next step before we actually start a sandbox is essentially to define what specific agent we're going to be using with that sandbox. Now, if we
click on agents right here linked in the documentation, we're going to see that there are a bunch of different configurations depending on what
specific coding agent we're going to want to run in the sandbox. So, if I was to use Claude code within the
agent sandbox, basically allowing Claude from within the sandbox to run with dangerously skip permissions enabled, or as they referred to it back in
the introductory page, YOLO mode, we would select the Claude code sandbox. However, for today's video, I am going to be using an open AI model. And
because of that, I will be selecting the codeex option for my sandbox agent. And if we click on that,
we just get a bit of pertinent information in how to specifically set up this agent, which I will be following now as that is the one I would like to
use. To begin, we would just type SBX run codeex. And then you could point it to a specific directory as well. But prior to that, I need to actually
get my API key here hooked up so that the model actually works and has access through the open AI
API. They do have a couple of different options for doing that here. You can just use this command to set your credential right here from within the
terminal. Or you can do the export open AI key equals in the shell session that you have open. I am probably just going to opt to do this. And because
these are like secret things, I am going to just like kind of blur this out or fast forward this
specific section. So I'm now just entering in my OpenAI API key. And when we paste it there, it doesn't actually show it in the terminal. So I can
leave this unblurred, which is always nice. And now our API or secret key has been saved. So we can then proceed with actually starting our docker
sandbox. Now I would like to just start this from within a specific directory that I've just created for
it. So that's why I've done that. And then instead of defining any specific directory, we can just do sbx run codeex. And we will now get our first
look at the actual process of the sandbox starting up and things like this which will culminate in an initial download and things just to pull the
container image. So now it's just asking us if we trust the contents of our directory, which in this case
we do. And we can see right here now that this shows us our information about our agent, which is OpenAI's GPT 5.5 in YOLO mode. And as they say right
here in the documentation, it is basically running in full authentication mode. Um I seem to forget where specifically that was mentioned. There we
go. The sandbox runs codecs without approval prompts by default. So that is good because it allows us
to do things at a rapid pace but in a safe manner. So to begin, I just want to say something to it just to ensure that everything is working well in
terms of the connectivity. So something interesting happened here when we sent our first hello message where basically we got this warning that kind
of looks scary where it said falling back from yeah falling back from websockets to HTTPS transport
stream disconnected before completion attack attempt detected. So something kind of triggered something somewhere to say, "Hey, this seems a little
suspicious like I'm not comfortable with this." And this is basically I want to highlight one of the cool things about this. So if we type SPX uh what
is it? SPX run logs or something. I have it in the command history, but SPX policy log, we're
actually going to see all of these specific network attempts and their status from this specific container being that this is the only one currently
running on our system. So these are all of our allowed requests and right now we don't have any denied requests. So this just shows us that this
wasn't something that was triggered at least by a access rule on the container because one of the cool
things about the containers is that we can basically audit a lot of what goes on with the specific agent. So it gives us a higher level overview to
see exactly what is happening. Next up I just want to showcase a real simple demonstration of the isolation of the file system that our agent is able
to access. So from here in our agent or our sandbox, I'm just going to say run um I won't give it the
specific command. I'll say run a command to list all files hidden included and share the results, save them as a text file to this directory. Now it's
going to do that. And what we're going to see here is if I do the same command that it should likely run on the actual native host system, we see that
a bunch of things show up, including hidden files, which are the ones that just have the dot
prepended to their name. So instead though, all we see right here in our sandbox is that the only thing it sees is the things contained are the things
contained in this specific folder that it has access to, which is just a native Mac OS thing that shows up there, as well as a little plain HTML file.
That was something I copied over just to have something in there that was inevitably created
during some form of model testing. So basically this is a simple sidebyside difference showcasing what our actual agent would have access to as if
this were running natively on the host system. This would have been the same exact thing that showed up when our agent ran this command. Meaning it
would have access to all these files and things of the sort versus what it has access to from within the
sandbox which is just contained to the folder that we kind of mounted it in and ran it from. So, I want to showcase some of the network access
policies in play. And to do this, we're going to run a very simple script that on the native host system will just ping an IP address for a random
website. Now, this in and of itself is not malicious, at least the one that we're going to be running right
here. However, this is something that could be essentially this runs some form of code and then uses that code to have our system reach out to a
specific IP address, which you can see why that could potentially lead to a bad thing. Now, this is running in this left terminal right here, just on
our native host system. So, as we see when we run this, we're going to see it pinging this IP address,
and we get a response from said IP. We can see this is actually just for the hacker news website. So, this is something you may be familiar with. It's
not malicious, but it's just a good way to demonstrate. Now, to showcase what happens when we do this in the Docker sandbox instead, I have copy
pasted the script and it is executable in the directory that the sandbox is allowed to access. So, I'm
just going to ask it to run that specific script now and we'll see what happens and what's different. So now when it goes to try to paste this, we're
going to see the behavior of what would happen when our sandboxed agent actually tries to run this. Okay, very interesting. So we see that ping
command is not found. So it is not available in the shell environment. That's interesting. So that's
actually another layer of security that I didn't actually have awareness of right now where it's actually blocked because it does not have the tool
accessible to it that would be used to actually even just send that ping to this IP address. So let's have our agent actually install ping. Okay, ping
is now installed from within our sandbox. So I'm just going to have it run this command one more time
and we'll see what happens. And as we can see, it did successfully execute as a command. But the target did not respond. So to do a little more
investigation, let's now go back to the network logs for our container and see if they were properly blocked or this ping or this malicious script was
actually blocked from executing just based off of our network rules. And we can see right here that there
is now a candidate in the blocked section, which is the specific script that was trying to get us to ping this IP. So, as we saw on the native host
system, it went it worked no problem at all. As we saw in the sandbox, first and foremost, we didn't actually have the tool available to perform the
ping. But once we had our agent rectify that, and it did because the allow list we had was for common
packages and things like this, but it was blocked just by our network rules and our sandbox. So say hypothetically your agent was running on your host
system and for some reason it encountered a malicious script that wanted it to go out and contact a malicious website maybe with a bad thing that was
contained on that website. This would have been blocked by our sandbox here. But on the native host
system it would have just freely been able to run and perhaps would have had the capacity to do bad things. Now, this is of course a very simple
thing, but it's just cool to showcase the difference in running these agents in something designed to safeguard the host system and the user's data
and things like that versus just how they would behave running on a native system. It's also interesting to
see here we have the last time and date that it was seen. But we also have the count. And in that script, as we saw on our native host system, if I
just run it once more right here, it is designed to ping that IP address three times, which is exactly what we see in the account here for the blocked
list. So, this is just a really cool one. I wanted to give like a realistic demonstration of the side
byside difference of actually running something that could potentially be a security issue on a host system versus in the sandbox. They also do kind
of highlight things like this right here in the page for Docker sandboxes where if you scroll down you see the side by side, but I always find it a
little more fun to actually whip up something to test in real time. Now, if you are familiar with my
channel, you know that I am a big fan of AIS that run locally. So, obviously, I want to touch upon actually using these sandboxes here with a local AI
running on the system. Now, this is something that is a bit difficult to configure. I guess I shouldn't say difficult, but it requires a bit more work
than simply just doing like spx run codeex or something like that. So, right now what we see is
actually the same exact codeex container right here in this side or the left side of the screen. However, this is using a local miniax model running
on my own system. So, the model that is actually powering this right here is running entirely on my local computer and it is not using any API costs
or anything of the sort. We can see right here that just when trying to ping the IP address that we
had run with that malicious script earlier with the codeex GPT55 container, it is still getting blocked. And I want to quickly just showcase this. I
typed the wrong thing. I switch between oss too much that like the keyboard commands get so confusing. But um I digress. So let's now take a peek. And
we can see right here in our network policy logs, we see that we have our new codeex LM Studio
sandbox which was trying to access this IP address. It will be the most recent one. So 6:12 a.m. And we can see right here it has been blocked and
that was also reported just by our sandbox. So the sandbox is still working and it is blocking things still. However, it is using an entirely local
model. So now I want to just quickly run through how one would actually go about setting this up in
specific using LM Studio that has a specific model running on it. So let's actually go through setting up the codec sandbox but having it point to a
local model running on LM Studio on our local system. The first thing we're going to do is just create a new directory to build the sandbox within. So
I'm going to make directory um let's call it like codeex dash LMS or something. And then we're going
to at the same time change into that directory. So we'll just type codeex dash if I can remember what I had called it lms. And now we're in that new
directory. So the next thing we're going to need to do is actually set a firewall rule for these sandboxes to allow outbound connections to where our
LM Studio server is running. Being that LM Studio is running natively on the host system. And we can
see that we have some pertinent information about the local server that LM Studio is serving. It is running on 127.0.0.1 at port 1234. However, we're
going to need to actually allow the sandbox to be able to access that specific port that is running on our local system. So to allow access to LM
Studio, we are going to make that rule to allow it in our sandbox. So we're going to type spbx policy
allow network and then g for global. And then we're going to type localhost colon one 123 4. And then we'll press enter. And this will allow that
policy for our sandbox to actually be able to reach this specific LM Studio server running natively on our host system. Next up, we're going to create
the sandbox. So we're going to do sbx create and then name. And we'll name it what the same name as
like the directory it's in right here. So, codeex dash lms and then we're going to type codeex following that because we are going to be using the
codeex container image just with our local model and then a period. So, when we press enter right there, it will now go ahead and create the sandbox
that we're going to be using with our local model. Now, before we actually go in and run in this, we're
going to need to run one additional thing, which is going to be kind of a command with a few different lines. So for that I will just copy paste this
from my notepad into the terminal right here and we'll be able to understand a bit better what specifically is going on when we see it. So the command
we're going to run now is this right here. And essentially what this means is just open this new
sandbox with codecs using LM Studio from my local machine with this specific model ID. So, if we paste this in right here, we're now going to have our
sandbox spin up, and it will actually show us the same interface, at least on the top of it right here, where we saw, okay, codeex. Okay. And it's
asking us, do you trust the contents of this directory, which it did, but instead of showing GPT55,
it's going to show us the specific local model that we're using that is being served through LM Studio right here. And at the same time, you may have
noticed that some of the developer logs actually changed on the bottom of the LM Studio server just because even that command actually just triggered
our local model to do something. So the next thing we're going to do, well, there are two things.
One is obviously not everyone is going to want to run this with this specific model. So let's say that we were running this with an entirely different
model. The only thing we would need to do is in LM Studio on the right hand side here when we do have the specific developer tools console open. We
can see that we have an API model identifier. Say you were using Gemma 431B or something like that.
You would just copy the model ID for the model you're using right here just by selecting that. And then you would run that instead or place that
instead in this command right here in place of what is listed for this specific example. Now I will put the specific codes or not codes the specific
syntax that led us to this custom configuration just in a GitHub gist or something like that so anyone can
replicate this and see it because it's cool to be able to do this with a local model as well. But really our next step is just to actually test this
and say like hey what's up? Now obviously you wouldn't normally like be talking to your agent like this but it's just cool to get an initial test. And
again, this is a fairly heavy model that I'm using right now. This is running on a Mac Studio M3
Ultra 256 gig unified system. And this is a six-bit quantization of Miniax M2.7. So, okay, it's just saying, "Hey, what's up?" Like, etc. blah blah.
Let's do some of the tests that we had done previously, such as like uh list all of the contents in this directory or just I'll instruct it
specifically. So, it's going to run that command to showcase all of the files and hidden files and then write
them to a text file. Now, obviously, a model like this, especially a lot of local models, may not be as potent as something like GPT55 or Claude Opus
4.7 Gemini 31 Pro. However, this is definitely a very cool thing to have, especially for a local model, because the argument could be made that
they're more likely to be the ones that are going to run like rmrf or oh yeah, here are the environment
variables that I found on the system. Let me now send them to this place. So, it's important to be able to have a safe place or safe space for the
local model to actually run in a sandbox. Okay. And we can see right here, done. Results have been saved to this specific place. The directory is
pretty sparse, just the output file itself and a couple of hidden entries. And we can see right here that I
mean I'll go find it and we'll look at it. So here's the specific text file that it wrote. And we can see okay, we have basically nothing except this
new text file. And that is correct because it was just created in this brand new directory for this sandbox with the LM Studio model. Now, that really
brings us to kind of a jumping off point, at least in terms of an introductory look at Docker
sandboxes and also the ability to actually hook it into a local model, which is very important just for safety and folks who are interested in local
AI, as I know a lot of viewers are on this specific channel, myself included. So, obviously, there's a bunch of other additional things as we can see
here in the documentation for Docker sandboxes. There's a bunch of customization option and things of
the sort. They have specific examples. They have a bunch of other stuff here that can allow you to bring this further. Obviously, depending on a
specific use case, you will want to configure this for a certain task. So, say I do want this agent actually putting things on GitHub for me, we would
be able to give it the credentials for that and allow that to have access to those specific tasks. But
the whole thing here was just kind of an introductory first look and getting our toes wet and seeing what exactly this enables at the same time seeing
what it blocks. So, we noticed that it was able to block the malicious spash script that was trying to ping a specific IP address. Additionally to
that, it can only see the directory that it's mounted in. But almost something more important or
something that I personally find a lot more exciting is just seeing this actually work with the default sandbox with a little bit of modification to
allow it to use a local model running being served through a pretty simple program like LM Studio. I think that provides a huge amount of value
because folks who want to experiment with agents who may not be able to go and buy a brand new Mac Mini or
whatever the hot topic is of the current week for what's best to run AI can still play with things without putting their whole system at large at
risk. And that is really the most important takeaway from what sandboxes enables is kind of security and safety and things like that. So there's a
bunch of other documentation here as I may had mentioned. However, also I find that like if you scroll all
the way down to this Docker sandboxes page that we started the video taking a peek at, there is a common Q&A and really this does like give us a lot
of insight into what specifically the purpose of this is and the purpose is definitely important as we see as time goes on there does continue to be a
lot of discussion on AI enabled cyber security whether that be on either side of the black hat or
white hat coin. So really that is going to conclude today's video on Docker sandboxes. If you have any questions, please feel free to leave them in
the comments. Again, thank you to Docker for sponsoring the channel and allowing me to do this video. And that's going to wrap it up. So thank you for
watching.