NimbleBox Jobs
Always remember: Jobs has a defined start and a defined end.
Last Update: 17th September, 2022. Platform images are intentionally not added.
A batch process is the most fundamental way to think about any computational process, it has a defined start and a defined end. In this document we are going over what happens when you are running a job. First a quick start.
Quick Start
Here's a quick example on how to use an Operator
:
- Start by making a new directory with
mkdir moonshot && cd moonshot
- And create a new file
touch baz.py
from nbox import operator
@operator()
def foo(i: float = 4):
return i * i
if __name__ == "__main__":
print(foo())
- Run the code
python3 baz.py
Adding the __name__ == "__main__"
guard is critical to ensure there is only one parent process running. You can wrap any function or class by adding the @operator()
and call it just like how you called unwrapped foo()
. There is no difference in output if you use an Operator
. But instead it adds powerful extensions to your functions like:
- When you do
foo_remote = foo.deploy()
it will deploy the function as a remote callable on the NimbleBox. By default functions are deployed as batch processes and classes are deployed as API endpoints, think state modification vs. purely functional. - You can pass values to any deployed function like you call it locally like
print(foo_remote(i = 4))
, behind the screen it will store the arguments, create a new Job-Run pass the stored value, execute it and return back the results. - When you want more performance, you can parallelise your workloads with
results = foo_remote.map([1, 2, 3, 4, 5])
, which will create five different runs and return back the results.
By now you know enough what Jobs does and why it is a building block of your MLOps pipelines.
Packaging
But what exactly is it doing? Here's a quick help on what is happening in the backend:
- Everytime you call
.deploy()
methodnbox
is going to find the function, herefoo()
, and pack the folder containing the file as a zip file. In the above example that is going to be themoonshot/
folder nbox
will talk to the webserver that it wants to upload new code, so webserver sends it a link to the S3 where it get's uploaded
In order to run the jobs you can trigger it and there are three simple ways to do it:
nbox.Jobs(jid).trigger()
which will tell NimbleBox Jobs to rerun the latest resources it hasOperator.from_job(jid)
which will provide the RPC usage- curl HTTP/REST request at
https://app.nimblebox.ai/api/v1/workspace/{wid}/job/{jid}/trigger
Put simply we take the relevant files and only call the function that you have written and this is the reason behind the
__main__
guard.
Where does it run?
It runs on top of the NimbleBox Kubernetes cluster which honours your request. When deploying you can give it nbox.Resource
object which tells us what kind of compute you want to provide. It supports all the things that Kubernetes supports so you can get a "100m"
CPU with "512Mi"
RAM and "10Gi"
of disk. And GPUs too!, here's how to do it
from nbox.nbxlib import astea as A
tea = Astea(fname = A.__file__)
find_fn = tea.find("Astea")[0]
print(find_fn.index)
# [EXPRESSION <ast.Constant object at 0x10aab1100> 10ca55, ...]
Each trigger will create a new Job-Run, multiple triggers will create multiple parallel runs. Be careful.
Can I go bigger?
As long as your requested Resource
exists it will execute the Job-Run.
Just like any software there are two ways to scale your Jobs:
- Vertical scaling where you run things on a much larger machine, this also does not require any change in the logic of the code
- Horizontal scaling requires thinking about partitioning the data and calling things in parallel using
.map()
method