nimblebox logo

NimbleBox Jobs

Always remember: Jobs has a defined start and a defined end.

Last Update: 17th September, 2022. Platform images are intentionally not added.

A batch process is the most fundamental way to think about any computational process, it has a defined start and a defined end. In this document we are going over what happens when you are running a job. First a quick start.

Quick Start

Here's a quick example on how to use an Operator:

  • Start by making a new directory with mkdir moonshot && cd moonshot
  • And create a new file touch
  • Run the code python3

Adding the __name__ == "__main__" guard is critical to ensure there is only one parent process running. You can wrap any function or class by adding the @operator() and call it just like how you called unwrapped foo(). There is no difference in output if you use an Operator. But instead it adds powerful extensions to your functions like:

  1. When you do foo_remote = foo.deploy() it will deploy the function as a remote callable on the NimbleBox. By default functions are deployed as batch processes and classes are deployed as API endpoints, think state modification vs. purely functional.
  2. You can pass values to any deployed function like you call it locally like print(foo_remote(i = 4)), behind the screen it will store the arguments, create a new Job-Run pass the stored value, execute it and return back the results.
  3. When you want more performance, you can parallelise your workloads with results =[1, 2, 3, 4, 5]), which will create five different runs and return back the results.

By now you know enough what Jobs does and why it is a building block of your MLOps pipelines.


But what exactly is it doing? Here's a quick help on what is happening in the backend:

  1. Everytime you call .deploy() method nbox is going to find the function, here foo(), and pack the folder containing the file as a zip file. In the above example that is going to be the moonshot/ folder
  2. nbox will talk to the webserver that it wants to upload new code, so webserver sends it a link to the S3 where it get's uploaded

In order to run the jobs you can trigger it and there are three simple ways to do it:

  • nbox.Jobs(jid).trigger() which will tell NimbleBox Jobs to rerun the latest resources it has
  • Operator.from_job(jid) which will provide the RPC usage
  • curl HTTP/REST request at{wid}/job/{jid}/trigger

Put simply we take the relevant files and only call the function that you have written and this is the reason behind the __main__ guard.

Where does it run?

It runs on top of the NimbleBox Kubernetes cluster which honours your request. When deploying you can give it nbox.Resource object which tells us what kind of compute you want to provide. It supports all the things that Kubernetes supports so you can get a "100m" CPU with "512Mi" RAM and "10Gi" of disk. And GPUs too!, here's how to do it


Each trigger will create a new Job-Run, multiple triggers will create multiple parallel runs. Be careful.

Can I go bigger?

As long as your requested Resource exists it will execute the Job-Run.

Just like any software there are two ways to scale your Jobs:

  • Vertical scaling where you run things on a much larger machine, this also does not require any change in the logic of the code
  • Horizontal scaling requires thinking about partitioning the data and calling things in parallel using .map() method
nbox SDKnbox provides built in access to all the APIs and packages them in the most user friendly manner.Star 0