Run a Scikit Learn Job
Here's the code for this specimen on Github
A job is an execution of code with a fixed start and end unlike a serving which is an endless process. With that clear, we will try to run a simple job using
sklearn. This is taken from the sklearn documentation and we extend it to connect to Relics which is used to store all the artifacts generated by the job.
Let's start by creating a file
main.py and add code from here:
from nbox import operator def bench_k_means(kmeans, name, data, labels): """Benchmark to evaluate the KMeans initialization methods. Parameters ---------- kmeans : KMeans instance A :class:`~sklearn.cluster.KMeans` instance with the initialization already set. name : str Name given to the strategy. It will be used to show the results in a table. data : ndarray of shape (n_samples, n_features) The data to cluster. labels : ndarray of shape (n_samples,) The labels used to compute the clustering metrics which requires some supervision. """ ... @operator() def benchmark(n_init = 5, save_to_relic: bool = False): """Benchmark to evaluate the KMeans initialization methods. Parameters ---------- n_init : int, default=5 Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia. save_to_relic : bool, default=False Whether to save the benchmark results to Relic. """ ...
Note that we want to run
benchmark function as job, it can take in a couple of arguments like
n_init which determines the number of times the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.
It also takes in another argument
save_to_relic which if passed as
True will save the matplotlib plot to Relics. We will see how to use Relics in the next section.
To deploy this job all we need to run is this single command on shell terminal or you can use the "Compute Fabric" strategy as well.
nbx jobs upload \ main:benchmark \ --id "<my_id>" \ --trigger
We have also ensure that you can pass arguments to the running function through CLI as well. So if you want to save the resulting image in the Relics, you can simply append
--save_to_relic True to the above command.
On the dashboard you will see logs that look like this:
[2023-01-24T07:10:15+0000] [INFO] [auth.py:136] Current workspace id: None (None) [2023-01-24T07:10:16+0000] [INFO] [dist.py:49] Workspace Id: wnja9glc [2023-01-24T07:10:16+0000] [INFO] [dist.py:61] Tag: [2023-01-24T07:10:16+0000] [INFO] [dist.py:87] Looking for init.pkl at hxo6nga7/args_kwargs # digits: 10; # samples: 1797; # features 64 __________________________________________________________________________________ init time inertia homo compl v-meas ARI AMI silhouette k-means++ 0.637s 69662 0.680 0.719 0.699 0.570 0.695 0.175 random 0.145s 69707 0.675 0.716 0.694 0.560 0.691 0.167 PCA-based 0.107s 72686 0.636 0.658 0.647 0.521 0.643 0.147 [2023-01-24T07:10:20+0000] [INFO] [dist.py:104] Saving output to hxo6nga7/return [2023-01-24T07:10:20+0000] [INFO] [dist.py:113] Job hxo6nga7 completed with status 7
In this demo we saw how you can run a job on the NimbleBox platform.