Best practices for build
1. Save money by using inexpensive hardware to upload your models, code, and debug.
You may choose a CPU with fewer cores to upload your models, code, and debug. Uploading data, coding, and debugging takes time but does not depend on the hardware. Therefore, choosing a lower-specification CPU will help you optimize your costs and achieve your goals. However, it is necessary to modify your hardware configurations for training the models. Learn more about choosing appropriate hardware configuration here
2. Set RAM hardware and auto-shutdown timer just above the required limits to avoid interruption.
To schedule your instance shutdown at a given time, say when you estimate your model to run for 22 hours, you can set an auto-shutdown timer to 24 hours. This will save you from incurring unnecessary costs if you forget to turn off your projects and also prevents a model from being shut down in the middle of training.
3. Choose storage that meets your current requirements and increase it when needed.
Storage space once added to your instance, cannot be decreased. Storage also comes with associated costs. However, choosing to store only for the files needed and increasing that storage in the future as your needs grow will help you avoid issues with space and save money. Be sure to include the library storage requirements when planning your storage needs.
4. Select the pre-emptive option when training short-term models to minimize costs.
Pre-emptive instances are effective for less-critical models where it is okay to restart the instance and start working on the projects. For example, if your model requires short-term training or when you are coding, preemptive might be a good option as it reduces costs by four times compared to dedicated instances.
5. Back up your data on a regular basis to prevent accidental data deletion.
Regularly backing up your data will help you recover your data and avoid losing your work in case of accidental data deletion.
6. For large datasets, upload the dataset in batches.
- Make use of curl or wget to download data from an online source.
- Use SCP to transfer files through SSH.
- Upload the zipped file, and then unzip the file using NimbleBox.
7. Avoid retraining of models by saving your models periodically.
Training models can take considerable time. Saving your model periodically can help you reload the model and retrain it later if there is any interruption.
8. Use tmux to prevent model-training crashes in the terminal.
Tmux might be helpful when training models on SSH, to avoid kernel disconnection during the training process. This article will help you to get started with Tmux.
9. Clear cache after installation of any custom library.
Installing custom libraries can occupy the storage of your instance along with the cache. Use the following command on the terminal to remove this cache and free up some space:
10. Version your models to avoid rewriting the code.
Templates enable you to create multiple versions of a model, build upon the existing version without rewriting code, and manage environments more efficiently. It is always best practice to version your models and share them with your team for collaboration.