Q-learning Blackjack
Here's the code for this specimen on Github
NimbleBox Projects provide a simple way to manage all the complexities of training and deploying your models. In this code we will see how to use NimbleBox Projects to train a Q-learning agent to play Blackjack. Here's all the parts we are going to use:
The code for this blog is taken from Farama-Foundation's gymnasium
Key Concepts
Here's some things about this project you should know
🍇 Step 1: Create a new Project
Go to NimbleBox dashboard and click on "New Project" button to create your project. Once complete, open it and copy the project ID.
🦾 Step 2: Upload the code
You can get the full code from here. A quick overview of the code:
BlackJackAgent
: this class is the single learning agent that contains theq_values
tablecreate_grid
andcreate_plots
: are helper functions to create the plotsmain
: contains the code to create an environment and train the agent
Here's all the code we have added:
from nbox import Project
def main()
...
# initialise the project, no need to pass ID on NBX-Job,
p = Project()
tracker = p.get_exp_tracker()
...
# log any metrics you want to track
if ep and (ep+1) % log_every == 0:
tracker.log({
'episode': ep+1,
'reward': float(np.array(env.return_queue).flatten()[-log_every:].mean()),
'length': float(np.array(env.length_queue).flatten()[-log_every:].mean()),
'td_error': float(np.mean(agent.training_error[-log_every:])),
'epsilon': agent.epsilon,
'lr': agent.lr,
'discount_factor': agent.discount_factor,
})
...
# save any plots you want to track, it will be available on Artifacts page of Project
tracker.save_file('./usable_ace.png')
🥨 Step 3: Train the agent
To train the agent we will run this with different parameters all we need to do is run the following commands:
# say we want to try out with 4 different values of lr, n_steps, so we can run 4 CLIs
# you can also pass hardware requirements.
nbx projects --id '_pid_' - run trainer:train_model --start_epsilon 1.0 --learning_rate 0.1
nbx projects --id '_pid_' - run trainer:train_model --start_epsilon 1.0 --learning_rate 0.01
nbx projects --id '_pid_' - run trainer:train_model --start_epsilon 0.9 --learning_rate 0.005
nbx projects --id '_pid_' - run trainer:train_model --final_epsilon 0.01 --learning_rate 0.001
Once the training is done this is how the plots will look like:
🃏 Bottomline
You can see how easy it is to build train and store your agents on NimbleBox.