Decision Transformer not learning properly
Hi,
I would be grateful if I could get some help on getting a decision transformer to work for offline learning.
I am trying to model the multiperiod blending problem, for which I have created a custom environment. I have a dataset of 60k state/action pairs which I obtained from a linear solver. I am trying to train the DT on the data but training is extremely slow and the loss decreases only very slightly.
I don't think my environment is particularly hard, and I have obtained some good results with PPO on a simple environment.
For more context, here is my repo: https://github.com/adamelyoumi/BlendingRL; I am using a modified version of experiment.py in the DT repository.
Thank you