Dota 2 Reinforcement Learning
2.3 هزار بار بازدید -
7 سال پیش
-
The challenge of creating a
The challenge of creating a bot for Dota 2 is the vast amount of information available at every frame, and the continuous set of actions possible.
This video shows one model I am testing where I discretize the actions into 8 directions + 1 hold, and I use only as input/state data the (x,y) offset of the creeps relative to the hero. The objective is to block the creeps as much as possible. Every 5 episodes, I use a hardcoded bot to "bootstrap" the training in an effort to get the model to learn faster.
I trained this model using actor-critic where I have a value and policy network. Both networks takes as input the state data. The value network outputs a single value which estimates the score of a certain state, and the policy network outputs 9 values from which estimates the probability of each action for a certain state.
Link to Thread: http://dev.dota2.com/showthread.php?t...
Link to Code: https://github.com/BeyondGodlikeBot/C...
This video shows one model I am testing where I discretize the actions into 8 directions + 1 hold, and I use only as input/state data the (x,y) offset of the creeps relative to the hero. The objective is to block the creeps as much as possible. Every 5 episodes, I use a hardcoded bot to "bootstrap" the training in an effort to get the model to learn faster.
I trained this model using actor-critic where I have a value and policy network. Both networks takes as input the state data. The value network outputs a single value which estimates the score of a certain state, and the policy network outputs 9 values from which estimates the probability of each action for a certain state.
Link to Thread: http://dev.dota2.com/showthread.php?t...
Link to Code: https://github.com/BeyondGodlikeBot/C...
7 سال پیش
در تاریخ 1396/05/30 منتشر شده
است.
2,302
بـار بازدید شده