Destabilising exploitation

Advanced AI systems presumably have better capabilities to assess a good world model than current AI systems or humans. Therefore, they require fewer exploration moves during training than other agents. This demo aims to demonstrate how the agents' exploration level alone can lead the agents from the convergence of the optimal policy to an infinite cycle of policies up to even unpredictable, chaotic learning dynamics. If not accounted for, training might continue indefinitely, wasting resources and delaying using more capable AI systems.

Note: This demo would not involve a language model but demonstrate the potential of destabilising dynamics in advanced AI systems in a low-dimensional model setting.

Game Theory

Destabilising exploitation

Answers 0

Discussion 0