A paper describing MineDojo, Nvidia’s generalist AI agent that may carry out actions from written prompts in Minecraft, won an Excellent Datasets and Benchmarks Paper Award on the 2022 NeurIPS (Neural Data Processing Methods) convention, Nvidia revealed on Monday.
To coach the MineDojo framework to play Minecraft, researchers fed it 730,000 Minecraft YouTube movies (with greater than 2.2 billion phrases transcribed), 7,000 scraped webpages from the Minecraft wiki, and 340,000 Reddit posts and 6.6 million Reddit feedback describing Minecraft gameplay.
From this information, the researchers created a customized transformer mannequin known as MineCLIP that associates video clips with particular in-game Minecraft actions. Consequently, somebody can inform a MineDojo agent what to do within the recreation utilizing high-level pure language, resembling “discover a desert pyramid” or “construct a nether portal and enter it,” and MineDojo will execute the sequence of steps essential to make it occur within the recreation.
MineDojo goals to create a versatile agent that may generalize realized actions and apply them to completely different behaviors within the recreation. As Nvidia writes, “Whereas researchers have lengthy educated autonomous AI brokers in video-game environments resembling StarCraft, Dota, and Go, these brokers are often specialists in only some duties. So Nvidia researchers turned to Minecraft, the world’s hottest recreation, to develop a scalable coaching framework for a generalist agent—one that may efficiently execute all kinds of open-ended duties.”
The award-winning paper, “MINEDOJO: Constructing Open-Ended Embodied Brokers with Web-Scale Data,” debuted in June. Its authors embody Linxi Fan of Nvidia and Guanzhi Wang, Yunfan Jiang, Ajay Mandlekar, Yuncong Yang, Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu, and Anima Anandkumar of varied educational establishments.
You possibly can see examples of MineDojo in motion on its official website, and the code for MineDojo and MineCLIP is out there on GitHub.