We find that, just as a large transformer model trained on language can generate coherent text, the same exact model trained on pixel sequences can generate coherent image completions and samples.
We’re introducing Jukebox, a neural net that generates music, including rudimentary singing, as raw audio in a variety of genres and artist styles.
Solving Rubik’s Cube with a Robot Hand
We’ve trained a pair of neural networks to solve the Rubik’s Cube with a human-like robot hand.
Environments and tools for measuring progress towards reinforcement learning agents that respect safety constraints while training.
Emergent Tool Use from Multi-Agent Interaction
We’ve observed agents discovering progressively more complex tool use while playing a simple game of hide-and-seek.
We provide stipends and mentorship to individuals from underrepresented groups to study deep learning full-time for 3 months and open-source a project.
We’ve created, in collaboration with Google researchers, a new technique for visualizing what interactions between neurons can represent.
Better Language Models and Their Implications
We’ve trained a large-scale unsupervised language model which generates coherent paragraphs of text, achieves state-of-the-art performance on many language modeling benchmarks, and performs rudimentary reading comprehension, machine translation, question answering, and summarization.
We offer a compensated 6-month apprenticeship people who want to be an AI researcher, but do not have a formal background in the field.
We’ve trained a human-like robot hand to manipulate physical objects with unprecedented dexterity.
Improving Language Understanding with Unsupervised Learning
We’ve obtained state-of-the-art results on a suite of diverse language tasks with a scalable, task-agnostic system, which we’re also releasing.
We’re launching a transfer learning contest that measures a reinforcement learning algorithm’s ability to generalize from previous experience.
We’re releasing the full version of Gym Retro, a platform for reinforcement learning research on games.
Ingredients for Robotics Research
We’re releasing eight simulated robotics environments and a Baselines implementation of Hindsight Experience Replay, all developed for our research over the past year.
Reptile: A Scalable Meta-Learning Algorithm
We’ve developed a simple meta-learning algorithm called Reptile which works by repeatedly sampling a task, performing stochastic gradient descent on it, and updating the initial parameters towards the final parameters learned on that task.
OpenAI Baselines: ACKTR & A2C
We’re releasing two new OpenAI Baselines implementations: ACKTR and A2C. A2C is a synchronous, deterministic variant of Asynchronous Advantage Actor Critic (A3C) which we’ve found gives equal performance.
Proximal Policy Optimization
We’re releasing a new class of reinforcement learning algorithms, Proximal Policy Optimization (PPO), which perform comparably or better than state-of-the-art approaches while being much simpler to implement and tune.
Robots that Learn
We’ve created a robotics system, trained entirely in simulation and deployed on a physical robot, which can learn a new task after seeing it done once.
We are releasing Roboschool: open-source software for robot simulation, integrated with OpenAI Gym.
Unsupervised Sentiment Neuron
We’ve developed an unsupervised system which learns an excellent representation of sentiment, despite being trained only to predict the next character in the text of Amazon reviews.
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
We’ve discovered that evolution strategies (ES), an optimization technique that’s been known for decades, rivals the performance of standard reinforcement learning (RL) techniques on modern RL benchmarks, while overcoming many of RL’s inconveniences.
We’re releasing Universe, a software platform for measuring and training an AI’s general intelligence across the world’s supply of games, websites and other applications.