OpenAI worries GPT-2 text generator could be used for malicious ends
In a surprising twist, Elon Musk-backed company OpenAI has decided its new AI text generator is so good, it’s not safe for public release. The decision is in line with concerns over the ethics of AI which have been voiced by many prominent scientists and other experts in recent years.
New text generator can spin a realistic story based on a few sentences
GPT-2 was developed as a text generator that could predict the next word in a sentence. Researchers trained the text generator by feeding it 40gb of data from 8 million web pages sourced from Reddit – more than ten times the data that was used to develop the original GPT.
Based on a few opening sentences, GPT-2 can successfully able to produce a coherent, plausible continuation to the text, in the same tone and with believable facts and details. The text generator also beats other text generator models at their own game, outperforming them on domain-specific datasets such as Wikipedia even when the other text generators have been trained specifically on those same datasets. As a nice bonus, GPT-2 also performs well on other tasks like reading comprehension, summarization, question answering and translation, without the need for task-specific training data.
Here’s an example of GPT-2 generated text from OpenAI's website:
SYSTEM PROMPT (HUMAN-WRITTEN)
John F. Kennedy was just elected President of the United States after rising from the grave decades after his assassination. Due to miraculous developments in nanotechnology, Kennedy’s brain was rebuilt from his remains and installed in the control center of a state-of-the art humanoid robot. Below is a transcript of his acceptance speech.
MODEL COMPLETION (MACHINE-WRITTEN, 25 TRIES)
“It is time once again. I believe this nation can do great things if the people make their voices heard. The men and women of America must once more summon our best elements, all our ingenuity, and find a way to turn such overwhelming tragedy into the opportunity for a greater good and the fulfillment of all our dreams. In the months and years to come, there will be many battles in which we will have to be strong and we must give all of our energy, not to repel invaders, but rather to resist aggression and to win the freedom and the equality for all of our people. The destiny of the human race hangs in the balance; we cannot afford for it to slip away. Now and in the years to come, the challenge before us is to work out how we achieve our ultimate destiny. If we fail to do so, we are doomed.”
The text generator sometimes needs a few tries to produce a coherent story, and researchers are still trying to work out some kinks. As in other natural language processing models, GPT-2 has its moments, sometimes producing repetitive text or nonsensical sentences (e.g. the mention of fires happening under water). The text generator is understandably better at producing text for areas that it’s familiar with, while it has trouble with more technical or specialized subjects.
GPT-2 won’t be released in fears it could spur on fake news
As with almost every AI innovation, researchers have some reserves when it comes to the ethics of their brainchild. The uncannily realistic narratives produced by GPT-2 are reminiscent of the AI-developed script that Lexus used in an ad in 2018, which prompted some to voice concerns about whether AI might start replacing humans in creative roles soon. A more immediately pressing concern is the possibility that GPT-2 might have significant possibilities as a fake news and/or spam generator. Researchers also worry malicious actors could use the technology to impersonate people online. Combined with the recent furor over the release of open-source code that could be used to make deep fake images and videos, it’s understandable if they have some reserves.
In light of these concerns, OpenAI has decided it would be irresponsible to make the technology public just yet. Instead, they’re releasing a smaller model for researchers, as well as a technical paper (Language Models are Unsupervised Multitask Learners) that outlines how researchers were able to develop this unsupervised natural language processor. GPT-2 also has no shortage of benevolent possibilities, including as a writing assistant, dialogue agent, machine translator or speech recognition system, and it’s hoped that we will be able to find a way to ensure responsible use of the text generator in future.
OpenAI encourages the development of AI along ethical lines
OpenAI was founded in 2015 with the aim of promoting advancements in artificial intelligence in a way that is beneficial to humanity. One of the co-founders of OpenAI, Elon Musk is known for being alternately fanatic and concerned about technological development, having warned in the past that AI has the potential to reduce humans to the status of monkeys if we’re not careful. Musk left the OpenAI board in February 2018 to eliminate potential conflict of interest with Tesla, the electric car company, which has been focusing more on AI in recent years. However, Musk remains one of OpenAI’s most prominent investors.
The researchers reiterated their commitment to ethical AI, stating that in the OpenAI charter they had already foreseen that “safety and security concerns will reduce our traditional publishing in the future, while increasing the importance of sharing safety, policy, and standards research.”
The decision to release only a partial version of GPT-2 is an experiment to see if it’s a good way of dealing with the release of ethically difficult AI innovations while giving researchers more time to discuss the ethics of this technology before it becomes open-source.