MIT CSAIL’s AI Automatically Rewrite Outdated Wikipedia Articles

Navin Bondade 1 Comment Artificial Intelligence, Data Science, Deep Learning, Machine Learning

The online hub of information Wikipedia is huge, the site receives an average of 600 new articles daily. And only the English-language version of Wikipedia has over 6 million articles and if we take the combined versions for all other languages it exceeds 28 billion words over 52 million articles in 309 languages.

But the other part of this information pool is that because of having so many articles the website maintenance team has to do lots of manual hard work in order to keep those articles updated with the latest information.

In an effort to create a more accurate internet, researchers at MIT have developed an advanced AI system capable of automatically identifying & correcting inaccurate, outdated information in Wikipedia articles.

The artificial intelligence system built by MIT researchers can automatically rewrite outdated sentences in Wikipedia articles while maintaining a human tone. It won’t look out of line in a carefully crafted paragraph.

The text-generating AI pinpoints and identify errors and update the articles as needed, using the latest information from around the web to produce revised sentences. The automated text-generating system is able to save time spent by real-life editors by replacing information in sentences with accuracy.

The team at MIT says their system will be able to do these editors’ jobs in a faster, more efficient manner, all while maintaining language largely similar to how a human would write or edit.

The machine learning-based system is trained to recognize the differences between a Wikipedia article sentence and a claim sentence with updated facts. The researchers used two different databases containing structured sentences and relevant Wikipedia sentences, to train the AI system.

According to MIT, the researchers have trained the artificial intelligence-based system on a data set having pairs of sentences, in which one sentence is a claim and the other one is a relevant Wikipedia sentence.

Each pair is labeled in one of 3 ways: “agree,” i.e sentences contain matching factual information; “disagree,” i.e it contains contradictory information; or “neutral,” i.e not enough information for either label.

If the AI system finds any contradictions between the two sentences, it uses a “neutrality masker” to pinpoint both the contradictory words that need deleting and the ones it absolutely has to keep.

After that, an encoder-decoder framework determines how to rewrite the Wikipedia sentence using simplified representations of both that sentence and the new claim.

The two-encoder-decoder framework generates the final output sentence post-masking, such that the model learns compressed representations of the claim and the outdated sentence.

The two encoder-decoders, working in conjunction, then fuse dissimilar words from the claim by sliding them into the spots left vacant by the deleted words.

“There are so many updates constantly needed to Wikipedia articles. It would be beneficial to automatically modify exact portions of the articles, with little to no human intervention,” lead study author Darsh Shah, a Ph.D. student in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), said in a statement.

“Instead of hundreds of people working on modifying each Wikipedia article, then you’ll only need a few because the model is doing it automatically,” he said. “That offers dramatic improvements to inefficiency.”

The system isn’t limited to Wikipedia either, researchers also used it to reduce bias in a popular fact-checking database. In fact, the study states that the system is useful in identifying fake news. More specifically, it can help train other AI systems designed to find and eliminate fake news by reducing bias.

In a test, the team used the deletion and fusion techniques from the Wikipedia task to balance pairs in a data set and help mitigate bias. For some pairs, a modified sentence’s false information was used to regenerate fake evidence supporting a sentence.

Some of the key phrases then existed in both the agree and disagree sentences, which forced the models to analyze more features.

The researchers report that their augmented data set reduced the error rate of a popular fake news detector by 13%. They also say that in the Wikipedia experiment, the system was more accurate in making factual updates and its output more closely resembled human writing.