A system developed by MIT researchers could be used to automatically update factual inconsistencies in Wikipedia articles, reducing the time and effort of human editors who now do the job manually.

Wikipedia has millions of articles that need constant editing to reflect new information. This may include item extensions, major rewrites, or more routine changes such as updating numbers, dates, names, and locations. Currently, people around the world are volunteering to make these changes.

In an article presented at the AAAI conference on artificial intelligence, the researchers describe a text generation system that localizes and replaces certain information in relevant Wikipedia sentences while keeping language similar to how people write and edit.

The idea is that people enter an unstructured sentence with updated information into an interface without having to worry about style or grammar. The system would then search Wikipedia, find the corresponding page and the outdated sentence and rewrite it in a human-like way. In the future, the researchers said, there is potential to build a fully automated system that identifies and uses the latest information from the Internet to create rewritten sentences in corresponding Wikipedia articles that reflect updated information.

“There are so many updates to Wikipedia articles that are constantly required. It would be beneficial to automatically change exact parts of the articles without human intervention,” said Darsh Shah, Ph.D. Student in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and one of the main authors. “Instead of working with hundreds of people to change every Wikipedia article, you only need a few because the model helps or does it automatically. This offers dramatic efficiency improvements.”

There are many other bots that make Wikipedia automatic changes. As a rule, they work to mitigate vandalism or to copy narrowly defined information into predefined templates, says Shah. The researchers’ model, he says, solves a more difficult problem of artificial intelligence: in view of new, unstructured information, the model automatically modifies the sentence in a human-like manner. “The other (bot) tasks are more based on rules, while this is a task that involves thinking about contradictory parts in two sentences and generating a coherent piece of text,” he says.

The system can also be used for other text-generating applications, says co-lead author and CSAIL PhD student Tal Schuster. In their work, the researchers also used it to automatically synthesize sentences in a popular fact-checking data set to reduce distortion without manually collecting additional data. “This improves performance for models that automatically check facts that train on the data set to detect, for example, fake messages,” says Schuster.

Shah and Schuster worked on the paper with their academic advisor Regina Barzilay, the Delta Electronics professor of electrical engineering and computer science and a professor for CSAIL.

Neutrality masking and merging

The system hides some text-generating ingenuity in identifying conflicting information between two separate sentences and merging them. An “outdated” sentence from a Wikipedia article and a separate “claim” sentence are used as input, which contains the updated and contradictory information. The system must automatically delete and retain certain words in the obsolete sentence based on the information in the claim to update facts but maintain style and grammar. This is an easy task for humans, but a new one in machine learning.

Assume that this sentence needs to be updated (in bold): “Fund A considers 28 of its 42 minority holdings in operating companies to be particularly important for the group.” The claim set with updated information may read: “Fund A considers 23 out of 43 minority holdings to be significant.” The system would find the relevant Wikipedia text for “Fund A” based on the claim. Then the obsolete numbers (28 and 42) are automatically removed and replaced by the new numbers (23 and 43), whereby the sentence remains exactly the same and grammatically correct. (In their work, the researchers ran the system on a set of certain Wikipedia sentences, not all Wikipedia pages.)

The system was trained on a popular dataset that contains sentence pairs, with one sentence being a claim and the other a relevant Wikipedia sentence. Each pair is identified in three ways: “Agree”, which means the sentences contain matching factual information; “disagree”, which means that they contain conflicting information; or “neutral” if there is insufficient information for both labels. The system must ensure that all mismatched pairs match by adjusting the obsolete sentence to the claim. This requires the use of two separate models to achieve the desired output.

The first model is a fact-checking classifier that is trained to identify each pair of sentences as “agree”, “disagree”, or “neutral”. He focuses on mismatched couples. In conjunction with the classifier, a user-defined “neutrality masker” module is executed which identifies which words in the obsolete sentence contradict the claim. The module removes the minimum number of words required to “maximize neutrality”. This means that the pair can be marked as neutral. This is the starting point: Although the sentences do not match, they no longer contain any obviously contradictory information. The module creates a binary “mask” over the obsolete sentence, placing a 0 over words that will most likely need to be deleted, while a 1 over the keepers.

After masking, a novel two-encoder-decoder framework is used to generate the final output set. This model learns condensed representations of the claim and the outdated sentence. In cooperation, the two encoder-decoders merge the different words from the claim by moving them to the places that are left free by the deleted words (which are covered with 0s) in the outdated sentence.

In one test, the model scored higher than all conventional methods. A technique called “SARI” was used to measure how well machines delete, add, and keep words compared to how people change sentences. They used a set of manually edited Wikipedia sets that the model had never seen before. Compared to several traditional methods of text generation, the new model was more accurate in updating facts and the output was more similar to human script. In another test, crowdsourcing people rated the model (on a scale of 1 to 5) based on how well its output sentences contained factual updates and matching human grammar. The model received an average score of 4 on actual updates and 3.85 on matching grammar.

Remove preload

The study also showed that the system can be used to expand data sets to eliminate bias in training detectors for “fake news”, a form of propaganda that contains disinformation created to mislead readers to website – Generate calls or control public opinion. Some of these detectors train records of sentence pairs that agree or disagree to “learn” to verify a claim by matching it with given evidence.

In these pairs, the claim is either matched (agree) to certain information with a supporting “evidence” phrase from Wikipedia, or modified by people to include information that contradicts (disagree) with the evidence. The models are trained to label claims with “wrong” evidence as “false,” which can be used to identify counterfeit messages.

Unfortunately, such datasets currently have unintended prejudices, Shah says: “During training, models use a language of human-made claims as” give-away “phrases to mark them as false without relying heavily on the evidence This reduces the accuracy of the model when evaluating real-world examples because no fact-checking is carried out. “

The researchers used the same deletion and fusion techniques from their Wikipedia project to compensate for the mismatched pairs in the data set and to reduce the distortion. For some “disagree” pairs, they used the incorrect information from the modified sentence to regenerate a falsified “evidence” support sentence. Some of the give-away sentences then exist in both the “agree” and “disagree” sentences, forcing models to analyze more features. Using their expanded data set, the researchers reduced the error rate of a popular counterfeit detector by 13 percent.

“If your data set is biased and you tempt your model to look at only one sentence in a mismatched pair to make predictions, your model will not survive the real world,” says Shah. “We have models look at both sentences in all pairs who agree or disagree.”

