Credit: CC0 Public Domain

Artificial Intelligence (AI) already reconfigures the world in a striking way. Data drives our global digital ecosystem and AI technologies reveal patterns in data. Smartphones, smart homes and smart cities influence the way we live and communicate and AI systems are increasingly involved in recruitment decisions, medical diagnoses and court decisions. Whether this scenario is utopian or dystopian depends on your perspective.


The potential risks of AI are listed repeatedly. Killer robots and mass unemployment are common concerns, while some people are even afraid of extinction. More optimistic predictions claim that AI will add US $ 15 trillion to the global economy by 2030 and eventually lead us to a kind of social nirvana.

We must certainly take into account the impact that such technologies have on our societies. An important concern is that AI systems reinforce existing social prejudices – with a harmful effect. Several notorious examples of this phenomenon have received much attention: state-of-the-art automated machine translation systems that produce sexist output, and image recognition systems that classify black people as gorillas.

These problems arise because such systems use mathematical models (such as neural networks) to identify patterns in large sets of training data. If that data is highly skewed in different ways, the inherent prejudices will inevitably be learned and reproduced by the trained systems. Prejudiced autonomous technologies are problematic because they may marginalize groups such as women, ethnic minorities or the elderly, thereby widening existing social imbalances.

For example, if AI systems were trained on police arrest data, all conscious or unconscious prejudices that manifest themselves in existing arrest patterns would be replicated by a "predictive policing" AI system that was trained on that data. Several authoritative organizations recognize the serious implications of this and have recently advised that all AI systems should be trained on unbiased data. Ethical guidelines published by the European Commission earlier in 2019 offered the following recommendation:

When data is collected, it may contain socially constructed prejudices, inaccuracies, errors and mistakes. This must be addressed prior to training with a given data set.

Dealing with biased data

This all sounds wise enough. But unfortunately it is sometimes simply impossible to ensure that certain data sets are unbiased before the training. A concrete example should clarify this.

All advanced machine translation systems (such as Google Translate) are trained on pairs of sentences. An English-French system uses data that associates English sentences ("she is tall") with equivalent French sentences ("elle est grande") There can be 500m of such pairs in a given set of training data, and therefore a total of one billion individual sentences. All gender-related prejudices would have to be removed from such a set of data if we wanted to prevent the resulting results system for producing sexist output such as the following:

  • Input: The women started the meeting. They worked efficiently.
  • Exit: Les femmes des commencé la reunion. Ils ont travaillé efficacement.

The French translation was generated with Google Translate on October 11, 2019 and it is incorrect: "Ils"the masculine plural subject is pronoun in French, and it appears here despite the context that clearly indicates that women are referred to. This is a classic example of the male standard preferred by the automated system due to training data bias.

In general, 70% of the nominated pronouns in translation data sets are male, while 30% are female. This is because the texts used for such purposes refer more to men than to women. To prevent translation systems from replicating these existing prejudices, specific pairs of sentences must be removed from the data, so that the male and female pronouns are 50% / 50% both English and French. This would prevent the system from assigning higher probabilities to male pronouns.

Independent nouns and adjectives must of course also be 50% / 50% in balance, since they can indicate gender in both languages ​​("actor", "actress", "neuf", "neuve") – and so on. But this drastic down sampling would necessarily reduce the available training data considerably, thereby reducing the quality of the translations produced.

And even if the resulting data subset were fully gender-balanced, it would still be skewed in a variety of other ways (such as ethnicity or age). In reality it would be difficult to remove all these prejudices all the way. If one person spent just five seconds reading each of the billion sentences in the training data, it would take 159 years to check them all – and that assumes a willingness to work all day and night, without lunch breaks.

An alternative?

It is therefore not realistic to require that all training data sets are unbiased before AI systems are built. Such high level requirements usually assume that "AI" indicates a homogeneous cluster of mathematical models and algorithmic approaches.

In reality, different AI tasks require very different types of systems. And trivializing the full extent of this diversity conceals the real problems of (let's say) deeply skewed training data. This is regrettable because it means that other solutions to the problem of data bias are being neglected.

For example, the prejudices in a trained machine translation system can be considerably reduced if the system is adjusted after it has been trained on the larger, inevitably biased, data set. This can be done using a much smaller, less skewed data set. The majority of the data can therefore be strongly biased, but the system that is trained on this need not be that. Unfortunately, these techniques are rarely discussed by those in charge of developing guidelines and legislative frameworks for AI research.

If AI systems simply reinforce existing social imbalances, they are more likely to hinder positive social change. If the AI ​​technologies that we increasingly use every day are far less biased than we are, they can help us to recognize and face our own lurking prejudices.

This is where we have to work. And so AI developers need to think much better about the social impact of the systems they build, while those who write about AI need to better understand how AI systems are actually designed and built. Because if we are indeed approaching a technological idyll or an apocalypse, the first would be preferred.

AI can be a force for positive social change – but we are currently moving towards a dark future

Brought to you by
The conversation

This article has been republished from The Conversation under a Creative Commons license. Read the original article.

AI can be a force forever – but we are currently moving towards a dark future (2019, December 2)
picked up on December 2, 2019

This document is protected by copyright. Apart from fair trade for private study or research, no
portion may be reproduced without written permission. The content is provided for informational purposes only.