Humans are naturally prone to having bias. External factors, opinions and feelings all help influence the decisions we make. Machine learning forms the core of modern AI systems, with deep learning algorithms being particularly popular. These algorithms are very data hungry. Specifically, what makes these systems effective is a large quantity of good training data that is relevant to the area in which you’re trying to achieve some machine learning objective. However, the machine learning data is only as good as the data you feed it. What if there are implicit or explicit biases in this training data? It follows that the AI system will inherit those same biases. Some of these can be easy to spot, but sometimes it’s more subtle than anyone can initially realize which can have significant unintended consequences.
Diverse teams equal more diverse data
The AI Today podcast interviewed Tess Posner, CEO at AI4All, a non-profit organization that focuses on increasing the diversity of AI. Unfortunately the tech industry today is very homogeneous; there are not many women or people of color in the field. AI4All aims to bring a more diverse background to the AI world. Research has consistently shown that working in a group with little to no diversity leads to less productivity and less imaginative thinking. AI4All’s goal is to help bring new people into the AI industry who may not have otherwise gotten into the industry.
In AI, there is an even bigger impact from limited diversity. Groups that are primarily made up of similar people naturally exhibit bias. Developers with similar backgrounds tend to think in a similar way and this bias can get introduced into datasets. When the AI goes to learn and use that data, the biases appear and sometimes are amplified by the way algorithms interpret the data.
As a result, the datasets that power AI are not as diverse as we’d like them to be. An example of the impact of biased training data is the Correctional Offender Management Profiling for Alternative Sanctions software, or COMPAS for short. COMPAS is used by U.S. courts to assess the likelihood of a defendant becoming a recidivist and helps make decisions on court sentences. Using AI, the system helps courts determine how likely someone is to commit a crime in the future. While this augmented intelligence tool sounds great in thorty, the software has been found to be racially biased.
Another big area in which biased data impacts AI is with regards to facial recognition. On numerous platforms, AI-based facial recognition systems have problems recognizing women or people of color as accurately as they can identify white men. Much of this problem is due to the lack of diverse training data the models were trained on before being released. The problems of Apple’s face unlock or Google Photos face/object identification have been well documented in the media. One could argue that if AI teams had more diversity, then they would produce more diverse sets of training data that are more representative of society at large.