Introduction to Partiality in Data Analytics

July 12, 2021

The chances of partiality, in the process of data analysis, are extreme and it can vary from how a question is hypothesized and explored to how the data is sampled and organized. Bias can be introduced at any stage from defining and capturing the data set to run the analytics or AI or ML system. Hariharan Kolam, CEO, and founder of Findem, a people intelligence company stated in an interview, “Avoiding bias starts by recognizing that data bias exists, both in the data itself and in the people analyzing or using it,” Actually it is kind of impossible to be completely unbiased and biasedness is an existing element of human nature.

The Human Catalyst

Bias in data analysis can come from human sources because they use unrepresentative data sets, leading questions in surveys, and biased reporting and measurements. Often bias goes unnoticed until some decision is made based on the data, such as building a predictive model that turns out to be wrong. Although data scientists can never completely eliminate bias in data analysis, they can take countermeasures to look for it and mitigate issues in practice.

The Social Catalyst

Bias is also a moving target as societal definitions of fairness evolve. Reuters has reported an instance when the International Baccalaureate program had to cancel its annual exams for high school students in May due to COVID-19. Instead of using exams to grade students, the IB program used an algorithm to assign grades that were substantially lower than many students and their teachers expected.

Biasedness from Existing Data

Amazon’s previous recruiting tools showed preference toward men, who were more representative of their existing staff. The algorithms didn’t explicitly know or look at the gender of applicants, but they ended up being biased by other things they looked at that were indirectly linked to gender, such as sports, social activities, and adjectives used to describe accomplishments. In essence, the AI was picking up on these subtle differences and trying to find recruits that matched what they internally identified as successful.

Under-representing populations

Another big source of bias in data analysis can occur when certain populations are under-represented in the data. This kind of bias has had a tragic impact in medicine by failing to highlight important differences in heart disease symptoms between men and women, said Carlos Melendez, COO, and co-founder of Wovenware, a Puerto Rico-based nearshore services provider. Bias shows up in the form of gender, racial or economic status differences. It appears when data that trains algorithms do not account for the many factors that go into decision-making.

Cognitive biases

Cognitive bias leads to statistical bias, such as sampling or selection bias. Often analysis is conducted on available data or found in data that is stitched together instead of carefully constructed data sets. Both the original collection of the data and an analyst’s choice of what data to include or exclude creates sample bias. Selection bias occurs when the sample data that is gathered isn’t representative of the true future population of cases that the model will see. In times like this, it’s useful to move from static facts to event-based data sources that allow data to update over time to more accurately reflect the world we live in. This can include moving to dynamic dashboards and machine learning models that can be monitored and measured over time.

This article has been published from the source link without modifications to the text. Only the headline has been changed.

Introduction to Partiality in Data Analytics

The Human Catalyst

The Social Catalyst

Biasedness from Existing Data

Under-representing populations

Cognitive biases

Most Popular

In Crypto we Still “Trust”

Understanding why Wall Street likes Crypto

Meta to Spend $40 Billion on AI this year

Is Bitcoin halving the launch of new era for crypto

Drawing the line on using AI in TV and film?

Making AI Sustainable

Follow Us

POPULAR POSTS

Meta to Spend $40 Billion on AI this year

Vitalik Buterin sheds light on crypto’s true essence

In Crypto we Still “Trust”

Making AI Sustainable

POPULAR CATEGORY

Introduction to Partiality in Data Analytics

The Human Catalyst

The Social Catalyst

Biasedness from Existing Data

Under-representing populations

Cognitive biases

RELATED ARTICLES

Most Popular

Follow Us

POPULAR POSTS

POPULAR CATEGORY