Workshop // Exploring Gender Bias in Word Embeddings

Towards an intuitive technical understanding of bias in machine learning systems

Open In Colab Binder Open in Github

Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 International License.

In a Nutshell (see abstract below)


Want me to give the workshop at your organization or event?

Or prehaps you want to deliver it by yourself?

Drop me an email!

shlomi <AT> bu <DOT> edu

Abstract (plain text version)

As we deploy more and more AI systems, their impact on our lives grows. Shaping this impact is, in many ways, a question of human values. But how do we embed these values into the mathematical models that power AI and address the inherent gap between these two domains? How can we audit the ethics of AI? What are the limits of our efforts, and what can we aim for?

In this 90 minutes workshop, we will explore bias in word embeddings - a widespread building block of many machine learning models that work with natural languages. Word embeddings have an easy-to-explain representation that allows an intuitive understanding of this building block and its potential biases without a technical background. We will use an open-source toolkit called Responsibly to explore, visualize, measure, and finally mitigate bias in word embeddings - particularly the gender bias.

Word embeddings will serve as a case-study to the general issue of bias in machine learning. Besides, the exploration process will naturally raise practical, methodological, and philosophical questions about the ethics of AI and the limitations of technical measurement and mitigation approaches. On top of that, the deployment of the same model in different contexts might affect our ethical judgment about it. All of that will be discussed in the workshop.

The workshop is hands-on and interactive. The participants will be able to run pre-written code in parallel to the instructor in the same tools that data scientists are using. Nevertheless, the workshop is designed to be adaptive to a diverse audience: from without any background in machine learning or programming to data science practitioners. The participants should bring their own laptops, but neither setup nor installation is required.