How to: Avoid Gender Data Bias at Each Stage of the Data Lifecycle

Requirements Gathering is the first stage at which gender data bias may be introduced


  • Usefulness and relevance: data must be directly linked to expected future analysis and outcome
  • Pragmatism: data can easily be collected within a reasonable timeframe and existing data is used when possible
  • Consistency: data must align with a standard definition of gender
  • Flexibility: data collection process is flexible so that studies can adapt definitions to fit variables
  • Balance: disaggregated data is balanced in data type
  • Aspiration: model for future studies to aspire to gather data in accordance with past

Bias Levels

  • Variable Inclusion: Data includes a (typically binary) distinction of male and female. Without this distinction, women are effectively invisible in analysis. Inclusion of a gender variable allows for women to be identified and results to be analyzed for gender-specific impacts.
  • Study Design: Data is collected with gender equitable considerations in the design, implementation, and analysis of the study. Important to consider where and how gender bias can arise in study design and how this could potentially impact results. Key questions to consider include: Whose (m/f) needs did they consider? Who (m/f) did they coordinate with? Who (m/f) did they gather feedback from? Who (m/f) was involved in the decision making? Whose (m/f) satisfaction was collected and analyzed?
  • Primary outcome: Gender equality and/or empowerment is a primary outcome of the study. Study directly enables monitoring and targeting of gender equality-focused outcomes.

Team Diversity

Bias Impact Statement

  • Who is the audience for the algorithm and who will be most affected by it?
  • Do we have training data to make the correct predictions about the decision?
  • Is the training data sufficiently diverse and reliable? What is the data lifecycle of the algorithm?
  • Which groups are we worried about when it comes to training data errors, disparate treatment, and impact?
  • How and when will the algorithm be tested? Who will be the targets for testing?
  • What will be the threshold for measuring and correcting for bias in the algorithm, especially as it relates to protected groups?
  • What will we gain in the development of the algorithm?
  • What are the potential bad outcomes and how will we know?
  • How open (e.g., in code or intent) will we make the design process of the algorithm to internal partners, clients, and customers?
  • What intervention will be taken if we predict that there might be bad outcomes associated with the development or deployment of the algorithm?
  • What’s the feedback loop for the algorithm for developers, internal partners, and customers?
  • Is there a role for civil society organizations in the design of the algorithm?
  • Will the algorithm have implications for cultural groups and play out differently in cultural contexts?
  • Is the design team representative enough to capture these nuances and predict the application of the algorithm within different cultural contexts? If not, what steps are being taken to make these scenarios more salient and understandable to designers?
  • Given the algorithm’s purpose, is the training data sufficiently diverse?
  • Are there statutory guardrails that companies should be reviewing to ensure that the algorithm is both legal and ethical?




Consultant / Data & Analytics

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Location intelligence: the branch network optimization problem

How Does Excel Handle Unfiltered and Hidden Rows?

Boston Airbnb Data Analysis

Data Mining in CRM

Visualizing 3D Data from Point Distance Information with MDS-Algorithm in Python

Network Analysis of NBA Playoffs via Flow Dynamics

Marrying Data Science and My Love for Travelling: A Well-spent Summer Getaway

Thinking Like a Chef Will Make You a Better Data Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Abby McCulloch

Abby McCulloch

Consultant / Data & Analytics

More from Medium

Understanding Airbnb Properties: A Host and Guest Perspective

How to: Avoid Gender Data Bias at Each Stage of the Data Lifecycle


Setting the points per question of a test automatically