How to: Avoid Gender Data Bias at Each Stage of the Data Lifecycle

Requirements Gathering is the first stage at which gender data bias may be introduced


Before data-related initiatives, principles are identified to ensure there is a purpose to the collection, cleansing, and/or analyzing of data. Two such principles are usefulness and pragmatism:

  • Pragmatism: data can easily be collected within a reasonable timeframe and existing data is used when possible
  • Flexibility: data collection process is flexible so that studies can adapt definitions to fit variables
  • Balance: disaggregated data is balanced in data type
  • Aspiration: model for future studies to aspire to gather data in accordance with past

Bias Levels

Other requirements gathering considerations include thoughtful variable inclusion, study design, and primary outcome. While variable inclusion and study design take form during the second stage, collection, primary outcome should be considered at all stages, and in particular, the cleansing and analysis stage.

  • Study Design: Data is collected with gender equitable considerations in the design, implementation, and analysis of the study. Important to consider where and how gender bias can arise in study design and how this could potentially impact results. Key questions to consider include: Whose (m/f) needs did they consider? Who (m/f) did they coordinate with? Who (m/f) did they gather feedback from? Who (m/f) was involved in the decision making? Whose (m/f) satisfaction was collected and analyzed?
  • Primary outcome: Gender equality and/or empowerment is a primary outcome of the study. Study directly enables monitoring and targeting of gender equality-focused outcomes.

Team Diversity

A common long-term goal of data collection and use is the use of data for advanced analytics. Once principles and data intentions are defined, it is important to especially scrutinize data intended for this use, as models tested and trained on gender biased data create a negative feedback loop of gender bias in those real-world questions that model seeks to answer.

Bias Impact Statement

Once the data collection principles are understood and the intentions and implications of the data-related action are considered, a bias impact statement should be assembled before any data-related action is taken.

  • Do we have training data to make the correct predictions about the decision?
  • Is the training data sufficiently diverse and reliable? What is the data lifecycle of the algorithm?
  • Which groups are we worried about when it comes to training data errors, disparate treatment, and impact?
  • What will be the threshold for measuring and correcting for bias in the algorithm, especially as it relates to protected groups?
  • What are the potential bad outcomes and how will we know?
  • How open (e.g., in code or intent) will we make the design process of the algorithm to internal partners, clients, and customers?
  • What intervention will be taken if we predict that there might be bad outcomes associated with the development or deployment of the algorithm?
  • Is there a role for civil society organizations in the design of the algorithm?
  • Is the design team representative enough to capture these nuances and predict the application of the algorithm within different cultural contexts? If not, what steps are being taken to make these scenarios more salient and understandable to designers?
  • Given the algorithm’s purpose, is the training data sufficiently diverse?
  • Are there statutory guardrails that companies should be reviewing to ensure that the algorithm is both legal and ethical?



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store