A collage illustration of 4 diverse social media creators on a cloud landscape. A banner says CONTENT to signify the importance of content moderation

‍

Over the years, we’ve moderated millions of content points through our content moderation outsourcing services, spanning countless industries. We’ve watched as the content moderation as outsourcing services transformed into the unseen behemoth behind the apps and media the world consumes daily.

‍

User-generated content (UGC) now fiercely competes with film, streaming, and gaming as the centerpiece of global entertainment. And there’s no sign this trend is slowing down.

‍

As UGC continues its exponential growth, the effort required to keep media safe and compliant has become staggering. Automated moderation has been in the mix from the start, with early AI models taking a role, but the rise of computer vision is giving AI a much bigger seat at the table.

‍

We’ve seen these shifts firsthand. Outsourcing content moderation has become an increasingly complex dance between humans and AI, where both systems feed into and rely on each other in ways we couldn’t have predicted.

‍

Through this, we’ve pinpointed three major types of bias that affect moderation services every day:

‍

First Group: Human Bias

This is the oldest bias, and we’re probably never getting rid of it. It’s no secret that bias is inherent in humans. Community guidelines are useful tools that work well in most cases, but when dealing with massive volumes of content, the biases moderators bring into the system can't be ignored. Simply put, moderators are human, and so are their judgments.

‍

There’s an ongoing debate: as long as moderation decisions are made by people, their inherent biases will always seep into the process.

‍

A banner that says "Human bias is not a bug. It's a feature."

‍

Second Group: AI Hallucinations

Also known as errors. AI models tend to overfit to patterns or noise, producing uncontrollable errors—sometimes detecting objects or features that don’t even exist. This becomes especially concerning when AI confidently approves content that ends up in front of the wrong audience.

‍

However, these hallucinations differ from common errors because they often involve fabricating entirely false information that seems plausible, following unpredictable patterns rather than simply misclassifying or overlooking data.

‍

Like all AI-generated data, it’s critical to audit these outputs before they’re sent back to the platform or published. The challenge is that this is the hardest type of output to QA, because hallucinations usually come with high confidence levels, leaving few clues for human moderators to catch during audits.

‍

Third Group: Training Biases, “Second-Generation Bias”

As we all know, AI models are trained on massive datasets. Generally, the larger and more diverse the dataset, the more accurate the model becomes. This process involves vast amounts of raw data, and in most cases, humans need to label this data, transforming it into usable training data. Human labelers follow guidelines crafted by data scientists, which help them make judgment calls about the elements within the data—whether it’s images, video, text, audio, or any other media.

‍

AI biases have only got worse with AI content moderation and it’s called "second generation biases"

‍

While most labeling decisions are straightforward, edge cases—those ambiguous or unclear situations—are critical. These are the gray areas where the model learns the most, as they determine how it will handle future edge cases during content moderation.

‍

As you might expect, this process becomes a breeding ground for biases.

‍

The bad news? Most of these biases get hardwired into the model permanently. They compound over time and will influence every moderation decision the model makes going forward.

‍

This bias can manifest in several ways:

‍

A. Engineer Bias

Data scientists, consciously or not, are influenced by the ideology of their organizations and the company’s official culture. They create the labeling guidelines that labelers follow, which means that ideological bias can easily be introduced into the practical instructions that guide labelers when making judgments.

‍

Some classic human biases include the overrepresentation of certain demographic groups, their behaviors, or cultural norms.

‍

For example, if an organization’s guidelines are shaped by a conservative stance on gender and sexual identity, they might push labelers to over-flag LGBT+ content as “adult content” during the training phase, even when it’s entirely appropriate.

‍

B. Labeling Data Bias

Training datasets are labeled by humans, and humans come with biases. Even with seemingly neutral guidelines, personal interpretations sneak in. Just like moderators make subjective calls, labelers filter decisions through their own lens, which means the training data AI models are fed is never bias-free. And when engineers and labelers are culturally misaligned, that bias gap only gets wider.

‍

At BUNCH we label training data for different computer vision and LLM models (some used for content moderation), and while most of our clients are in the US and Europe, our labeling team is based in the Philippines. Bridging the cultural and contextual gaps between these two sides is crucial.

‍

Take nudity, for example. If labelers from a culture with stricter views on what’s considered revealing clothing get vague guidelines from engineers who assume everyone thinks alike, you’ve got a recipe for bias. What’s considered “nudity” in one culture might be totally acceptable elsewhere, but the data the AI gets and trained with is skewed. It’s up to the labeling agency to ensure there’s maximum alignment between both sides.

‍

C. Data Curation Bias

The very act of selecting images and videos from raw media introduces bias. It’s rarely a random process; a human is making the choice, and with that comes subjectivity.

‍

For example, if a dataset heavily favors news articles from liberal media outlets, it could skew an AI model’s content moderation or sentiment analysis toward liberal viewpoints. The model might start leaning in that direction, over-representing some perspectives while under-representing others, leading to biased outcomes.

‍

D. Guardrails Bias

Guardrails are post-training mechanisms added to AI models to prevent harmful or unethical outputs. While they make sense in certain contexts, like generative AI, applying them in content moderation often leads to another layer of bias.

‍

This became a major controversy earlier in 2024 when Google’s Gemini AI generated images with obvious biases, depicting unrealistic historical figures due to guardrails designed to promote racial diversity. Other than becoming a great source of memes, it surfaced how company ideology (woke?) can negatively influence model outputs.

‍

This issue is especially critical in content moderation. For instance, people with certain skin colors could be under-scored in images containing violent content because positive discrimination biases are baked into the model’s guardrails. As a result, violent images might be labeled as safe and end up being shown to minors, simply because the model is trying too hard to avoid over-policing certain racial groups.

‍

What's the Solution to This?

Bias has been the Achilles’ heel of trust and safety for as long as these systems have existed. And as long as humans—or AI models trained by humans—are involved, there’s no easy fix in sight. Bias creeps into every stage, whether we’re curating datasets or moderating content.

‍

So what’s the compromise? A few steps can help manage this.

‍

First, acknowledge the biases. No one is immune. Accepting that these flaws are part of the system is the first step toward making anyone pause and think critically before passing judgment.

‍

Next, break down the funnel. Bias can enter at any stage, from data collection to training. Independent oversight at every step can shine a light on where things go wrong.

‍

Finally, keep expert humans in the loop. AI can handle vast amounts of data at superhuman speed, but it’s not perfect. We need human auditors to validate and course-correct AI outputs. The human factor is more relevant than ever.

‍

A banner that says: The key to solving bias is keeping humans in the loop to QA AI model outputs and resolve edge cases on the spot

‍

The beauty here is the power of combining AI and human oversight. If managed well, we could be moving toward a future where content moderation finds a balance between free expression and safety. It’s not perfect, but it’s better than ignoring the problem altogether.

‍

About the Author

Rodrigo Cardenete

Rodrigo is co-founder of BUNCH. With background in design, operations and development, he has taken different roles as COO and CMO