Popular AI Less Likely to Flag ‘Hateful Content’ That Targets Whites, Republicans, Men, Research Finds

OpenAI, the company behind the headline-grabbing artificial intelligence chatbot ChatGPT, has an automated content moderation system designed to flag hateful speech, but the software treats speech differently depending on which demographic groups are insulted, according to a study conducted by research scientist David Rozado.

The content moderation system used in ChatGPT and other OpenAI products is designed to detect and block hate, threats, self-harm and sexual comments about minors, according to Rozado. The researcher fed various prompts to ChatGPT involving negative adjectives ascribed to various demographic groups based on race, gender, religion and various other markers and found that the software favors some demographic groups over others.

Read More