Advanced content filtering that identifies and prevents harmful, unsafe, or inappropriate content across all AI interactions.
Note: The following steps walk you through configuring the Harmful Content Detection Guardrail in the guardrail workflow.
Category | Description |
---|---|
Hate | Detects hateful language and content that promotes hatred, discrimination, or bias against individuals or groups based on characteristics such as race, ethnicity, religion, gender, sexual orientation, or other protected attributes. |
Insults | Identifies personal attacks and insults that demean, belittle, or cause emotional distress to individuals, helping maintain respectful and professional communication standards. |
Sexual | Monitors for sexual content and harassment, including inappropriate sexual language, explicit material, or unwanted sexual advances that are unsuitable for professional environments. |
Violence | Detects violent threats and content that promotes, glorifies, or threatens violence, including physical harm, terrorism, or other dangerous activities. |
Misconduct | Identifies inappropriate professional behaviour, including unprofessional conduct, workplace harassment, or behaviour that violates organisational standards and professional ethics. |