Ensures that AI interactions remain professional and respectful, preventing the spread of harmful or inappropriate content.
Note: The following steps walk you through configuring the Toxicity Detection Policy in the policy workflow.
0.2
) catch broader cases, reducing false negatives.0.9
) enforce stricter toxicity detection.Category | Description |
---|---|
General Toxicity | Detects overall harmful or negative language patterns, including severe forms of toxicity that could impact workplace culture and professional communication. |
Insults & Identity-Based Attacks | Identifies language that demeans individuals or groups based on personal characteristics, helping maintain respectful and inclusive communication standards. |
Sexual Content | Monitors for explicit sexual language or references that are inappropriate for professional environments. |
Obscenity | Detects vulgar or offensive language that could create a hostile work environment or damage professional relationships. |
Threats | Identifies language that suggests harm, intimidation, or coercion, helping prevent potential workplace harassment or security concerns. |