Profanity Filter

Overview

The Profanity Filter uses advanced AI to detect and filter profanity and inappropriate language in both user inputs and AI responses, ensuring all interactions remain professional and respectful. Unlike basic word lists, this guardrail leverages context-aware machine learning to accurately identify explicit, disguised, or context-dependent profanity across multiple languages, with high accuracy and minimal false positives. Designed for enterprise use, the Profanity Filter operates in real time with low latency, helping organisations maintain communication standards and protect their brand reputation.

What the Guardrail Does

Purpose

The primary goal of the Profanity Filter is to maintain professional communication standards by preventing the use of profanity and inappropriate language during AI interactions while maintaining high accuracy and minimal impact on legitimate business communications. By enabling this guardrail, organisations can ensure content appropriateness, maintain professional standards, protect brand reputation, and uphold responsible AI usage across all interactions.

Scope

Comprehensive Profanity Detection

The Profanity Filter applies advanced content analysis to:

Input – Applies the selected behaviour to what users send to the model.
Output – Applies the selected behaviour to what the model returns as a response.
Both – Full bidirectional coverage

Operational Modes

Monitor – Lets you review input or output content without taking any action—used for observation and diagnostics.
Block – Automatically stops content from being processed if it violates the selected guardrail rules.

Detection Capabilities

The guardrail can identify various forms of inappropriate language:

Explicit Profanity: Direct use of offensive or vulgar language
Contextual Profanity: Language that becomes inappropriate based on context
Disguised Profanity: Attempts to circumvent detection through spelling variations
Inappropriate Language: Content unsuitable for professional or general audiences

Key Features

Comprehensive Language Detection

Identifies profanity across multiple languages and contexts with advanced pattern recognition.

Context-Aware Analysis

Advanced understanding of conversation context and language usage for accurate detection.

Configurable Sensitivity

Adjustable detection thresholds for different use cases with Low, Medium, and High options.

Low Latency

High-performance detection that doesn’t impact response times or user experience.

Enterprise-Grade Accuracy

Minimises false positives while maintaining high detection rates across all languages.

Multi-Language Support

Detects profanity across various languages and dialects for global applications.

Why Use This Guardrail?

Benefits

Professional Standards: Maintains appropriate language standards in all interactions
Brand Protection: Protects organisational reputation and maintains professional image
Audience Appropriateness: Ensures content is suitable for intended audiences
Compliance: Helps meet workplace and industry communication standards
Risk Mitigation: Reduces potential reputational damage from inappropriate language

Use Case: Customer Service AI Assistant

Scenario

A global retail company deploys an AI assistant to handle customer inquiries and support requests. The assistant must provide helpful service while maintaining professional language standards and avoiding any profanity or inappropriate content that could damage the brand or offend customers.

Challenge

The organisation must ensure that:

The AI never uses profanity or inappropriate language in responses
User inputs containing profanity are properly handled
All interactions remain professional and brand-appropriate
Detection works accurately across various languages and contexts

Solution: Implementing Profanity Filter

Comprehensive Language Filtering
- Enabled for both inputs and output
Appropriate Enforcement
- Set to Block to actively prevent profanity
- Provides clear, professional fallback responses
Optimised Sensitivity
- Set to High for high accuracy with minimal false positives

How to Use the Guardrail

Note: The steps below guide you through configuring the Profanity Filter in the Guardrail workflow.

Step 1: Navigate to the Guardrail Setup

From the Home Page, open the AI System Dashboard by selecting View to open your AI system from the AI System Table.
In the guardrails section of the AI System Overview, click Edit Guardrails to launch the guardrail configuration workflow.

Step 2: Select and Enable the Profanity Filter

In the Configure Guardrails page, click on Profanity Filter from the list.
The configuration panel will display on the right.
Toggle Enable Guardrail to ON to begin editing.

Step 3: Set Application Scope

Under the Apply Guardrail To section, select where you want the guardrail enforced:
- Input – Applies the selected behaviour to what users send to the model.
- Output – Applies the selected behaviour to what the model returns as a response.
- Both – Full bidirectional coverage

Step 4: Configure Enforcement Behaviour

Under Select Guardrail Behaviour, choose how the guardrail should respond to detected filters:
- Monitor – Lets you review input or output content without taking any action—used for observation and diagnostics.
- Block – Automatically stops content from being processed if it violates the selected guardrail rules.

Step 5: Save, Test, and Apply the Guardrail

Click Save & Continue to store your selected entities and configuration.
Go to the Test Guardrails step to evaluate how the guardrail behaves in real time with a chatbot.
After saving, you can proceed to the Summary section to review your configuration, save all changes, and view your AI System overview.

Profanity Filter Capabilities

Capability	Description
Explicit Profanity	Detects direct use of offensive, vulgar, or inappropriate language that violates professional communication standards.
Contextual Profanity	Identifies language that becomes inappropriate based on context, even when individual words may be acceptable in other situations.
Disguised Profanity	Recognises attempts to circumvent detection through spelling variations, character substitutions, or other evasion techniques.
Inappropriate Language	Monitors for content that, while not explicitly profane, may be unsuitable for professional or general audiences.

Introduction

Product

Release Notes

Overview

What the Guardrail Does

Purpose

Scope

Comprehensive Profanity Detection

Operational Modes

Detection Capabilities

Key Features

Comprehensive Language Detection

Context-Aware Analysis

Configurable Sensitivity

Low Latency

Enterprise-Grade Accuracy

Multi-Language Support

Why Use This Guardrail?

Benefits

Use Case: Customer Service AI Assistant

Scenario

Challenge

Solution: Implementing Profanity Filter

How to Use the Guardrail

Step 1: Navigate to the Guardrail Setup

Step 2: Select and Enable the Profanity Filter

Step 3: Set Application Scope

Step 4: Configure Enforcement Behaviour

Step 5: Save, Test, and Apply the Guardrail

Profanity Filter Capabilities

Introduction

Product

Release Notes

​Overview

​What the Guardrail Does

​Purpose

​Scope

​Comprehensive Profanity Detection

​Operational Modes

​Detection Capabilities

​Key Features

Comprehensive Language Detection

Context-Aware Analysis

Configurable Sensitivity

Low Latency

Enterprise-Grade Accuracy

Multi-Language Support

​Why Use This Guardrail?

​Benefits

​Use Case: Customer Service AI Assistant

​Scenario

​Challenge

​Solution: Implementing Profanity Filter

​How to Use the Guardrail

​Step 1: Navigate to the Guardrail Setup

​Step 2: Select and Enable the Profanity Filter

​Step 3: Set Application Scope

​Step 4: Configure Enforcement Behaviour

​Step 5: Save, Test, and Apply the Guardrail

​Profanity Filter Capabilities

Overview

What the Guardrail Does

Purpose

Scope

Comprehensive Profanity Detection

Operational Modes

Detection Capabilities

Key Features

Why Use This Guardrail?

Benefits

Use Case: Customer Service AI Assistant

Scenario

Challenge

Solution: Implementing Profanity Filter

How to Use the Guardrail

Step 1: Navigate to the Guardrail Setup

Step 2: Select and Enable the Profanity Filter

Step 3: Set Application Scope

Step 4: Configure Enforcement Behaviour

Step 5: Save, Test, and Apply the Guardrail

Profanity Filter Capabilities