Overview
The Custom Data Protection Filter provides flexible, pattern-based detection for sensitive information using customisable regex patterns. Unlike the standard Data Protection Filter that uses machine learning models, this guardrail uses deterministic regex matching to identify specific data patterns, making it ideal for organisations with unique data formats or specific compliance requirements.What the Guardrail Does
Purpose
The primary goal of the Custom Data Protection Filter is to provide precise, pattern-based detection of sensitive information using custom regex patterns. This deterministic approach offers predictable results and allows organisations to define exactly what constitutes sensitive data in their specific context, complementing the ML-based Data Protection Filter for comprehensive coverage.Scope
Regex-Based Pattern Detection
The Custom Data Protection Filter applies deterministic pattern matching to:- Input – Applies the selected behaviour to what users send to the model.
- Output – Applies the selected behaviour to what the model returns as a response.
- Both – Full bidirectional coverage
Operational Modes
- Monitor – Lets you review input or output content without taking any action—used for observation and diagnostics.
- Block – Automatically stops content from being processed if it violates the selected guardrail rules.
- Mask – Replaces detected sensitive information with anonymised placeholders while allowing content to proceed.
Detection Approach
The guardrail uses regex patterns to identify sensitive data:- Deterministic Matching: Exact pattern matching based on defined regex rules
- Custom Patterns: User-defined regex patterns for specific data formats
- Predictable Results: Consistent detection based on pattern matching rules
- High Precision: Low false positive rates when patterns are well-defined
Key Features
Custom Regex Patterns
Define specific regex patterns to match your organisation’s unique data formats and requirements.
Deterministic Detection
Predictable, rule-based detection that provides consistent results across all interactions.
High Precision Control
Exact pattern matching with minimal false positives when patterns are properly configured.
Flexible Configuration
Create multiple custom patterns for different types of sensitive data or compliance requirements.
Performance Optimised
Fast regex-based processing with minimal computational overhead compared to ML models.
Compliance Ready
Supports specific regulatory requirements with custom pattern definitions for unique identifiers.
Why Use This Guardrail?
Benefits
- Precise Control: Define exactly what patterns should be detected using regex
- Predictable Results: Deterministic matching ensures consistent behaviour
- Custom Compliance: Support for organisation-specific or industry-specific data formats
- Performance: Fast processing with minimal computational overhead
- Complementary Protection: Works alongside ML-based filters for comprehensive coverage
When to Use Custom Data Protection vs. Standard Data Protection
Use Custom Data Protection Filter when:- You have specific data formats that require exact pattern matching
- You need predictable, deterministic detection results
- You have unique identifiers or codes specific to your organisation
- You want to complement ML-based detection with rule-based patterns
- You need to meet specific compliance requirements with custom patterns
- You need intelligent detection of varied PII formats
- You want context-aware detection that understands data patterns
- You need to detect PII across multiple languages and formats
- You want ML-powered detection that adapts to new patterns
Use Case: Financial Services with Custom Account Numbers
Scenario
A financial services company uses custom account number formats that are specific to their internal systems. These account numbers follow a unique pattern (e.g., “ACC-XXXX-YYYY-ZZZZ” where X, Y, Z are specific digit patterns) that standard PII detection cannot identify.Challenge
The organisation must ensure that:- Custom account number formats are detected and protected
- Internal reference codes are not exposed in AI responses
- Specific compliance patterns are matched exactly
- Detection is predictable and consistent across all interactions
Solution: Implementing Custom Data Protection Filter
-
Custom Pattern Definition
- Created regex pattern:
ACC-\d{4}-\d{4}-\d{4}for account numbers - Added pattern for internal reference codes:
REF-[A-Z]{2}\d{6} - Applied to both Input and Output for comprehensive protection
- Created regex pattern:
-
Deterministic Enforcement
- Set to Mask behaviour to anonymise sensitive data while maintaining workflow continuity
- Replaces detected patterns with appropriate placeholders (e.g., , )
-
Complementary Setup
- Used alongside standard Data Protection Filter for comprehensive coverage
- Custom patterns handle organisation-specific formats
- ML-based filter handles standard PII types
How to Use the Guardrail
Note: The steps below guide you through configuring the Custom Data Protection Filter using the Guardrail Setup.
Step 1: Navigate to the Guardrail Setup
- From the Home Page, open the AI System Dashboard by selecting View to open your AI system from the AI System Table.
- In the guardrails section of the AI System Overview, click Edit Guardrails to launch the guardrail configuration workflow.
Step 2: Select and Enable the Custom Data Protection Filter
- In the Configure Guardrails page, a list of available guardrails will be displayed.
- Click on Custom Data Protection to open its configuration options on the right-hand side of the screen.
- Toggle the Enable Policy switch to ON to begin configuration.
Step 3: Create Custom Regex Patterns
- In the Custom Patterns section, click Add Pattern to create a new regex pattern.
- Enter a Pattern Name (e.g., “Internal Account Numbers”, “Employee IDs”).
- Enter the Regex Pattern using standard regex syntax (e.g.,
ACC-\d{4}-\d{4}-\d{4}). - Add an optional Description to explain what this pattern detects.
- Click Save Pattern to add it to your configuration.
Step 4: Set Application Scope
- Under the Apply Guardrail To section, select where you want the guardrail enforced:
- Input – Applies the selected behaviour to what users send to the model.
- Output – Applies the selected behaviour to what the model returns as a response.
- Both – Full bidirectional coverage
Step 5: Configure Enforcement Behaviour
- Under Select Guardrail Behaviour, choose how the system should respond to detected patterns:
- Monitor – Lets you review input or output content without taking any action—used for observation and diagnostics.
- Block – Automatically stops content from being processed if it violates the selected guardrail rules.
- Mask – Replaces detected sensitive information with anonymised placeholders while allowing content to proceed.
Step 6: Save, Test, and Apply the Guardrail
- Click Save & Continue to store your custom patterns and configuration.
- Go to the Test Guardrails step to evaluate how the guardrail behaves with your custom patterns.
- After saving, you can proceed to the Summary section to review your configuration, save all changes, and view your AI System overview.
Regex Pattern Examples
Common Pattern Types
Account Numbers
Employee IDs
Internal Reference Codes
Custom Serial Numbers
Best Practices for Regex Patterns
Pattern Design Guidelines
When creating custom regex patterns, follow these best practices for optimal performance and accuracy:1. Be Specific and Precise
- Use exact patterns that match only your intended data formats
- Avoid overly broad patterns that might create false positives
- Test patterns thoroughly before deployment
2. Consider Edge Cases
- Account for variations in formatting (spaces, dashes, case sensitivity)
- Test with real data samples to ensure accuracy
- Consider international formats if applicable
3. Performance Optimisation
- Use efficient regex patterns to minimise processing time
- Avoid overly complex patterns that might impact performance
- Consider using anchors (^ and $) for exact matches when appropriate
4. Documentation and Maintenance
- Provide clear descriptions for each pattern
- Document the expected format and use cases
- Regularly review and update patterns as requirements change
Example Pattern Configurations
Financial Services
Healthcare
Government
Performance Considerations
Regex vs. ML-Based Detection
Custom Data Protection Filter (Regex-based):- Speed: Very fast pattern matching
- Predictability: Deterministic results
- Resource Usage: Low computational overhead
- Accuracy: High precision for well-defined patterns
- Flexibility: Limited to predefined patterns
- Speed: Moderate processing time
- Predictability: Context-dependent results
- Resource Usage: Higher computational requirements
- Accuracy: High recall across varied formats
- Flexibility: Adapts to new patterns and contexts
Optimisation Tips
- Use Anchors: Start patterns with
^and end with$for exact matches - Avoid Greedy Quantifiers: Use
*?instead of*when appropriate - Group Efficiently: Use non-capturing groups
(?:...)when you don’t need to capture - Test Performance: Validate pattern performance with large datasets
- Monitor Usage: Track pattern match rates and adjust as needed