SaaS Platform PII Detection Image Redaction 48 Languages MCP Server Microsoft Presidio

Platform Overview

cloak.business is an enterprise-grade PII detection and anonymization platform built on Microsoft Presidio. Features a regex-first approach with 317 deterministic pattern recognizers for structured data, complemented by NLP engines for names and locations. Includes image redaction with Tesseract OCR, native MCP Server integration for AI tools, and Zero-Knowledge authentication.

390+
Entity Types
48
Languages
5
Anonymization Methods
317
Regex Recognizers
99.9%
Uptime

What Makes cloak.business Unique

How Detection Works

cloak.business uses a 10-step pipeline that prioritizes deterministic regex matching before engaging NLP engines. Built on Microsoft Presidio.

  1. Input Reception — Text received via web app, API, MCP Server, or batch upload
  2. Language Detection — Automatic identification of text language for engine selection
  3. Regex Scanning — 317 pattern recognizers scan for structured PII (emails, IBANs, SSNs, credit cards, phone numbers)
  4. Checksum Validation — Detected patterns validated using checksums (Luhn, IBAN, SSN format rules)
  5. Context Enhancement — Surrounding text analyzed to boost or reduce confidence scores
  6. NLP Processing — spaCy, Stanza, or XLM-RoBERTa processes text for names, organizations, and locations
  7. Result Merging — Regex and NLP results merged with conflict resolution (regex wins for structured data)
  8. Confidence Scoring — Each detection receives a confidence score (0.0 to 1.0)
  9. Anonymization — Selected method applied: replace, redact, hash, encrypt, or mask
  10. Output Delivery — Anonymized text returned with detection report and audit trail
// Before cloak.business
"Invoice for John Doe, IBAN DE89 3704 0044 0532 0130 00, email john@example.com..."
// After cloak.business (regex-first)
"Invoice for PII_PERSON_001, IBAN PII_IBAN_001, email PII_EMAIL_001..."

Multi-Engine Language Processing

cloak.business uses three NLP engines optimized for different language families with lazy-loaded models. Regex handles structured data, NLP handles names and organizations. Built on Microsoft Presidio.

spaCy Engine

25 Languages

Fast industrial-strength NLP for European languages and major world languages.

Stanza Engine

7 Languages

Stanford NLP engine for specialized language processing and academic accuracy.

XLM-RoBERTa Transformer

16 Languages

Cross-lingual transformer for low-resource languages and multilingual documents.

PII Detection & Anonymization

390+ Entity Types

Names, emails, phone numbers, credit cards, SSNs, IBANs, IP addresses, medical records, and more. 317 regex recognizers for structured data, NLP-based NER for names and organizations. Confidence scoring for all detections.

5 Anonymization Methods

  • Replace — Substitute with fake data
  • Redact — Complete removal
  • Hash — SHA-256 hashing
  • Encrypt — AES-256-GCM encryption
  • Mask — Partial obscuring

75+ Countries

Country-specific recognizers for national IDs, tax numbers, social security numbers, and regional data formats across 75+ countries.

Deterministic Results

317 regex recognizers give 100% reproducible results for structured data. NLP provides high consistency for names. Fully auditable for compliance. No model drift.

OCR-Powered Image Anonymization

Extract text from images, detect PII, and redact directly on the image. Powered by Tesseract OCR.

38 OCR Languages

Tesseract OCR extracts text from images in 38 languages, enabling PII detection on scanned documents, screenshots, and photos.

Supported Formats

JPEG, PNG, BMP, TIFF, and WebP. Upload images and receive redacted versions with PII masked or removed.

Visual Redaction

Detected PII is redacted directly on the image with configurable redaction boxes. Original text positions preserved for accurate coverage.

Platform Components

Web Application

Cloud-Based Processing

Full-featured web interface with Zero-Knowledge authentication. No software installation required.

Desktop App

Windows 10+

Documents stay on your device while using cloud-powered entity detection. Only extracted text is sent for analysis. AES-256-GCM encryption with Argon2id key derivation. Supports PDF, DOCX, XLSX, TXT, CSV, JSON, XML.

Office Add-in

Word, Excel & PowerPoint

Real-time PII detection directly in Microsoft Office. Anonymize without leaving your document.

MCP Server

Claude Desktop, Cursor, VS Code (Continue & Cline)

Native stdio integration for Claude Desktop. HTTP endpoints for Cursor and VS Code via Continue and Cline extensions. 6 operators with entity groups and presets.

REST API

JWT Authentication

RESTful endpoints for workflow automation and CI/CD pipeline integration.

Batch Processing

Multi-Document Upload

Multi-document upload with parallel processing. Plan limits apply per tier.

Image Redaction

Tesseract OCR — 38 Languages

Upload images, extract text via OCR, detect PII, and receive redacted images. JPEG, PNG, BMP, TIFF, WebP supported.

Industries & Applications

Enterprise

GDPR compliance at scale. Centralized PII detection across departments with role-based access control and batch processing.

Developers

REST API and MCP Server for CI/CD pipelines. Safe test dataset generation without production PII exposure.

Legal

Contract anonymization & e-discovery. Redact sensitive information from court filings and discovery documents with audit trails.

Healthcare

Patient data protection & HIPAA support. Medical records anonymization for research and administrative documents.

Financial

PCI-DSS compliance & fraud prevention. Transaction and customer data protection with regulatory reporting.

Research

Anonymize research datasets for publication. Remove personal identifiers while preserving data utility for analysis.

Government

Public records & FOIA compliance. Automated redaction for FOIA requests and inter-agency data sharing.

Zero-Knowledge Architecture

Password Never Leaves Device

True Zero-Knowledge authentication. Your master password is used locally to derive encryption keys. Server never sees your password.

Argon2id + XChaCha20-Poly1305

Memory-hard key derivation with modern authenticated encryption. Industry-leading cryptographic primitives.

24-Word Recovery Phrase

BIP-39 compatible recovery phrase for account recovery. No password reset emails — you control your keys.

ISO 27001:2022 Certified

Hosted in Hetzner data centers in Germany. Full GDPR compliance with EU data residency. AES-256-GCM encryption. TLS 1.2+.

Transparent Token System

Each operation costs tokens based on text length, entities detected, and operations applied.

Free

€0/month

200 tokens per cycle

  • Online account
  • Analyzer & Anonymizer
  • 48 languages
  • 317 regex recognizers
  • Image redaction
  • No credit card required

Pro Best Value

€15/month

4,000 tokens per cycle

  • All Basic features
  • MCP Server access
  • All file types supported
  • Unlimited uploads
  • Token top-ups available

Business Enterprise

€29/month

10,000 tokens per cycle

  • All Pro features
  • Priority support
  • Custom integrations
  • Extended history
  • Token top-ups available

cloak.business vs anonym.legal vs anonymize.today

Three platforms, different strengths. All built for GDPR compliance on German servers.

Feature cloak.business anonym.legal anonymize.today
Focus Enterprise & Developers Legal & Privacy-First Simple & Transparent
Entity Types 390+ 260+ 256
Languages 48 48 + RTL 27
Regex Recognizers 317 n/a n/a
Image Redaction Yes (38 OCR langs) No No
MCP Server Yes Yes No
Zero-Knowledge Auth Yes Yes No
Desktop App Windows Win/Mac/Linux Win/Mac/Linux
Office Add-in Word/Excel/PPT Word/Excel/PPT Word/Excel/PPT
Free Tokens 200/cycle 200/cycle 300/month
Technology Microsoft Presidio Presidio-based Regex-based

Try cloak.business

Start with 200 free tokens per cycle. No credit card required. Zero-Knowledge from day one.

Related Platforms: anonym.legal — Zero-Knowledge PII anonymization with MCP Server  |  anonymize.today — Simple, transparent PII detection

Need enterprise-grade PII detection?

Let's discuss how cloak.business can support your compliance, image redaction, and API integration requirements.