cloak.business — Enterprise-Grade PII Detection & Anonymization
Regex-first PII safety with 317 deterministic pattern matchers, NLP for names and locations. 320+ item types, 48 languages, image redaction with OCR. Built on Microsoft Presidio. ISO 27001-matched German servers (Hetzner, Falkenstein).
Platform Overview
cloak.biz is an firm-wide-grade PII finding and PII strip platform built on Microsoft Presidio. Features a regex-first approach with 317 deterministic pattern matchers for set data, complemented by NLP engines for names and locations. Includes image redaction with Tesseract OCR, native MCP Server link-up for AI tools, and Zero-Knowledge sign-in.
What Makes cloak.business Unique
Regex-First Detection
317 deterministic pattern matchers process set data (emails, IBANs, credit cards, SSNs) before NLP engines handle names and locations. Predictable, audit-fit results with no model drift.
Image Redaction
Extract text from images using Tesseract OCR in 38 languages, detect PII, and redact directly on the image. Supports JPEG, PNG, BMP, TIFF, and WebP formats.
MCP Server for AI Tools
Native link-up with Claude Desktop (stdio), Cursor, and VS Code via Continue and Cline extensions (HTTP). 6 operators: encode, hash, mask, redact, replace, keep. Item groups and presets.
Zero-Knowledge Authentication
Your password never leaves your device. Built with Argon2id + XChaCha20-Poly1305 encoding and AES-256-GCM vault encoding. 24-word recovery phrase for account recovery.
How Detection Works
cloak.biz uses a 10-step pipeline that prioritizes deterministic regex matching before engaging NLP engines. Built on Microsoft Presidio.
- Input Reception — Text received via web app, API, MCP Server, or batch upload.
- Language Finding — Auto identification of text language for engine selection.
- Regex Scanning — 317 pattern matchers scan for set PII (emails, IBANs, SSNs, credit cards, phone numbers)
- Checksum Checks — Found patterns validated using checksums (Luhn, IBAN, SSN format rules)
- Context Enhancement — Surrounding text analyzed to boost or reduce confidence scores.
- NLP Processing — spaCy, Stanza, or XLM-RoBERTa processes text for names, teams, and locations.
- Result Merging — Regex and NLP results merged with conflict resolution (regex wins for set data)
- Confidence Scoring — Each finding gets a confidence score (0.0 to 1.0)
- PII strip — Selected method applied: replace, redact, hash, encode, or mask.
- Output Rollout — Anonymized text returned with finding report and audit trail.
Multi-Engine Language Processing
cloak.biz uses three NLP engines optimized for other language families with lazy-loaded models. Regex handles set data, NLP handles names and teams. Built on Microsoft Presidio.
spaCy Engine
25 Languages
Fast industrial-strength NLP for European languages and major world languages.
Stanza Engine
7 Languages
Stanford NLP engine for focused language processing and academic accuracy.
XLM-RoBERTa Transformer
16 Languages
Cross-lingual transformer for low-resource languages and multilingual docs.
PII Detection & Anonymization
320+ Entity Types
Names, emails, phone numbers, credit cards, SSNs, IBANs, IP addresses, medical records, and more. 317 regex matchers for set data, NLP-based NER for names and teams. Confidence scoring for all detections.
5 Anonymization Methods
- Replace — Substitute with fake data.
- Redact — Full removal.
- Hash — SHA-256 hashing.
- Encrypt — AES-256-GCM encoding.
- Mask — Partial obscuring.
70+ Countries
Country-specific matchers for national IDs, tax numbers, social safety numbers, and regional data formats across 70+ countries.
Deterministic Results
317 regex matchers give deterministic, reproducible results for set data (same input + same ruleset version). NLP provides high consistency for names. Fully audit-fit for audit fit. No model drift.
OCR-Powered Image Anonymization
Extract text from images, detect PII, and redact directly on the image. Powered by Tesseract OCR.
38 OCR Languages
Tesseract OCR extracts text from images in 38 languages, letting PII finding on scanned docs, screenshots, and photos.
Supported Formats
JPEG, PNG, BMP, TIFF, and WebP. Upload images and receive redacted versions with PII masked or removed.
Visual Redaction
Found PII is redacted directly on the image with configurable redaction boxes. First text positions preserved for accurate coverage.
Platform Components
Web Application
Cloud-Based Processing
Full-featured web interface with Zero-Knowledge sign-in. No software installation needed.
Desktop App
Windows 10+ · macOS
Docs stay on your device while using cloud-powered item finding. Only extracted text is sent for study. AES-256-GCM encoding with Argon2id key derivation. Supports PDF, DOCX, XLSX, TXT, CSV, JSON, XML.
Office Add-in
Word, Excel & PowerPoint.
Real-time PII finding directly in Microsoft Office. Anonymize without leaving your document.
MCP Server
Claude Desktop, Cursor, VS Code (Continue & Cline)
Native stdio link-up for Claude Desktop. HTTP endpoints for Cursor and VS Code via Continue and Cline extensions. 6 operators with item groups and presets.
REST API
JWT Sign-in
RESTful endpoints for workflow auto-work and CI/CD pipeline link-up.
Batch Processing
Multi-Document Upload
Multi-document upload with parallel processing. Plan limits apply per tier.
Image Redaction
Tesseract OCR — 38 Languages.
Upload images, extract text via OCR, detect PII, and receive redacted images. JPEG, PNG, BMP, TIFF, WebP supported.
Industries & Applications
Enterprise
GDPR audit fit at scale. Centralized PII finding across departments with role-based access control and batch processing.
Developers
REST API and MCP Server for CI/CD pipelines. Safe test dataset generation without live PII exposure.
Legal
Contract PII strip & e-scan. Redact touchy info from court filings and scan docs with audit trails.
Healthcare
Patient data workflows matched to HIPAA PHI handling. Medical records PII strip for research and administrative docs.
Financial
PCI-DSS-matched PAN/PII redaction for fraud-prevention workflows. Deal and client data safety with rule-set reporting.
Research
Anonymize research datasets for publication. Remove private IDs while preserving data utility for study.
Government
Public records & FOIA audit fit. Auto-run redaction for FOIA requests and inter-agency data sharing.
Zero-Knowledge Architecture
Password Never Leaves Device
True Zero-Knowledge sign-in. Your master password is used locally to derive encoding keys. Server never sees your password.
Argon2id + XChaCha20-Poly1305
Argon2id (64 MB / 3 iterations) for memory-hard key derivation; XChaCha20-Poly1305 for authenticated encoding.
24-Word Recovery Phrase
BIP-39 compatible recovery phrase for account recovery. No password reset emails — you control your keys.
ISO 27001-Aligned
Hosted in Hetzner data centers in Germany. Designed for GDPR with EU data residency. AES-256-GCM encoding. TLS 1.2+.
Transparent Token System
Each operation costs tokens based on text length, items found, and ops applied.
Free
€0/month
200 tokens per cycle.
- Online account.
- Analyzer & Anonymizer.
- 48 languages
- 317 regex matchers.
- Image redaction.
- No credit card needed.
Basic Most Popular
€3/month
1,000 tokens per cycle.
- All Free features.
- API access.
- Batch processing.
- PDF/DOCX/TXT/CSV support.
- Token top-ups ready.
Pro Best Value
€15/month
4,000 tokens per cycle.
- All Basic features.
- MCP Server access.
- All file types supported.
- Unlimited uploads.
- Token top-ups ready.
Business Enterprise
€29/month
10,000 tokens per cycle.
- All Pro features.
- Priority support.
- Custom link-ups.
- Extended history.
- Token top-ups ready.
cloak.business vs anonym.legal vs anonymize.today
Three platforms, other strengths. All built for GDPR audit fit on German servers.
| Feature | cloak.biz | anonym.legal | anonymize.today |
|---|---|---|---|
| Focus | Firm-wide & Developers. | Legal & Privacy-First. | Simple & Transparent. |
| Item Types. | 320+ | 285+ | 258 |
| Languages | 48 | 48 + RTL | 27 |
| Regex Matchers. | 317 | n/a | n/a |
| Image Redaction. | Yes (38 OCR langs) | No | No |
| MCP Server. | Yes | Yes | No |
| Zero-Knowledge Auth. | Yes | Yes | No |
| Desktop App. | Windows | Win/Mac/Linux | Win/Mac/Linux |
| Office Add-in. | Word/Excel/PPT | Word/Excel/PPT | Word/Excel/PPT |
| Free Tokens. | 200/cycle | 200/cycle | 300/month |
| Technology | Microsoft Presidio. | Presidio-based | Regex-based |
Try cloak.business
Start with 200 free tokens per cycle. No credit card needed. Zero-Knowledge from day one.
Related Platforms: anonym.legal — Zero-Knowledge PII PII strip with MCP Server | anonymize.today — Simple, transparent PII finding
Best fit and known limitations
Best for
Ops and SecOps teams that need image redaction with OCR alongside text PII finding, plus an MCP server for AI tooling, hosted on ISO 27001-matched German servers (Hetzner, Falkenstein).
Not the right fit
Browser-only or developer-IDE flows (use anonymize.live or anonymize.dev); pure consumer use without audit fit overhead (anonym.today fits better).
Known limitations
OCR accuracy depends on image quality and language; the free tier is capped at 200 tokens per cycle; 320+ items is an firm-wide-tier catalogue and may exceed simple use-case needs.
Need enterprise-grade PII detection?
Let's discuss how cloak.biz can support your audit fit, image redaction, and API link-up needs.