SaaS Platform PII Finding Image Redaction 48 Languages MCP Server Microsoft Presidio

Platform Overview

cloak.biz is an firm-wide-grade PII finding and PII strip platform built on Microsoft Presidio. Features a regex-first approach with 317 deterministic pattern matchers for set data, complemented by NLP engines for names and locations. Includes image redaction with Tesseract OCR, native MCP Server link-up for AI tools, and Zero-Knowledge sign-in.

320+
Item Types
48
Languages
5
PII strip Methods
317
Regex Matchers

What Makes cloak.business Unique

How Detection Works

cloak.biz uses a 10-step pipeline that prioritizes deterministic regex matching before engaging NLP engines. Built on Microsoft Presidio.

  1. Input Reception — Text received via web app, API, MCP Server, or batch upload.
  2. Language Finding — Auto identification of text language for engine selection.
  3. Regex Scanning — 317 pattern matchers scan for set PII (emails, IBANs, SSNs, credit cards, phone numbers)
  4. Checksum Checks — Found patterns validated using checksums (Luhn, IBAN, SSN format rules)
  5. Context Enhancement — Surrounding text analyzed to boost or reduce confidence scores.
  6. NLP Processing — spaCy, Stanza, or XLM-RoBERTa processes text for names, teams, and locations.
  7. Result Merging — Regex and NLP results merged with conflict resolution (regex wins for set data)
  8. Confidence Scoring — Each finding gets a confidence score (0.0 to 1.0)
  9. PII strip — Selected method applied: replace, redact, hash, encode, or mask.
  10. Output Rollout — Anonymized text returned with finding report and audit trail.
// Before cloak.biz
"Invoice for John Doe, IBAN DE89 3704 0044 0532 0130 00, email john@example.com..."
// After cloak.biz (regex-first)
"Invoice for PII_PERSON_001, IBAN PII_IBAN_001, email PII_EMAIL_001..."

Multi-Engine Language Processing

cloak.biz uses three NLP engines optimized for other language families with lazy-loaded models. Regex handles set data, NLP handles names and teams. Built on Microsoft Presidio.

spaCy Engine

25 Languages

Fast industrial-strength NLP for European languages and major world languages.

Stanza Engine

7 Languages

Stanford NLP engine for focused language processing and academic accuracy.

XLM-RoBERTa Transformer

16 Languages

Cross-lingual transformer for low-resource languages and multilingual docs.

PII Detection & Anonymization

320+ Entity Types

Names, emails, phone numbers, credit cards, SSNs, IBANs, IP addresses, medical records, and more. 317 regex matchers for set data, NLP-based NER for names and teams. Confidence scoring for all detections.

5 Anonymization Methods

  • Replace — Substitute with fake data.
  • Redact — Full removal.
  • Hash — SHA-256 hashing.
  • Encrypt — AES-256-GCM encoding.
  • Mask — Partial obscuring.

70+ Countries

Country-specific matchers for national IDs, tax numbers, social safety numbers, and regional data formats across 70+ countries.

Deterministic Results

317 regex matchers give deterministic, reproducible results for set data (same input + same ruleset version). NLP provides high consistency for names. Fully audit-fit for audit fit. No model drift.

OCR-Powered Image Anonymization

Extract text from images, detect PII, and redact directly on the image. Powered by Tesseract OCR.

38 OCR Languages

Tesseract OCR extracts text from images in 38 languages, letting PII finding on scanned docs, screenshots, and photos.

Supported Formats

JPEG, PNG, BMP, TIFF, and WebP. Upload images and receive redacted versions with PII masked or removed.

Visual Redaction

Found PII is redacted directly on the image with configurable redaction boxes. First text positions preserved for accurate coverage.

Platform Components

Web Application

Cloud-Based Processing

Full-featured web interface with Zero-Knowledge sign-in. No software installation needed.

Desktop App

Windows 10+ · macOS

Docs stay on your device while using cloud-powered item finding. Only extracted text is sent for study. AES-256-GCM encoding with Argon2id key derivation. Supports PDF, DOCX, XLSX, TXT, CSV, JSON, XML.

Office Add-in

Word, Excel & PowerPoint.

Real-time PII finding directly in Microsoft Office. Anonymize without leaving your document.

MCP Server

Claude Desktop, Cursor, VS Code (Continue & Cline)

Native stdio link-up for Claude Desktop. HTTP endpoints for Cursor and VS Code via Continue and Cline extensions. 6 operators with item groups and presets.

REST API

JWT Sign-in

RESTful endpoints for workflow auto-work and CI/CD pipeline link-up.

Batch Processing

Multi-Document Upload

Multi-document upload with parallel processing. Plan limits apply per tier.

Image Redaction

Tesseract OCR — 38 Languages.

Upload images, extract text via OCR, detect PII, and receive redacted images. JPEG, PNG, BMP, TIFF, WebP supported.

Industries & Applications

Enterprise

GDPR audit fit at scale. Centralized PII finding across departments with role-based access control and batch processing.

Developers

REST API and MCP Server for CI/CD pipelines. Safe test dataset generation without live PII exposure.

Contract PII strip & e-scan. Redact touchy info from court filings and scan docs with audit trails.

Healthcare

Patient data workflows matched to HIPAA PHI handling. Medical records PII strip for research and administrative docs.

Financial

PCI-DSS-matched PAN/PII redaction for fraud-prevention workflows. Deal and client data safety with rule-set reporting.

Research

Anonymize research datasets for publication. Remove private IDs while preserving data utility for study.

Government

Public records & FOIA audit fit. Auto-run redaction for FOIA requests and inter-agency data sharing.

Zero-Knowledge Architecture

Password Never Leaves Device

True Zero-Knowledge sign-in. Your master password is used locally to derive encoding keys. Server never sees your password.

Argon2id + XChaCha20-Poly1305

Argon2id (64 MB / 3 iterations) for memory-hard key derivation; XChaCha20-Poly1305 for authenticated encoding.

24-Word Recovery Phrase

BIP-39 compatible recovery phrase for account recovery. No password reset emails — you control your keys.

ISO 27001-Aligned

Hosted in Hetzner data centers in Germany. Designed for GDPR with EU data residency. AES-256-GCM encoding. TLS 1.2+.

Transparent Token System

Each operation costs tokens based on text length, items found, and ops applied.

Free

€0/month

200 tokens per cycle.

  • Online account.
  • Analyzer & Anonymizer.
  • 48 languages
  • 317 regex matchers.
  • Image redaction.
  • No credit card needed.

Pro Best Value

€15/month

4,000 tokens per cycle.

  • All Basic features.
  • MCP Server access.
  • All file types supported.
  • Unlimited uploads.
  • Token top-ups ready.

Business Enterprise

€29/month

10,000 tokens per cycle.

  • All Pro features.
  • Priority support.
  • Custom link-ups.
  • Extended history.
  • Token top-ups ready.

Three platforms, other strengths. All built for GDPR audit fit on German servers.

Feature cloak.biz anonym.legal anonymize.today
Focus Firm-wide & Developers. Legal & Privacy-First. Simple & Transparent.
Item Types. 320+ 285+ 258
Languages 48 48 + RTL 27
Regex Matchers. 317 n/a n/a
Image Redaction. Yes (38 OCR langs) No No
MCP Server. Yes Yes No
Zero-Knowledge Auth. Yes Yes No
Desktop App. Windows Win/Mac/Linux Win/Mac/Linux
Office Add-in. Word/Excel/PPT Word/Excel/PPT Word/Excel/PPT
Free Tokens. 200/cycle 200/cycle 300/month
Technology Microsoft Presidio. Presidio-based Regex-based

Try cloak.business

Start with 200 free tokens per cycle. No credit card needed. Zero-Knowledge from day one.

Related Platforms: anonym.legal — Zero-Knowledge PII PII strip with MCP Server  |  anonymize.today — Simple, transparent PII finding

Best fit and known limitations

Best for

Ops and SecOps teams that need image redaction with OCR alongside text PII finding, plus an MCP server for AI tooling, hosted on ISO 27001-matched German servers (Hetzner, Falkenstein).

Not the right fit

Browser-only or developer-IDE flows (use anonymize.live or anonymize.dev); pure consumer use without audit fit overhead (anonym.today fits better).

Known limitations

OCR accuracy depends on image quality and language; the free tier is capped at 200 tokens per cycle; 320+ items is an firm-wide-tier catalogue and may exceed simple use-case needs.

Need enterprise-grade PII detection?

Let's discuss how cloak.biz can support your audit fit, image redaction, and API link-up needs.