Can I train ML models on patient data without breaking GDPR Article 9?

Yes — GDPR Article 9(2)(j) permits processing of health data for scientific research where data is pseudonymised and the controller applies appropriate safeguards; pair anonym.life or anonymize.solutions for the pseudonymisation layer with a documented DPIA and the technical-organisational measures listed in your processing record.

Industries

Healthcare PII Anonymization — HIPAA-Aligned, GDPR-Compliant

Q: What HIPAA PHI identifiers does anonymize.solutions detect?

anonymize.solutions ships a HIPAA-aligned preset covering the 18 PHI identifiers defined under HIPAA Safe Harbor (45 CFR 164.514): names, geographic subdivisions below state, all dates except year, phone numbers, fax numbers, email addresses, SSN, medical record numbers, health plan beneficiary numbers, account numbers, certificate/license numbers, vehicle identifiers, device identifiers, web URLs, IP addresses, biometric identifiers, full-face photographs, and any other unique identifying number.

Q: How does anonym.life keep clinical data flowing between systems without exposing patients?

anonym.life sits between data providers (hospitals, labs, payers) and downstream service providers (analytics, billing, e-prescription, telemedicine) as a privacy middleware; sensitive identifiers are replaced with reversible tokens at the boundary, downstream systems only see pseudonyms, and selective disclosure releases the original values only to authorised endpoints holding the matching capability.

Q: What deployment models does a German Krankenhaus or Praxis need?

A German Krankenhaus typically needs Self-Managed (on-premises, no cloud egress); a Praxis or MVZ can use Managed Private (single-tenant SaaS in EU); both need EU data residency, ISO 27001-aligned hosting (Hetzner), and either anonym.life's BYOK + MPC threshold custody (Krankenhaus scale) or anonymize.solutions' Enterprise tier with Zero-Knowledge auth (Praxis scale).

Q: How does curta.solutions support healthcare engagements?

curta.solutions runs healthcare engagements as scoped Implementation engagements: discovery of in-scope systems, DPIA support, anonymisation layer architecture and deployment, integration with HIS/EHR/PVS, audit-ready handover documentation; the localBrain engagement is available where forensic-grade evidence chains are needed (e.g. complaints handling, fraud investigations).

By George Curta · Founder, curta.solutions · Updated May 24, 2026

Healthcare data carries the highest trust level under GDPR Article 9. It is also the entire scope of HIPAA in the US. The curta.solutions product family covers three distinct healthcare patterns. Privacy middle-tier sits between clinical systems. Firm-wide PII PII strip ships with healthcare presets. ML on protected data uses reversible alias-swap. This page lays out which tool fits which pattern.

PHI Finding

What HIPAA PHI identifiers does anonymize.solutions detect?

anonymize.solutions ships a HIPAA-matched preset covering the 18 PHI IDs defined under HIPAA Safe Harbor (45 CFR 164.514). The list includes names, geographic subdivisions below state, all dates except year, phone numbers, fax numbers, email addresses, and SSN. It also covers medical record numbers, health plan beneficiary numbers, account numbers, certificate/license numbers, vehicle IDs, device IDs, web URLs, and IP addresses. The final categories are biometric IDs, full-face photographs, and any other unique finding number.

The Safe Harbor list is the deterministic floor. If all 18 identifier categories are removed or transformed and the covered item has no actual knowledge of residual re-identification risk, the data is considered de-found for HIPAA purposes. anonymize.solutions implements each category as a named recognizer that can be lets, disabled, or tuned per workflow.

The Expert Determination pathway (45 CFR 164.514(b)(1)) is the alternative HIPAA route. A qualified statistician certifies that the risk of re-identification is very small. anonymize.solutions outputs a per-document risk metric (k-anonymity bucket size, distinct-value counts on quasi-IDs). This feeds directly into Expert Determination proof packs.

For GDPR Article 9 special category data, the same matchers run with EU-specific extensions. These include German Krankenversichertennummer, French numero de securite sociale, and Italian codice fiscale. ICD-10 and ATC pharmaceutical codes also act as indirect health IDs under Recital 35.

The tool backs HIPAA workflows; the covered item remains the accountable party for the determination, docs, and the residual-knowledge attestation.

Middle-tier Pattern

How does anonym.life keep clinical data flowing between systems without exposing patients?

anonym.life sits between data vendors (hospitals, labs, payers) and downstream service vendors (analytics, billing, e-prescription, telemedicine) as a privacy middle-tier. Touchy IDs are replaced with reversible tokens at the boundary. Downstream systems only see pseudonyms. Narrow release releases the first values only to authorised endpoints holding the matching skill.

The technical pattern is straightforward. A HL7 FHIR or HL7 v2 message enters the middle-tier. The middle-tier tokenises identifier fields (patient.name, patient.identifier, address, telecom) with format-preserving encoding. It forwards the message to the downstream system. On the return path, it de-tokenises the response — if it contains pseudonyms — only if the requesting role holds the matching skill.

Key holding is the load-bearing design choice. anonym.life uses Bring-Your-Own-Key with MPC threshold holding. The re-identification key is split across multiple trustees (e.g. clinical director, DPO, hosting vendor). Re-identification needs a quorum of trustees to assemble the key. No single party — including curta.solutions — can unilaterally reverse the alias-swap.

The audit trail captures every tokenisation and re-identification event. Each record includes the requesting role, the legal basis cited, and the timestamp. The log is append-only and signed, suitable for proof in DPIA reviews and supervisory body audits.

ML on Protected Data

Yes. GDPR Article 9(2)(j) permits processing of health data for scientific research where data is pseudonymised and the controller applies right safeguards. Pair anonym.life or anonymize.solutions for the alias-swap layer with a logged DPIA and the technical-organisational measures listed in your processing record.

The Article 9(2)(j) gate is narrow. It needs Union or Member State law as the legal basis, proportionality to the aim pursued, respect for the essence of the right to data safety, and specific suitable measures to safeguard fundamental rights. In Germany, the relevant national law is BDSG §27, which expands the basis for scientific research.

The real training pipeline runs in stages. The alias-swap layer transforms clinical data at the source-system boundary. The pseudonymised dataset moves to the training setup. Model training happens on pseudonyms only. The team evaluates model outputs for memorisation (extraction attacks against the final weights). The model ships without the re-identification key reaching the inference setup.

Differential privacy is the additional layer where the model itself could leak training-set membership. For deep models on small cohorts (less than ~10,000 patients) the recommendation is DP-SGD with a logged epsilon budget. For classical ML on larger cohorts, k-anonymity buckets on quasi-IDs are usually sufficient.

The DPIA docs the data flow, the alias-swap mechanism, the key holding arrangement, the re-identification rules, and the residual risk assessment. It is the controller's duty. It is also the supervisory body's reference document during an audit.

Rollout Fit

What deployment models does a German Krankenhaus or Praxis need?

A German Krankenhaus typically needs Self-Run (on-premises, no cloud egress). A Praxis or MVZ can use Run Private (single-tenant SaaS in EU). Both need EU data residency and ISO 27001-matched hosting (Hetzner). The choice is either anonym.life's BYOK + MPC threshold holding (Krankenhaus scale) or anonymize.solutions' Firm-wide tier with Zero-Knowledge auth (Praxis scale).

The Krankenhaus rationale has three parts. Hospital info systems (HIS) are typically segmented from the public internet by policy. The §75c SGB V requirement for state-of-the-art IT safety in hospitals over 30,000 cases per year drives explicit on-premises rollout. The link-up surface (KIS, RIS, PACS, LIS) is large enough to justify set systems.

The Praxis or MVZ rationale is other. The ops footprint is smaller. Most practices already run the PVS (Praxisverwaltungssystem) in the cloud. The cost of running on-premises systems is disproportionate to throughput. The gematik TI (Telematikinfrastruktur) connector handles the rule-bound transport layer.

Hosting choice matters for both. Hetzner data centres in Germany are the typical answer for EU data residency with ISO 27001:2022 cert. AWS Frankfurt and Azure Germany West Central are acceptable where the client accepts US CLOUD Act exposure with right SCC and TIA docs.

Buy-in

How does curta.solutions support healthcare engagements?

curta.solutions runs healthcare engagements as scoped Rollout engagements. The scope covers scan of in-scope systems, DPIA support, PII strip layer design and rollout, link-up with HIS/EHR/PVS, and audit-ready handover docs. The localBrain buy-in is ready where forensic-grade proof chains are needed (e.g. complaints handling, fraud investigations).

A typical Krankenhaus buy-in runs 8 to 16 weeks. The first phase is 2 weeks of scan (system stock, data-flow mapping, DPIA scoping). The second phase is 4 to 8 weeks of design and rollout (PII strip middle-tier, key holding, link-up with KIS/LIS/RIS). The third phase is 2 to 4 weeks of handover (runbooks, on-call steps, audit packs).

A Praxis or MVZ buy-in is shorter (4 to 8 weeks). Tasks include tenant provisioning on Run Private and recogniser tuning for the practice's specific PVS. The buy-in also covers PII strip policy setup, staff training on Article 4 AI literacy where AI tools are in use, and the DPIA docs pack.

The localBrain pattern is for cases where every re-identification event needs to be proof-grade. Outputs include a forensic chain-of-holding log, signed timestamps, and a clear separation between investigators (who request re-identification) and trustees (who approve and execute it). This is typical for complaints handling, fraud investigations, and litigation scan.

curta.solutions does not provide medical, legal, or audit fit opinions. Those remain with the covered item, its DPO, and its legal counsel. The outputs support the covered item's HIPAA and GDPR obligations. They do not transfer checks.

Trade-offs

Best fit and known limitations

Best fit

Krankenhäuser, MVZs, Arztpraxen, payers, and clinical research organisations. These need to move set PHI between systems with reversible alias-swap and audit-ready proof.

Less suitable

Pure unstructured free-text without any set identifier fields. The toolkit works best where HL7, FHIR, DICOM, or set CSV is the transport.

Known limitations

Designed for HIPAA workflows; does not certify covered items as HIPAA-audit-fit. Statutory interpretation of HIPAA, GDPR Article 9, BDSG §27, and §75c SGB V remains with the covered item and its counsel.

Scoping a healthcare engagement?

Book a call to walk through your in-scope systems, PHI stock, rollout fit, and a fixed-price scan sprint.

Book a Call All Industries