Data handling + security

SkillOx is a security tool. We hold ourselves to the same standard we publicly grade other people's skills against — minimal collection, explicit retention, no quiet harvesting. This page is the canonical answer to "what happens to the SKILL.md content I send you?"

tl;dr

Anonymous scans auto-delete after 30 days. Until then, the scanned content lives in Postgres so the Report Card URL is still viewable.
Creator-claimed scans persist indefinitely so the public catalog stays browseable. Creators can remove their own listings any time from the dashboard.
We don't store IPs. We store HMAC-SHA-256 hashes of the IP (salted with a server secret) — enough for rate limiting, not enough to identify a person.
We never look at private code. The scanner is local-first; the CLI runs entirely offline. The hosted scanner only sees URLs you explicitly POST + content you explicitly paste.
The scanner engine is Apache-2.0 (git.skillox.io/skillox/skillox). You can self-host the whole stack and never talk to api.skillox.io.

What we store

scans

Every scan goes into a Postgres scans table. Columns:

id — cuid2-prefixed scan ID
url — the canonical URL submitted (or file://path for bulk-file submissions)
source_repo, skill_name, skill_version — parsed from the frontmatter
status — pending | running | completed | failed
grade, score, findings (jsonb) — the scanner output
ip_hash — HMAC-SHA-256 of (client IP, server salt). Stored as 64 hex chars; not reversible to the IP
user_agent_hash — same construction over the UA string
created_at, completed_at — timestamps

We do not store: raw IP, raw User-Agent, geolocation, browser fingerprint, referrer (beyond OG meta on the Report Card), cookies, session tokens (for unauthenticated callers).

skills

The canonical catalog table. One row per unique SKILL.md URL we've scanned, with the latest grade + repo metadata (stars, license, archived, license, description, topics, owner type). Populated by the crawler + by every completed scan via the worker's upsert.

This table has no IP / UA / user data — it's about the skill, not the requester. Anonymous removal is a soft-delete flag (removed = true) so the crawler skips it on re-discovery.

users + creators

For signed-in users (GitHub OAuth): users stores email, display name, image URL, and admin flags. creators stores the creator profile linked to that user (slug, display name, bio, verification level, subscription tier).

OAuth tokens (accounts.access_token) are stored encrypted at rest by Postgres's underlying disk encryption and used only to fetch the GitHub username for creator linking. They're never used to read your private repos.

Rate-limit state (Redis)

Redis keys of the form rl:{ip_hash}:{bucket} with a 24-hour TTL. Pure counters, no PII.

Retention

Anonymous scans — A nightly job deletes scans where ip_hash != 'crawler' AND created_at < now() - interval '30 days' AND there's no creator submission pointing at them. 30-day window aligns with typical security-incident review timelines.
Crawler-discovered scans — kept indefinitely. They're the catalog backbone; removing them removes the public Report Card.
Creator-claimed scans — kept as long as the creator wants. Removing the listing soft-deletes the skills row but keeps the scan history (audit trail for the creator).
Audit logs — append-only, retained 7 years (compliance retention floor).
Rate-limit counters — 24-hour Redis TTL, then automatically reaped.

Where it lives

Postgres 17 on a Hetzner box in the EU (Helsinki, Finland). Disk encrypted at rest. Daily backups to Backblaze B2 EU; backups are encrypted client-side before upload.
Redis 7 on the same box for rate limits + the scan queue. Not persisted to disk for sensitive paths; ephemeral by design.
Cloudflare in front as TLS edge + Turnstile abuse protection + CDN. No request bodies are logged on the edge; only standard Cloudflare access logs (timestamp, country, user-agent class) with their default 7-day retention.

Nothing lives in the US. Nothing lives on AWS. No third-party analytics, no Google fonts (we self-host Inter + JetBrains Mono), no Segment / Mixpanel / Amplitude / FullStory / Sentry-with-replay.

What we never collect

Browser cookies (other than the Auth.js session cookie for signed-in users)
Local storage data beyond a theme preference
Mouse-move / click heatmap / session replay
Device fingerprints (canvas, WebGL, AudioContext, etc.)
Third-party tracking pixels of any kind
Your private repos (the GitHub OAuth scope is read:user + user:email only)

Self-hosting

The full stack is open-source under git.skillox.io/skillox/skillox (Apache-2.0). If you don't want to send SKILL.md content to api.skillox.io, run your own instance behind your own VPC. The CLI (npm i -g skillox) works entirely offline by default; passing --api-base points it at your own scanner.

We're an EU-based company (Atomira Technologies S.L., Barcelona). Personal data handling falls under GDPR; the catalog of skills + scan results is non-personal data (public artifacts about public code).

Data Subject Access Request: privacy@skillox.io
Right to erasure: account deletion from /account removes user + creator + linked accounts + sessions
Data Processing Agreement available on request for Team + Enterprise customers
EU AI Act compliance roadmap: /docs/concepts/aibom

Reporting a vulnerability

Coordinated disclosure: see /docs/disclose. TL;DR — email security@skillox.io with reproducer; we respond within 24 h, fix critical issues within 7 days, credit in the changelog.

Something here surprised you, or you want a specific data-handling commitment we don't make? Open an issue at git.skillox.io or email privacy@skillox.io. The line between "reasonable defaults" and "privacy maximalism" is a conversation, not a fixed point.