Data handling + security
SkillOx is a security tool. We hold ourselves to the same standard we publicly grade other people's skills against — minimal collection, explicit retention, no quiet harvesting. This page is the canonical answer to "what happens to the SKILL.md content I send you?"
tl;dr
- Anonymous scans auto-delete after 30 days. Until then, the scanned content lives in Postgres so the Report Card URL is still viewable.
- Creator-claimed scans persist indefinitely so the public catalog stays browseable. Creators can remove their own listings any time from the dashboard.
- We don't store IPs. We store HMAC-SHA-256 hashes of the IP (salted with a server secret) — enough for rate limiting, not enough to identify a person.
- We never look at private code. The scanner is local-first; the CLI runs entirely offline. The hosted scanner only sees URLs you explicitly POST + content you explicitly paste.
- The scanner engine is Apache-2.0 (git.skillox.io/skillox/skillox). You can self-host the whole stack and never talk to
api.skillox.io.
What we store
scans
Every scan goes into a Postgres scans table. Columns:
id— cuid2-prefixed scan IDurl— the canonical URL submitted (orfile://pathfor bulk-file submissions)source_repo,skill_name,skill_version— parsed from the frontmatterstatus—pending|running|completed|failedgrade,score,findings(jsonb) — the scanner outputip_hash— HMAC-SHA-256 of (client IP, server salt). Stored as 64 hex chars; not reversible to the IPuser_agent_hash— same construction over the UA stringcreated_at,completed_at— timestamps
We do not store: raw IP, raw User-Agent, geolocation, browser fingerprint, referrer (beyond OG meta on the Report Card), cookies, session tokens (for unauthenticated callers).
skills
The canonical catalog table. One row per unique SKILL.md URL we've scanned, with the latest grade + repo metadata (stars, license, archived, license, description, topics, owner type). Populated by the crawler + by every completed scan via the worker's upsert.
This table has no IP / UA / user data — it's about the skill, not the requester. Anonymous removal is a soft-delete flag (removed = true) so the crawler skips it on re-discovery.
users + creators
For signed-in users (GitHub OAuth): users stores email, display name, image URL, and admin flags. creators stores the creator profile linked to that user (slug, display name, bio, verification level, subscription tier).
OAuth tokens (accounts.access_token) are stored encrypted at rest by Postgres's underlying disk encryption and used only to fetch the GitHub username for creator linking. They're never used to read your private repos.
Rate-limit state (Redis)
Redis keys of the form rl:{ip_hash}:{bucket} with a 24-hour TTL. Pure counters, no PII.
Retention
- Anonymous scans — A nightly job deletes scans where
ip_hash != 'crawler'ANDcreated_at < now() - interval '30 days'AND there's no creator submission pointing at them. 30-day window aligns with typical security-incident review timelines. - Crawler-discovered scans — kept indefinitely. They're the catalog backbone; removing them removes the public Report Card.
- Creator-claimed scans — kept as long as the creator wants. Removing the listing soft-deletes the skills row but keeps the scan history (audit trail for the creator).
- Audit logs — append-only, retained 7 years (compliance retention floor).
- Rate-limit counters — 24-hour Redis TTL, then automatically reaped.
Where it lives
- Postgres 17 on a Hetzner box in the EU (Helsinki, Finland). Disk encrypted at rest. Daily backups to Backblaze B2 EU; backups are encrypted client-side before upload.
- Redis 7 on the same box for rate limits + the scan queue. Not persisted to disk for sensitive paths; ephemeral by design.
- Cloudflare in front as TLS edge + Turnstile abuse protection + CDN. No request bodies are logged on the edge; only standard Cloudflare access logs (timestamp, country, user-agent class) with their default 7-day retention.
Nothing lives in the US. Nothing lives on AWS. No third-party analytics, no Google fonts (we self-host Inter + JetBrains Mono), no Segment / Mixpanel / Amplitude / FullStory / Sentry-with-replay.
What we never collect
- Browser cookies (other than the Auth.js session cookie for signed-in users)
- Local storage data beyond a theme preference
- Mouse-move / click heatmap / session replay
- Device fingerprints (canvas, WebGL, AudioContext, etc.)
- Third-party tracking pixels of any kind
- Your private repos (the GitHub OAuth scope is
read:user+user:emailonly)
Self-hosting
The full stack is open-source under git.skillox.io/skillox/skillox (Apache-2.0). If you don't want to send SKILL.md content to api.skillox.io, run your own instance behind your own VPC. The CLI (npm i -g skillox) works entirely offline by default; passing --api-base points it at your own scanner.
GDPR + EU AI Act
We're an EU-based company (Atomira Technologies S.L., Barcelona). Personal data handling falls under GDPR; the catalog of skills + scan results is non-personal data (public artifacts about public code).
- Data Subject Access Request: privacy@skillox.io
- Right to erasure: account deletion from
/accountremoves user + creator + linked accounts + sessions - Data Processing Agreement available on request for Team + Enterprise customers
- EU AI Act compliance roadmap: /docs/concepts/aibom
Reporting a vulnerability
Coordinated disclosure: see /docs/disclose. TL;DR — email security@skillox.io with reproducer; we respond within 24 h, fix critical issues within 7 days, credit in the changelog.
privacy@skillox.io. The line between "reasonable defaults" and "privacy maximalism" is a conversation, not a fixed point.