False positives
v0 is regex-based. That makes it high-precision (when we flag something, it's usually real), but it also means a few rules will misfire on safe code that happens to match the pattern. Here are the known failure modes and what we do about them.
Common misfires
env-var-harvestingon docstrings — a skill that says “set$ANTHROPIC_API_KEYin your shell” gets flagged the same as one that exfiltrates it. We currently flag any reference to a known secret name. Upcoming semantic probes will distinguish discussion from exfiltration intent.dangerous-shellin code examples — a tutorial skill that shows# don't do this: rm -rf /as a counter-example will be flagged. We don't yet track which fenced code-block a finding came from.obfuscationon legitimate base64 — a skill that bundles a 200-char base64-encoded icon or a small PEM cert will be flagged. The threshold is ≥100 chars, which catches malware but also legitimate inline assets.network-egress-undeclaredon documentation links — when a skill mentionshttps://example.comin prose without declaring it incapabilities.network.egress, we flag it. The rule currently can't distinguish a documentation link from an operational one.repo-popularity-lowon internal repos — corporate org-internal skills with low star counts get flagged. The rule is calibrated for the public ecosystem; a per-org policy override is coming soon.
How to report a misgrade
Email hello@skillox.io with:
- The scan ID (looks like
sk_pprsxxv0typ34oq5x2ufzd4q) - The rule ID that misfired
- Why you believe it's a false positive
We triage every report and use them to tighten the rules. False-positive reports also feed the upcoming LLM-probe training set.
What we won't do: we won't change a published grade retroactively without re-scanning. If a rule changes, the next scan reflects the new rule; the old result page keeps its historical grade with the rule version it was scored under (visible soon).
Found a real vulnerability instead?
If your finding is the opposite — a malicious skill we missed — that's a security disclosure, not a false positive. See Reporting a CVE for the responsible-disclosure process.