All 12 rules
Every v0 rule is a pure function over the parsed SKILL.md. Pattern-based rules scan the text; provenance-based rules call the GitHub API for repository metadata. Each emits findings with severity, a line range, and (for line-based rules) ±2 lines of context.
Pattern-based (regex over parsed markdown)
These run line-by-line against the SKILL.md body and frontmatter.
env-var-harvestingReferences a known secret env var ($ANTHROPIC_API_KEY, $DATABASE_URL, $AWS_ACCESS_KEY_ID, 22 total).
If an attacker can lure the agent into including this in an outbound URL or message, the credential leaks.
instruction-injection"ignore previous instructions", "also include the value of", "when the user asks to read", and 2 more patterns.
Classic prompt-injection trigger phrases. Agents may treat the line as a system directive instead of user content.
url-exfiltrationURLs that interpolate a secret variable into the query string.
Once the agent fetches the URL, the credential is in the recipient's access log.
dangerous-shellrm -rf /, curl|sh, wget|sh, chmod 777, eval $(…), eval `…` — 7 patterns total.
Destructive or supply-chain attack primitives.
filesystem-overreach~/.ssh/, ~/.aws/, ~/.gnupg/, /etc/passwd, /proc/self/environ — 13 sensitive paths.
Reading these from an unsandboxed skill is a credential-exfiltration vector.
network-egress-undeclaredURLs to hosts not in the manifest's capabilities.network.egress allowlist (only fires when a manifest is present).
A skill that declares api.acme.io and then talks to analytics.acme.io is lying about its capabilities.
subprocess-executionchild_process, spawn(, exec(, subprocess.Popen, os.system( — 6 patterns.
Subprocesses break out of any capability declaration. Should require explicit process.exec in the manifest.
obfuscationBase64 blobs ≥100 chars, escaped-hex runs ≥8, unicode-escape runs ≥5.
Legitimate skills rarely include long base64/hex/unicode runs. Often hides a payload.
Provenance-based (GitHub API)
These fire on repository metadata, not on the SKILL.md content. Only applicable when the scan URL points to a GitHub repo. They fall back to safe defaults on API errors.
repo-age-youngGitHub repo created < 14 days ago.
Most supply-chain attacks use freshly-created throwaway repos. Established projects rarely match.
repo-popularity-low< 10 stars AND single contributor (contributors fetched live from /contributors).
No community vetting + lone author = elevated risk profile.
force-pushes-recentForced PushEvent to default branch in the last 30 days, via the GitHub /events feed.
Recent force-push is a common pre-attack pattern (rewriting history to hide a malicious commit).
no-manifestNo `capabilities` block in the SKILL.md frontmatter.
Without a manifest, the runtime cannot enforce what this skill is permitted to do. Required for sandboxing (planned).
v0 caveats
instruction-injectionis regex-based. Alongside it, the worker now also runs an LLM-based semantic probe suite (gated onANTHROPIC_API_KEY) that catches behavioral exfil patterns the regex misses — see Semantic prompt injection. An initial probe set today, expanding over time.repo-popularity-low: contributor count uses/contributors?per_page=2&anon=true, distinguishing “1” from “2+” only. Exact counts ship later.force-pushes-recentuses the unauthenticated/eventsfeed, which is rate-limited to 60 req/hr per IP. SetGITHUB_TOKENon the worker to raise to 5000/hr.- HIGH-severity findings count toward the grade — see Grading explained.