Fossier

GitHub spam prevention for open source repositories.

Open source repos face increasing spam PR volume - AI-slop, Hacktoberfest spam, SEO link injection. Trusted and known contributors pass through automatically, and unknown contributors are evaluated through a multi-signal scoring algorithm that estimates spam probability. Legitimate first-time contributors get through and likely spam gets blocked automatically.

Primary interface: A GitHub Action that automatically evaluates PRs when opened. Contributors don't need to install or run anything.

Secondary interface: A CLI for maintainers to debug evaluations, manage trust lists, and inspect the database.

How It Works

When a PR is opened, Fossier classifies the author into a trust tier:

| Tier | Source | Outcome | |------|--------|---------| | Blocked | VOUCHED.td denouncements, config blocked_users | Auto-close PR | | Trusted | CODEOWNERS, GitHub collaborators, VOUCHED.td vouches, config trusted_users | Auto-allow | | Known | Previous contributors in the local DB | Auto-allow | | Unknown | None of the above | Run scoring algorithm |

Scoring Algorithm

Unknown contributors are scored across 13 signals, each normalized to 0.0–1.0:

| Signal | What it measures | Default Weight | |--------|-----------------|----------------| | account_age | Days since GitHub account creation | 0.11 | | public_repos | Number of public repositories | 0.07 | | contribution_history | Public repos + gists as activity proxy | 0.07 | | open_prs_elsewhere | Open PRs across GitHub (spam signal if high) | 0.11 | | prior_interaction | Has issues/comments on this repo | 0.11 | | pr_content | Files changed analysis (docs-only, code, tests) | 0.11 | | follower_ratio | followers / following ratio | 0.07 | | bot_signals | Username patterns, API type field | 0.07 | | commit_email | Public email set, disposable domain detection | 0.05 | | pr_description | PR title/body quality (empty, keyword-stuffed, links, em-dashes, emojis) | 0.05 | | repo_stars | Target repo popularity (high-star repos attract more spam) | 0.05 | | org_membership | Public GitHub organization memberships | 0.05 | | commit_verification | GPG/SSH signed commits | 0.08 |

The composite score (0–100) maps to an outcome:

| Score | Outcome | Default Action | |-------|---------|----------------| | >= 70 | ALLOW | Record contributor as known in DB | | 40–69 | REVIEW | Add label + post score breakdown comment | | < 40 | DENY | Post explanatory comment + close PR |

If too many signals fail (confidence < 0.5), the outcome is forced to REVIEW regardless of score.

Flood Detection

Fossier detects when a non-trusted contributor mass-opens PRs or issues in a short time window — a common pattern with automated spam tools. If an unknown user exceeds the threshold, all their PRs are automatically denied.

[trust]
flood_threshold = 3      # 3+ PRs/issues from the same unknown user within the window = spam
flood_window_hours = 1   # time window to check (default: 1 hour)

Trusted and known contributors are exempt from flood detection. This check runs before scoring, so mass-opened PRs are caught immediately without consuming API quota.

AI-Authored Commit Rejection

Fossier can automatically reject PRs that contain commits co-authored by AI agents. When reject_ai_authored is enabled, commit messages are scanned for Co-Authored-By lines matching known AI tools (Claude, Copilot, GPT, Cursor, Codeium, Windsurf, Devin, Gemini, and others). If any match is found, the PR is immediately denied regardless of trust tier or score.

NOTE: If you have the global registry access enabled and you reject LLM co-authored commits, it will not report the user to the registry as spam. Please do not manually send spam reports only for a user submitting commits which are co-authored by LLM's. The registry is meant to collect incidents of legitimate PR spam and low-effort slop, please use your best judgement when manually submitting spam incidents and keep in mind that not everyone wants to reject any and all AI usage in their repositories.

[trust]
reject_ai_authored = true  # default: false

This check runs before the trust tier cascade and scoring algorithm, so even trusted contributors will have AI-co-authored PRs rejected when this is enabled.

Quick Start

GitHub Action (Recommended)

Add to .github/workflows/fossier.yml:

name: Fossier PR Check
on:
  pull_request_target:
    types: [opened, synchronize]

permissions:
  pull-requests: write
  issues: write

jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: PThorpe92/fossier@main
        id: fossier
        with:
          contact-url: "https://discord.gg/your-server"
          github-token: ${{ secrets.GITHUB_TOKEN }}
          # Optional: connect to the global spam registry
          # registry-api-key: ${{ secrets.FOSSIER_REGISTRY_API_KEY }}

      - name: Handle result
        if: steps.fossier.outputs.outcome == 'deny'
        run: echo "PR denied with score ${{ steps.fossier.outputs.score }}"

Note: Use pull_request_target (not pull_request) so the action has write permissions and reads config from the base branch — preventing PR authors from modifying their own trust settings.

CLI

# Install from a local clone
uv tool install .

# Or install from git
uv tool install git+https://github.com/pthorpe92/fossier.git

Once installed, the fossier command is available directly:

# Evaluate a contributor (full pipeline)
fossier check octocat --repo owner/repo --pr 42

# Score only (debug)
fossier score octocat --repo owner/repo --pr 42

# Check trust tier
fossier tier octocat --repo owner/repo

# View decision history
fossier history octocat --repo owner/repo

# Manage trust lists
fossier vouch octocat
fossier denounce spammer --reason "SEO link spam"

# Reject a contributor (denounce locally + report to global registry)
fossier reject spammer --reason "SEO link spam" --pr 42

# Vouch for all existing repo contributors (bootstrap)
fossier vouch-all --repo owner/repo
fossier vouch-all --dry-run  # preview without writing

# Initialize config files and workflows
fossier init

# Bulk-evaluate all open PRs
fossier scan --repo owner/repo

# Scan and take action (close spam, label reviews)
fossier scan --execute

# Database operations
fossier db migrate
fossier db stats --repo owner/repo
fossier db prune

Bulk Scan (workflow_dispatch)

fossier init generates a fossier-scan.yml workflow you can trigger from the GitHub Actions tab. It evaluates all open PRs at once - closing spam, labeling borderline PRs for review, and passing trusted contributors through. Run with the "dry run" option to preview without taking action.

Locally, the same command works if you have the gh CLI authenticated:

# Preview (no actions taken)
fossier scan --format table --dry-run

# Execute actions (close/label/comment)
fossier scan --execute

Exit codes: 0 = allow, 1 = deny, 2 = review, 3 = error.

Global flags: --verbose, --format json|text|table, --dry-run, --repo owner/repo, --db-path PATH.

Configuration

Create fossier.toml (or .github/fossier.toml) in your repo root:

[thresholds]
allow_score = 70.0    # Score >= this -> auto-allow
deny_score = 40.0     # Score < this -> auto-deny
min_confidence = 0.5  # Below this -> force REVIEW regardless of score

[trust]
flood_threshold = 3        # PRs/issues from same unknown user in window = flood
flood_window_hours = 1     # time window for flood detection
# reject_ai_authored = false

[registry]
# Global fossier spam registry: share and receive spam intelligence across repositories
# Register at https://fossier.io to get an API key
url = "https://registry.fossier.io"
report_denials = false          # Automatically report score-based denials to the registry
check_before_scoring = false    # Block users with 3+ registry reports before scoring

[weights]
# Signal weights (auto-normalized to sum to 1.0)
account_age = 0.11
public_repos = 0.07
contribution_history = 0.07
open_prs_elsewhere = 0.11
prior_interaction = 0.11
pr_content = 0.11
follower_ratio = 0.07
bot_signals = 0.07
commit_email = 0.05
pr_description = 0.05
repo_stars = 0.05
org_membership = 0.05
commit_verification = 0.08

[actions.deny]
close_pr = true                # Set to false to only comment/label without closing
comment = true
label = "fossier:spam-likely"

[actions.review]
comment = true
label = "fossier:needs-review"

[cache_ttl]
user_profile_hours = 24
search_hours = 1
collaborators_hours = 6

[trust]
trusted_users = ["dependabot", "renovate"]
blocked_users = []
# bot_policy = "score"  # "score" (default), "allow", or "block"
# reject_ai_authored = false  # auto-deny PRs with AI co-authored commits

See fossier.toml.example for the full reference.

Environment variables GITHUB_TOKEN (or GH_TOKEN) and GITHUB_REPOSITORY are read automatically in CI.

GitHub CLI Integration

If the GitHub CLI (gh) is installed and authenticated, fossier will use it automatically:

Token fallback — if no GITHUB_TOKEN is set, fossier uses gh auth token
Search fallback — when the search API fails (private repos, insufficient scopes), falls back to gh search prs/gh search issues
Collaborators fallback — uses gh api when the REST API can't list collaborators

This means running fossier check locally "just works" if you have gh set up, no token configuration needed.

PR Slash Commands

When Fossier labels a PR fossier:needs-review, maintainers can interact with Fossier directly from PR comments using slash commands:

| Command | What it does | |---------|-----------

Fossier

Install / Use

README

Fossier

How It Works

Scoring Algorithm

Flood Detection

AI-Authored Commit Rejection

Quick Start

GitHub Action (Recommended)

CLI

Bulk Scan (workflow_dispatch)

Configuration

GitHub CLI Integration

PR Slash Commands