Results for "aisafety"

Claude Code Claude Desktop GitHub Copilot Cursor Windsurf Cline Zed JetBrains

📄SKILL.md 🤖CLAUDE.md ⚡Claude Commands 📐.cursorrules 📐Cursor Rules 🕹️AGENTS.md 🧬codex.md 🏄.windsurfrules 🔧.clinerules 🧑‍✈️Copilot Instructions

All Development Operations Data Product Marketing Customer Design Sales

33 skills found · Page 1 of 2

tigerlab-ai / Tiger

401

Open Source LLM toolkit to build trustworthy LLM applications. TigerArmor (AI safety), TigerRAG (embedding, RAG), TigerTune (fine-tuning)

universal

ai-safetyaisafetyclassification+6

Updated 4d ago

thu-coai / AISafetyLab

237

AISafetyLab: A comprehensive framework covering safety attack, defense, evaluation and paper list.

universal

Updated 15h ago

PKU-Alignment / Aligner

191

[NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct

universal

aisafetyaligneralignment+5

Updated 3mo ago

trendmicro / Ais

116

Toolkit for research purposes in AIS. See the website for the paper.

universal

aisaisafetyrf+3

Updated 2d ago

DIG-Beihang / AISafety

No description available

universal

Updated 1mo ago

metadriverse / Cat

[CoRL'23] Adversarial Training for Safe End-to-End Driving

universal

adversarial-machine-learningaisafetyautonomous-vehicles

Updated 3d ago

dobriban / Principles Of AI LLMs

Materials for the course Principles of AI: LLMs at UPenn (Stat 9911, Spring 2025). LLM architectures, training paradigms (pre- and post-training, alignment), test-time computation, reasoning, safety and robustness (jailbreaking, oversight, uncertainty), representations, interpretability (circuits), etc.

universal

aiaisafetyalignment+13

Updated 1mo ago

xiaoweih / AISafetyLectureNotes

Machine Learning Safety

universal

Updated 1mo ago

ChristianInterno / ReStraV

AI-Generated Video Detection via Perceptual Straightening (NeurIPS2025)

universal

aidetectionaisafetyaivideo+6

Updated 3d ago

kaustpradalab / Fraud R1

[ACL 2025 Findings] Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing Inducements

universal

aisafetybenchmarkethic+2

Updated 2d ago

ocasc / Doc

AI安全开放社区官方文档

universal

aiaisafetycommunity

Updated 3d ago

riceissa / Aiwatch

Website to track people, organizations, and products (tools, websites, etc.) in AI safety

universal

ai-alignmentai-safetyaisafety+5

Updated 7d ago

pillowsofwind / LLM CBRN Risks

[ACL 2025 Findings] The official GitHub repo for the paper "Nuclear Deployed: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents"

universal

agentai-safetyaisafety+1

Updated 1mo ago

ai-safety-graph / AISafetyGraph

AI Safety Graph

universal

Updated 5mo ago

HOLYKEYZ / IntellectSafe

AI defense infrastructure against manipulation, misuse, hallucinations, and synthetic deception.

universal

ai-safetyaisafetyllms+1

Updated 1h ago

ZiyueWang25 / Llm Security Challenge

Can Large Language Models Solve Security Challenges? We test LLMs' ability to interact and break out of shell environments using the OverTheWire wargames environment, showing the models' surprising ability to do action-oriented cyberexploits in shell environments

universal

aisafetycybersecurityllm

Updated 10mo ago

AnaBelenBarbero / Detect Prompt Injection

The go-to API for detecting and preventing prompt injection attacks.

universal

aigovernanceaisafetyllmops+2

Updated 2mo ago

WindVChen / Solution For AISafety CVPR2022

A Simple and Effective Solution For AISafety CVPR2022, ranked 5th

universal

adversarial-attacksdefenseimage-classification+1

Updated 7mo ago

MartinLeitgab / AISafetyIntervention LiteratureExtraction

This repository contains all outcomes created in the 2025 Scientific Literature Knowledge Extraction Tool project hosted on the Eleuther AI Discord.

universal

Updated 3mo ago

mirseo / String Formatter

A high-performance string formatter written in Rust. This project detects and blocks LLM prompt injection and jailbreak attacks. It also features a customizable rule-based system and defends against obfuscated prompt attacks.

universal

ai-securityaisafetycybersecurity+9

Updated 9d ago