33 skills found · Page 1 of 2
tigerlab-ai / TigerOpen Source LLM toolkit to build trustworthy LLM applications. TigerArmor (AI safety), TigerRAG (embedding, RAG), TigerTune (fine-tuning)
thu-coai / AISafetyLabAISafetyLab: A comprehensive framework covering safety attack, defense, evaluation and paper list.
PKU-Alignment / Aligner[NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct
trendmicro / AisToolkit for research purposes in AIS. See the website for the paper.
DIG-Beihang / AISafetyNo description available
metadriverse / Cat[CoRL'23] Adversarial Training for Safe End-to-End Driving
dobriban / Principles Of AI LLMsMaterials for the course Principles of AI: LLMs at UPenn (Stat 9911, Spring 2025). LLM architectures, training paradigms (pre- and post-training, alignment), test-time computation, reasoning, safety and robustness (jailbreaking, oversight, uncertainty), representations, interpretability (circuits), etc.
xiaoweih / AISafetyLectureNotesMachine Learning Safety
ChristianInterno / ReStraVAI-Generated Video Detection via Perceptual Straightening (NeurIPS2025)
kaustpradalab / Fraud R1[ACL 2025 Findings] Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing Inducements
ocasc / DocAI安全开放社区官方文档
riceissa / AiwatchWebsite to track people, organizations, and products (tools, websites, etc.) in AI safety
pillowsofwind / LLM CBRN Risks[ACL 2025 Findings] The official GitHub repo for the paper "Nuclear Deployed: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents"
ai-safety-graph / AISafetyGraphAI Safety Graph
HOLYKEYZ / IntellectSafeAI defense infrastructure against manipulation, misuse, hallucinations, and synthetic deception.
ZiyueWang25 / Llm Security ChallengeCan Large Language Models Solve Security Challenges? We test LLMs' ability to interact and break out of shell environments using the OverTheWire wargames environment, showing the models' surprising ability to do action-oriented cyberexploits in shell environments
AnaBelenBarbero / Detect Prompt InjectionThe go-to API for detecting and preventing prompt injection attacks.
WindVChen / Solution For AISafety CVPR2022A Simple and Effective Solution For AISafety CVPR2022, ranked 5th
MartinLeitgab / AISafetyIntervention LiteratureExtractionThis repository contains all outcomes created in the 2025 Scientific Literature Knowledge Extraction Tool project hosted on the Eleuther AI Discord.
mirseo / String FormatterA high-performance string formatter written in Rust. This project detects and blocks LLM prompt injection and jailbreak attacks. It also features a customizable rule-based system and defends against obfuscated prompt attacks.