CodeLLMEval
Evaluation based on programming scenarios
Install / Use
/learn @laziobird/CodeLLMEvalREADME
CodeLLMEval
Evaluation based on programming scenarios
[ English | 中文 ]
👋 Join our WeChat
🔥 New
DeepSeek R1 Coding Ability Review
stackoverflow 2024 real sentiments behind the surge in AI popularity
https://survey.stackoverflow.co/2024/ai#sentiment-and-usage
Scoring mode
High frequency defects - continuously updating
| Defect scenario | Serious result | case | | ----------------------------------------------------------------- | -------------------------------- | --------- | | Dead Loop | Severe cause CPU 100%, service crash | 2 | | Memory leak, memory overflow | Severe OOM, service crashes | 2 | | Thread Deadlock | Concurrent threads compete for resource deadlocks, severely causing CPU 100% or OOM, service unavailability or failure | 2 | | Inconsistent concurrent data | Improper operation in multi-threaded situations leads to inconsistent and dirty data | 1 | | Long context/token capability | Test the accuracy and maximum capability of long text processing | 1 | | Context learning capability | Test the accuracy of context understanding and reasoning | 1 |
- Dead loop search
Compare and evaluate the effectiveness

- Lookuping for multi threaded deadlock
- Memory leakage
Related Skills
node-connect
343.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
90.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
