CodeLLMEval

Evaluation based on programming scenarios

Generate Convert Improve

Install / Use

/learn @laziobird/CodeLLMEval

About this skill

Quality Score

0/100

README

CodeLLMEval

Evaluation based on programming scenarios

[ English | 中文 ]

👋 Join our WeChat

🔥 New

DeepSeek R1 Coding Ability Review

stackoverflow 2024 real sentiments behind the surge in AI popularity

https://survey.stackoverflow.co/2024/ai#sentiment-and-usage

Scoring mode

High frequency defects - continuously updating

| Defect scenario | Serious result | case | | ----------------------------------------------------------------- | -------------------------------- | --------- | | Dead Loop | Severe cause CPU 100%, service crash | 2 | | Memory leak, memory overflow | Severe OOM, service crashes | 2 | | Thread Deadlock | Concurrent threads compete for resource deadlocks, severely causing CPU 100% or OOM, service unavailability or failure | 2 | | Inconsistent concurrent data | Improper operation in multi-threaded situations leads to inconsistent and dirty data | 1 | | Long context/token capability | Test the accuracy and maximum capability of long text processing | 1 | | Context learning capability | Test the accuracy of context understanding and reasoning | 1 |

Dead loop search Compare and evaluate the effectiveness
Lookuping for multi threaded deadlock
Memory leakage

Related Skills

node-connect

343.1k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

90.0k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

343.1k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

343.1k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。