SkillAgentSearch skills...

CodeLLMEval

Evaluation based on programming scenarios

Install / Use

/learn @laziobird/CodeLLMEval
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

CodeLLMEval

Evaluation based on programming scenarios

[ English | 中文 ]

👋 Join our WeChat

🔥 New

DeepSeek R1 Coding Ability Review

stackoverflow 2024 real sentiments behind the surge in AI popularity

https://survey.stackoverflow.co/2024/ai#sentiment-and-usage

image

Scoring mode

High frequency defects - continuously updating

| Defect scenario | Serious result | case | | ----------------------------------------------------------------- | -------------------------------- | --------- | | Dead Loop | Severe cause CPU 100%, service crash | 2 | | Memory leak, memory overflow | Severe OOM, service crashes | 2 | | Thread Deadlock | Concurrent threads compete for resource deadlocks, severely causing CPU 100% or OOM, service unavailability or failure | 2 | | Inconsistent concurrent data | Improper operation in multi-threaded situations leads to inconsistent and dirty data | 1 | | Long context/token capability | Test the accuracy and maximum capability of long text processing | 1 | | Context learning capability | Test the accuracy of context understanding and reasoning | 1 |

  • Dead loop search Compare and evaluate the effectiveness loop
  • Lookuping for multi threaded deadlock
  • Memory leakage

Related Skills

View on GitHub
GitHub Stars24
CategoryDevelopment
Updated1mo ago
Forks6

Languages

Java

Security Score

75/100

Audited on Feb 9, 2026

No findings