SkillAgentSearch skills...

Midscene

AI-powered, vision-driven UI automation for every platform.

Install / Use

/learn @web-infra-dev/Midscene
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<p align="center"> <img alt="Midscene.js" width="260" src="https://github.com/user-attachments/assets/f60de3c1-dd6f-4213-97a1-85bf7c6e79e4"> </p> <h1 align="center">Midscene.js</h1> <div align="center">

English | 简体中文

<strong>Official Website</strong>: <a href="https://midscenejs.com/">https://midscenejs.com/</a>

<a href="https://trendshift.io/repositories/12524" target="_blank"><img src="https://trendshift.io/api/badge/repositories/12524" alt="web-infra-dev%2Fmidscene | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>

</div> <p align="center"> AI-powered, vision-driven UI automation for every platform. </p> <p align="center"> <a href="https://www.npmjs.com/package/@midscene/web"><img src="https://img.shields.io/npm/v/@midscene/web?style=flat-square&color=00a8f0" alt="npm version" /></a> <a href="https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B"><img src="https://img.shields.io/badge/UI%20TARS%20Models-yellow" alt="hugging face model" /></a> <a href="https://npm-compare.com/@midscene/web/#timeRange=THREE_YEARS"><img src="https://img.shields.io/npm/dm/@midscene/web.svg?style=flat-square&color=00a8f0" alt="downloads" /></a> <a href="https://github.com/web-infra-dev/midscene/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-MIT-blue.svg?style=flat-square&color=00a8f0" alt="License" /> <a href="https://discord.gg/2JyBHxszE4"><img src="https://img.shields.io/discord/1328277792730779648?style=flat-square&color=7289DA&label=Discord&logo=discord&logoColor=white" alt="discord" /></a> <a href="https://x.com/midscene_ai"><img src="https://img.shields.io/twitter/follow/midscene_ai?style=flat-square" alt="twitter" /></a> <a href="https://deepwiki.com/web-infra-dev/midscene"> <img alt="Ask DeepWiki.com" src="https://devin.ai/assets/deepwiki-badge.png" style="height: 18px; vertical-align: middle;" /> </a> </p>

📣 Midscene Skills is here!

Use Midscene Skills to control any platform with OpenClaw

Showcases

💡 Features

Write Automation with Natural Language

  • Describe your goals and steps, and Midscene will plan and operate the user interface for you.
  • Use Javascript SDK or YAML to write your automation script.

Web & Mobile App & Any Interface

  • Web Automation: Either integrate with Puppeteer, Playwright or use Bridge Mode to control your desktop browser.
  • Android Automation: Use Javascript SDK with adb to control your local Android device.
  • iOS Automation: Use Javascript SDK with WebDriverAgent to control your local iOS devices and simulators.
  • Any Interface Automation: Use Javascript SDK to control your own interface.

For Developers

  • Three kinds of APIs:
  • MCP: Midscene provides MCP services that expose atomic Midscene Agent actions as MCP tools so upper-layer agents can inspect and operate UIs with natural language. Docs
  • Caching for Efficiency: Replay your script with cache and get the result faster.
  • Debugging Experience: Midscene.js offers a visualized replay back report file, a built-in playground, and a Chrome Extension to simplify the debugging process. These are the tools most developers truly need.

👉 Zero-code Quick Experience

✨ Driven by Visual Language Model

Midscene.js is all-in on the pure-vision route for UI actions: element localization and interactions are based on screenshots only. It supports visual-language models like Qwen3-VL, Doubao-1.6-vision, gemini-3-pro, and UI-TARS. For data extraction and page understanding, you can still opt in to include DOM when needed.

  • Pure-vision localization for UI actions; the DOM extraction mode is removed.
  • Works across web, mobile, desktop, and even <canvas> surfaces.
  • Far fewer tokens by skipping DOM for actions, which cuts cost and speeds up runs.
  • DOM can still be included for data extraction and page understanding when needed.
  • Strong open-source options for self-hosting.

Read more about Model Strategy

📄 Resources

🤝 Community

🌟 Awesome Midscene

Community projects that extend Midscene.js capabilities:

📝 Credits

We would like to thank the following projects:

  • Rsbuild and Rslib for the build tool.
  • UI-TARS for the open-source agent model UI-TARS.
  • Qwen-VL for the open-source VL model Qwen-VL.
  • scrcpy and yume-chan allow us to control Android devices with browser.
  • appium-adb for the javascript bridge of adb.
  • appium-webdriveragent for the javascript operate XCTest。
  • YADB for the yadb tool which improves the performance of text input.
  • libnut-core for the cross-platform native keyboard and mouse control.
  • Puppeteer for browser automation and control.
  • Playwright for browser automation and control and testing.

📖 Citation

If you use Midscene.js in your research or project, please cite:

@software{Midscene.js,
  author = {Xiao Zhou, Tao Yu, YiBing Lin},
  title = {Midscene.js: Your AI Operator for Web, Android, iOS, Automation & Testing.},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/web-infra-dev/midscene}
}

✨ Star History

Star History Chart

📝 License

Midscene.js is MIT licensed.


<div align="center"> If this project helps you or inspires you, please give us a star </div>
View on GitHub
GitHub Stars12.3k
CategoryDevelopment
Updated9h ago
Forks914

Languages

TypeScript

Security Score

100/100

Audited on Mar 23, 2026

No findings