MacosUseSDK
Library to traverse and control MacOS
Install / Use
/learn @mediar-ai/MacosUseSDKREADME
MacosUseSDK
Library and command-line tools to traverse the macOS accessibility tree and simulate user input actions. Allows interaction with UI elements of other applications.
https://github.com/user-attachments/assets/d8dc75ba-5b15-492c-bb40-d2bc5b65483e
Highlight whatever is happening on the computer: text elements, clicks, typing
Listen to changes in the UI, elements changed, text changed
Building the Tools
To build the command-line tools provided by this package, navigate to the root directory (MacosUseSDK) in your terminal and run:
swift build
This will compile the tools and place the executables in the .build/debug/ directory (or .build/release/ if you use swift build -c release). You can run them directly from there (e.g., .build/debug/TraversalTool) or use swift run <ToolName>.
Available Tools
All tools output informational logs and timing data to stderr. Primary output (like PIDs or JSON data) is sent to stdout.
AppOpenerTool
- Purpose: Opens or activates a macOS application by its name, bundle ID, or full path. Outputs the application's PID on success.
- Usage:
AppOpenerTool <Application Name | Bundle ID | Path> - Examples:
# Open by name swift run AppOpenerTool Calculator # Open by bundle ID swift run AppOpenerTool com.apple.Terminal # Open by path swift run AppOpenerTool /System/Applications/Utilities/Terminal.app # Example output (stdout) # 54321
TraversalTool
- Purpose: Traverses the accessibility tree of a running application (specified by PID) and outputs a JSON representation of the UI elements to
stdout. - Usage:
TraversalTool [--visible-only] <PID> - Options:
--visible-only: Only include elements that have a position and size (are geometrically visible).
- Examples:
# Get only visible elements for Messages app swift run TraversalTool --visible-only $(swift run AppOpenerTool Messages)
HighlightTraversalTool
- Purpose: Traverses the accessibility tree of a running application (specified by PID) and draws temporary red boxes around all visible UI elements. Also outputs traversal data (JSON) to
stdout. Useful for debugging accessibility structures. - Usage:
HighlightTraversalTool <PID> [--duration <seconds>] - Options:
--duration <seconds>: Specifies how long the highlights remain visible (default: 3.0 seconds).
- Examples:
Note: This tool needs to keep running for the duration specified to manage the highlights.# Combine with AppOpenerTool to open Messages and highlight it swift run HighlightTraversalTool $(swift run AppOpenerTool Messages) --duration 5
InputControllerTool
- Purpose: Simulates keyboard and mouse input events without visual feedback.
- Usage: See
swift run InputControllerTool --help(or just run without args) for actions. - Examples:
# Press the Enter key swift run InputControllerTool keypress enter # Simulate Cmd+C (Copy) swift run InputControllerTool keypress cmd+c # Simulate Shift+Tab swift run InputControllerTool keypress shift+tab # Left click at screen coordinates (100, 250) swift run InputControllerTool click 100 250 # Double click at screen coordinates (150, 300) swift run InputControllerTool doubleclick 150 300 # Right click at screen coordinates (200, 350) swift run InputControllerTool rightclick 200 350 # Move mouse cursor to (500, 500) swift run InputControllerTool mousemove 500 500 # Type the text "Hello World!" swift run InputControllerTool writetext "Hello World!"
VisualInputTool
- Purpose: Simulates keyboard and mouse input events with visual feedback (currently a pulsing green circle for mouse actions).
- Usage: Similar to
InputControllerTool, but adds a--durationoption for the visual effect. Seeswift run VisualInputTool --help. - Options:
--duration <seconds>: How long the visual feedback effect lasts (default: 0.5 seconds).
- Examples:
Note: This tool needs to keep running for the duration specified to display the visual feedback.# Left click at (100, 250) with default 0.5s feedback swift run VisualInputTool click 100 250 # Right click at (800, 400) with 2 second feedback swift run VisualInputTool rightclick 800 400 --duration 2.0 # Move mouse to (500, 500) with 1 second feedback swift run VisualInputTool mousemove 500 500 --duration 1.0 # Keypress and writetext (currently NO visualization implemented) swift run VisualInputTool keypress cmd+c swift run VisualInputTool writetext "Testing"
Running Tests
Run only specific tests or test classes, use the --filter option. Run a specific test method: Provide the full identifier TestClassName/testMethodName
swift test
# Example: Run only the multiply test in CombinedActionsDiffTests
swift test --filter CombinedActionsDiffTests/testCalculatorMultiplyWithActionAndTraversalHighlight
# Example: Run all tests in CombinedActionsFocusVisualizationTests
swift test --filter CombinedActionsFocusVisualizationTests
Using the Library
You can also use MacosUseSDK as a dependency in your own Swift projects. Add it to your Package.swift dependencies:
dependencies: [
.package(url: "/* path or URL to your MacosUseSDK repo */", from: "1.0.0"),
]
And add MacosUseSDK to your target's dependencies:
.target(
name: "YourApp",
dependencies: ["MacosUseSDK"]),
Then import and use the public functions:
import MacosUseSDK
import Foundation // For Dispatch etc.
// Example: Get elements from Calculator app
Task {
do {
// Find Calculator PID (replace with actual logic or use AppOpenerTool output)
// let calcPID: Int32 = ...
// let response = try MacosUseSDK.traverseAccessibilityTree(pid: calcPID, onlyVisibleElements: true)
// print("Found \(response.elements.count) visible elements.")
// Example: Click at a point
let point = CGPoint(x: 100, y: 200)
try MacosUseSDK.clickMouse(at: point)
// Example: Click with visual feedback (needs main thread for UI)
DispatchQueue.main.async {
do {
try MacosUseSDK.clickMouseAndVisualize(at: point, duration: 1.0)
} catch {
print("Visualization error: \(error)")
}
}
} catch {
print("MacosUseSDK Error: \(error)")
}
}
// Remember to keep the run loop active if using async UI functions like highlightVisibleElements or *AndVisualize
// RunLoop.main.run() // Or use within an @main Application structure
License
This project is licensed under the MIT License - see the LICENSE file for details.
Related Skills
node-connect
353.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
353.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
353.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
