SparkCyclone
Plugin to accelerate Spark SQL with the NEC Vector Engine.
Install / Use
/learn @XpressAI/SparkCycloneREADME
Spark Cyclone
Spark Cyclone is an Apache Spark plug-in that accelerates the performance of Spark by using the SX-Aurora TSUBASA "Vector Engine" (VE). The plugin enables Spark users to accelerate their existing jobs by generating optimized C++ code and executing it on the VE, with minimal or no effort.
Spark Cyclone currently offers three pathways to accelerate Spark on the VE:
- Spark SQL: The plugin leverages Spark SQL's extensibility to rewrite SQL queries on the fly and executes dynamically-generated C++ code on the VE with no user code changes necessary.
- RDD: For
more direct control, the plugin's VERDD API API provides Scala macros that can
be used to transpile normal Scala code into C++ and thus execute common RDD
operations such as
map()on the VE. - MLlib: CycloneML is a fork of MLlib that uses Spark Cyclone to accelerate many of the ML algorithms using either the VE or CPU.
Plugin Usage
Integrating the Spark Cyclone plugin into an existing Spark job is very straightforward. The following is the minimum set of flags that need to be added to an existing Spark job configuration:
$ $SPARK_HOME/bin/spark-submit \
--name YourSparkJobName \
--master yarn \
--deploy-mode cluster \
--num-executors=8 --executor-cores=1 --executor-memory=8G \ # Specify 1 executor per VE core
--jars /path/to/spark-cyclone-sql-plugin.jar \ # Add the Spark Cyclone plugin JAR
--conf spark.executor.extraClassPath=/path/to/spark-cyclone-sql-plugin.jar \ # Add Spark Cyclone libraries to the classpath
--conf spark.plugins=io.sparkcyclone.plugin.AuroraSqlPlugin \ # Specify the plugin's main class
--conf spark.executor.resource.ve.amount=1 \ # Specify the number of VEs to use
--conf spark.resources.discoveryPlugin=io.sparkcyclone.plugin.DiscoverVectorEnginesPlugin \ # Specify the class used to discover VE resources
--conf spark.cyclone.kernel.directory=/path/to/kernel/directory \ # Specify a directory where the plugin builds and caches C++ kernels
YourSparkJob.py
Configuration
Please refer to the Plugin Configuration guide for an overview of the configuration options available to Spark Cyclone.
Plugin Development
System Setup
While parts of the codebase can be developed on a standard x86 machine running
Linux or MacOS, building and testing the plugin requires a system that has VEs
properly installed and set up - please refer to the
VE Documentation for more information on this.
The following guides contain all the necessary setup and installation steps:
In particular, the system should have the following software ready after setup:
- VEOS, the set of daemons and commands providing operating system functionality to VE programs
- AVEO, the offloading framework for running code on the VE
- NCC, NEC's C compiler for building code to VE target
Development Guide
The following pages cover all aspects of Spark Cyclone development:
- Usage:
- Development:
- External Dependencies:
License
Spark Cyclone is licensed under the Apache License, Version 2.0.
For additional information, please see the LICENSE and NOTICE files.
Related Skills
feishu-drive
339.5k|
things-mac
339.5kManage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)
clawhub
339.5kUse the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com
yu-ai-agent
2.0k编程导航 2025 年 AI 开发实战新项目,基于 Spring Boot 3 + Java 21 + Spring AI 构建 AI 恋爱大师应用和 ReAct 模式自主规划智能体YuManus,覆盖 AI 大模型接入、Spring AI 核心特性、Prompt 工程和优化、RAG 检索增强、向量数据库、Tool Calling 工具调用、MCP 模型上下文协议、AI Agent 开发(Manas Java 实现)、Cursor AI 工具等核心知识。用一套教程将程序员必知必会的 AI 技术一网打尽,帮你成为 AI 时代企业的香饽饽,给你的简历和求职大幅增加竞争力。
