Arrowsqlbcpy

Python library that uses Apache Arrow and .Net SqlBulkCopy to load data into SQL Server

Generate Convert Improve

Install / Use

/learn @RusselWebber/Arrowsqlbcpy

About this skill

Quality Score

0/100

README

GitHub Latest Tag GitHub Actions Workflow Status PyPI Python Versions

arrowsqlbcpy

A tiny library that uses .Net SqlBulkCopy to enable fast data loading to Microsoft SQL Server. Apache Arrow is used to serialise data between Python and the native DLL. .Net Native Library AOT compilation is used to generate the native DLL.

This library is much faster than any other Python solution, including bcpandas, pyodbc and pymssql. See the benchmark results below.

Performance plot

Installation

Binary wheels are available from PyPi and can be installed using your preferred package manager:

pip install arrowsqlbcpy

uv add arrowsqlbcpy

Usage

Connection strings for .Net are documented here


import pandas as pd
from arrowsqlbcpy import bulkcopy_from_pandas

# Create a connection string
cn = r"Server=myServerAddress;Database=myDataBase;Trusted_Connection=True;"
# The table to load into must exist and have the same column names and types as the pandas df
tablename = "test"

df = pd.DataFrame({"a":[1]*10000, "b":[2]*10000, "c":[3]*10000})

bulkcopy_from_pandas(df, cn, tablename)

When testing it can be useful to have pandas create the table for you, see tests/test_load.py for an example.

Requirements

Wheels are available for the latest versions of Windows 64 bit, MacOS ARM 64bit and Ubuntu 64 bit.

Wheels are available for Python 3.9-3.13.

Linux support

The Ubuntu wheels may work on other Linux distros. Building C# native libaries and then packaging appropriately for multiple Linux distros is not straightforward. The simplest solution for most Linux distros is to simply pull the source from Github and build locally. These are the high-level steps:

Install .net https://learn.microsoft.com/en-us/dotnet/core/install/linux
Clone the source

git clone https://github.com/RusselWebber/arrowsqlbcpy
Install uv https://docs.astral.sh/uv/getting-started/installation/
Build the wheel locally

uv build --wheel
Install the wheel

pip install dist/wheel_file.whl

Benchmarks

The benchmarks were run using the richbench package. Tests were run repeatedly to get stable benchmarks.

richbench ./benchmarks

The benchmarks load a 3m row parquet file of New York taxi data. Times are recorded for loading 1000 rows, 10 000 rows, 100 000 rows, 1 000 000 rows and finally all 3 000 000 rows.

The benchmarks have a baseline of using pandas to_sql() and SQLAlchemy with pyodbc and pymssql. This is a common solution for loading pandas dataframes into SQL Server. A batch size of 10 000 rows was used in the benchmarks.

The benchmarks show the time taken to load using various alternative strategies:

| Label | Description | | --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | fast_executemany=True | Use pandas to_sql(), SQLAlchemy, pyodbc, pymssql with the fast_executemany=True option as discussed here | | bcpandas | Use the bcpandas package to load the dataframes. The package writes temp files and spawns bcp processes to load them | | arrowsqlbcpy | This package using .Net SqlBulkCopy |

The richbench tables show the min, max and mean time in seconds for the baseline in the left three columns; then the min, max, mean time in seconds for the alternative strategy.

For example this row:

| Benchmark | Min | Max | Mean | Min (+) | Max (+) | Mean (+) | | ---------------------------------- | --- | --- | ---- | -------- | -------- | -------- | | 1 000 rows - fast_executemany=True | 1.0 | 1.0 | 1.0 | 0.5 (2x) | 0.5 (2x) | 0.5 (2x) |

should be interpreted as: the strategy of setting fast_executemany=True resulted in a 2x speedup over the baseline when loading 1000 rows, so fast_executemany=True reduced the average time in seconds to load 1000 rows from 1.0 to 0.5, a 2x speedup.

Windows 11 (local db)

Summary results

| | 1000 | 10000 | 10000 | 1000000 | 3000000 | | --------------------- | ---------------- | ---------------- | ---------------- | ---------------- | ----------------- | | df.to_sql() | 0.055 | 0.495 | 4.601 | 46.648 | 198.57 | | arrowsqlbcpy | 0.106 (-1.9x) | 0.101 (4.9x) | 0.933 (4.9x) | 8.864 (5.3x) | 26.048 (7.6x) | | bcpandas | 0.156 (-3.0x) | 0.336 (1.5x) | 2.567 (1.8x) | 24.627 (1.9x) | 72.353 (2.7x) | | fast_executemany=True | 0.035 (2.4x) | 0.235 (2.3x) | 2.246 (2.3x) | 22.044 (2.1x) | 65.344 (3.0x) |

Detailed richbench results

| Benchmark | Min | Max | Mean | Min (+) | Max (+) | Mean (+) | | -------------------------------------- | ------- | ------- | ------- | ------------- | ------------- | ------------- | | 1 000 - arrowsqlbcp | 0.053 | 0.056 | 0.055 | 0.015 (3.6x) | 0.198 (-3.5x) | 0.106 (-1.9x) | | 10 000 rows - arrowsqlbcp | 0.489 | 0.502 | 0.495 | 0.099 (4.9x) | 0.103 (4.9x) | 0.101 (4.9x) | | 100 000 rows - arrowsqlbcp | 4.587 | 4.616 | 4.601 | 0.922 (5.0x) | 0.944 (4.9x) | 0.933 (4.9x) | | 1 000 000 rows - arrowsqlbcp | 46.558 | 46.738 | 46.648 | 8.842 (5.3x) | 8.886 (5.3x) | 8.864 (5.3x) | | 3 000 000 rows - arrowsqlbcp | 198.464 | 198.676 | 198.570 | 26.016 (7.6x) | 26.079 (7.6x) | 26.048 (7.6x) | | 1 000 - bcpandas | 0.051 | 0.052 | 0.052 | 0.121 (-2.4x) | 0.190 (-3.6x) | 0.156 (-3.0x) | | 10 000 rows - bcpandas | 0.499 | 0.500 | 0.500 | 0.333 (1.5x) | 0.339 (1.5x) | 0.336 (1.5x) | | 100 000 rows - bcpandas | 4.543 | 4.547 | 4.545 | 2.565 (1.8x) | 2.570 (1.8x) | 2.567 (1.8x) | | 1 000 000 rows - bcpandas | 45.298 | 46.443 | 45.871 | 24.581 (1.8x) | 24.674 (1.9x) | 24.627 (1.9x) | | 3 000 000 rows - bcpandas | 197.292 | 197.699 | 197.496 | 72.301 (2.7x) | 72.405 (2.7x) | 72.353 (2.7x) | | 1 000 - fast_executemany=True | 0.052 | 0.116 | 0.084 | 0.030 (1.7x) | 0.041 (2.9x) | 0.035 (2.4x) | | 10 000 rows - fast_executemany=True | 0.513 | 0.550 | 0.531 | 0.233 (2.2x) | 0.237 (2.3x) | 0.235 (2.3x) | | 100 000 rows - fast_executemany=True | 5.018 | 5.374 | 5.196 | 2.239 (2.2x) | 2.253 (2.4x) | 2.246 (2.3x) | | 1 000 000 rows - fast_executemany=True | 45.470 | 45.582 | 45.526 | 22.036 (2.1x) | 22.051 (2.1x) | 22.044 (2.1x) | | 3 000 000 rows - fast_executemany=True | 194.152 | 194.523 | 194.337 | 65.153 (3.0x) | 65.534 (3.0x) | 65.344 (3.0x) |

Ubuntu (WSL2) (local db in docker container)

Summary results

| | 1000 | 10000 | 10000 | 1000000 | 3000000 | | --------------------- | ---------------- | ---------------- | ---------------- | ----------------- | ----------------- | | df.to_sql() | 0.070 | 0.506 | 5.074 | 50.089 | 208.811 | | arrowsqlbcpy | 0.154 (-2.2x) | 0.120 (4.2x) | 1.070 (4.7x) | 10.572 (4.7x) | 30.673 (6.8x) | | bcpandas | 0.158 (-2.4x) | 0.438 (1.2x) | 3.383 (1.5x) | 32.774 (1.5x) | 95.200 (2.2x) | | fast_executemany=True | 0.059 (1.6x) | 0.323 (1.7x) | 3.039 (1.6x) | 29.810 (1.7x) | 87.419 (2.4x) |

Detailed richbench results

Related Skills

feishu-drive

339.3k

things-mac

339.3k

Manage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)

clawhub

339.3k

Use the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com

yu-ai-agent

2.0k

编程导航 2025 年 AI 开发实战新项目，基于 Spring Boot 3 + Java 21 + Spring AI 构建 AI 恋爱大师应用和 ReAct 模式自主规划智能体YuManus，覆盖 AI 大模型接入、Spring AI 核心特性、Prompt 工程和优化、RAG 检索增强、向量数据库、Tool Calling 工具调用、MCP 模型上下文协议、AI Agent 开发（Manas Java 实现）、Cursor AI 工具等核心知识。用一套教程将程序员必知必会的 AI 技术一网打尽，帮你成为 AI 时代企业的香饽饽，给你的简历和求职大幅增加竞争力。

RusselWebber

View profile

View on GitHub

GitHub Stars6

CategoryData

Updated1mo ago

Forks0

RusselWebber/arrowsqlbcpy

Languages

Python

Security Score

85/100

Audited on Feb 4, 2026

No findings