SkillAgentSearch skills...

SqlBulkHelpers

Lightweight library for working with high performance bulk updates & materialized data in SQL Server with support for Identity columns, Annotations, Materialization from disparate systems, etc.

Install / Use

/learn @cajuncoding/SqlBulkHelpers

README

SqlBulkHelpers & MaterializedData

A library for efficient and high performance bulk processing of data with SQL Server in .NET. It greatly simplifies the ordinarily very complex process of bulk loading data into SQL Server from C# for high performance inserts/updates, and for implementing materialized data patterns with SQL Server from .NET.

This package includes both SqlBulkdHelpers and SqlBulkHelperse.MaterializedData and can be used in conjunction with other popular SQL Server ORMs such as Dapper, Linq2Sql, RepoDB, etc.

  • SqlBulkHelpers allow the loading of thousands (or tens of thousands) of records in seconds.
  • The Materialized Data pattern enables easy loading of offline staging tables with data and then switching them out to replace/publish to Live tables extremely efficiently (milliseconds) so that the Live tables are not blocked during the background data loading process.

If you like this project and/or use it the please give it a Star 🌟 (c'mon it's free, and it'll help others find the project)!

Give Star 🌟

If you like this project and/or use it the please give it a Star 🌟 (c'mon it's free, and it'll help others find the project)!

Buy me a Coffee ☕

I'm happy to share with the community, but if you find this useful (e.g for professional use), and are so inclinded, then I do love-me-some-coffee!

<a href="https://www.buymeacoffee.com/cajuncoding" target="_blank"> <img src="https://cdn.buymeacoffee.com/buttons/default-orange.png" alt="Buy Me A Coffee" height="41" width="174"> </a>

SqlBulkHelpers

The SqlBulkHelpers component of the library provides the capability to insert and update data with fantastic performance, operating on thousands or tens of thousands of records per second. This library leverages the power of the C# SqlBulkCopy classes, while augmenting and automating the orchestration, model mapping, etc. with the following key benefits:

  • Provides much more simplified facade for interacting with the SqlBulkCopy API now exposed as simple extension methods of the SqlConnection/SqlTransaction classes.
    • It's obviously easy to bulk query data from the DB into model/objects via optimized queries (e.g. Dapper or RepoDB), but inserting and updating is a whole different story.
    • The Performance improvements are DRAMATIC!
  • Provides enhanced support for ORM (e.g. Dapper, LINQ, etc.) based inserts/updates with the SqlBulkCopy API by automatically mapping to/from your data model.
    • Includees support for annotation/attribute mappings from Dapper, LINQ, RepoDB, or the SqlBulkColumn & SqlBulkTable attributes provided by the library.
  • Provides support for Database Tables that utilize an Identity Id column (often used as the Primary Key).
    • SQL Server Identity column values are populated on the server and retrieving the values has always been complex, but is now trivialized.
    • The library dynamically retrieves the new Identity column value and populates them back into the data Models/Objects provided so your insert will result in your models automagically having the Identity value from the SQL Server insert populated!

The SqlBulkCopy API, provided by Microsoft, offers fantastic performance benefits, but retrieving the Identity values that are auto-generated from the server is not a default capability. And as it turns out, this is not a trivial task despite the significant benefits it provides for developers.

However, a critical reason for developing this library was to provide support for tables with Identity column values. There is alot of good information on Stack Overflow and other web resources that provide various levels of help for this kind of functionality, but there are few (if any) fully developed solutions to really help others find an efficient way to do this end-to-end. To my knowledge RepoDB is the only lightweight/low-level ORM that provides this, so if you are using any other ORM such as Dapper, then this library can be used in combination!

Example Usage for Bulk Insert or Update:

public class TestDataService
{
    private ISqlBulkHelpersConnectionProvider _sqlConnectionProvider
    public TestDataService(ISqlBulkHelpersConnectionProvider sqlConnectionProvider)
    {
        _sqlConnectionProvider = sqlConnectionProvider 
            ?? throw new ArgumentNullException(nameof(sqlConnectionProvider))
    }

    public async Task<IList<TestDataModel>> BulkInsertOrUpdateAsync(IEnumerable<TestDataModel> testData)
    {
        await using (var sqlConn = await _sqlConnectionProvider.NewConnectionAsync())
        await using (var sqlTransaction = (SqlTransaction)await sqlConn.BeginTransactionAsync())
        {
            var bulkResults = await sqlTransaction.BulkInsertOrUpdateAsync(testData);
            await sqlTransaction.CommitAsync();

            //Return the data that will have updated Identity values (if used).
            return bulkResults;
        }
    }
}

SqlBulkHelpers.MaterializedData

The Materialized View pattern (or Materialized Data) pattern within the context of relational databases, such as SQL Server, is often implemented as a readonly replica of remote data in local tables. This can enable massive performance improvements via highly efficient SQL queries, while at the same time making your application more resilient via Eventual Consistency.

This is not a data sync per se, because it's only a one-way replication and conceptually the data should be treated as read-only. Though locally the data may be augmented and extended, however any new data is owned by your system while the original materialized data is conceptually owned by an external system.

The primary benefit of this pattern is to improve performance and resiliency. For example, if the data is not local then the power of SQL queries is dramatically hindered because the database is unable to join across data. This means that all data correllation/joining/filtering/etc. must occur within the application layer -- in-memory processing of the data from disparate sources. Therefore, regardless of whether the data is retrieved by an API (should be the case) or from multiple direct database connections (stop doing this please!), the responsibility to correllate/join/filter/etc. must lie in the application layer for processing and is by definition less efficient and likely poses (ususally signficant) negative implications as follows.

For most business applications this introduces several major problems:

  • Significant performance impacts due to retrieval of (potentially large) sets of data into application memory and crunching the results via code.
    • Server load is higher, server resource utilization is higher, etc.
    • Code complexity, and ongoing maintenance, is usually much higher than efficient SQL queries would be.
  • There is additional developer impact as much more effort is required to implement data processing in the application layer than can be done with SQL queries.
  • There is a runtime dependency on the external data source.
    • Whether it's and API or separate database connection the data must be retrieved to be processed and if there are any issues with the external data source(s) (e.g. connectivity, errors, etc. then your application will fail.

To be fair, this isn't a new problem however the Materialized Data/View pattern is a robust & reliable solution to the problem because it provides a local replica of the data than can be used in optimized, and highly efficient, SQL queries. And that data is refreshed periodically so if there are errors in the refresh process your live data remains unaffected making your application significantly more resilient. This process of periodic (or event based) updating/refreshing of the materialized data creates an Eventually Consistent architectural model for this data and helps you attain the benefits therein.

The simple & naive approach...

A simplistic approach to this would be to have a set of tables in your database, and a .NET applicaiton that runs in the background on a schedule or event based trigger that refreshed the data in the tables. You would then have the local data in your database for which you could join into, filter, etc. And your performance would be greatly improved! But as your data grows, there are very notable drawbacks to this simple approach of updating the live tables directly.

For small data sets of a couple dozen to a couple hundred records this likely will work without much issue. But if the data set to be materialized is larger thousands, tens-of-thousands, or millions then you will quickly encounter several problems:

  • Clearing the tables via DELETE FROM [TableName] is slow as all records are processed individually and the transaction log will grow commensurately -- this is SLOW!!!
    • The alternative is to use TRUNCATE TABLE [TableName] but this will not work if you have data integrity constraints such as Foreign Keys on the table.
  • At the point of initially clearing the table in your Transaction (you absolutely should be using a Transaction!) the table is Locked and any/all queries that need this data will be blocked -- this makes them SLOW!!!
  • Your not yet done, you still need to re-populate the table.
  • Even if you do a Bulk Update --instead of a delete/truncate and re-load -- the Updates will still lock the table blocking other queries, bog down the transaction log, and result in a process that is very SLOW!!!.

The robust, performant, & resilient approach...

Soo, what's the solution?

  1. Don't load directly into your Live table(s) at all.
  2. Instead load into an offline staging table(s)...
    • Now it doesn't matter if the staging tables are blocked at all as they are offl

Related Skills

View on GitHub
GitHub Stars22
CategoryData
Updated2mo ago
Forks9

Languages

C#

Security Score

95/100

Audited on Jan 26, 2026

No findings