SkillAgentSearch skills...

Nedmalloc

An EXTREMELY FAST portable thread caching malloc implementation written in C for multiple threads without lock contention based on dlmalloc. Optimised for x86 and x64. Compatible with C++. Can patch itself into existing binaries on Windows.

Install / Use

/learn @ned14/Nedmalloc
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type" /> <title>nedalloc Readme</title> <style type="text/css"> <!-- body { text-align: justify; } h1, h2, h3, h4, h5, h6 { margin-bottom: -0.5em; } h1 { text-align: center; } h2 { text-decoration: underline; margin-bottom: -0.25em; } p { margin-top: 0.5em; margin-bottom: 0.5em; } ul li, ol li { margin-top: 0.2em; margin-bottom: 0.2em; } dl { margin-left: 2em; } dl dt { font-weight: bold; } dt + dd { margin-bottom: 1em; } .gitcommit { font-family: "Courier New", Courier, monospace; font-size: smaller; } --> </style> </head> <body> <div style="text-align: center"> <h1 style="text-decoration: underline">nedalloc v1.10 beta 4 (?)</h1> <h2 style="text-decoration: none;">by Niall Douglas</h2> <p>Web site: <a href="http://www.nedprod.com/programs/portable/nedmalloc/">http://www.nedprod.com/programs/portable/nedmalloc/</a></p> <p>Trunk build status: <a href="https://travis-ci.org/ned14/nedmalloc"><img style="vertical-align:middle;border:none" src="https://travis-ci.org/ned14/nedmalloc.png?branch=master"/></a></p> <hr /></div> <p>Enclosed is nedalloc, an alternative malloc implementation for multiple threads without lock contention based on <a href="http://g.oswego.edu/" target="_blank"> dlmalloc</a> v2.8.4 and a specialised user mode page allocator (Windows Vista or later only). It has the following features:</p> <ol> <li>A per-thread small block cache for maximum CPU scalability.</li> <li>A per-thread arena to minimise lock contention.</li> <li>The ability to patch Windows binaries to replace the C memory allocation API malloc, realloc(), free() et al such that by simply inserting nedmalloc.dll into a process one realises performance improvements without recompilation.</li> <li>On POSIX, it knows how to talk to valgrind so you can track memory corruption and/or memory leaks.</li> <li>A unique user mode page allocator implementation which delivers O(1) scaling for blocks of any size, including an O(1) very fast realloc(). Improves medium sized block (~1Mb) allocation speeds by about 25 times on current hardware. Requires Windows Vista or later only, and requires Administrator privileges as well as either UAC disabled or a UAC prompt at the start of each program run.</li> <li>A malloc v2 API which enables considerable improvements in efficiency by allowing client code to better inform the allocator on what (not) to do.</li> <li>An enhanced C++ STL allocator implementation to enable super-fast std::vector&lt;&gt; <strong>[unfinished]</strong></li> </ol> <p>It is licensed under the <a href="http://www.boost.org/LICENSE_1_0.txt" target="_blank">Boost Software License</a> which basically means you can do anything you like with it. This does not apply to the malloc.c.h file which remains copyright to others. Commercial support is available from <a href="http://www.nedproductions.biz/" target="_blank">ned Productions Limited</a>.</p> <p>It has been tested on win32 (x86), win64 (x64), Linux (x64), FreeBSD (x64) and Apple Mac OS X (x86). It works very well on all of these and is very significantly faster than the system allocator on Windows XP and FreeBSD &lt;v7. If you are using &gt;= 10.6 Apple Mac OS X or you are on Windows 7 or later then you probably won&#39;t see much improvement without modifying your source to use the v2 malloc API (and kudos to Apple and Microsoft for adopting excellent allocators).</p> <p>The user mode page allocator returns jaw dropping real world performance improvements but requires running the process as the superuser. Without, it still offers sizeable gains on all older operating systems and through the v2 malloc API modest gains on all very recent operating systems, especially in these situations:</p> <ol> <li>If you are repeatedly extending large vector arrays, you will see a LARGE improvement if you use the address space reservation features.</li> <li>If you do a lot of work with 16 byte aligned vectors e.g. SSE or AVX vector arrays, you will find the v2 malloc API a godsend.</li> </ol> <p style="text-decoration: underline"><strong>Table of Contents: </strong></p> <ol style="list-style-type: upper-alpha; position: relative; margin-top: -0.5em;"> <li><a href="#touse">How to use</a><ul style="list-style-type: none; margin-left: 0; padding-left: 0"> <li>A1. <a href="#CPPAPI">The C++ API</a></li> <li>A2. <a href="#v2mallocAPI">The v2 malloc C API</a></li> </ul> </li> <li><a href="#notes">Notes</a><ul style="list-style-type: none; margin-left: 0; padding-left: 0"> <li>B1. <a href="#memorybloat">Memory Bloating</a></li> <li>B2. <a href="#memoryleaks">Memory Leakage</a></li> <li>B3. <a href="#threadcache">The Threadcache</a></li> <li>B4. <a href="#largepages">Large Page support</a></li> <li>B5. <a href="#logger">Memory operation logging</a></li> <li>B6. <a href="#windowsonly">Windows-only features</a></li> </ul> </li> <li><a href="#speedcomparisons">Speed Comparisons</a></li> <li><a href="#troubleshooting">Troubleshooting</a></li> <li><a href="#changelog">Changelog</a></li> </ol> <h2><a name="touse">A. To use:</a></h2> <p>The quickest way is to drop nedmalloc.h, nedmalloc.c and malloc.c.h into your project. Call nedmalloc(), nedcalloc(), nedrealloc() and nedfree() instead of your normal allocator, or nedpmalloc(), nedpcalloc(), nedprealloc() and nedpfree() if you want to segment your memory usage into pools. Make sure that you call neddisablethreadcache() for every pool you use on thread exit, and don&#39;t forget neddisablethreadcache(0) for the system pool if necessary. Run and enjoy!</p> <p>To test, compile <a href="test.c">test.c</a> (C) and <a href="test.cpp">test.cpp</a> (C++). Both will run a comparison between your system allocator and nedalloc and tell you how much faster nedalloc is. They also serve as examples of usage.</p> <p>If you&#39;d like nedalloc as a Windows DLL or POSIX ELF shared object, the easiest thing to do is to use <a href="http://www.scons.org/" target="_blank">scons</a> which comes with a myriad of build options listed using scons -h. <b>If you want to build some MSVC project files for use with Microsoft Visual Studio</b> then what you do is (i) install <a href="http://www.python.org/" target="_blank">python</a> (ii) install <a href="http://www.scons.org/" target="_blank">scons</a> (iii) open a Visual Studio Command Box for the Visual Studio you wish to use via Start Menu =&gt; Programs =&gt; Microsoft Visual Studio XXXX =&gt; Visual Studio Tools =&gt; Visual Studio XXXX Command Prompt (iv) change directory to the nedmalloc directory (e.g. by dragging in its folder) (v) type &quot;!MakeMSVCProjs&quot; and hit Return. Note that for Visual Studio 2008 and later support you need scons v2.1 or later.</p> <p>nedalloc comes with two new memory allocator APIs: one is for C++, and the other is for C. <strong>Full documentation</strong> for all nedalloc&#39;s APIs and features is provided in the enclosed <a href="nedalloc.chm">nedalloc.chm</a> which is in Microsoft HTML Help format (Linux and Apple Mac OS X will happily read this format too). If you don&#39;t want to use the CHM documentation, <a href="nedmalloc.h">nedmalloc.h</a> is extensively commented with <a href="http://www.doxygen.org/" target="_blank"> doxygen markup</a>.</p> <h3><a name="CPPAPI">A1: The C++ API:</a></h3> <p>For the v1.10 release which was generously sponsored by <a href="http://www.ara.com/" target="_blank">Applied Research Associates (USA)</a>, a C++ metaprogrammed STL allocator was designed which makes use of advanced nedalloc features to remedy many of the long standing problems and inefficiencies caused by C++&#39;s traditional over-fondness for copying things. While its implementation is complex, usage is extremely easy - simply supply nedallocator&lt;&gt; as the custom allocator to STL container classes.</p> <p>As nedmalloc can do even better for vector extension, nedmalloc.h also contains a nedvector&lt;&gt; implementation which is the standard STL vector&lt;&gt; implementation except that it makes use of the non-relocating facilities of realloc2() (see below). This allows nedvector&lt;&gt; to not need to overallocate memory (most STL vector&lt;&gt; implementations will overallocate by 50%) which saves a lot of memory as well as <strong>completely avoiding array copy construction</strong> which make std::vector&lt;&gt;::resize() so very, very slow.</p> <p>Even without nedalloc&#39;s major speed improvements as a simple C style allocator, the improvements to the C++ memory infrastructure alone can generate huge performance gains.</p> <h3><a name="v2mallocAPI">A2: The v2 malloc C API:</a></h3> <p><strong>[Note: This API will be completely replaced in v1.2]</strong></p> <p>For the v1.10 release which was generously sponsored by <a href="http://www.ara.com/" target="_blank">Applied Research Associates (USA)</a>, a new general purpose allocator API was designed which is intended to remedy many of the long standing problems and inefficiencies introduced by the ISO C allocator API. Internally nedalloc&#39;s implementations of nedmalloc(), nedcalloc(), nedmemalign() and nedrealloc() all call into this API:</p> <ul> <li><code>void* malloc2(size_t bytes, size_t alignment, unsigned flags)</code></li> <li><code>void* realloc2(void* mem, size_t bytes, size_t alignment, unsigned flags)</code></li> </ul> <p>If nedmalloc.h is being included by C++ code, the alignment and flags parameters default to zero which makes the new API identical to the old API (roll on the introduction of defaul

Related Skills

View on GitHub
GitHub Stars416
CategoryContent
Updated1mo ago
Forks80

Languages

C

Security Score

95/100

Audited on Feb 9, 2026

No findings