Ems
Extended Memory Semantics - Persistent shared object memory and parallelism for Node.js and Python
Install / Use
/learn @mogill/EmsREADME
OSX | Linux | Node 4.1-14.x, Python2/3:
API Documentation | EMS Website
Extended Memory Semantics (EMS)
EMS makes possible persistent shared memory parallelism between Node.js, Python, and C/C++.
Extended Memory Semantics (EMS) unifies synchronization and storage primitives to address several challenges of parallel programming:
- Allows any number or kind of processes to share objects
- Manages synchronization and object coherency
- Implements persistence to non-volatile memory and secondary storage
- Provides dynamic load-balancing between processes
- May substitute or complement other forms of parallelism
Examples: Parallel web servers, word counting
Table of Contents
- Parallel Execution Models Supported Fork Join, Bulk Synchronous Parallel, User defined
- Atomic Operations Atomic Read-Modify-Write operations
- Examples Parallel web servers, word counting
- Benchmarks Bandwidth, Transaction processing
- Synchronization as a Property of the Data, Not a Duty for Tasks Full/Empty tags
- Installation Downloading from Git or NPM
- Roadmap The Future™! It's all already happened
EMS is targeted at tasks too large for one core or one process but too small for a scalable cluster
A modern multi-core server has 16-32 cores and nearly 1TB of memory, equivalent to an entire rack of systems from a few years ago. As a consequence, jobs formerly requiring a Map-Reduce cluster can now be performed entirely in shared memory on a single server without using distributed programming.
Sharing Persistent Objects Between Python and Javascript
<img src="Docs/ems_js_py.gif" />Inter-language example in interlanguage.{js,py} The animated GIF demonstrates the following steps:
- Start Node.js REPL, create an EMS memory
- Store "Hello"
- Open a second session, begin the Python REPL
- Connect Python to the EMS shared memory
- Show the object created by JS is present in Python
- Modify the object, and show the modification can be seen in JS
- Exit both REPLs so no programs are running to "own" the EMS memory
- Restart Python, show the memory is still present
- Initialize a counter from Python
- Demonstrate atomic Fetch and Add in JS
- Start a loop in Python incrementing the counter
- Simultaneously print and modify the value from JS
- Try to read "empty" data from Python, the process blocks
- Write the empty memory, marking it full, Python resumes execution
Types of Concurrency
<table> <tr> <td width="50%"> EMS extends application capabilities to include transactional memory and other fine-grained synchronization capabilities. <br><br> EMS implements several different parallel execution models: <ul> <li> <B>Fork-Join Multiprocess</B>: execution begins with a single process that creates new processes when needed, those processes then wait for each other to complete. <li> <B>Bulk Synchronous Parallel</B>: execution begins with each process starting the program at the <code>main</code> entry point and executing all the statements <li> <B>User Defined</B>: parallelism may include ad-hoc processes and mixed-language applications </ul> </td> <td width="50%"> <center> <img height="350px" style="margin: 10px;" src="Docs/typesOfParallelism.svg" type="image/svg+xml" /> </center> </td> </tr> <tr> <td width="50%"> <center> <img height="350px" style="margin: 10px;" src="Docs/ParallelContextsBSP.svg" type="image/svg+xml" /> </center> </td> <td> <center> <img height="350px" style="margin: 10px;" src="Docs/ParallelContextsFJ.svg" type="image/svg+xml" /> </center> </td> </tr> </table>Built in Atomic Operations
EMS operations may performed using any JSON data type, read-modify-write operations may use any combination of JSON data types. like operations on ordinary data.
Atomic read-modify-write operations are available in all concurrency modes, however collectives are not available in user defined modes.
-
Atomic Operations: Read, write, readers-writer lock, read when full and atomically mark empty, write when empty and atomically mark full
-
Primitives: Stacks, queues, transactions
-
Read-Modify-Write: Fetch-and-Add, Compare and Swap
-
Collective Operations: All basic OpenMP collective operations are implemented in EMS: dynamic, block, guided, as are the full complement of static loop scheduling, barriers, master and single execution regions
Examples and Benchmarks
API Documentation | EMS Website
Word Counting Using Atomic Operations
Map-Reduce is often demonstrated using word counting because each document can be processed in parallel, and the results of each document's dictionary reduced into a single dictionary. This EMS implementation also iterates over documents in parallel, but it maintains a single shared dictionary across processes, atomically incrementing the count of each word found. The final word counts are sorted and the most frequently appearing words are printed with their counts.
<img height="300px" src="Docs/wordcount.svg" />The performance of this program was measured using an Amazon EC2 instance:<br>
c4.8xlarge (132 ECUs, 36 vCPUs, 2.9 GHz, Intel Xeon E5-2666v3, 60 GiB memory
The leveling of scaling around 16 cores despite the presence of ample work
may be related to the use of non-dedicated hardware:
Half of the 36 vCPUs are presumably HyperThreads or otherwise shared resource.
AWS instances are also bandwidth limited to EBS storage, where our Gutenberg
corpus is stored.
Bandwidth Benchmarking
A benchmark similar to STREAMS
gives us the maximum speed EMS double precision
floating point operations can be performed on a
c4.8xlarge (132 ECUs, 36 vCPUs, 2.9 GHz, Intel Xeon E5-2666v3, 60 GiB memory.
Benchmarking of Transactions and Work Queues
Transactions and Work Queues Example
Transactional performance is measured alone, and again with a separate process appending new processes as work is removed from the queue. The experiments were run using an Amazon EC2 instance:<br> <code>c4.8xlarge (132 ECUs, 36 vCPUs, 2.9 GHz, Intel Xeon E5-2666v3, 60 GiB memory</code>
Experiment Design
Six EMS arrays are created, each holding 1,000,000 numbers. During the benchmark, 1,000,000 transactions are performed, each transaction involves 1-5 randomly selected elements of randomly selected EMS arrays. The transaction reads all the elements and performs a read-modify-write operation involving at least 80% of the elements. After all the transactions are complete, the array elements are checked to confirm all the operations have occurred.
The parallel process scheduling model used is block dynamic (the default),
where each process is responsible for successively smaller blocks
of iterations. The execution model is bulk synchronous parallel, each
processes enters the program at the same main entry point
and executes all the statements in the program.
forEach loops have their normal semantics of performing all iterations,
parForEach loops are distributed across threads, each process executing
only a portion of the total iteration space.
[Synchronization as a Property of the Data, Not a
Related Skills
node-connect
342.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
claude-opus-4-5-migration
85.3kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
frontend-design
85.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
model-usage
342.5kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
