Main Page: Difference between revisions

From MemCP
Jump to navigation Jump to search
No edit summary
 
(38 intermediate revisions by the same user not shown)
Line 1: Line 1:
= MemCP – A Modern In-Memory Columnar Database =


=== What is memcp? ===
'''MemCP is a high-performance, in-memory, column-oriented database designed for modern workloads.''' 
memcp is an open-source, high-performance, columnar in-memory database that can handle both OLAP and OLTP workloads. It provides an alternative to proprietary analytical databases and aims to bring the benefits of columnar storage to the open-source world.
It provides a lightweight, developer-friendly alternative to traditional relational databases such as MySQL, with a focus on speed, compression, and direct API integration.


memcp is written in Golang and is designed to be portable and extensible, allowing developers to embed the database into their applications with ease. It is also designed with a focus on scalability and performance, making it a suitable choice for distributed applications.
----


=== Features ===
== Key Features ==


* '''fast:''' MemCP is built with parallelization in mind. The parallelization pattern is made for minimal overhead.
* '''High Performance''': NUMA-aware, parallelized query execution optimized for multicore CPUs, large caches, and NVMe SSDs. Handles both OLTP and OLAP workloads efficiently. 
* '''efficient:''' The average compression ratio is 1:5 (80% memory saving) compared to MySQL/MariaDB
* '''Columnar Storage''': Data is stored by column for improved compression, reduced memory footprint, and faster analytical queries.
* '''modern:''' MemCP is built for modern hardware with caches, NUMA memory, multicore CPUs, NVMe SSDs
* '''In-Memory Operation''': Designed to keep data in memory, with configurable persistence backends for durability.
* '''versatile:''' Use it in big mainframes to gain analytical performance, use it in embedded systems to conserve flash lifetime
* '''Built-in APIs''': Exposes RESTful endpoints directly from the database, reducing middleware overhead.
* Columnar storage: Stores data column-wise instead of row-wise, which allows for better compression, faster query execution, and more efficient use of memory.
* '''Compression''': Multiple strategies (bit-packing, dictionary encoding, sequence compression) reduce storage by up to 80% compared to MySQL/MariaDB. 
* In-memory database: Stores all data in memory, which allows for extremely fast query execution.
* '''Simple Deployment''': Start with a single <code>docker run</code> or <code>pm2 start</code> command. Lightweight footprint (~10MB). 
* Build fast REST APIs directly in the database (they are faster because there is no network connection / SQL layer in between)
* '''Extensible''': Written in Go, with pluggable storage backends and custom frontend support (SQL, RDF, REST). 
* OLAP and OLTP support: Can handle both online analytical processing (OLAP) and online transaction processing (OLTP) workloads.
* Compression: Lots of compression formats are supported like bit-packing and dictionary encoding
* Scalability: Designed to scale on a single node with huge NUMA memory
* Adjustable persistency: Decide whether you want to persist a table or not or to just keep snapshots of a period of time


<youtube>g29FR4Jwius</youtube>
----


https://www.youtube.com/watch?v=g29FR4Jwius
== Why MemCP? ==


=== Navigation ===
Traditional relational databases were designed decades ago, optimized for spinning disks and single-core CPUs. 
MemCP rethinks the core design for today’s hardware and workloads:


==== Introduction ====
* Real-time dashboards and analytics 
* [[What is OLTP and OLAP]]
* Data-heavy SaaS platforms 
* [[History of the MemCP project]]
* Embedded systems with limited resources 
* [[Hardware Requirements]]
* High-throughput OLTP/OLAP hybrids 
* [[Persistency and Performance Guarantees]]
* [[Current Status and Open Issues]]


==== Getting Started ====
----
* [[Install MemCP with Docker|With Docker]]
* [[With Singularity]]
* [[Compile MemCP from Source|Build from Source]]
* [[Contributing]]
* [[Introduction to Scheme]]


==== Frontends ====
== Quick Start ==


===== SQL Frontend =====
Clone and build MemCP from source:
* [[Supported SQL]]
* [[Advanced SQL Tutorial]]
* [[Replace MySQL with MemCP]]
* [[SQL over REST]]
* [[Database Tools compatibility with MemCP|Supported Tooling]]
* [[How SQL Operators are implemented on MemCP]]


===== RDF Frontend =====
<pre>
* [[Introduction to RDF]]
git clone https://github.com/launix-de/memcp
* [[RDF templating and model driven development]]
cd memcp
go get
make
pm2 start ./memcp ./data/
</pre>


===== Custom Frontends =====
Connect with MySQL tooling:


* [[In-Database WebApps|In-Database WebApps and REST Services]]
<pre>
mysql -u root -p -P 3307
Enter password: admin
</pre>


==== Administration ====
----


*[[Settings]]
== MemCP vs. MySQL ==
* [[Process Hibernation]]
* [[Performance Measurement]]


==== Internals ====
{| class="wikitable"
! Feature
! MySQL
! MemCP
|-
| Storage Model
| Row-based
| Column-based (compressed)
|-
| Performance
| Good
| NUMA-optimized, in-memory
|-
| In-Memory Capable
| Limited
| Yes (default)
|-
| REST API Integration
| External
| Built-in
|-
| Installation Footprint
| ~150MB+
| ~10MB
|-
| Open Source
| ✅
| ✅
|}


===== How things work in MemCP =====
----
 
== Architecture Overview ==
 
* '''Tables, Schemas, Columns''': Familiar SQL-style structures with a columnar physical layout. 
* '''Transaction Model''': Supports both OLTP and OLAP semantics with delta + main storage. 
* '''Persistence''': Configurable storage backends (filesystem, S3, Ceph). 
* '''Frontends''': Multiple query interfaces:
  - SQL frontend (MySQL wire protocol + SQL over REST) 
  - RDF/graph query engine 
  - Custom APIs via in-database web apps 
 
----
 
== Documentation ==
 
* [[What is OLTP and OLAP]] 
* [[History of the MemCP project]] 
* [[Hardware Requirements]] 
* [[Persistency and Performance Guarantees]] 
* [[Comparison: MemCP vs. MySQL]] 
* [[Install MemCP with Docker|Install with Docker]] 
* [[Compile MemCP from Source|Build from Source]] 
* [[Contributing]] 
* [[SQL over REST]] 
* [[In-Database WebApps|REST & Microservices]] 
 
----
 
 
===Navigation===
 
====Introduction====
*[[What is OLTP and OLAP]]
*[[History of the MemCP project]]
*[[Hardware Requirements]]
*[[Persistency and Performance Guarantees]]
*[[Current Status and Open Issues]]
*[[Comparison: MemCP vs. MySQL]]
 
====Getting Started====
*[[Install MemCP with Docker|With Docker]]
*[[With Singularity]]
*[[Compile MemCP from Source|Build from Source]]
*[[Contributing]]
*[[Introduction to Scheme]]
*[[Full SCM API documentation]]
 
====Administration====
 
* [[Deployment]]
* [[Migration from MySQL and PostgreSQL]]
* [[Settings]]
*[[Process Hibernation]]
*[[Performance Measurement]]
*[[MemCP Console]]
 
====Frontends====
 
=====SQL Frontend=====
*[[Supported SQL]]
*[[Advanced SQL Tutorial]]
*[[SQL over REST]]
*[[Database Tools compatibility with MemCP|Supported Tooling]]
*[[How SQL Operators are implemented on MemCP]]
*[[Add custom SQL operators to MemCP]]
 
=====RDF Frontend=====
*[[Introduction to RDF]]
*[[Advanced Graph Querying]]
*[[RDF templating and model driven development]]
 
=====Custom Frontends=====
 
*[[In-Database WebApps|In-Database WebApps and REST Services]]
*[[MemCP for Microservices]]
*[[Websockets in MemCP]]
 
==== Persistency Backends (= Storage) ====
 
* [[File System]]
* [[S3 Buckets]]
* [[Ceph/Rados]]
* [[Cluster Monitor]]
 
====Internals====
 
=====How things work in MemCP=====  


*[[Databases, Tables and Columns]]
*[[Databases, Tables and Columns]]
* [[Shards, RecordIDs, Main Storage, Delta Storage]]
*[[Shards, RecordIDs, Main Storage, Delta Storage]]
* [[Columnar Storage]]
*[[Columnar Storage]]
* [[Transactions]]
*[[Transactions]]  
*[[Full SCM API documentation]]
 
===== SCM Documentation =====


===== Optimizations =====
* [[SCM Builtins]]
* [[In-Memory Compression, Columnar Compression Techniques]]
* [[Arithmetic / Logic]]
* [[Temporary Columns]]
* [[Strings]]
* [[Data Auto Sharding and Auto Indexing]]
* [[Streams]]
* [[Lists]]
* [[Associative Lists / Dictionaries]]
* [[Date]]
* [[Vectors]]
* [[Parsers]]
* [[Sync]]
* [[IO]]
* [[Storage]]
 
=====Optimizations=====
*[[In-Memory Compression, Columnar Compression Techniques]]
*[[Temporary Columns]]
*[[Data Auto Sharding and Auto Indexing]]
* [[Parallel Computing]]
* [[Parallel Computing]]




[[File:Screenshot from htop.png|center|frameless|2490x2490px]]
----
 
== Further Reading ==


* [https://github.org/launix-de/memcp MemCP on GitHub] 
* [https://www.vldb.org/pvldb/vol13/p2649-boncz.pdf VLDB Research Paper] 
* [https://cs.emis.de/LNI/Proceedings/Proceedings241/383.pdf LNI Proceedings Paper] 
* [https://www.dcs.bbk.ac.uk/~dell/teaching/cc/paper/sigmod10/p135-malewicz.pdf Large Graph Algorithms] 


=== Further Reading ===
Additional blog posts on design decisions, compression techniques, and performance optimization are available on the [https://launix.de/launix/ Launix blog].
[https://github.org/launix-de/memcp MemCP on Github]


==== Scientific ====
----


* [https://www.vldb.org/pvldb/vol13/p2649-boncz.pdf VLDB Research Paper]
== Community ==
* [https://cs.emis.de/LNI/Proceedings/Proceedings241/383.pdf LNI Proceedings Paper]
* [https://wwwdb.inf.tu-dresden.de/wp-content/uploads/T_2014_Master_Patrick_Damme.pdf TU Dresden Research Paper]
* [https://www.dcs.bbk.ac.uk/~dell/teaching/cc/paper/sigmod10/p135-malewicz.pdf Large Graph Algorithms]
* https://wwwdb.inf.tu-dresden.de/research-projects/eris/


==== How MemCP was built ====
MemCP is an open-source project maintained by developers for developers. 
Contributions are welcome — whether in the form of bug reports, feature requests, or pull requests. 


* [https://launix.de/launix/how-to-balance-a-database-between-olap-and-oltp-workflows/ Balancing OLAP and OLTP Workflows]
See: [[Contributing]]
* [https://launix.de/launix/designing-a-programming-language-for-distributed-systems-and-highly-parallel-algorithms/ Designing Programming Languages for Distributed Systems]
* [https://launix.de/launix/on-designing-an-interface-for-columnar-in-memory-storage-in-golang/ Columnar Storage Interface in Golang]
* [https://launix.de/launix/how-in-memory-compression-affects-performance/ Impact of In-Memory Compression on Performance]
* [https://launix.de/launix/memory-efficient-indices-for-in-memory-storages/ Memory-Efficient Indices for In-Memory Storages]
* [https://launix.de/launix/on-compressing-null-values-in-bit-compressed-integer-storages/ Compressing Null Values in Bit-Compressed Integer Storages]
* [https://launix.de/launix/when-the-benchmark-is-too-slow-golang-http-server-performance/ Improving Golang HTTP Server Performance]
* [https://launix.de/launix/how-to-benchmark-a-sql-database/ Benchmarking SQL Databases]
* [https://launix.de/launix/writing-a-sql-parser-in-scheme/ Writing a SQL Parser in Scheme]
* [https://launix.de/launix/accessing-memcp-via-scheme/ Accessing memcp via Scheme]
* [https://launix.de/launix/memcp-first-sql-query-is-correctly-executed/ First SQL Query in memcp]
* [https://launix.de/launix/sequence-compression-in-in-memory-database-yields-99-memory-savings-and-a-total-of-13/ Sequence Compression in In-Memory Database]
* [https://launix.de/launix/storing-a-bit-smaller-than-in-one-bit/ Storing Data Smaller Than One Bit]
* [https://www.youtube.com/watch?v=DWg4nx4KVLo memcp Announcement Video]

Latest revision as of 12:52, 22 September 2025

MemCP – A Modern In-Memory Columnar Database

MemCP is a high-performance, in-memory, column-oriented database designed for modern workloads. It provides a lightweight, developer-friendly alternative to traditional relational databases such as MySQL, with a focus on speed, compression, and direct API integration.


Key Features

  • High Performance: NUMA-aware, parallelized query execution optimized for multicore CPUs, large caches, and NVMe SSDs. Handles both OLTP and OLAP workloads efficiently.
  • Columnar Storage: Data is stored by column for improved compression, reduced memory footprint, and faster analytical queries.
  • In-Memory Operation: Designed to keep data in memory, with configurable persistence backends for durability.
  • Built-in APIs: Exposes RESTful endpoints directly from the database, reducing middleware overhead.
  • Compression: Multiple strategies (bit-packing, dictionary encoding, sequence compression) reduce storage by up to 80% compared to MySQL/MariaDB.
  • Simple Deployment: Start with a single docker run or pm2 start command. Lightweight footprint (~10MB).
  • Extensible: Written in Go, with pluggable storage backends and custom frontend support (SQL, RDF, REST).

Why MemCP?

Traditional relational databases were designed decades ago, optimized for spinning disks and single-core CPUs. MemCP rethinks the core design for today’s hardware and workloads:

  • Real-time dashboards and analytics
  • Data-heavy SaaS platforms
  • Embedded systems with limited resources
  • High-throughput OLTP/OLAP hybrids

Quick Start

Clone and build MemCP from source:

git clone https://github.com/launix-de/memcp
cd memcp
go get
make
pm2 start ./memcp ./data/

Connect with MySQL tooling:

mysql -u root -p -P 3307
Enter password: admin

MemCP vs. MySQL

Feature MySQL MemCP
Storage Model Row-based Column-based (compressed)
Performance Good NUMA-optimized, in-memory
In-Memory Capable Limited Yes (default)
REST API Integration External Built-in
Installation Footprint ~150MB+ ~10MB
Open Source

Architecture Overview

  • Tables, Schemas, Columns: Familiar SQL-style structures with a columnar physical layout.
  • Transaction Model: Supports both OLTP and OLAP semantics with delta + main storage.
  • Persistence: Configurable storage backends (filesystem, S3, Ceph).
  • Frontends: Multiple query interfaces:
 - SQL frontend (MySQL wire protocol + SQL over REST)  
 - RDF/graph query engine  
 - Custom APIs via in-database web apps  

Documentation



Navigation

Introduction

Getting Started

Administration

Frontends

SQL Frontend
RDF Frontend
Custom Frontends

Persistency Backends (= Storage)

Internals

How things work in MemCP
SCM Documentation
Optimizations



Further Reading

Additional blog posts on design decisions, compression techniques, and performance optimization are available on the Launix blog.


Community

MemCP is an open-source project maintained by developers for developers. Contributions are welcome — whether in the form of bug reports, feature requests, or pull requests.

See: Contributing