<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://www.memcp.org/index.php?action=history&amp;feed=atom&amp;title=Columnar_Storage</id>
	<title>Columnar Storage - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://www.memcp.org/index.php?action=history&amp;feed=atom&amp;title=Columnar_Storage"/>
	<link rel="alternate" type="text/html" href="https://www.memcp.org/index.php?title=Columnar_Storage&amp;action=history"/>
	<updated>2026-04-12T23:49:41Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.39.1</generator>
	<entry>
		<id>https://www.memcp.org/index.php?title=Columnar_Storage&amp;diff=39&amp;oldid=prev</id>
		<title>Carli: Created page with &quot;The advantages for columnar storages over row based storages are the ability for good in-memory compression &#039;&#039;&#039;(low memory usage)&#039;&#039;&#039; and cache locality when accessing only few columns &#039;&#039;&#039;(performance)&#039;&#039;&#039;.  == How we designed the Interface == When designing an interface for a storage engine for an in-memory database, a lot of considerations have to be made. Here are the design goals:  * The interface must be simple so that using it does not require implementing all kinds...&quot;</title>
		<link rel="alternate" type="text/html" href="https://www.memcp.org/index.php?title=Columnar_Storage&amp;diff=39&amp;oldid=prev"/>
		<updated>2024-05-17T12:57:56Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;The advantages for columnar storages over row based storages are the ability for good in-memory compression &amp;#039;&amp;#039;&amp;#039;(low memory usage)&amp;#039;&amp;#039;&amp;#039; and cache locality when accessing only few columns &amp;#039;&amp;#039;&amp;#039;(performance)&amp;#039;&amp;#039;&amp;#039;.  == How we designed the Interface == When designing an interface for a storage engine for an in-memory database, a lot of considerations have to be made. Here are the design goals:  * The interface must be simple so that using it does not require implementing all kinds...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;The advantages for columnar storages over row based storages are the ability for good in-memory compression &amp;#039;&amp;#039;&amp;#039;(low memory usage)&amp;#039;&amp;#039;&amp;#039; and cache locality when accessing only few columns &amp;#039;&amp;#039;&amp;#039;(performance)&amp;#039;&amp;#039;&amp;#039;.&lt;br /&gt;
&lt;br /&gt;
== How we designed the Interface ==&lt;br /&gt;
When designing an interface for a storage engine for an in-memory database, a lot of considerations have to be made. Here are the design goals:&lt;br /&gt;
&lt;br /&gt;
* The interface must be simple so that using it does not require implementing all kinds of corner cases (especially with datatypes)&lt;br /&gt;
* The interface must allow fast value fetches for random data access&lt;br /&gt;
* The interface must allow analyzing the data and finding the perfect storage and compression format&lt;br /&gt;
&lt;br /&gt;
Here’s what we came up with:&lt;br /&gt;
 type ColumnStorage interface {&lt;br /&gt;
         GetValue(uint) any // read function&lt;br /&gt;
         // buildup functions 1) prepare 2) scan, 3) proposeCompression(), if != nil repeat at 1, 4) init, 5) build; all values are passed through twice&lt;br /&gt;
         // analyze&lt;br /&gt;
         prepare()&lt;br /&gt;
         scan(uint, any)&lt;br /&gt;
         proposeCompression() ColumnStorage&lt;br /&gt;
 &lt;br /&gt;
         // store&lt;br /&gt;
         init(uint)&lt;br /&gt;
         build(uint, any)&lt;br /&gt;
         finish()&lt;br /&gt;
 &lt;br /&gt;
         // serialization&lt;br /&gt;
         Serialize(io.Writer)&lt;br /&gt;
         Deserialize(io.Reader)&lt;br /&gt;
 }&lt;br /&gt;
The read interface is pretty simple: &amp;lt;code&amp;gt;GetValue(i)&amp;lt;/code&amp;gt; will read the column value at recordId &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; and return the &amp;lt;code&amp;gt;scmer&amp;lt;/code&amp;gt; value. &amp;lt;code&amp;gt;scmer&amp;lt;/code&amp;gt; is a kind of „any“ datatype that is used in our scheme interpreter to further process the data.&lt;br /&gt;
&lt;br /&gt;
== Data Analysis on Columnar Storages ==&lt;br /&gt;
The write interface is a bit more complicated and basically works in two passes:&lt;br /&gt;
&lt;br /&gt;
# analyze phase:&lt;br /&gt;
#* &amp;lt;code&amp;gt;prepare()&amp;lt;/code&amp;gt; is called&lt;br /&gt;
#* every future value that shall be stored will be passed to &amp;lt;code&amp;gt;scan(recordId, value)&amp;lt;/code&amp;gt;&lt;br /&gt;
#* the columnar storage will analyze the data. If it knows a better storage format than its own class, it will return an other column container when calling &amp;lt;code&amp;gt;proposeCompression()&amp;lt;/code&amp;gt;&lt;br /&gt;
#* when &amp;lt;code&amp;gt;proposeCompression()&amp;lt;/code&amp;gt; returns another container, repeat step 1 analyze phase with the new container&lt;br /&gt;
#* otherwise proceed with the store phase&lt;br /&gt;
# store phase&lt;br /&gt;
#* &amp;lt;code&amp;gt;init(size)&amp;lt;/code&amp;gt; is called where the storage can allocate the memory for its columns&lt;br /&gt;
#* for each value, &amp;lt;code&amp;gt;build(recordId, value)&amp;lt;/code&amp;gt; is called to write the data to memory&lt;br /&gt;
#* for cleanup of possible reverse dictionaries that are only needed in the store phase, &amp;lt;code&amp;gt;finish()&amp;lt;/code&amp;gt; is called&lt;br /&gt;
&lt;br /&gt;
During analyze phase, statistics about the data can be collected. This allows a storage engine to check how wide the integers are or if there are lots of string duplicates.&lt;br /&gt;
&lt;br /&gt;
The classes implementing that interface are then structured as follows:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;StorageSCMER&amp;lt;/code&amp;gt; is a universal value storage; fast but needs lot of RAM&lt;br /&gt;
* &amp;lt;code&amp;gt;StorageSCMER&amp;lt;/code&amp;gt; analyzes a column whether it better fits the StorageString or StorageInt scheme; otherwise (i.e. for float64 values, stay at StorageSCMER)&lt;br /&gt;
* &amp;lt;code&amp;gt;StorageInt&amp;lt;/code&amp;gt; is a bit compressed integer storage. When analyzed values do not exceed i.e. 15, the storage will only eat up 4 bits per stored integer. The numbers are stored in an array of 64 bit integers which are shifted when reading and writing&lt;br /&gt;
* &amp;lt;code&amp;gt;StorageString&amp;lt;/code&amp;gt; is a dictionary compressed string storage using golang slices&lt;br /&gt;
&lt;br /&gt;
With this interface we have built a powerful interface to implement any columnar storage compression format. We can support bit compressed integers as well as dictionaries.&lt;br /&gt;
&lt;br /&gt;
== Evaluation ==&lt;br /&gt;
We loaded a bunch of 55MiB of .jsonl files into our storage. This is our memory usage using the storage interface:&lt;br /&gt;
&lt;br /&gt;
As you see, RAM compressed columnar storage needs only about half the RAM compared to disk based InnoDB storage or .json files.&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|&amp;#039;&amp;#039;&amp;#039;Storage&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
|&amp;#039;&amp;#039;&amp;#039;Memory Usage&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
|-&lt;br /&gt;
|MySQL InnoDB original storage&lt;br /&gt;
|55 MiB Disk Space&lt;br /&gt;
|-&lt;br /&gt;
|Export from MySQL in .jsonl format&lt;br /&gt;
|55 MiB Disk Space&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Imported .jsonl as []map[string]any into delta storage&lt;br /&gt;
|233 MiB RAM&lt;br /&gt;
|-&lt;br /&gt;
|Moved to main storage only using &amp;lt;code&amp;gt;StorageSCMER&amp;lt;/code&amp;gt;&lt;br /&gt;
|81 MiB RAM&lt;br /&gt;
|-&lt;br /&gt;
|Using &amp;lt;code&amp;gt;StorageInt&amp;lt;/code&amp;gt; for integer columns&lt;br /&gt;
|46 MiB RAM&lt;br /&gt;
|-&lt;br /&gt;
|Using &amp;lt;code&amp;gt;StorageString&amp;lt;/code&amp;gt; for string columns&lt;br /&gt;
|58 MiB RAM&lt;br /&gt;
|-&lt;br /&gt;
|Using &amp;lt;code&amp;gt;StorageInt&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;StorageString&amp;lt;/code&amp;gt;&lt;br /&gt;
|23 MiB RAM&lt;br /&gt;
|}&lt;br /&gt;
For more info about compression, read the article about [[In-Memory Compression, Columnar Compression Techniques|Compression]]&lt;/div&gt;</summary>
		<author><name>Carli</name></author>
	</entry>
</feed>