Auto Sharding: Difference between revisions

From MemCP
Jump to navigation Jump to search
No edit summary
(Blanked the page)
Tags: Blanking Visual edit
 
Line 1: Line 1:
== Parallel Sharding ==
MemCP automatically creates parallel shards for optimal query performance across multiple CPU cores.


=== Bulk Insert Optimization ===
When inserting large amounts of data, MemCP automatically:
# '''Splits bulk inserts into chunks''' of 60,000 rows (configurable via ShardSize)
# '''Creates new shards on-the-fly''' when the current shard fills up
# '''Rebuilds full shards in parallel''' using background goroutines
This enables insertion of millions of rows while maintaining parallel query execution.
=== Automatic Repartitioning ===
During <code>rebuild()</code>, if no partitioning hints exist but data exceeds ShardSize:
* MemCP calculates the optimal number of shards based on data size
* Creates at least <code>2 × NumCPU</code> shards for good parallelism
* Uses the first column for round-robin distribution
=== Performance Characteristics ===
With parallel sharding enabled: - Query CPU utilization: 1500-1900% (15-19 cores on a 24-core machine) - Speedup: 3-6x compared to single-threaded execution - Throughput: ~0.04 µs/row for COUNT/SUM operations
=== Configuration ===
{| class="wikitable"
!Setting
!Default
!Description
|-
|ShardSize
|60,000
|Rows per shard before splitting
|-
|PartitionMaxDimensions
|10
|Maximum partitioning dimensions
|}

Latest revision as of 01:06, 26 January 2026