Auto Sharding: Difference between revisions

@@ Line 1: / Line 1: @@
-== Parallel Sharding ==
-MemCP automatically creates parallel shards for optimal query performance across multiple CPU cores.
-=== Bulk Insert Optimization ===
-When inserting large amounts of data, MemCP automatically:
-# '''Splits bulk inserts into chunks''' of 60,000 rows (configurable via ShardSize)
-# '''Creates new shards on-the-fly''' when the current shard fills up
-# '''Rebuilds full shards in parallel''' using background goroutines
-This enables insertion of millions of rows while maintaining parallel query execution.
-=== Automatic Repartitioning ===
-During <code>rebuild()</code>, if no partitioning hints exist but data exceeds ShardSize:
-* MemCP calculates the optimal number of shards based on data size
-* Creates at least <code>2 × NumCPU</code> shards for good parallelism
-* Uses the first column for round-robin distribution
-=== Performance Characteristics ===
-With parallel sharding enabled: - Query CPU utilization: 1500-1900% (15-19 cores on a 24-core machine) - Speedup: 3-6x compared to single-threaded execution - Throughput: ~0.04 µs/row for COUNT/SUM operations
-=== Configuration ===
-{| class="wikitable"
-!Setting
-!Default
-!Description
-|-
-|ShardSize
-|60,000
-|Rows per shard before splitting
-|-
-|PartitionMaxDimensions
-|10
-|Maximum partitioning dimensions
-|}

Latest revision as of 01:06, 26 January 2026