Auto Sharding: Difference between revisions

@@ Line 1: / Line 1: @@
-## Parallel Sharding
-MemCP automatically creates parallel shards for optimal query performance across multiple CPU cores.
-### Bulk Insert Optimization
-When inserting large amounts of data, MemCP automatically:
-. **Splits bulk inserts into chunks** of 60,000 rows (configurable via ShardSize)
-. **Creates new shards on-the-fly** when the current shard fills up
-. **Rebuilds full shards in parallel** using background goroutines
-This enables insertion of millions of rows while maintaining parallel query execution.
-### Automatic Repartitioning
-During `rebuild()`, if no partitioning hints exist but data exceeds ShardSize:
-- MemCP calculates the optimal number of shards based on data size
-- Creates at least `2 × NumCPU` shards for good parallelism
-- Uses the first column for round-robin distribution
-### Performance Characteristics
-With parallel sharding enabled:
-- Query CPU utilization: 1500-1900% (15-19 cores on a 24-core machine)
-- Speedup: 3-6x compared to single-threaded execution
-- Throughput: ~0.04 µs/row for COUNT/SUM operations
-### Configuration
-| Setting | Default | Description |
-|---------|---------|-------------|
-| ShardSize | 60,000 | Rows per shard before splitting |
-| PartitionMaxDimensions | 10 | Maximum partitioning dimensions |

Latest revision as of 01:06, 26 January 2026