Auto Sharding
- Parallel Sharding
MemCP automatically creates parallel shards for optimal query performance across multiple CPU cores.
- Bulk Insert Optimization
When inserting large amounts of data, MemCP automatically:
1. **Splits bulk inserts into chunks** of 60,000 rows (configurable via ShardSize) 2. **Creates new shards on-the-fly** when the current shard fills up 3. **Rebuilds full shards in parallel** using background goroutines
This enables insertion of millions of rows while maintaining parallel query execution.
- Automatic Repartitioning
During `rebuild()`, if no partitioning hints exist but data exceeds ShardSize:
- MemCP calculates the optimal number of shards based on data size - Creates at least `2 × NumCPU` shards for good parallelism - Uses the first column for round-robin distribution
- Performance Characteristics
With parallel sharding enabled: - Query CPU utilization: 1500-1900% (15-19 cores on a 24-core machine) - Speedup: 3-6x compared to single-threaded execution - Throughput: ~0.04 µs/row for COUNT/SUM operations
- Configuration
| Setting | Default | Description | |---------|---------|-------------| | ShardSize | 60,000 | Rows per shard before splitting | | PartitionMaxDimensions | 10 | Maximum partitioning dimensions |