Auto Sharding: Difference between revisions

From MemCP
Jump to navigation Jump to search
(Created page with "## Parallel Sharding MemCP automatically creates parallel shards for optimal query performance across multiple CPU cores. ### Bulk Insert Optimization When inserting large amounts of data, MemCP automatically: 1. **Splits bulk inserts into chunks** of 60,000 rows (configurable via ShardSize) 2. **Creates new shards on-the-fly** when the current shard fills up 3. **Rebuilds full shards in parallel** using background goroutines This enables insertion of millions of ro...")
 
(Blanked the page)
Tags: Blanking Visual edit
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
## Parallel Sharding


MemCP automatically creates parallel shards for optimal query performance across multiple CPU cores.
### Bulk Insert Optimization
When inserting large amounts of data, MemCP automatically:
1. **Splits bulk inserts into chunks** of 60,000 rows (configurable via ShardSize)
2. **Creates new shards on-the-fly** when the current shard fills up
3. **Rebuilds full shards in parallel** using background goroutines
This enables insertion of millions of rows while maintaining parallel query execution.
### Automatic Repartitioning
During `rebuild()`, if no partitioning hints exist but data exceeds ShardSize:
- MemCP calculates the optimal number of shards based on data size
- Creates at least `2 × NumCPU` shards for good parallelism
- Uses the first column for round-robin distribution
### Performance Characteristics
With parallel sharding enabled:
- Query CPU utilization: 1500-1900% (15-19 cores on a 24-core machine)
- Speedup: 3-6x compared to single-threaded execution
- Throughput: ~0.04 µs/row for COUNT/SUM operations
### Configuration
| Setting | Default | Description |
|---------|---------|-------------|
| ShardSize | 60,000 | Rows per shard before splitting |
| PartitionMaxDimensions | 10 | Maximum partitioning dimensions |

Latest revision as of 01:06, 26 January 2026