Auto Sharding: Difference between revisions
Jump to navigation
Jump to search
(Created page with "## Parallel Sharding MemCP automatically creates parallel shards for optimal query performance across multiple CPU cores. ### Bulk Insert Optimization When inserting large amounts of data, MemCP automatically: 1. **Splits bulk inserts into chunks** of 60,000 rows (configurable via ShardSize) 2. **Creates new shards on-the-fly** when the current shard fills up 3. **Rebuilds full shards in parallel** using background goroutines This enables insertion of millions of ro...") |
No edit summary |
||
| Line 1: | Line 1: | ||
== Parallel Sharding == | |||
MemCP automatically creates parallel shards for optimal query performance across multiple CPU cores. | MemCP automatically creates parallel shards for optimal query performance across multiple CPU cores. | ||
=== Bulk Insert Optimization === | |||
When inserting large amounts of data, MemCP automatically: | When inserting large amounts of data, MemCP automatically: | ||
# '''Splits bulk inserts into chunks''' of 60,000 rows (configurable via ShardSize) | |||
# '''Creates new shards on-the-fly''' when the current shard fills up | |||
# '''Rebuilds full shards in parallel''' using background goroutines | |||
This enables insertion of millions of rows while maintaining parallel query execution. | This enables insertion of millions of rows while maintaining parallel query execution. | ||
=== Automatic Repartitioning === | |||
During <code>rebuild()</code>, if no partitioning hints exist but data exceeds ShardSize: | |||
During | |||
* MemCP calculates the optimal number of shards based on data size | |||
* Creates at least <code>2 × NumCPU</code> shards for good parallelism | |||
* Uses the first column for round-robin distribution | |||
- | |||
=== Performance Characteristics === | |||
With parallel sharding enabled: - Query CPU utilization: 1500-1900% (15-19 cores on a 24-core machine) - Speedup: 3-6x compared to single-threaded execution - Throughput: ~0.04 µs/row for COUNT/SUM operations | |||
| Setting | === Configuration === | ||
|- | {| class="wikitable" | ||
| ShardSize | 60,000 | Rows per shard before splitting | | !Setting | ||
| PartitionMaxDimensions | 10 | Maximum partitioning dimensions | | !Default | ||
!Description | |||
|- | |||
|ShardSize | |||
|60,000 | |||
|Rows per shard before splitting | |||
|- | |||
|PartitionMaxDimensions | |||
|10 | |||
|Maximum partitioning dimensions | |||
|} | |||
Revision as of 00:58, 26 January 2026
Parallel Sharding
MemCP automatically creates parallel shards for optimal query performance across multiple CPU cores.
Bulk Insert Optimization
When inserting large amounts of data, MemCP automatically:
- Splits bulk inserts into chunks of 60,000 rows (configurable via ShardSize)
- Creates new shards on-the-fly when the current shard fills up
- Rebuilds full shards in parallel using background goroutines
This enables insertion of millions of rows while maintaining parallel query execution.
Automatic Repartitioning
During rebuild(), if no partitioning hints exist but data exceeds ShardSize:
- MemCP calculates the optimal number of shards based on data size
- Creates at least
2 × NumCPUshards for good parallelism - Uses the first column for round-robin distribution
Performance Characteristics
With parallel sharding enabled: - Query CPU utilization: 1500-1900% (15-19 cores on a 24-core machine) - Speedup: 3-6x compared to single-threaded execution - Throughput: ~0.04 µs/row for COUNT/SUM operations
Configuration
| Setting | Default | Description |
|---|---|---|
| ShardSize | 60,000 | Rows per shard before splitting |
| PartitionMaxDimensions | 10 | Maximum partitioning dimensions |