Auto Sharding: Difference between revisions

Revision as of 01:58, 26 January 2026

Parallel Sharding

MemCP automatically creates parallel shards for optimal query performance across multiple CPU cores.

Bulk Insert Optimization

When inserting large amounts of data, MemCP automatically:

Splits bulk inserts into chunks of 60,000 rows (configurable via ShardSize)
Creates new shards on-the-fly when the current shard fills up
Rebuilds full shards in parallel using background goroutines

This enables insertion of millions of rows while maintaining parallel query execution.

Automatic Repartitioning

During rebuild(), if no partitioning hints exist but data exceeds ShardSize:

MemCP calculates the optimal number of shards based on data size
Creates at least 2 × NumCPU shards for good parallelism
Uses the first column for round-robin distribution

Performance Characteristics

With parallel sharding enabled: - Query CPU utilization: 1500-1900% (15-19 cores on a 24-core machine) - Speedup: 3-6x compared to single-threaded execution - Throughput: ~0.04 µs/row for COUNT/SUM operations

Configuration

Setting	Default	Description
ShardSize	60,000	Rows per shard before splitting
PartitionMaxDimensions	10	Maximum partitioning dimensions

@@ Line 1: / Line 1: @@
-## Parallel Sharding
+== Parallel Sharding ==
 MemCP automatically creates parallel shards for optimal query performance across multiple CPU cores.
-### Bulk Insert Optimization
+=== Bulk Insert Optimization ===
 When inserting large amounts of data, MemCP automatically:
-. **Splits bulk inserts into chunks** of 60,000 rows (configurable via ShardSize)
+# '''Splits bulk inserts into chunks''' of 60,000 rows (configurable via ShardSize)
-. **Creates new shards on-the-fly** when the current shard fills up
+# '''Creates new shards on-the-fly''' when the current shard fills up
-. **Rebuilds full shards in parallel** using background goroutines
+# '''Rebuilds full shards in parallel''' using background goroutines
 This enables insertion of millions of rows while maintaining parallel query execution.
-### Automatic Repartitioning
+=== Automatic Repartitioning ===
+During <code>rebuild()</code>, if no partitioning hints exist but data exceeds ShardSize:
-During `rebuild()`, if no partitioning hints exist but data exceeds ShardSize:
-- MemCP calculates the optimal number of shards based on data size
-- Creates at least `2 × NumCPU` shards for good parallelism
-- Uses the first column for round-robin distribution
-### Performance Characteristics
-With parallel sharding enabled:
+* MemCP calculates the optimal number of shards based on data size
-- Query CPU utilization: 1500-1900% (15-19 cores on a 24-core machine)
+* Creates at least <code>2 × NumCPU</code> shards for good parallelism
-- Speedup: 3-6x compared to single-threaded execution
+* Uses the first column for round-robin distribution
-- Throughput: ~0.04 µs/row for COUNT/SUM operations
-### Configuration
+=== Performance Characteristics ===
+With parallel sharding enabled: - Query CPU utilization: 1500-1900% (15-19 cores on a 24-core machine) - Speedup: 3-6x compared to single-threaded execution - Throughput: ~0.04 µs/row for COUNT/SUM operations
-| Setting | Default | Description |
+=== Configuration ===
-|---------|---------|-------------|
+{| class="wikitable"
-| ShardSize | 60,000 | Rows per shard before splitting |
+!Setting
-| PartitionMaxDimensions | 10 | Maximum partitioning dimensions |
+!Default
+!Description
+|-
+|ShardSize
+|60,000
+|Rows per shard before splitting
+|-
+|PartitionMaxDimensions
+|10
+|Maximum partitioning dimensions
+|}

Auto Sharding: Difference between revisions

Revision as of 01:58, 26 January 2026

Contents

Parallel Sharding

Bulk Insert Optimization

Automatic Repartitioning

Performance Characteristics

Configuration

Navigation menu

Auto Sharding: Difference between revisions

Revision as of 01:58, 26 January 2026

Parallel Sharding

Bulk Insert Optimization

Automatic Repartitioning

Performance Characteristics

Configuration

Navigation menu

Search