xMapReduce Core Concepts

This guide explains the architecture and design principles of the xMapReduce framework, how it leverages XCENA’s PXL (Parallel Execution Library) for efficient parallel data processing.

What is MapReduce?

MapReduce is a programming model designed for processing large datasets across distributed systems. It was originally developed by Google and has become a standard approach for parallel computing. The model takes its name from two main operations:

Map: Process each piece of input data independently
Reduce: Combine intermediate results into final outputs

This approach makes it possible to handle datasets that are too large for a single computer to process efficiently.

xMapReduce Architecture

xMapReduce implements the MapReduce model using a three-phase process:

Map Phase: Input data is divided into chunks and processed by multiple mappers in parallel
Local Reduce Phase: Mapped data is processed by reducers that perform local aggregation
Global Reduce Phase: Results from local reducers are combined to produce the final output

xMapReduce Architecture Diagram

Key Components

MapReduce Class

The central component that orchestrates the entire pipeline:

Manages parallel execution of map and reduce operations
Handles data distribution and collection
Controls memory allocation through KVStoreManager
Supports execution on both host CPU and XCENA accelerator

// Example of creating and using a MapReduce instance
// Method 1: Direct constructor
auto mapReduce = std::make_unique<xmapreduce::MapReduce<MyMapper, MyReducer>>(numMapTasks, numReduceTasks, resultCapacity);  
mapReduce->initialize();  
mapReduce->execute(inputData, dataSize);  

// Method 2: Builder pattern (newer implementation)
auto mapReduce = xmapreduce::MapReduce<MyMapper, MyReducer>::builder()  
                     .numMaps(8)
                     .numReduces(4)
                     .numTasks(2)
                     .resultCapacity(1024)
                     .build();
mapReduce->execute(inputData, dataSize);

Mapper Interface

Base interface for implementing custom mapping logic:

class MyMapper : public xmapreduce::Mapper<InputType, xmapreduce::KeyValuePair<KeyType, ValueType>>
{
public:
    void mapImpl(InputType* data, int count, xkvstore::KVStore<KeyType, ValueType>* kvStore) override
    {
        // Map implementation
        for (int i = 0; i < count; i++) {
            // Process data[i] and emit key-value pairs
            kvStore->insert(key, value);
        }
    }
};

// Register mapper for dynamic creation at runtime
REGISTER_MAPPER(MyMapper);

Reducer Interface

Base interface for implementing custom reduction logic:

class MyReducer : public xmapreduce::Reducer<xmapreduce::KeyValuePair<KeyType, ValueType>, xmapreduce::KeyValuePair<KeyType, ValueType>>
{
public:
    void reduceImpl(xkvstore::KVStore<KeyType, ValueType>* input, xkvstore::KVStore<KeyType, ValueType>* output) override
    {
        // Reduce implementation
        // Process all values for each key and emit results
        std::unordered_map<KeyType, ValueType> aggregate;
        for (const auto& kv : *input)
        {
            aggregate[kv.getKey()] += kv.getValue();
        }
        for (const auto& [key, value] : aggregate)
        {
            output->insert(key, value);
        }
    }
};

// Register reducer for dynamic creation at runtime
REGISTER_REDUCER(MyReducer);

KVStore

Container for key-value pairs with efficient parallel access:

Supports concurrent insertion from multiple tasks
Contains multiple blocks for dynamic memory expansion
Each block divided into regions for task-specific access
Provides iterators for traversing stored data
Manages memory allocations through the KVStoreManager

Block and Region Memory Structure

For efficient parallel memory management, xMapReduce uses a block-based approach:

Block: Basic memory allocation unit containing multiple regions
Region: Task-specific memory area within a block

This structure prevents memory conflicts during parallel execution and enables efficient memory expansion when needed.

Execution Flow

1. Initialization

When a MapReduce job is initialized:

The MapReduce object is created with mapper and reducer types
PXL runtime is initialized for parallel execution
KVStoreManager creates KVStores for map and reduce phases
Memory blocks and regions are allocated
MapReduce registers the mapper and reducer with the registry system

2. Map Execution

During the map phase:

Input data is divided among map tasks
Each mapper processes its assigned data chunk in parallel
Mappers emit key-value pairs into the map KVStore
If a region becomes full, a new block is automatically allocated

3. Local Reduce Execution

During the local reduce phase:

Map results are grouped by key
Each reducer processes values for its assigned keys
Reducers emit aggregated results to the local reduce KVStore

4. Global Reduce

After local reduction:

Results from all local reducers are combined using KVStoreManager
Final aggregation is performed across all intermediate results
Results are made available through the result KVStore and can be retrieved with getResultKVStore()

Memory Management

xMapReduce employs a block and region-based memory allocation system:

Block Allocation

When a KVStore needs more memory:

Initial insert attempt may fail if the region is full
KVStoreManager is notified to allocate a new block
New block is created with multiple regions
Block is added to the KVStore’s block list
The failed insert operation is retried

This dynamic block allocation enables efficient handling of varying data sizes without manual memory management.

Common MapReduce Patterns

Several common patterns emerge in MapReduce applications:

Filtering: Mappers emit only items meeting certain criteria
Counting: Count occurrences of items (like in word count)
Summation: Aggregate numerical values
Averaging: Calculate averages for grouped data
Indexing: Create indexes or inverted indexes for data retrieval

When to Use xMapReduce

xMapReduce works well for tasks that:

Can be broken into independent sub-problems
Involve processing each record in a similar way
Require aggregating or combining intermediate results
Benefit from parallel processing

For more examples, see the Examples page.