xMapReduce

xMapReduce is a MapReduce framework built on XCENA’s PXL (Parallel Execution Library) for processing large datasets in parallel. It simplifies distributed computing tasks by breaking them down into map and reduce operations, with automatic handling of parallelization.

How It Works

xMapReduce follows a three-phase process:

Map Phase - Transforms input data into intermediate key-value pairs
Reduce Phase - Processes grouped data with local reduce operations
Global Reduce Phase - Aggregates local reduce results to produce final output

This approach lets you focus on writing the processing logic for your specific use case rather than managing the complexities of parallel execution.

Features

Parallel Processing - Execute operations in parallel across multiple processing units
Key-Value Store - Built-in efficient key-value storage for intermediate results
Flexible Execution - Run on host CPU or XCENA accelerator devices
Memory Management - Automatic memory allocation and efficient block-based storage

Using the Template Project

xMapReduce includes a template project to help you get started quickly:

# Navigate to the template directory
cd xmapreduce/template

# Build your project based on the template
# (Customize the template files according to your needs)

Getting Started

To use xMapReduce, you need to:

Create a Mapper class that processes input data
Create a Reducer class that aggregates intermediate results
Execute the MapReduce job with your data

Example

Here’s a basic example of a sort application:

//Mapper
class SortMapper : public xmapreduce::Mapper<int, xmapreduce::KeyValuePair<xmapreduce::string, int>>
{
public:
    void mapImpl(int* data, int elementCount, xkvstore::KVStore<xmapreduce::string, int>* kvStore) override
    {
        for (int i = 0; i < elementCount; ++i)
        {
            kvStore->insert(xmapreduce::string("key"), data[i]);
        }
    }
};

REGISTER_MAPPER(SortMapper);

//Reducer
class SortReducer : public xmapreduce::Reducer<xkvstore::KeyValuePair<xmapreduce::string, int>, xkvstore::KeyValuePair<xmapreduce::string, int>>
{
public:
    void reduceImpl(xkvstore::KVStore<xmapreduce::string, int>* inputKVStore, xkvstore::KVStore<xmapreduce::string, int>* outputKVStore) override
    {
        std::vector<xkvstore::KeyValuePair<xmapreduce::string, int>> data;
        for (const auto& kv : *inputKVStore)
        {
            data.push_back(kv);
        }

        std::sort(data.begin(), data.end(), [](const xkvstore::KeyValuePair<xmapreduce::string, int>& a, const xkvstore::KeyValuePair<xmapreduce::string, int>& b)
                  {
                      return a.getValue() < b.getValue();
                  });

        for (const auto& kv : data)
        {
            outputKVStore->insert(kv.getKey(), kv.getValue());
        }
        }
};

REGISTER_REDUCER(SortReducer);

// Main application
#include <stdio.h>

#include "sort_mapper.hpp"
#include "sort_reducer.hpp"
#include "xmapreduce/xmapreduce.hpp"

using timescale_t = std::chrono::microseconds;

int main()
{
    std::vector<int> dataList = {16, 18, 17, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1};

    auto mapReduce = xmapreduce::MapReduce<SortMapper, SortReducer>::builder()
                         .numMaps(8)
                         .numReduces(4)
                         .numTasks(4)
                         .resultCapacity(1024)
                         .threshold(2)
                         .build();
    mapReduce->execute(dataList);

    auto result = mapReduce->getResultKVStore();

    for (auto kv : *result)
    {
        printf("[device] key: %s, value: %d\n", kv.getKey().c_str(), kv.getValue());
    }

    return 0;
}

Architecture Overview

xMapReduce consists of several key components:

MapReduce Class: Manages the entire pipeline, handling map and reduce operations
Mapper Interface: Transforms input data into key-value pairs
Reducer Interface: Aggregates values by key to produce results
KVStoreManager: Creates and manages key-value stores with memory allocation
KVStore: Stores intermediate and final key-value data
Block & Region: Basic memory units for efficient parallel data access

For detailed architecture information, see Core Concepts.

Learn More

For a deeper understanding:

Core Concepts - Architecture and design principles
Examples - More practical examples