xMapReduce
xMapReduce is a MapReduce framework built on XCENA’s PXL (Parallel Execution Library) for processing large datasets in parallel. It simplifies distributed computing tasks by breaking them down into map and reduce operations, with automatic handling of parallelization.
How It Works
xMapReduce follows a three-phase process:
- Map Phase - Transforms input data into intermediate key-value pairs
- Reduce Phase - Processes grouped data with local reduce operations
- Global Reduce Phase - Aggregates local reduce results to produce final output
This approach lets you focus on writing the processing logic for your specific use case rather than managing the complexities of parallel execution.
Features
- Parallel Processing - Execute operations in parallel across multiple processing units
- Key-Value Store - Built-in efficient key-value storage for intermediate results
- Flexible Execution - Run on host CPU or XCENA accelerator devices
- Memory Management - Automatic memory allocation and efficient block-based storage
Using the Template Project
xMapReduce includes a template project to help you get started quickly:
# Navigate to the template directory
cd xmapreduce/template
# Build your project based on the template
# (Customize the template files according to your needs)
Getting Started
To use xMapReduce, you need to:
- Create a Mapper class that processes input data
- Create a Reducer class that aggregates intermediate results
- Execute the MapReduce job with your data
Example
Here’s a basic example of a sort application:
//Mapper
class SortMapper : public xmapreduce::Mapper<int, xmapreduce::KeyValuePair<xmapreduce::string, int>>
{
public:
void mapImpl(int* data, int elementCount, xkvstore::KVStore<xmapreduce::string, int>* kvStore) override
{
for (int i = 0; i < elementCount; ++i)
{
kvStore->insert(xmapreduce::string("key"), data[i]);
}
}
};
REGISTER_MAPPER(SortMapper);
//Reducer
class SortReducer : public xmapreduce::Reducer<xkvstore::KeyValuePair<xmapreduce::string, int>, xkvstore::KeyValuePair<xmapreduce::string, int>>
{
public:
void reduceImpl(xkvstore::KVStore<xmapreduce::string, int>* inputKVStore, xkvstore::KVStore<xmapreduce::string, int>* outputKVStore) override
{
std::vector<xkvstore::KeyValuePair<xmapreduce::string, int>> data;
for (const auto& kv : *inputKVStore)
{
data.push_back(kv);
}
std::sort(data.begin(), data.end(), [](const xkvstore::KeyValuePair<xmapreduce::string, int>& a, const xkvstore::KeyValuePair<xmapreduce::string, int>& b)
{
return a.getValue() < b.getValue();
});
for (const auto& kv : data)
{
outputKVStore->insert(kv.getKey(), kv.getValue());
}
}
};
REGISTER_REDUCER(SortReducer);
// Main application
#include <stdio.h>
#include "sort_mapper.hpp"
#include "sort_reducer.hpp"
#include "xmapreduce/xmapreduce.hpp"
using timescale_t = std::chrono::microseconds;
int main()
{
std::vector<int> dataList = {16, 18, 17, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1};
auto mapReduce = xmapreduce::MapReduce<SortMapper, SortReducer>::builder()
.numMaps(8)
.numReduces(4)
.numTasks(4)
.resultCapacity(1024)
.threshold(2)
.build();
mapReduce->execute(dataList);
auto result = mapReduce->getResultKVStore();
for (auto kv : *result)
{
printf("[device] key: %s, value: %d\n", kv.getKey().c_str(), kv.getValue());
}
return 0;
}
Architecture Overview
xMapReduce consists of several key components:
- MapReduce Class: Manages the entire pipeline, handling map and reduce operations
- Mapper Interface: Transforms input data into key-value pairs
- Reducer Interface: Aggregates values by key to produce results
- KVStoreManager: Creates and manages key-value stores with memory allocation
- KVStore: Stores intermediate and final key-value data
- Block & Region: Basic memory units for efficient parallel data access
For detailed architecture information, see Core Concepts.
Learn More
For a deeper understanding:
- Core Concepts - Architecture and design principles
- Examples - More practical examples