Let’s offload a simple sorting algorithm.
Before diving into example, it is recommended to review SDK workflow, and Get Started.
Hello Sort Example
‘Hello Sort’ example application demonstrates how to implement a simple sorting algorithm using the SDK by sorting an array of integers in ascending order. This example will guide you through the process of setting up, building, and running the application, helping you understand the basic structure and workflow of an SDK-based application.
1. Build Your Compute Kernel
- Prepare source code to offload
mu_sort.cpp
file is implemented using thestd::sort
algorithm. You can customize the algorithm in this file as needed.#include <algorithm> #include "mu/mu.hpp" void sort_with_tensor(int* arr, int size) { std::sort(arr, arr + size); } void sort_with_ptr(int* arr, int size) { auto taskIdx = mu::getTaskIdx(); auto curArray = &arr[taskIdx * size]; std::sort(curArray, curArray + size); } MU_KERNEL_ADD(sort_with_tensor) MU_KERNEL_ADD(sort_with_ptr)
Note: MU_KERNEL_INIT must be included to initialize the kernel properly.
2. Create Your Host Application
- Prepare source code of host application
- Include API header.
#include "pxl/pxl.hpp"
- Configure kernel path, and arguments.
sortSize
is number of elements to process per task.testCount
is number of parallel tasks.const int sortSize = 64; const int testCount = 2048; const char *filename = "mu_kernel/mu_kernel.mubin";
- Setup PXL runtime instances.
uint32_t deviceId = 0; auto ctx = pxl::runtime::createContext(deviceId); auto job = ctx->createJob(); auto module = pxl::createModule(filename); job->load(module);
- Prepare map execution.
auto muFunc = module->createFunction("sort_with_tensor"); auto map = job->buildMap(muFunc, testCount);
- Allocate device memory. (CXL MEMORY)
int* data = reinterpret_cast<int*>(context->memAlloc(testCount * sortSize * sizeof(int)));
initialize input data as needed
- Flush host cache (For CXL memory w/ CXL2.0 Host)
pxl::flushHostCache(data, testCount * sortSize * sizeof(int));
- Execute the kernel.
auto ret = map->execute(dataTensor, sizeTensor); if (!ret) { fprintf(stderr, "Map Execute Failed\n"); return; } if (!map->synchronize()) { fprintf(stderr, "Map Synchronize Failed\n"); return; }
You can find full example source code in
examples/sort/sort_with_tensor.cpp
orexamples/sort/sort_with_ptr.cpp
.
- Include API header.
3. Build Your Application
You can build the application using one of the following methods:
- Using the build.sh script (Recommended)
The simplest way to build everything at once:./build.sh
This script automatically builds both the MU kernel and host application in one step.
- Using CMake
For more control over the build process:mkdir -p build cd build cmake .. -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo ninja ninja install cd -
4. Run and Check
- Run executable
./sort_with_tensor
- Verify results
Check the output data to confirm that the offloading process works correctly. If any element is not sorted correctly, an error message will be displayed.test done : 4698.65 ms