Let’s offload a simple sorting algorithm.
Before diving into example, it is recommended to review SDK workflow, and Get Started.
Hello Sort Example
‘Hello Sort’ example application demonstrates how to implement a simple sorting algorithm using the SDK by sorting an array of integers in ascending order. This example will guide you through the process of setting up, building, and running the application, helping you understand the basic structure and workflow of an SDK-based application.
1. Build Your Compute Kernel
- Prepare source code to offload
mu_sort.cpp
file is implemented using thestd::sort
algorithm. You can customize the algorithm in this file as needed.#include <algorithm> #include "mu/mu.hpp" void sort_with_map(int* arr, int size) { std::sort(arr, arr + size); } void sort_with_parallel(int* arr, int size) { auto taskIdx = mu::getTaskIdx(); auto curArray = &arr[taskIdx * size]; std::sort(curArray, curArray + size); } MU_KERNEL_ADD(sort_with_parallel) MU_KERNEL_ADD(sort_with_map)
Note: MU_KERNEL_INIT must be included to initialize the kernel properly.
- Build kernel
UseMakefile
in directorymu_kernel
.cd mu_kernel make
mu_kernel.mubin
file will be generated.
2. Create Your Host Application
- Prepare source code of host application
- Include API header.
#include "pxl/pxl.hpp"
- Configure kernel path, and arguments.
sortSize
is number of elements to process per task.testCount
is number of parallel tasks.const int sortSize = 64; const int testCount = 2048; const char *filename = "mu_kernel/mu_kernel.mubin";
- Setup PXL runtime instances.
uint32_t deviceId = 0; auto ctx = pxl::runtime::createContext(deviceId); auto job = ctx->createJob(); auto module = pxl::createModule(filename); job->load(module);
- Prepare map execution.
auto muFunc = module->createFunction("sort_with_map"); auto map = job->buildMap(muFunc, testCount);
- Allocate device memory. (CXL MEMORY)
int* data = reinterpret_cast<int*>(context->memAlloc(testCount * sortSize * sizeof(int)));
initialize input data as needed
- Flush host cache (For CXL memory w/ CXL2.0 Host)
pxl::flushHostCache(data, testCount * sortSize * sizeof(int));
- Execute the kernel.
auto ret = map->execute(dataTensor, sizeTensor); if (ret) { map->synchronize(); }
You can find full example source code in
examples/sort/map_sort.cpp
orexamples/sort/parallel_sort.cpp
.
- Include API header.
-
Build using CMake
a. ConfigureCMakeLists.txt
:
Configure project name, source files, and header files as needed.b. Build:
mkdir -p build cd build cmake .. -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo ninja ninja install cd -
- Build using Makefile Alternatively, you can build without CMake using Makefile:
make
You can build both kernel and application using the build.sh
script. bash ./build.sh
3. Run and Check
- Run executable
./run_sort
- Verify results Check the output data to confirm that the offloading process works correctly. If any element is not sorted correctly, an error message will be displayed.
test done : 4698.65 ms