Let’s offload a simple sorting algorithm.
Before diving into example, it is recommended to review SDK workflow, and Get Started.

Hello Sort Example

‘Hello Sort’ example application demonstrates how to implement a simple sorting algorithm using the SDK by sorting an array of integers in ascending order. This example will guide you through the process of setting up, building, and running the application, helping you understand the basic structure and workflow of an SDK-based application.

1. Build Your Compute Kernel

  1. Prepare source code to offload
    mu_sort.cpp file is implemented using the std::sort algorithm. You can customize the algorithm in this file as needed.
     #include <algorithm>
     #include "mu/mu.hpp"
        
     void sort_with_tensor(int* arr, int size)
     {
         std::sort(arr, arr + size);
     }
        
     void sort_with_ptr(int* arr, int size)
     {
         auto taskIdx = mu::getTaskIdx();
         auto curArray = &arr[taskIdx * size];
         std::sort(curArray, curArray + size);
     }
        
     MU_KERNEL_ADD(sort_with_tensor)
     MU_KERNEL_ADD(sort_with_ptr)
    

    Note: MU_KERNEL_INIT must be included to initialize the kernel properly.

2. Create Your Host Application

  1. Prepare source code of host application
    1. Include API header.
       #include "pxl/pxl.hpp"
      
    2. Configure kernel path, and arguments. sortSize is number of elements to process per task. testCount is number of parallel tasks.
       const int sortSize = 64;
       const int testCount = 2048;
       const char *filename = "mu_kernel/mu_kernel.mubin";
      
    3. Setup PXL runtime instances.
       uint32_t deviceId = 0;
       auto ctx = pxl::runtime::createContext(deviceId);
       auto job = ctx->createJob();
       auto module = pxl::createModule(filename);
       job->load(module);
      
    4. Prepare map execution.
       auto muFunc = module->createFunction("sort_with_tensor");
       auto map = job->buildMap(muFunc, testCount);
      
    5. Allocate device memory. (CXL MEMORY)
       int* data = reinterpret_cast<int*>(context->memAlloc(testCount * sortSize * sizeof(int)));
      

      initialize input data as needed

    6. Flush host cache (For CXL memory w/ CXL2.0 Host)
       pxl::flushHostCache(data, testCount * sortSize * sizeof(int));
      
    7. Execute the kernel.
       auto ret = map->execute(dataTensor, sizeTensor);
       if (!ret)
       {
           fprintf(stderr, "Map Execute Failed\n");
           return;
       }
       if (!map->synchronize())
       {
           fprintf(stderr, "Map Synchronize Failed\n");
           return;
       }
      

      You can find full example source code in examples/sort/sort_with_tensor.cpp or examples/sort/sort_with_ptr.cpp.

3. Build Your Application

You can build the application using one of the following methods:

  1. Using the build.sh script (Recommended)
    The simplest way to build everything at once:
    ./build.sh
    

    This script automatically builds both the MU kernel and host application in one step.

  2. Using CMake
    For more control over the build process:
    mkdir -p build
    cd build
    cmake .. -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo
    ninja
    ninja install
    cd -
    

4. Run and Check

  1. Run executable
     ./sort_with_tensor
    
  2. Verify results
    Check the output data to confirm that the offloading process works correctly. If any element is not sorted correctly, an error message will be displayed.
     test done : 4698.65 ms