Let’s offload a simple sorting algorithm.
Before diving into example, it is recommended to review SDK workflow, and Get Started.

Hello Sort Example

‘Hello Sort’ example application demonstrates how to implement a simple sorting algorithm using the SDK by sorting an array of integers in ascending order. This example will guide you through the process of setting up, building, and running the application, helping you understand the basic structure and workflow of an SDK-based application.

1. Build Your Compute Kernel

  1. Prepare source code to offload
    mu_sort.cpp file is implemented using the std::sort algorithm. You can customize the algorithm in this file as needed.
     #include <algorithm>
     #include "mu/mu.hpp"
        
     void sort_with_map(int* arr, int size)
     {
         std::sort(arr, arr + size);
     }
        
     void sort_with_parallel(int* arr, int size)
     {
         auto taskIdx = mu::getTaskIdx();
         auto curArray = &arr[taskIdx * size];
         std::sort(curArray, curArray + size);
     }
        
     MU_KERNEL_ADD(sort_with_parallel)
     MU_KERNEL_ADD(sort_with_map)
    

    Note: MU_KERNEL_INIT must be included to initialize the kernel properly.

  2. Build kernel
    Use Makefile in directory mu_kernel.
     cd mu_kernel
     make
    

    mu_kernel.mubin file will be generated.

2. Create Your Host Application

  1. Prepare source code of host application
    1. Include API header.
       #include "pxl/pxl.hpp"
      
    2. Configure kernel path, and arguments. sortSize is number of elements to process per task. testCount is number of parallel tasks.
       const int sortSize = 64;
       const int testCount = 2048;
       const char *filename = "mu_kernel/mu_kernel.mubin";
      
    3. Setup PXL runtime instances.
       uint32_t deviceId = 0;
       auto ctx = pxl::runtime::createContext(deviceId);
       auto job = ctx->createJob();
       auto module = pxl::createModule(filename);
       job->load(module);
      
    4. Prepare map execution.
       auto muFunc = module->createFunction("sort_with_map");
       auto map = job->buildMap(muFunc, testCount);
      
    5. Allocate device memory. (CXL MEMORY)
       int* data = reinterpret_cast<int*>(context->memAlloc(testCount * sortSize * sizeof(int)));
      

      initialize input data as needed

    6. Flush host cache (For CXL memory w/ CXL2.0 Host)
       pxl::flushHostCache(data, testCount * sortSize * sizeof(int));
      
    7. Execute the kernel.
       auto ret = map->execute(dataTensor, sizeTensor);
       if (ret)
       {
           map->synchronize();
       }
      

      You can find full example source code in examples/sort/map_sort.cpp or examples/sort/parallel_sort.cpp.

  2. Build using CMake
    a. Configure CMakeLists.txt:
    Configure project name, source files, and header files as needed.

    b. Build:

     mkdir -p build
     cd build
     cmake .. -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo
     ninja
     ninja install
     cd -
    
  3. Build using Makefile Alternatively, you can build without CMake using Makefile:
     make
    

You can build both kernel and application using the build.sh script. bash ./build.sh

3. Run and Check

  1. Run executable
     ./run_sort
    
  2. Verify results Check the output data to confirm that the offloading process works correctly. If any element is not sorted correctly, an error message will be displayed.
     test done : 4698.65 ms