Parallel Xceleration Library

The Parallel Xceleration Library (PXL) is a key component of XCENA SDK designed to facilitate optimized communication between the host system and the device. By leveraging PXL, developers can unlock the full potential of in-memory computing technology through a robust and efficient API. Alt text

Key Capabilities of PXL

  1. Manage XCENA device resources.
    1. Allocate/Free device memory resources.
    2. Allocate/Free device compute resources.
    3. Check device’s current remaining resources.
  2. Offload user applications to the XCENA devices.
    1. Load a binary file into the device.
    2. Read/Write data to the device using DMA
    3. Launch applications on the device. (Map and Parallel)

PXL Object Overview

Please refer to PXL C++ API docs for more details. Alt text

Context

The Context object manages device memory resources and holds overall information about the device.

  1. Allocate / Free device memory resources.
    auto ptr = context->memAlloc(size);
    context->memFree(ptr);
    
  2. Check device resources
    printf("Remain memory in device %d = %d\n", deviceId, context->remainMemorySize());
    printf("Remain sub in device %d = %d\n", deviceId, context->remainSub());
    
  3. Allocate / Free device compute resources by creating Job object.
    auto numSub = 4;
    auto job = context->createJob(numSub);
    

Job

The Job object manages device compute resources and offloads user applications to the device. The Job object is a group of Sub which is a compute unit to execute user application.

  1. load a binary file into the device.
     const char* filename = "mu_kernel.mubin";
     auto muModule = pxl::createModule(filename);
     job->load(muModule);
    
  2. Allocate / Free device compute resources.
      // allocates additional Sub to current job.
     auto numSub = 2;
     auto ret = job->subAlloc(numSub); // return false if there is no remaining sub to assign current job.
    
  3. Create Map and Parallel object.
     auto testCount = 1024 * 1024;
     auto muFunc = muModule->createFunction("mu_main"); // name of mu_kernel main.
     auto map = job->buildMap(muFunc, testCount);
     auto parallel = job->buildParallel(muFunc, testCount);
    

Map

The Map object is responsible for executing a map operation on a given job. It provides methods to set input and output arguments, set the batch size, and synchronize the execution. The map operation can be executed by calling the execute method. map->execute(...) takes up to 10 arguments, which should be a C++ fundamental type or a Tensor object. The Tensor object defines the type and shape of the array so that Map can appropriately distribute array elements to each MU. The Map operation divides the array argument into equal chunks and distributes the sliced chunks to the MU threads. Alt text

Parallel

The Parallel object is responsible for executing a parallel operation on a given job. It provides methods to set the batch size, and synchronize the execution. The parallel operation can be executed by calling the execute method. parallel->execute(...) takes up to 10 arguments, which should be a C++ fundamental type or a device memory pointer allocated from the Context object. The Parallel operation distributes identical arguments to all MU threads belongs to assigned Sub. In MU kernel code, the user needs to reference the appropriate memory address with taskIdx, obtained by calling mu::getTaskIdx() function. Alt text

Module and Function

The Module object represents MU kernel binary compiled by provided MU compiler. the Function object points out the starting function when the user launches offloading job on the device. the Module provides an interface for accessing functions within a module. It also provides information about the module’s binary path and the number of functions it contains.

Alt text

PXL example code

const char* filename = "mu_kernel.mubin";
auto muModule = pxl::createModule(filename);
auto sortFunc = muModule->createFunction("bubbleSort");
auto searchFunc = muModule->createFunction("binarySearch");