K-Nearest Neighbors (KNN) Implementation Tutorial with PXL

This tutorial demonstrates how to classify data using the K-Nearest Neighbors (KNN) algorithm with the PXL framework. We will cover the implementation of KNN on both CPU and parallelized environments using PXL’s Map and Parallel execution modes.

Prerequisites:
Before starting, it is recommended to review the following tutorials:

  • SDK Workflow: Provides an overview of the SDK’s architecture and workflow.
  • Get Started: Guides you through setting up your development environment.
  • Hello Sort: Introduces basic concepts using a simple sorting example.
  • Emulator on Docker: Guides you through setting up the emulator on Docker.

Overview

The KNN algorithm works by:

  1. Finding the K nearest neighbors to a target point.
  2. Taking a majority vote of their labels to classify the target.

Our implementation includes three main approaches:

  • Host Implementation: Uses the CPU for computation.
  • Map Implementation: Leverages PXL’s Map execution mode for parallel processing.
  • Parallel Implementation: Distributes computation tasks across multiple parallel resources.

Important Notes

  • Ensure that QEMU is started with the --no-kvm option when running the application in a QEMU environment to avoid issues.
cd emulator
./run.sh --no-kvm

Detailed Implementation

1. Build Your Compute Kernel

  1. Write Distance Calculation Functions

    The kernel code, implemented in mu_knn.cpp, calculates distances using two metrics:

    • Euclidean Distance
    • Dot Product
     #include <algorithm>
     #include <utility>
     #include <vector>
    
     #include "mu/mu.hpp"
     #include "mu/vector.hpp" //for vector engine
     #include "mu_knn_config.hpp"
    
     // Calculate Euclidean Distance
     void calculate_euclidean_distance(float* point0, float* point1, int pointSize, float* result)
     {
         float distance = 0;
         for (int i = 0; i < pointSize; i++)
         {
             distance += (point0[i] - point1[i]) * (point0[i] - point1[i]);
         }
         *result = distance;
     }
    
     // Calculate Dot Product
     void calculate_dot_product(float* point0, float* point1, int pointSize, float* result)
     {
         float distance = 0;
         for (int i = 0; i < pointSize; i++)
         {
             distance += point0[i] * point1[i];
         }
         *result = distance;
     }
    
  2. Implement KNN Functions

    Note: Using user-defined types (e.g., struct or class) as a parameter type for map currently is not supported. However, it is supported for parallel implementation.

    1. KNN with Map

       void knn_with_map(float* data, int* label, float* target, int k, int dim, int numPoints, float* result, int* labelResult, bool useDotProduct)
       {
           std::vector<std::pair<float, int>> distances(numPoints);
           for (int i = 0; i < numPoints; i++)
           {
               float distance = 0;
               if (useDotProduct)
               {
                   calculate_dot_product(data + i * dim, target, dim, &distance);
               }
               else
               {
                   calculate_euclidean_distance(data + i * dim, target, dim, &distance);
               }
               distances[i] = std::make_pair(distance, i * dim);
           }
      
           std::sort(distances.begin(), distances.end());
      
           for (int i = 0; i < k; i++)
           {
               for (int j = 0; j < dim; j++)
               {
                   result[i * dim + j] = data[distances[i].second + j];
               }
      
               labelResult[i] = label[distances[i].second / dim];
           }
       }
      
    2. KNN with Parallel

       void knn_with_parallel(float* data, int* label, float* target, const knnConfig& config, int numTasks, float* result, int* labelResult)
       {
           auto taskIdx = mu::getTaskIdx();
           int chunkSize = config.dataCount / numTasks;
           int start = taskIdx * chunkSize;
           int end = (taskIdx == numTasks - 1) ? config.dataCount : start + chunkSize;
      
           std::vector<std::pair<float, int>> distances(end - start);
      
           for (int i = start; i < end; i++)
           {
               float distance = 0;
               if (config.useDotProduct)
               {
                   calculate_dot_product(data + i * config.dimension, target, config.dimension, &distance);
               }
               else
               {
                   calculate_euclidean_distance(data + i * config.dimension, target, config.dimension, &distance);
               }
      
               distances[i - start] = std::make_pair(distance, i * config.dimension);
           }
      
           std::sort(distances.begin(), distances.end());
      
           for (int i = 0; i < config.k; i++)
           {
               for (int j = 0; j < config.dimension; j++)
               {
                   int dataIndex = distances[i].second + j;
                   result[taskIdx * config.k * config.dimension + i * config.dimension + j] = data[dataIndex];
               }
               labelResult[taskIdx * config.k + i] = label[distances[i].second / config.dimension];
           }
       }
      
       MU_KERNEL_ADD(knn_with_map)
       MU_KERNEL_ADD(knn_with_parallel)
      

2. Create Your Host Application

  1. Setup PXL Runtime Instances

    You can initialize PXL runtime instances as shown below:

     uint32_t deviceId = 0;
     auto ctx = pxl::runtime::createContext(deviceId);
     auto job = ctx->createJob();
     auto module = pxl::createModule(filename);
     job->load(module);
    

    Alternatively, use a custom PxlResources struct to manage the lifecycle of PXL resources. The PxlResources struct is implemented in pxl_helper.hpp.

     struct PxlResources {
         pxl::runtime::Context* context;
         pxl::runtime::Job* job;
         pxl::runtime::Map* map;
         pxl::runtime::Parallel* parallel;
    
         ~PxlResources() {
             if (context) {
                 if (job) {
                     context->destroyJob(job);
                 }
                 pxl::runtime::destroyContext(context);
             }
         }
     };
    
     int main(int argc, char* argv[])
     {
         const uint32_t numTasks = 4;
         auto pxlResources = initializePxl(numTasks, MAP); // or PARALLEL
    
         ... 
     }
    
  2. Setup Data

    1. Reading Data and configuring KNN parameters

      First, read data from the input files using a helper function. helper function is implemented in helper.hpp

       std::string dataFilename = "knn_data/dim_5/data.txt";
       std::string targetFilename = "knn_data/dim_5/targets.txt";
      
       auto [data, label] = readData<float>(dataFilename);
       auto target = readTarget<float>(targetFilename);
      
      
       const uint32_t k = 3;
       const bool useDotProduct = false;
       const uint32_t dimension = data[0].size();
       const uint32_t dataCount = data.size();
       knnConfig config = {k, dimension, dataCount, useDotProduct};
      
    2. Allocating Device Memory

      You can allocate device memory as in a similar manner as demonstrated in the Hello Sort tutorial. In this tutorial, we use the helper function allocatePxlMemory along with a custom deleter implemented through the PxlMemoryDeleter struct. This ensures that memory allocated by PXL is properly freed when it is no longer needed. The allocatePxlMemory function is a template that returns a PxlUniquePtr, which is a std::unique_ptr with a custom deleter. This deleter ensures that the allocated memory is freed using the PXL runtime’s memFree function.

       template <typename T>
       struct PxlMemoryDeleter
       {
           pxl::runtime::Context* context;
           PxlMemoryDeleter(pxl::runtime::Context* ctx)
               : context(ctx)
           {
               if (!ctx)
                   throw std::invalid_argument("Null context provided to PxlMemoryDeleter");
           }
           void operator()(T* ptr)
           {
               if (ptr && context)
                   context->memFree(ptr);
           }
       };
      
       template <typename T>
       using PxlUniquePtr = std::unique_ptr<T, PxlMemoryDeleter<T>>;
      
       template <typename T>
       PxlUniquePtr<T> allocatePxlMemory(pxl::runtime::Context* context, size_t count)
       {
           T* ptr = reinterpret_cast<T*>(context->memAlloc(count * sizeof(T)));
           if (!ptr)
           {
               throw std::runtime_error("Failed to allocate PXL memory");
           }
           return PxlUniquePtr<T>(ptr, PxlMemoryDeleter<T>(context));
       }
      

      allocate memory for data and label

       int main(int argc, char* argv[])
       {
           ...
      
           auto muData = allocatePxlMemory<float>(pxlResources->context, dataCount * dimension);
           auto dataTensor = pxl::runtime::Tensor<float>(
               muData.get(), 
               {static_cast<size_t>(numTasks), (dataCount * dimension) / static_cast<size_t>(numTasks)}
           ); //in case of MAP
      
           auto muLabel = allocatePxlMemory<int>(pxlResources->context, dataCount);
           auto labelTensor = pxl::runtime::Tensor<int>(
               muLabel.get(), 
               {static_cast<size_t>(numTasks), (dataCount) / static_cast<size_t>(numTasks)}
           ); //in case of MAP
      
           ...
              
      
    3. Write data to device memory

           for (size_t i = 0; i < dataCount; i++)
           {
               std::memcpy(muData.get() + i * dimension, data[i].data(), dimension * sizeof(float));
           }
      
           std::memcpy(muLabel.get(), label.data(), dataCount * sizeof(int));
      
           ...
       }
      
  3. Run KNN

     // Iterate over each target
     for (size_t i = 0; i < target.size(); i++)
     {
    
         std::vector<std::vector<float>> mapResult(numTasks * config.k, std::vector<float>(config.dimension));
         std::vector<int> labelResult(numTasks * config.k);
    
         knnMap(pxlResources->context, pxlResources->map, dataTensor, labelTensor, target[i], config, numTasks, mapResult, labelResult); // Map
         knnParallel(pxlResources->context, pxlResources->parallel, data, label, target[i], config, numTasks, mapResult, labelResult); // Parallel
    
         // Validate and process results
         ...
     }
    
    1. Map The knnMap function is responsible for executing the KNN algorithm using the PXL framework’s Map mode. This includes:
      • Preparing the necessary tensors (target tensor, result tensor, and label tensor).
      • Executing the Map operation via the map->execute function.
      • Converts the results into a format that can be returned to the host.
       void knnMap(pxl::runtime::Context* context, pxl::runtime::Map* map, pxl::runtime::Tensor<float> dataTensor,
                   pxl::runtime::Tensor<int> labelTensor, const std::vector<float>& target,
                   const knnConfig& config, int numTasks,
                   std::vector<std::vector<float>>& result, std::vector<int>& labelResult)
       {
           // Allocate and prepare target tensor, notice here we allocate target memory for each task because we need same target for each task
           auto muTarget = allocatePxlMemory<float>(context, numTasks * config.dimension);
           auto targetTensor = pxl::runtime::Tensor<float>(muTarget.get(), {static_cast<size_t>(numTasks), config.dimension});
           for (int j = 0; j < numTasks; j++)
           {
               std::memcpy(muTarget.get() + j * config.dimension, target.data(), config.dimension * sizeof(float));
           }
      
           // Allocate result tensor
           auto muResult = allocatePxlMemory<float>(context, numTasks * config.dimension * config.k);
           auto resultTensor = pxl::runtime::Tensor<float>(muResult.get(),
                                                           {static_cast<size_t>(numTasks), config.dimension * static_cast<size_t>(config.k)});
      
           // Allocate label result tensor
           auto muLabelResult = allocatePxlMemory<int>(context, numTasks * config.k);
           auto labelResultTensor = pxl::runtime::Tensor<int>(muLabelResult.get(),
                                                           {static_cast<size_t>(numTasks), static_cast<size_t>(config.k)});
      
           // Execute the map operation
           if (!map->execute(dataTensor, labelTensor, targetTensor, config.k, config.dimension, config.dataCount / numTasks, resultTensor, labelResultTensor, config.useDotProduct))
           {
               throw std::runtime_error("Map Execute Failed");
           }
      
           map->synchronize();
      
           // Convert result to return format
           for (int i = 0; i < numTasks * config.k; i++)
           {
               for (int j = 0; j < config.dimension; j++)
               {
                   result[i][j] = muResult.get()[i * config.dimension + j];
               }
               labelResult[i] = muLabelResult.get()[i];
           }
       }
      

      Note: map automatically divides the data into chunks and distributes them to each task. As a result, the target tensor is divided into chunks. Ensure that the target is correctly written to each chunk.

    2. Parallel The knnParallel function is responsible for executing the KNN algorithm using the PXL framework’s Parallel mode. This method divides the workload among multiple tasks to achieve efficient parallel computation.

      • Allocates and prepares the necessary memory for the target.
      • Executes the Parallel operation via the parallel->execute function.
      • Converts the results into a format that can be returned to the host.
       void knnParallel(pxl::runtime::Context* context, pxl::runtime::Parallel* parallel, float* data,
                       int* label, const std::vector<float>& target,
                       const knnConfig& config, int numTasks,
                       std::vector<std::vector<float>>& result, std::vector<int>& labelResult)
       {
           // Allocate and prepare target tensor, ensuring the target memory is correctly written for processing
           auto muTarget = allocatePxlMemory<float>(context, config.dimension);
           std::memcpy(muTarget.get(), target.data(), config.dimension * sizeof(float));
      
           // Allocate result memory
           auto muResult = allocatePxlMemory<float>(context, numTasks * config.dimension * config.k);
      
           // Allocate label result memory
           auto muLabelResult = allocatePxlMemory<int>(context, numTasks * config.k);
      
           // Execute the parallel operation
           if (!parallel->execute(data, label, muTarget.get(), config, numTasks, muResult.get(), muLabelResult.get()))
           {
               throw std::runtime_error("Parallel Execute Failed");
           }
      
           parallel->synchronize();
      
           // Convert result to return format
           for (int i = 0; i < numTasks * config.k; i++)
           {
               for (int j = 0; j < config.dimension; j++)
               {
                   result[i][j] = muResult.get()[i * config.dimension + j];
               }
               labelResult[i] = muLabelResult.get()[i];
           }
       }
      
  4. Validate and Process Results

    Once the KNN algorithm has been executed in Map mode, the results need to be:

    1. Validated against the CPU-based host results.
    2. Processed for errors.
    3. Displayed or logged as the final classified label.

    The following code handles these tasks::

    • Validation: Compare the mapResult and labelResult from Map mode to the hostTopKResult and hostMajorityLabel from the host.
    • Error Handling: Log discrepancies using printErrorDetails.
    • Output: Print the final classified label to the console.
     // Get top k result from whole data
     auto [hostTopKResult, hostMajorityLabel] = getTopK(data, label, target[i], config.k, config.dimension, dataCount, config.useDotProduct);
    
     // Get top k from kernel result
     auto [mapTopKResult, mapMajorityLabel] = getTopK(mapResult, labelResult, target[i], config.k, config.dimension, numTasks * config.k, config.useDotProduct);
    
     if (mapTopKResult != hostTopKResult || mapMajorityLabel != hostMajorityLabel)
     {
         printErrorDetails<float>(directoryName, target[i], hostTopKResult, mapTopKResult, hostMajorityLabel, mapMajorityLabel, k);
     }
     else
     {
         std::cout << "Classified label : " << mapMajorityLabel[0] << std::endl;
     }
    

3. Build and Run

  1. Build Build the application using the build.sh script.

     ./build.sh
    
  2. Run executable and check results

     ./map_knn
     ./parallel_knn
    
     Classified label : 5