Workloads #
As for now BenchPilot only supports the following containerized workloads:
Name | Description | Specific Configuration Parameters |
---|---|---|
marketing-campaign | A streaming distributed workload that features an application as a data processing pipeline with multiple and diverse steps that emulate insight extraction from marketing campaigns. The workload utilizes technologies such as Kafka and Redis. |
|
mlperf | An inference workload, that includes tasks such as image classification and object recognition. |
|
db | A nosql database workload that keeps executing CRUD operations (Create, Read, Update and Delete). |
|
simple | This workload represents simple stressors that target a specific resource to stress. Underneath uses the linux command stress or IPerf3. |
|
It’s important to note that BenchPilot can be easily extended to add new workloads.
For extending BenchPilot check this section out.
Detailed Workload Information #
Streaming Analytic Workload / Marketing-Campaign #
For this workload, we have employed the widely known Yahoo Streaming Benchmark, which is designed to simulate a data processing pipeline for extracting insights from marketing campaigns. the pipeline executed on the edge device includes steps such as receiving advertising traffic data, filtering the data, removing any unnecessary values, combining the data with existing information from a key-value store, and storing the final results. All data produced by a data generator is pushed and extracted through a message queue (Apache Kafka), while intermediate data and final results are stored in an in-memory database (Redis). This workload can be executed using any of the following distributed streaming processing engines: Apache Storm, Flink, Spark.
Collected Performance Metrics #
For evaluating the performance of this application, we extract the following measurements from the benchmarking log files:
Metric | Description |
---|---|
# of Tuples | The total number of tuples processed during execution |
Latency | The total application latency, measured in ms, based on the statistics provided by the selected underlying processing engine for each deployed task |
Distributed Processing Engine Parameters #
As mentioned, this workload can be executed using any of the following streaming distributed processing engines: Apache Storm, Flink or Spark.
For each of those engine, the user can alter/define the following attributes:
Engine | Storm | Flink | Spark |
---|---|---|---|
Parameters |
|
|
|
Machine Learning Inference Workload / MLPerf #
We use MLPerf, a benchmark for machine learning training and inference, to assess the performance of our inference system. Currently, our focus is on two MLPerf tasks:
- Image Classification: This task uses the ImageNet 2012 dataset (resized to 224x224) and measures Top-1 accuracy. MLPerf provides two model options: ResNet-50 v1.5, which excels in image classification, and RetinaNet, which is effective in object detection and bounding box prediction.
- Object Detection: This task identifies and classifies objects within images, locating them with bounding boxes. MLPerf uses two model configurations: a smaller, 300x300 model for low-resolution tasks (e.g., mobile devices) and a larger, high-resolution model (1.44 MP). Performance is measured by mean average precision (mAP). The SSD model with a ResNet-34 backbone is the default for this task.
Additionally, we have extended MLPerf by adding network-serving capabilities to measure the impact of network overhead on inference. Our setup includes:
- A lightweight server that loads models and provides a RESTful API.
- A workload generator that streams images to the server one-by-one (“streaming mode”), contrasting with MLPerf’s standard local loading (“default mode”).
For this workload, it is possible to flexibly and easily configure the dataset, latency, batch size, workload duration, thread count, and inference framework (ONNX, NCNN, TensorFlow, or PyTorch).
Collected Performance Metrics #
For evaluating the performance of this application, we extract the following measurements from the benchmarking log files:
Metric | Description |
---|---|
Accuracy % | The model's accuracy that was measured during the benchmarking period |
Average and/or Total Queries per Second | The # of queries that were executing during the experiment - each query represents the processing a batch of images |
Mean Latency | The application's mean latency, measured in ms |
NoSQL Database Workload #
Through the Yahoo! Cloud Serving Benchmark (YCSB) workload, one can evaluate NoSQL databases like MongoDB, Redis, Cassandra, and Elasticsearch under heavy load. YCSB tests basic operations—read, update, and insert—on each database using defined operation rates across an experiment’s duration. Currently, BenchPilot only supports MongoDB as the underlying database. However, it can be easily adapted to the rest of the databases by containerizing them.
Additionally, YCSB supports three workload distributions:
- Zipfian: Prioritizes frequently accessed items.
- Latest: Similar to Zipfian but focuses on recently inserted records.
- Uniform: Accesses items randomly.
For this benchmark, users can adjust various parameters, including the number of records, total operations, load distribution, operation rate, and experiment duration. It also supports multiple threads for increased database load through asynchronous operations.
Collected Performance Metrics #
For evaluating the performance of this application, we can extract the following measurements from the log files:
Metric | Description |
---|---|
Count | Total number of operations per second for each minute of the experiment |
Min | Min number of operations per second for each minute of the experiment |
Max | Max number of operations per second for each minute of the experiment |
Average | Average number of operations per second for each minute of the experiment |