6G-SANDBOX-PowerStorm #
The PowerStorm Framework, built on top of Apache Storm, aims to optimize the performance and energy efficiency of underlying infrastructures by strategically assigning streaming analytic job operators to worker nodes.
For this project, we’ve purchased seven UE (User Equipment) devices to showcase the usability of our scheduler, namely, three Raspberry Pi 5, two Jetson Nano, one Jetson NX Xavier, and one Jetson AGX Orin. For each UE device, we have also used the appropriate 5G toolkit.
For the extraction of monitoring measurements we have utilized the following technologies: Netdata, Cadvisor, Prometheus and Consul, as to automatically capture and store both utilization and power consumption metrics.
Results #
To conduct our experiments, we leveraged Berlin’s testbed and introduced our 5G-enabled compute nodes, creating three distinct configurations. Below is a detailed description of each setup:
Setup | Description |
---|---|
Local Network via Ethernet | Edge compute nodes connected to a local router via Ethernet - these were out initial experiments |
Edge Cluster via 5G Connection | Three virtual machines (VMs) provided by Pilot:
|
5G Connectivity of UE devices | All 5G-enabled UE devices connected to Berlin's RAN (Radio Access Network). Trials were repeated over the 5G network to evaluate performance under these conditions. |
Figure 1: Latency & Tuples Comparison for Local Ethernet, 5G Edge Cluster, and UE Devices.
In terms of energy consumption within the local network (ethernet), both PowerStorm and Resource-aware execution demonstrated a reduction in energy use, with decreases of approximately 15% and 5.9% respectively. These results align with our initial expectations based on preliminary trials. Next, we analyzed energy consumption during the Edge Cluster execution, involving 5G data generation. In this scenario, PowerStorm resulted in a 13.2% increase in energy consumption, while the Resource-aware scheduler exhibited a more substantial rise of 135.9%. Despite the increased energy demands, both approaches provided superior performance in latency reduction and overall tuples processed, illustrating the trade-off between energy efficiency and performance gains in this setting. Finally, we compared the energy consumption across three configurations, namely, the default execution of Apache Storm, the PowerStorm-enabled deployment, and the resource-aware deployment, in the 5G RAN network. In this comparison, PowerStorm showed a minimal energy increase of just 0.43%, whereas the Resource-aware scheduler nearly doubled the system’s energy consumption, with an increase of 93.7%.
When comparing the results across different deployments (Ethernet, 5G-Edge, and 5G-RAN), we observed several noteworthy findings. The Edge cluster proved to be the most power-intensive, primarily due to the use of conventional server hardware. In terms of energy consumption, there was little difference between devices connected via Ethernet and those connected via 5G. However, executing our workload over the 5G network significantly enhanced performance.
To this end, our experiments and analyses demonstrate that modern big data streaming engines are well-suited for operation on 5G networks. Running these engines on 5G networks offers performance benefits comparable to those of Ethernet connectivity. Furthermore, deploying 5G-enabled User Equipment (UE) devices with processing capabilities enhances both performance and energy efficiency, as evidenced by our experimental results.
Datasets #
Our datasets can be found under the /datasets
folder. Inside, one can find the summary
and raw
folders. As their name implies, we include both the raw datasets that include the collected utilization and application measurements, whilst, the summary includes their summarized information.
Our dataset naming convention is straightforward. The first word identifies the scheduler used in the configuration, while the second word specifies the primary infrastructure or communication network employed. For example, “default_5G” refers to the dataset generated from experiments conducted using the default Apache Storm scheduler on UE devices connected via the 5G network.
The following table outlines the mapping of dataset keyword names:
Keyword | Scheduler | Experiment Setup | Description |
---|---|---|---|
default | ✅ | Utilizes the Apache Storm default scheduler | |
resource | ✅ | Utilizes the Apache Storm Resource-Aware scheduler | |
energy | ✅ | Utilizes our Powerstorm scheduler | |
5G | ✅ | Uses the "5G Connectivity of UE devices" setup | |
edge | ✅ | Uses the "Edge Cluster via 5G Connection" setup | |
ethernet | ✅ | Uses the "Local Network via Ethernet" setup |
How to use PowerStorm Scheduler #
Under /storm
we include the Apache Storm 2.2.0 binaries and code, including our PowerStorm scheduler.
Step 1: Add Scheduler Code to the Nimbus Node #
To use the PowerStorm Scheduler, the you must first overwrite their Apache Storm folder with the contents of the zip file located at /code/storm-2.2.0/storm-dist/binary/final-package/target/apache-storm-2.2.0.zip
. This step is required only for the resource that will act as the Nimbus, as it needs access to the scheduler’s code.
If you wish to use a different version of Apache Storm, you will need to rebuild its binaries. For detailed instructions on how to do this, refer to their developer guide.
Step 2: Specify Worker Configurations Individually #
To ensure our scheduler functions correctly, you need to configure certain settings on each node. This involves adding new parameters to their storm.yaml
configuration file. Specifically, the parameters to be included are as follows:
- supervisor.cores, which is the total number of CPU cores of the device
- supervisor.clock, , the CPU clock measured in GHz
- supervisor.power.idle, the power that the node consumes on idle, in Watts
- supervisor.power.max, the maximum power it can consume, measured in Watts
Here’s a storm.yaml
example on how to define those parameters:
nimbus.seeds: ["nimbus"]
storm.zookeeper.servers:
- "zookeeper"
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703
supervisor.cores: 4
supervisor.clock: 1.8
supervisor.power.idle: 3
supervisor.power.max: 8
Step 3: Configure Scheduler Selection on the Nimbus Node #
The last step, is to specify to nimbus that it needs to use the Powerstorm strategy. To do that, you need to add the following lines on your storm.yaml
on your nimbus node:
storm.scheduler: "org.apache.storm.scheduler.resource.ResourceAwareScheduler"
topology.scheduler.strategy: "org.apache.storm.scheduler.resource.strategies.scheduling.EnergyAwareStrategy"
topology.scheduler.energy_awareness: "1.0"
The PowerStorm scheduler uses the Resource Aware Scheduler code underneath, hence why we need the first configuration line. The second line, specifies that it will use the Powerstorm strategy, which is named as “EnergyAwareStrategy”. Lastly, one can define the “energy awareness”. Our scheduler takes into consideration both the energy that one node can consume and it’s processing capability. In case of adding “energy_awareness” 1.0, then the scheduler will only take into consideration to prioritize the nodes that consume the less energy. In case of adding 0.0 as energy_awareness, then it will take into consideration the resource capabilities, as the Resource Aware Scheduler does. One can also define another percentage on how “energy aware” or “resource aware” where it could any number between 0 and 1.
Workload & Configurations #
For these experiments, we have employed the widely known Yahoo Streaming Benchmark, which is designed to simulate a data processing pipeline for extracting insights from marketing campaigns using Apache Storm. The pipeline executed on the worker device includes steps such as receiving advertising traffic data, filtering the data, removing any unnecessary values, combining the data with existing information from a key-value store, and storing the final results. All data produced by a data generator is pushed and extracted through a message queue (Apache Kafka), while intermediate data and final results are stored in an in-memory database (Redis).
For evaluating the performance of this application, we extract the following measurements from the benchmarking log files:
Metric | Description |
---|---|
# of Tuples | The total number of tuples processed during execution |
Latency | The total application latency, measured in ms, based on the statistics provided by Apache Storm for each deployed task |
In regards to the application parameters we used the following ones:
Application Parameters |
---|
|