Experiments

Experiments #

Before starting the benchmarking process, you need to define (i) the cluster under test, and (ii) the experiments you want to perform.

Defining your Cluster #

By defining you cluster, Benchpilot removes the effort of you needing to download docker, docker compose and every docker image you may need for your benchmarking on your worker devices. All you need to do is define your cluster inside the /BenchPilotSDK/conf/bench-cluster.yaml.

Inside the “bench-cluster.yaml” we define our cluster nodes. So as we can see below, we define our cluster with the keywoard “cluster”, and afterwards we define each worker with the keyword “node”.

For each worker, it is needed to define their ip, hostname, and authentication credentials. These credentials include the username, and password or the path to an ssh key.

On each node there is also the option to skip the orchestration setup. For example, if in an experiment we use Docker Swarm as our orchestrator, and that node is already registered as a worker in our swarm, then it will skip the registration process.

cluster:
  - node:
      ip: "xx.xx.xx.xx"
      hostname: "raspberrypi0"
      username: "your_username"
      password: "your_password"
      orchestrator_setup: "false"
  - node:
      ip: "xx.xx.xx.xx"
      hostname: "server0"
      username: "your_username"
      ssh_key_path: "/BenchPilotSDK/conf/ssh_key.pem" # this is an example of ssh key path
  ..

** Please keep in mind that every hostname and ip you declare needs to be accessible by the node you will run the BenchPilot client on.

Defining your Experiments #

After defining our cluster, our final step before starting the benchmarking process is to define our experiments! All you need to is define them inside the /BenchPilotSDK/conf/bench-experimens.yaml file using the BenchPilot Model.

The BenchPilot model is composed with experiments, where workloads are described.

For each workload, you have to define the following:

  • name, which will be selected from the supported workload list, e.g. “marketing-campaign”,
  • record name, so that you can later on retrieve monitoring metrics based on it,
  • number of repetitions,
  • workload duration,
  • specific workload parameters,
  • cluster under test, including the control plane’s node public IP, and the list of the worker nodes’ hostnames,
  • engine configurations, in case of streaming distributed-based workloads

An example of the bench-experimens.yaml can be found below:

experiments:
  - workload:
      name: "marketing-campaign"
      record_name: "storm_x1"
      repetition: 1
      duration: "10m"
      parameters:
        campaigns: 100
        tuples_per_second: 1000
        capacity_per_window: 10
        time_divisor: 10000
      cluster:
        manager: "xx.xx.xx.xx" # control-plane node public ip
        nodes: [ "raspberrypi0", "server0" ] # workers' hostnames
        # define http(s)_proxy only if your BenchPilot client is placed behind a proxy
        http_proxy: "${http_proxy}"
        https_proxy: "${https_proxy}"
      engine:
        name: "storm"
        parameters:
          partitions: 5
          ackers: 2
  - workload:
      name: "marketing-campaign"
      record_name: "flink_x1"
      repetition: 5
      duration: "30m"
        ..

** BenchPilot will run the experiments with the order you described them.