Profiling

The profiling/ directory hosts utilities for measuring and profiling ORQ’s performance.

Contents:

  • stopwatch.h – Lightweight wall-clock timer.

  • thread_profiling.h – Thread-local CPU usage and timing utilities.

  • utils.h – Miscellaneous helpers shared across benchmarks.

Defines

LABEL_WIDTH
STOPWATCH_PREC
namespace orq
namespace benchmarking
namespace stopwatch

Typedefs

using sec = std::chrono::duration<float, std::chrono::seconds::period>

Functions

void timepoint(std::string label)

Mark a timepoint with the given label and output the elapsed time on Party 0. If this is the first time timepoint has been called, just print the label, and start the clock. Otherwise, outut the time since the last timepoint.

Parameters:

label

float get_elapsed()

Get the elapsed time without printing. Still registers intervals in the manner of timepoint. This means that interleaved calls to timepoint and get_elapsed will return the time elapsed since either was last called.

Returns:

float

void done()

Output the time elapsed since the first call to timepoint or get_elapsed. This is useful to call at the end of a program to see how long the entire execution takes.

done can be called multiple times; it will always output the elapsed time since the same initial timepoint.

void profile_init()

Initialized the profiler.

ORQ provides a primitive profiling utility based on the stopwatch which simply aggregates elapsed times registered under each label. The profiler should not be used when high-accuracy measurements are needed, but it is sufficient for simple tests and benchmarks.

void profile_timepoint(std::string label)

Register a profile timepoint under label. Semantics are similar to timepoint, but nothing is printed.

Parameters:

label

void profile_preprocessing(std::optional<std::string> label = {})

Register a preprocessing timepoint with an optional label. If no label is provided, update the last timepoint but do not measure the elapsed time. If a label is provided, measure the elapsed time for both this label and the special PREPROCESSING symbol.

Parameters:

label

void profile_comm(std::string label, double t)

Register a given interval for the given communication category. This function behaves differently due to architectural differences in measuring compute versus communication, but (TODO) should probably be updated.

Parameters:
  • label

  • t – time in seconds

void profile_done()

Complete profiling and output a profiling report. Prints each category of aggregated times and separates out preprocessing. Also prints out a breakdown of offline versus online time.

Variables

int partyID = 0
std::chrono::steady_clock::time_point _tp_first
static std::map<std::string, double> profile_times
static std::map<std::string, double> preproc_times
static std::map<std::string, double> comm_times
static std::chrono::steady_clock::time_point profile_last
static std::chrono::steady_clock::time_point preproc_last

Defines

HOST_NAME_MAX

Functions

std::string exec(const char *cmd)

Execute the command cmd and return its output.

Parameters:

cmd – command to run

Returns:

std::string stdout from the command

std::string prependHash(const std::string &str)

Prepend a hash (#) to each line on the input str and return the new string. Does not modify its input.

Parameters:

str

Returns:

std::string

std::string hostname()

Get the hostname of this machine. Returns only the first HOST_NAME_MAX characters of the host name. This value defaults to 256 but can be increased with a compile-time define.

Returns:

std::string

namespace orq
namespace instrumentation
namespace thread_stopwatch

Functions

uint64_t get_now_ns()

Get the current time of steady_clock in nanoseconds.

Returns:

uint64_t

double get_aggregate_comm(int pid = 0)

Return the sum of all measured timing events. TODO: this is incorrectly named; does not apply only to communication.

Parameters:

pid

Returns:

double

void init_map(std::thread::id tid)

Initialize the timing map for thread id tid.

Parameters:

tid

Variables

std::map<int64_t, std::atomic_uint64_t> timing

Map from thread ID to timing information. Switching to atomic u64 for communication timing purposes only.

TODO: revert this to fix write

class InstrumentBlock
#include <thread_profiling.h>

A utility class to measure the time taken in a given C++ code block. Instantiate a named instance of this class at, or near, the start of a block. The constructor of this class will record the current time, and the destructor (which fires when the block completes) computes the elapsed time and stores it in the timing map. InstrumentBlock records the time between its construction and the end of the block. Thus it need not time an entire block.

The string meta passed in the constructor labels a block. This information is saved to the output file and can be used by later analysis scripts.

It may be possible to use InstrumentBlock in other, non-block, contexts, by taking advantage of C++ scoping rules. This behavior has not been tested.

WARNING: without an object name, the compiler will immediately destroy the object, and no timing information will be available. That is,

InstrumentBlock _ib{"wait"}
works, but
InstrumentBlock{"wait"}
will not.

Public Functions

template<typename ...T>
inline InstrumentBlock(T... args)

Default constructor when INSTRUMENT_THREADS is not defined. This constructor is provided so that ORQ can be compiled without thread profiling, but without having to remove InstrumentBlocks sprinkled throughout the codebase. When profiling is turned off, the constructor does nothing and we use the default destructor.

Template Parameters:

T

Parameters:

args

Private Members

const int64_t tid
const uint64_t start
uint64_t end
const std::string meta

Defines

PROFILE(EXPR, NAME)

Typedefs

using sec = std::chrono::duration<float, std::chrono::seconds::period>
namespace orq
namespace benchmarking
namespace utils

Functions

static void print_bin(const int &num1, const int &num2, bool add_line)
template<typename... T> static void timeTest (void(func)(T...), std::string name, int batch_nums, int size, T... args)
template<typename T>
static int duration_to_ms(std::chrono::time_point<T> a, std::chrono::time_point<T> b)