Nway Discovery API
From OSR
The user-level application manager(s) requires some way to interrogate the physical hardware resources available on a compute node. This is typically done once at startup and then the information is cached at user level. Every compute node will consist of one or more memories, one or more CPU cores, and one or more network interfaces. Additionally, a compute node may contain additional devices such as a GPU, FPGA, or accelerator. The following sections describe an API for discovering these resources at runtime.
Contents |
Memory Discovery
Each compute node consists of one or more physical memories. Each memory is a region of contiguous physical memory that generally has the same properties (e.g., the memory attached to a memory controller). Memories and cores are arranged in a fixed topology such that certain memories will be closer to a particular core than others. A high performance application will attempt to allocate memory that is near the core or cores that will be using it.
An application can determine the number of memories available by calling get_num_mems(). The memories are logically numbered from 0 to the number returned minus 1.
int get_num_mems()
An application can get information about a memory by calling get_mem_info(). The structure returned contains static information that will not change.
int
get_mem_info(
int mem_id, /* logical id of the memory */
struct mem_info * info /* output: the memory's info */
)
struct mem_info {
int mem_id, /* logical id of the memory */
mem_type_t mem_type, /* type of memory (normal,etc)*/
char name[16], /* string identifier */
void * base_addr, /* base physical address */
size_t extent, /* extent in bytes */
uint64_t rel_speed /* relative speed of memory */
}
Core Discovery
Each compute node consists of one or more CPU cores, each capable of running one or more threads of execution.
The number of cores available can be determined by get_num_cores(). The cores are logically numbered from 0 to the value returned minus 1.
int get_num_cores()
An application can get information about a core by calling get_core_info(). The structure returned contains static information that will not change.
int
get_core_info(
int core_id, /* logical id of the core */
struct core_info * info /* output: the core's info */
)
struct core_info {
int core_id, /* logical id of the core */
core_type_t core_type, /* type of core (arch, func, etc.) */
char name[16], /* string identifier */
int num_threads, /* number of hw execution contexts */
uint64_t rel_speed, /* relative speed of core */
uint64_t page_sizes /* supported page sizes (2^bit) */
core_set_t shares_l1_with[2], /* cores sharing same L1/2/3 cache */
core_set_t shares_l2_with[2], /* index 0 = instruction cache */
core_set_t shares_l3_with[2] /* index 1 = data cache */
}
Device Discovery
Each compute node will contain some number of network interfaces (NICs) and possibly other devices (GPUs, FPGAs, accelerators, etc.). Each of these devices will support one or more simultanous users (contexts). For example, a NIC may be able to support four host contexts simultaneously. The user-level node manager is responsible for assigning device contexts to host processes. The device's kernel-level driver takes care of making the device accessible by the process.
The number of devices attached to the compute node is determined by get_num_devs. The devices are logically numbered from 0 to the value returned minus 1.
int get_num_devs()
An application can get (rather generic) information about a device by calling get_dev_info(). The structure returned contains static information that will not change.
int
get_dev_info(
int dev_id, /* logical id of the device */
struct dev_info * info /* output: the devices's info */
)
struct dev_info {
int dev_id, /* logical id of the nic */
dev_type_t dev_type, /* type of nic (net, vid, etc)*/
char name[16], /* string identifier */
int num_contexts, /* # simultaneous contexts */
uint64_t rel_speed /* relative speed of the dev */
}
Topology Discovery
Memories that are physically nearby a given core are determined with get_nearby_mems(). This function allows a memory type and a distance range to be specified. Specifying MEM_TYPE_ANY for mem_type will cause all memories to be considered. Specify 0 and -1 for the min_dist and max_dist arguments respectively will return a mask of all memories of the specified type that are accessible by the core.
int
get_nearby_mems(
int core_id, /* logical id of the core */
mem_type_t mem_type, /* the type of memory requested */
int min_dist, /* minimum distance away */
int max_dist, /* maximum distance away */
mem_set_t * nearby_mems /* mems falling within range */
)
Cores that are physically nearby a given core (e.g., perhaps in the same package) are determined with get_nearby_cores. It functions similarly to get_nearby_mems() above. Specifying CORE_TYPE_ANY for core_type will cause all cores to be considered.
int
get_nearby_cores(
int core_id, /* logical id of the core */
core_type_t core_type, /* the type of core requested */
int min_dist, /* minimum distance away */
int max_dist, /* maximum distance away */
core_set_t * nearby_cores /* cores falling within range */
)
Devices that are physically nearby a given core are determined with get_nearby_devs(). It functions similarly to get_nearby_mems() above. Specifying DEV_TYPE_ANY for dev_type will cause all devices to be considered.
int
get_nearby_devs(
int core_id, /* logical id of the core */
dev_type_t dev_type, /* the type of device requested */
int min_dist, /* minimum distance away */
int max_dist, /* maximum distance away */
dev_mask_t * nearby_devs /* devs falling within range */
)