API Reference

Array

class _shortfin_default.lib.array.DType
class _shortfin_default.lib.array.storage
allocate_device = <nanobind.nb_func object>
allocate_host = <nanobind.nb_func object>
copy_from

Copy contents from a source storage to this array.

This operation executes asynchronously and the effect will only be visible once the execution fiber has been synced to the point of mutation.

fill

Fill a storage with a value.

Takes as argument any value that can be interpreted as a buffer with the Python buffer protocol of size 1, 2, or 4 bytes. The storage will be filled uniformly with the pattern.

This operation executes asynchronously and the effect will only be visible once the execution fiber has been synced to the point of mutation.

map

Create a mapping of the buffer contents in host memory.

Support kwargs of:

read: Enables read access to the mapped memory.
write: Enables write access to the mapped memory and will flush upon close (for non-unified memory systems).
discard: Indicates that the entire memory map should be treated as if it will be overwritten. Initial contents will be undefined. Implies write=True.

Mapping memory for access from the host requires a compatible buffer that has been created with host visibility (which includes host buffers).

The returned mapping object is a context manager that will close/flush on exit. Alternatively, the close() method can be invoked explicitly.

See also device_array.map() which functions similarly but allows some additional dtype specific accessors.

class _shortfin_default.lib.array.base_array
class _shortfin_default.lib.array.device_array(*args, **kwargs)
copy_from

Copy contents from a source array to this array.

Equivalent to dest_array.storage.copy_from(source_array.storage).

copy_to

Copy contents this array to a destination array.

Equivalent to dest_array.storage.copy_from(source_array.storage).

property device

(self) -> _shortfin_default.lib.local.ScopedDevice

fill

Fill an array with a value.

Note that fill is asynchronous and may not be visible immediately. For immediate manipulation of host visible arrays, assign to the items property or use the map(discard=True) to get a mapping object which can be used to directly update the contents.

Equivalent to array.storage.fill(pattern).

for_device = <nanobind.nb_func object>
for_host = <nanobind.nb_func object>
for_transfer
property items

Convenience shorthand for map(…).items

map

Create a typed mapping of the buffer contents in host memory.

Support kwargs of:

read: Enables read access to the mapped memory.
write: Enables write access to the mapped memory and will flush upon close (for non-unified memory systems).
discard: Indicates that the entire memory map should be treated as if it will be overwritten. Initial contents will be undefined. Implies write=True.

Mapping memory for access from the host requires a compatible buffer that has been created with host visibility (which includes host buffers).

The returned mapping object is a context manager that will close/flush on exit. Alternatively, the close() method can be invoked explicitly.

See also storage.map() which functions similarly but does not allow access to dtype specific functionality.

property storage

(self) -> _shortfin_default.lib.array.storage

view

Create a view of an array.

Either integer indices or slices can be passed to the view() method to create an aliased device_array that shares a subset of the storage. Only view() organizations that result in a row-major, dense array are currently supported.

class _shortfin_default.lib.array.RandomGenerator(*args, **kwargs)
_shortfin_default.lib.array.fill_randn(out: _shortfin_default.lib.array.device_array, generator: _shortfin_default.lib.array.RandomGenerator | None = None) None

Fills an array with numbers sampled from the standard ormal distribution.

Values are samples with a mean of 0 and standard deviation of 1.

This operates like torch.randn but only supports in place fills to an existing array, deriving shape and dtype from the output array.

Parameters:
  • out – Output array to fill.

  • generator – Uses an explicit generator. If not specified, uses a global default.

_shortfin_default.lib.array.argmax(input: _shortfin_default.lib.array.device_array, axis: int = -1, out: _shortfin_default.lib.array.device_array | None = None, *, keepdims: bool = False, device_visible: bool = False) _shortfin_default.lib.array.device_array

Returns the indices of the maximum values along an axis.

Implemented for dtypes: float16, float32.

Parameters:
  • input – An input array.

  • axis – Axis along which to sort. Defaults to the last axis (note that the numpy default is into the flattened array, which we do not support).

  • keepdims – Whether to preserve the sort axis. If true, this will become a unit dim. If false, it will be removed.

  • out – Array to write into. If specified, it must have an expected shape and int64 dtype.

  • device_visible – Whether to make the result array visible to devices. Defaults to False.

Returns:

A device_array of dtype=int64, allocated on the host and not visible to the device.

Local

class _shortfin_default.lib.local.SystemBuilder(*args, **kwargs)
class _shortfin_default.lib.local.System(*args, **kwargs)
class _shortfin_default.lib.local.Node
class _shortfin_default.lib.local.Device
class _shortfin_default.lib.local.DeviceAffinity(*args, **kwargs)
class _shortfin_default.lib.local.Program(*args, **kwargs)
class _shortfin_default.lib.local.ProgramFunction
property calling_convention

(self) -> str

invocation

Creates an invocation object targeting the function.

This is a low-level interface for performing an invocation, and it should be used when precise, non-default control is needed.

property isolation

(self) -> _shortfin_default.lib.local.ProgramIsolation

property name

(self) -> str

class _shortfin_default.lib.local.ProgramModule
class _shortfin_default.lib.local.ProgramInvocation
class _shortfin_default.lib.local.Fiber
class _shortfin_default.lib.local.ScopedDevice
class _shortfin_default.lib.local.Worker
class _shortfin_default.lib.local.Process(*args, **kwargs)
class _shortfin_default.lib.local.CompletionEvent(*args, **kwargs)
class _shortfin_default.lib.local.Message(*args, **kwargs)
class _shortfin_default.lib.local.Queue
class _shortfin_default.lib.local.QueueWriter
class _shortfin_default.lib.local.QueueReader
class _shortfin_default.lib.local.Future
class _shortfin_default.lib.local.VoidFuture(*args, **kwargs)
class _shortfin_default.lib.local.MessageFuture

AMD GPU

AMDGPU system config

class _shortfin_default.lib.local.amdgpu.SystemBuilder(*args, **kwargs)
property amdgpu_allocator_specs

Allocator specs to apply to AMDGPU devices configured by this builder.

This uses syntax like:

some_allocator
some_allocator:key=value
some_allocator:key=value,key=value
some_allocator:key=value,key=value;other_allocator:key=value

Typical values for some_allocator include caching and debug.

This can be set via a keyword of amdgpu_allocators, which will only apply to AMDGPU devices or allocators which will apply to all contained devices. Similarly, it is available on a SHORTFIN_ prefixed env variable if environment lookup is not disabled.

property async_allocations

Whether to use async allocations if supported (default true).

property available_devices

List of available device ids on the system.

Accessing this property triggers enumeration, so configuration needed to load libraries and perform basic system setup must be set first.

property cpu_devices_enabled

Whether to create a heterogenous system with hostcpu and amdgpu devices.

Defaults to false. If enabled, the resulting system will contain both device types and it is up to application code to differentiate between them. All options for the hostcpu system builder are applicable in this case.

This option can be set as an option keyword with the name “amdgpu_cpu_devices_enabled” or the environment variable “SHORTFIN_AMDGPU_CPU_DEVICES_ENABLED=true” (if env_prefix was not changed at construction).

property hip_lib_search_paths

List of directories to search for libamdhip64.so (or amdhip64.dll).

If empty, then dlopen will be used without a path, meaning that the library must be on the default search path or already loaded in the process (i.e. if running within an overall framework).

Each entry should be a directory, but a full path to a file can be given by prefixing with “file:”.

This option can be set as an option keyword with the name “amdgpu_hip_lib_search_path” or the environment variable “SHORTFIN_AMDGPU_HIP_LIB_SEARCH_PATH” (if env_prefix was not changed at construction). For compatibility with IREE tools, the “IREE_HIP_DYLIB_PATH” environment variable is searched as a fallback in all cases. Multiple paths can be separated by semicolons on all platforms.

property logical_devices_per_physical_device

Number of logical devices to open per physical, visible device.

This option can be set as an option keyword with the name “amdgpu_logical_devices_per_physical_device” or the environment variable “SHORTFIN_AMDGPU_LOGICAL_DEVICES_PER_PHYSICAL_DEVICE” (if env_prefix was not changed at construction).

property tracing_level

Tracing level for AMDGPU device behavior.

Controls the verbosity of tracing when Tracy instrumentation is enabled. The impact to benchmark timing becomes more severe as the verbosity increases, and thus should be only enabled when needed.

This is the equivalent of the –hip_tracing IREE tools flag. Permissible values are:

  • 0 : stream tracing disabled.

  • 1 : coarse command buffer level tracing enabled.

  • 2 : (default) fine-grained kernel level tracing enabled.

The setting only has an effect if using a tracing enabled runtime (i.e. by running with SHORTFIN_PY_RUNTIME=tracy or equiv).

The default value for this setting is available as a amdgpu.SystemBuilder(amdgpu_tracing_level=2) or (by default) from an environment variable SHORTFIN_AMDGPU_TRACING_LEVEL.

property visible_devices

Get or set the list of visible device ids.

If not set or None, then all available devices will be opened and added to the system. See the property available_devices to access this list of ids.

If set, then each device with the given device id will be opened and added to the system in the order listed. Note that in certain partitioned cases, multiple devices may be available with the same device id. In this case, duplicates in the visible devices list will cause instantiate a partition of the device in enumeration order (so there can be as many duplicates as physical partitions). This is an uncommon scenario and most users should not specify duplicate device ids. Since there are several ways that partitioned devices can be consumed, additional options will be available in the future for controlling this behavior.

This property can be set as an option keyword with the name “amdgpu_visible_devices” or the environment variable “SHORTFIN_AMDGPU_VISIBLE_DEVICES” (if env_prefix was not changed at construction). Multiples can be separated by a semicolon.

class _shortfin_default.lib.local.amdgpu.AMDGPUDevice

Host

Host device management

class _shortfin_default.lib.local.host.CPUSystemBuilder(*args, **kwargs)
class _shortfin_default.lib.local.host.HostCPUDevice