API Reference
Array
- class _shortfin_default.lib.array.DType
- class _shortfin_default.lib.array.storage
- allocate_device = <nanobind.nb_func object>
- allocate_host = <nanobind.nb_func object>
- copy_from
Copy contents from a source storage to this array.
This operation executes asynchronously and the effect will only be visible once the execution fiber has been synced to the point of mutation.
- fill
Fill a storage with a value.
Takes as argument any value that can be interpreted as a buffer with the Python buffer protocol of size 1, 2, or 4 bytes. The storage will be filled uniformly with the pattern.
This operation executes asynchronously and the effect will only be visible once the execution fiber has been synced to the point of mutation.
- map
Create a mapping of the buffer contents in host memory.
Support kwargs of:
read: Enables read access to the mapped memory.write: Enables write access to the mapped memory and will flush upon close (for non-unified memory systems).discard: Indicates that the entire memory map should be treated as if it will be overwritten. Initial contents will be undefined. Implies write=True.Mapping memory for access from the host requires a compatible buffer that has been created with host visibility (which includes host buffers).
The returned mapping object is a context manager that will close/flush on exit. Alternatively, the close() method can be invoked explicitly.
See also device_array.map() which functions similarly but allows some additional dtype specific accessors.
- class _shortfin_default.lib.array.base_array
- class _shortfin_default.lib.array.device_array(*args, **kwargs)
- copy_from
Copy contents from a source array to this array.
Equivalent to dest_array.storage.copy_from(source_array.storage).
- copy_to
Copy contents this array to a destination array.
Equivalent to dest_array.storage.copy_from(source_array.storage).
- property device
(self) -> _shortfin_default.lib.local.ScopedDevice
- fill
Fill an array with a value.
Note that fill is asynchronous and may not be visible immediately. For immediate manipulation of host visible arrays, assign to the items property or use the map(discard=True) to get a mapping object which can be used to directly update the contents.
Equivalent to array.storage.fill(pattern).
- for_device = <nanobind.nb_func object>
- for_host = <nanobind.nb_func object>
- for_transfer
- property items
Convenience shorthand for map(…).items
- map
Create a typed mapping of the buffer contents in host memory.
Support kwargs of:
read: Enables read access to the mapped memory.write: Enables write access to the mapped memory and will flush upon close (for non-unified memory systems).discard: Indicates that the entire memory map should be treated as if it will be overwritten. Initial contents will be undefined. Implies write=True.Mapping memory for access from the host requires a compatible buffer that has been created with host visibility (which includes host buffers).
The returned mapping object is a context manager that will close/flush on exit. Alternatively, the close() method can be invoked explicitly.
See also storage.map() which functions similarly but does not allow access to dtype specific functionality.
- property storage
(self) -> _shortfin_default.lib.array.storage
- view
Create a view of an array.
Either integer indices or slices can be passed to the view() method to create an aliased device_array that shares a subset of the storage. Only view() organizations that result in a row-major, dense array are currently supported.
- class _shortfin_default.lib.array.RandomGenerator(*args, **kwargs)
- _shortfin_default.lib.array.fill_randn(out: _shortfin_default.lib.array.device_array, generator: _shortfin_default.lib.array.RandomGenerator | None = None) None
Fills an array with numbers sampled from the standard ormal distribution.
Values are samples with a mean of 0 and standard deviation of 1.
This operates like torch.randn but only supports in place fills to an existing array, deriving shape and dtype from the output array.
- Parameters:
out – Output array to fill.
generator – Uses an explicit generator. If not specified, uses a global default.
- _shortfin_default.lib.array.argmax(input: _shortfin_default.lib.array.device_array, axis: int = -1, out: _shortfin_default.lib.array.device_array | None = None, *, keepdims: bool = False, device_visible: bool = False) _shortfin_default.lib.array.device_array
Returns the indices of the maximum values along an axis.
Implemented for dtypes: float16, float32.
- Parameters:
input – An input array.
axis – Axis along which to sort. Defaults to the last axis (note that the numpy default is into the flattened array, which we do not support).
keepdims – Whether to preserve the sort axis. If true, this will become a unit dim. If false, it will be removed.
out – Array to write into. If specified, it must have an expected shape and int64 dtype.
device_visible – Whether to make the result array visible to devices. Defaults to False.
- Returns:
A device_array of dtype=int64, allocated on the host and not visible to the device.
Local
- class _shortfin_default.lib.local.SystemBuilder(*args, **kwargs)
- class _shortfin_default.lib.local.System(*args, **kwargs)
- class _shortfin_default.lib.local.Node
- class _shortfin_default.lib.local.Device
- class _shortfin_default.lib.local.DeviceAffinity(*args, **kwargs)
- class _shortfin_default.lib.local.Program(*args, **kwargs)
- class _shortfin_default.lib.local.ProgramFunction
- property calling_convention
(self) -> str
- invocation
Creates an invocation object targeting the function.
This is a low-level interface for performing an invocation, and it should be used when precise, non-default control is needed.
- property isolation
(self) -> _shortfin_default.lib.local.ProgramIsolation
- property name
(self) -> str
- class _shortfin_default.lib.local.ProgramModule
- class _shortfin_default.lib.local.ProgramInvocation
- class _shortfin_default.lib.local.Fiber
- class _shortfin_default.lib.local.ScopedDevice
- class _shortfin_default.lib.local.Worker
- class _shortfin_default.lib.local.Process(*args, **kwargs)
- class _shortfin_default.lib.local.CompletionEvent(*args, **kwargs)
- class _shortfin_default.lib.local.Message(*args, **kwargs)
- class _shortfin_default.lib.local.Queue
- class _shortfin_default.lib.local.QueueWriter
- class _shortfin_default.lib.local.QueueReader
- class _shortfin_default.lib.local.Future
- class _shortfin_default.lib.local.VoidFuture(*args, **kwargs)
- class _shortfin_default.lib.local.MessageFuture
AMD GPU
AMDGPU system config
- class _shortfin_default.lib.local.amdgpu.SystemBuilder(*args, **kwargs)
- property amdgpu_allocator_specs
Allocator specs to apply to AMDGPU devices configured by this builder.
This uses syntax like:
some_allocator some_allocator:key=value some_allocator:key=value,key=value some_allocator:key=value,key=value;other_allocator:key=value
Typical values for some_allocator include caching and debug.
This can be set via a keyword of amdgpu_allocators, which will only apply to AMDGPU devices or allocators which will apply to all contained devices. Similarly, it is available on a SHORTFIN_ prefixed env variable if environment lookup is not disabled.
- property async_allocations
Whether to use async allocations if supported (default true).
- property available_devices
List of available device ids on the system.
Accessing this property triggers enumeration, so configuration needed to load libraries and perform basic system setup must be set first.
- property cpu_devices_enabled
Whether to create a heterogenous system with hostcpu and amdgpu devices.
Defaults to false. If enabled, the resulting system will contain both device types and it is up to application code to differentiate between them. All options for the hostcpu system builder are applicable in this case.
This option can be set as an option keyword with the name “amdgpu_cpu_devices_enabled” or the environment variable “SHORTFIN_AMDGPU_CPU_DEVICES_ENABLED=true” (if env_prefix was not changed at construction).
- property hip_lib_search_paths
List of directories to search for libamdhip64.so (or amdhip64.dll).
If empty, then dlopen will be used without a path, meaning that the library must be on the default search path or already loaded in the process (i.e. if running within an overall framework).
Each entry should be a directory, but a full path to a file can be given by prefixing with “file:”.
This option can be set as an option keyword with the name “amdgpu_hip_lib_search_path” or the environment variable “SHORTFIN_AMDGPU_HIP_LIB_SEARCH_PATH” (if env_prefix was not changed at construction). For compatibility with IREE tools, the “IREE_HIP_DYLIB_PATH” environment variable is searched as a fallback in all cases. Multiple paths can be separated by semicolons on all platforms.
- property logical_devices_per_physical_device
Number of logical devices to open per physical, visible device.
This option can be set as an option keyword with the name “amdgpu_logical_devices_per_physical_device” or the environment variable “SHORTFIN_AMDGPU_LOGICAL_DEVICES_PER_PHYSICAL_DEVICE” (if env_prefix was not changed at construction).
- property tracing_level
Tracing level for AMDGPU device behavior.
Controls the verbosity of tracing when Tracy instrumentation is enabled. The impact to benchmark timing becomes more severe as the verbosity increases, and thus should be only enabled when needed.
This is the equivalent of the –hip_tracing IREE tools flag. Permissible values are:
0 : stream tracing disabled.
1 : coarse command buffer level tracing enabled.
2 : (default) fine-grained kernel level tracing enabled.
The setting only has an effect if using a tracing enabled runtime (i.e. by running with SHORTFIN_PY_RUNTIME=tracy or equiv).
The default value for this setting is available as a amdgpu.SystemBuilder(amdgpu_tracing_level=2) or (by default) from an environment variable SHORTFIN_AMDGPU_TRACING_LEVEL.
- property visible_devices
Get or set the list of visible device ids.
If not set or None, then all available devices will be opened and added to the system. See the property available_devices to access this list of ids.
If set, then each device with the given device id will be opened and added to the system in the order listed. Note that in certain partitioned cases, multiple devices may be available with the same device id. In this case, duplicates in the visible devices list will cause instantiate a partition of the device in enumeration order (so there can be as many duplicates as physical partitions). This is an uncommon scenario and most users should not specify duplicate device ids. Since there are several ways that partitioned devices can be consumed, additional options will be available in the future for controlling this behavior.
This property can be set as an option keyword with the name “amdgpu_visible_devices” or the environment variable “SHORTFIN_AMDGPU_VISIBLE_DEVICES” (if env_prefix was not changed at construction). Multiples can be separated by a semicolon.
- class _shortfin_default.lib.local.amdgpu.AMDGPUDevice
Host
Host device management
- class _shortfin_default.lib.local.host.CPUSystemBuilder(*args, **kwargs)
- class _shortfin_default.lib.local.host.HostCPUDevice