Reference

Seamless: a framework for reproducible computing and interactive workflows

Author: Sjoerd de Vries Copyright 2016-2023, INSERM, CNRS and project contributors

Seamless high-level API.

Has a two-fold function:

1. Maintain a workflow graph containing nodes (cells, transformers etc.), checksums, and connections. This workflow graph is pure data that can be serialized any time to JSON (.seamless file).

2. Maintain a translation of the workflow graph to a low-level representation that is constantly being evaluated. Interrogate the low-level representation (asking for its status, checksums, etc.).

seamless.highlevel.load_graph(graph, *, zip=None, cache_ctx=None, static=False, mounts=True, shares=True)[source]

Load a Context from graph.

“graph” can be a file name or a JSON dict Normally, it has been generated with Context.save_graph / Context.get_graph

“zip” can be a file name, zip-compressed bytes or a Python ZipFile object. Normally, it has been generated with Context.save_zip / Context.get_zip

“cache_ctx”: re-use a previous context for caching (e.g. checksum-to-buffer caching)

“static”: create a StaticContext instead

“mounts”: mount cells and pins to the file system, as specified in the graph.

“shares”: share cells over HTTP, as specified in the graph

Context class

class seamless.highlevel.Context(manager: Manager | None = None)[source]

Context class. Organizes cells and workers hierarchically.

Wrapper around a workflow graph, which can be serialized as JSON to a .seamless file. Changing the workflow topology (by adding/removing children or connections, or by changing celltypes) marks the context as “untranslated”. Untranslated graphs can be translated explicitly, or implicitly (with the .compute method). Upon translation, wraps a a low-level context object (seamless.core.context). This context does all the work and holds all the data. Most of the methods and properties of the Seamless high-level classes (Cell, Transformer, etc.) are wrappers that interact with their low-level counterparts. Seamless low-level contexts accept value changes but not modifications in topology.

Typical usage: ```python ctx = Context() ctx.a = 32 # equivalent to: ctx.a = Cell().set(32) def func(a, b):

return a + b

ctx.func = func # equivalent to:

# ctx.func = Transformer(); ctx.func.set(func)

ctx.func.a = ctx.a ctx.func.b = 16 ctx.result = ctx.func.result ctx.compute() assert ctx.result.value == 48 ```

See http://sjdv1982.github.io/seamless/sphinx/html/context.html for documentation

add_zip(zip, incref: bool = False)[source]

Add entries from “zip” to the checksum-to-buffer cache.

“zip” can be a file name, zip-compressed bytes or a Python ZipFile object. Normally, it has been generated with Context.save_zip / Context.get_zip

Note that caching is temporary and entries will be removed after some time if no element (cell, expression, or high-level library) holds their checksum This can be overridden with “incref=True” (not recommended for long-living contexts)

property children

Return a wrapper for the direct children of the context. This includes subcontexts and libinstances

async computation(timeout: float | None = None, report: float = 2)[source]

Block until no more computation is required.

This means that all cells and transformers have either been computed, or they have an error, or they have unsatisfied upstream dependencies.

The graph is first (re-)translated, if necessary.

compute(timeout: float | None = None, report: float = 2)[source]

Block until no more computation is required.

This means that all cells and transformers have either been computed, or they have an error, or they have unsatisfied upstream dependencies.

The graph is first (re-)translated, if necessary.

This function can only be invoked if no event loop is running, i.e. under python or ipython, but not in a Jupyter kernel.

property environment

Return the global execution environment of the context

classmethod from_graph(graph: str | dict, manager: Manager | None, *, mounts: bool = True, shares: bool = True, share_namespace: str | None = None, zip: str | bytes | ZipFile | None = None)[source]

Construct a Context from a graph

“graph” can be a file name or a JSON dict Normally, it has been generated with Context.save_graph / Context.get_graph

“zip” can be a file name, zip-compressed bytes or a Python ZipFile object. Normally, it has been generated with Context.save_zip / Context.get_zip

“manager”: re-use the manager of a previous context. The manager controls caching and execution.

“mounts”: mount cells and pins to the file system, as specified in the graph.

“shares”: share cells over HTTP, as specified in the graph

“share_namespace”: The namespace to use for HTTP sharing (“ctx”by default)

get_children(type: str | None = None) list[str][source]

Select all children that are directly ours. A sorted list of strings (attribute names) is returned.

It is possible to define a type of children, which can be one of:

cell, transformer, context, macro, module, foldercell, deepcell, or deepfoldercell

If type is None, all children and descendants are returned: - SubContexts are not returned, but their children and descendants are (with full path info) - For LibInstance, the children and descendants of the generated SynthContext is returned

get_graph(runtime: bool = False) dict[str, Any][source]

Return the graph in JSON format.

“runtime”: The graph is returned after Library/LibInstance/Macro transformations of the graph.

Get all Link (bidirectional cell-cell) connections.

get_zip(with_libraries: bool = True) bytes[source]

Obtain the checksum-to-buffer cache for the current graph.

The cache is returned as zipped bytes.

async get_zip_async(with_libraries: bool = True) bytes[source]

Obtain the checksum-to-buffer cache for the current graph

The cache is returned as zipped bytes.

include(lib: Library, only_zip: bool = False, full_path: bool = False)[source]

Include a library in the graph.

A library (seamless.highlevel.Library) must be included before library instances (seamless.highlevel.LibInstance) can be constructed using ctx.lib

property lib

Returns the libraries that were included in the graph

Create a bidirectional link between the first and second cell.

Both cells must be authoritative (independent).

property live_share_namespace

The actual namespace for sharing cells by the HTTP server

Cells are shared under:

http://<shareserver URL>/<live_share_namespace>/<cell path>

The live share namespace is in principle equal to the share namespace, but if it is already taken, a number will be added to it (ctx1, ctx2, etc.)

Default: “ctx”

load_vault(vault_directory: str, incref: bool = False)[source]

Load the contents of a vault directory in the checksum-to-buffer cache.

Normally, the vault has been generated with Context.save_vault.

Note that caching is temporary and entries will be removed after some time if no element (cell, expression, or high-level library) holds their checksum This can be overridden with “incref=True” (not recommended for long-living contexts).

observe(path, callback, polling_interval, observe_none=False, params=None)[source]

Observe attributes of the context, analogous to Cell.observe.

remove_connections(path, *, runtime=False, endpoint='both', match='sub')[source]

Remove all connections/links with source or target matching path.

“endpoint” can be “source”, “target”, “connection”, “link” or “all”.

With endpoint “source”, only remove connections where the source matches path. Don’t remove links.

With endpoint “target”, only remove connections where the target matches path. Don’t remove links.

With endpoint “both”, only remove connections where source or target matches path. Don’t remove links.

With endpoint “link”, remove links where “first” or “second” matches path. Don’t remove connections

“match” can be “super”, “exact”, or “sub”.

If “super”, only paths P that are shorter or equal to “path” are matched. The start of P must be identical to “path”

If “exact”, only paths P that are equal to “path” are matched.

If “sub”, only paths that are longer or equal to “path” are matched. The start of “path” must be identical to P.

If “all”, all longer and shorter paths are matched.

resolve(checksum: Checksum, celltype=None)[source]

Returns the data buffer that corresponds to the checksum. If celltype is provided, a value is returned instead

The checksum must be a SHA3-256 hash, as hex string or as bytes

save_graph(filename: str)[source]

Save the graph in JSON format.

save_vault(dirname: str, with_libraries: bool = True)[source]

Save the checksum-to-buffer cache for the current graph in a vault directory

save_zip(filename: str)[source]

Save the checksum-to-buffer cache for the current graph.

The cache is saved to “filename”, which should be a .zip file.

async save_zip_async(filename: str)[source]

Save the checksum-to-buffer cache for the current graph.

The cache is saved to “filename”, which should be a .zip file.

property self

Return a wrapper where the children are not directly accessible.

By default, a Cell called “compute” will cause “ctx.compute” to return the Cell. This is problematic if you want to access the method compute(). This can be done using ctx.self.compute()

NOTE: experimental, requires more testing

set_graph(graph, *, mounts: bool = True, shares: bool = True)[source]

Set the graph of the Context

“graph” can be a file name or a JSON dict Normally, it has been generated with Context.save_graph / Context.get_graph

“mounts”: mount cells and pins to the file system, as specified in the graph.

“shares”: share cells over HTTP, as specified in the graph

property share_namespace

The preferred namespace for sharing cells by the HTTP server

Cells are shared under:

http://<shareserver URL>/<live_share_namespace>/<cell path>

The live share namespace is in principle equal to the share namespace, but if it is already taken, a number will be added to it (ctx1, ctx2, etc.)

Default: “ctx”

property status: dict | str

The computation status of the context Returns a dictionary containing the status of all direct children that are not OK. If all children are OK, returns “Status: OK”

translate(force: bool = False)[source]

(Re-)translate the graph. The graph is translated to a low-level, computable form (seamless.core). After translation, return immediately, although computation will start automatically.

If force=True, translation will happen even though no change in topology or celltype was detected.

This function can only be invoked if no event loop is running, i.e. under python or ipython, but not in a Jupyter kernel.

async translation(force: bool = False)[source]

(Re-)translate the graph. The graph is translated to a low-level, computable form (seamless.core). After translation, return immediately, although computation will start automatically.

If force=True, translation will happen even though no change in topology or celltype was detected.

Remove a bidirectional link between the first and second cell. (If it exists). Returns True if a link was removed

unobserve(path=())[source]

Analogous to Cell.unobserve

Cell class

class seamless.highlevel.Cell(celltype: str | None = None, *, parent=None, path=None)[source]

Cell class. Contains the checksum of a value.

See http://sjdv1982.github.io/seamless/sphinx/html/cell.html for documentation

Typical usage: ```python # Explicit ctx.a = Cell(“int”).set(42)

# Implicit ctx.a = 42 ctx.a.celltype = “int” ```

add_validator(validator: Callable, name: str) None[source]

Adds a validator function (in Python)def add to the schema.

The validator must take a single argument, the (buffered) value of the cell It is expected to raise an exception (e.g. an AssertionError) if the value is invalid.

If a previous validator with the same name exists, that validator is overwritten.

property buffered

For a structured cell, return the buffered value.

The buffered value is the value before schema validation

property celltype: str

The type of the cell.

The type of the cell is by default “structured”, unless it is a help cell, which are “text” by default.

Non-structured celltypes are:

  • “plain”: contains any JSON-serializable data

  • “binary”: contains binary data, wrapped in a Numpy object

  • “mixed”: an arbitrary mixture of “plain” and “binary” data

  • “code”: source code in any language

  • “text”, “cson”, “yaml”

  • “str”, “bytes”, “int”, “float”, “bool”

property checksum: Checksum

Contains the checksum of the cell, as SHA3-256 hash.

The checksum defines the value of the cell. If the cell is defined, the checksum is available, even if the value may not be.

connect_from(other: Cell | Transformer) None[source]

Connect from another cell or transformer to this cell.

property datatype

The datatype of a structured cell. This makes it possible to indicate that a structural cell conforms to another Seamless celltype and can be trivially converted to it. Use cases: - “plain” or “binary” cells with subcell access and a schema - “str” or “bytes” cell with a validator schema that parses the content - sharing a structured cell over HTTP using the Seamless web interface generator

property example: Silk

For a structured cell, return a dummy Silk handle.

The handle does not store any values, but has type inference,

i.e. schema properties are inferred from what is assigned to it.

Examples

property exception

Returns the exception associated with the cell.

For non-structured cells, this exception was raised during parsing. For structured cells, it may also have been raised during validation

property fallback: Fallback

Get a Fallback object With this, you can set and activate a fallback Cell, which will provide an alternative value once the fallback is activated.

async fingertip() None[source]

Puts the buffer of this cell’s checksum ‘at your fingertips’:

  • Verify that the buffer is locally or remotely available;

    if remotely, download it.

  • If not available, try to re-compute it using its provenance,

    i.e. re-evaluating any transformation or expression that produced it

  • Such recomputation is done in “fingertip” mode, i.e. disallowing

    cache hits from expression-to-checksum or transformation-to-checksum caches

property fingertip_no_recompute: bool

If True, recomputation is disabled for fingertipping.

This means recomputation via a transformer, which can be intensive. Recomputation via conversion or subcell expression (which are quick) is always enabled.

property fingertip_no_remote: bool

If True, remote calls are disabled for fingertipping.

Remote calls can be for a database or a buffer server.

Get all Link (bidirectional cell-cell) connections involving this cell.

property handle

Return a Silk handle to a structured cell.

This is a Silk wrapper around the authoritative (independent) part of a structured cell.

property hash_pattern

The hash pattern of the cell.

This is an advanced feature, not used in day-to-day programming. Possible values: - {“*”: “#”} . The cell will behave as a deep cell. - {“*”: “##”} . The cell will behave as a deep folder. - {“!”: “#”} . The cell will behave as a deep list

(a list of mixed checksums).

Note that all usual safety guards provided by DeepCell and DeepFolder are absent. You can invoke Cell.value, or do similar things that may consume all of your memory.

property independent: bool

True if the cell has no dependencies

property language

The programming language for code cells.

Default: Python

property mimetype

The mimetype of the cell.

Can be set directly according to the MIME specification,

or as a file extension.

If not set, the default value depends on the celltype:

  • For structured cells, it is derived from the datatype attribute

  • For mixed cells, it is “seamless/mixed”

  • For code cells, it is derived from the language attribute

  • For plain cells and int/float/bool cells, it is “application/json”

  • For text cells and str cells, it is “text/plain”

  • For other cells, it is derived from their default file extension.

mount(path: str, mode: str = 'rw', authority: str = 'file', *, persistent: bool = True)[source]

Mounts the cell to the file system. Mounting is only supported for non-structured cells.

To delete an existing mount, do del cell.mount

Parameters:
  • path (-) – The file path on disk

  • mode (-) – “r” (read), “w” (write) or “rw”. If the mode contains “r”, the cell is updated when the file changes on disk. If the mode contains “w”, the file is updated when the cell value changes. The mode can only contain “r” if the cell is independent. Default: “rw”

  • authority (-) – In case of conflict between cell and file, which takes precedence. Default: “file”.

  • persistent (-) – If False, the file is deleted from disk when the Cell is destroyed Default: True.

observe(attr: str | tuple[str, ...], callback: Callable, polling_interval: float, observe_none: bool = False)[source]

Adds an observer that monitors getattr(Cell, attr).

This value is polled every polling_interval seconds, and if changed, callback(value) is invoked.

If observe_none, None is considered as a separate value. (Default: False)

This method is not recommended to observe cell values, this is better done with traitlets.

Instead, it is recommended to use this to observe changes in status and exception.

output(layout: dict | None = None) OutputWidget[source]

Returns an output widget that tracks the cell value.

The widget is a wrapper around an ipywidgets.Output and is to be used in Jupyter.

“layout” is a dict that is passed on directly to ipywidgets.Output

Examples

property scratch: bool

Is the cell a scratch cell.

Scratch cells are fully dependent cells that are big and/or easy to recompute. TODO: enforce that scratch cells must be fully dependent.

Scratch cell buffers are: - Not added to saved zip archives and vaults. - TODO: Annotated as “scratch” in databases - TODO: cleared automatically from databases a short while after computation

property self

Returns a wrapper where the subcells are not directly accessible. Only relevant for structured cells.

By default, a structured cell with value {“status”: 123} will cause “cell.status” to return “123”, and not the actual cell status.

To be sure to get the cell status, you can invoke cell.self.status.

NOTE: experimental, requires more testing

set(value)[source]

Set the value of the cell

set_checksum(checksum: Checksum | str)[source]

Set the cell’s checksum from a SHA256 checksum

share(path=None, readonly=True, *, toplevel=False)[source]

Share a cell over HTTP.

Typically, the cell is available under http://localhost:5813/ctx/<path>.

If path is None (default), Cell.path is used, with dots replaced by slashes.

If toplevel is True, the cell is instead available under http://localhost:5813/<path>.

If readonly is True, only GET requests are supported. Else, the cell can also be modified using PUT requests using the Seamless JS client (js/seamless-client.js)

Cells with mimetype ‘application/json’ (the default for plain cells) also support subcell GET requests, e.g. http://.../ctx/a/x/0 for a cell ctx.a with value {'x': [1,2,3] }

To remove a share, do del cell.share

property status

Return the status of the cell.

The status may be undefined, error, upstream or OK If it is error, Cell.exception will be non-empty.

traitlet() SeamlessTraitlet[source]

Creates an traitlet object with its value linked to the cell.

A traitlet is derived from traitlets.HasTraits,

and can be linked to other traitlet objects, such as ipywidgets.

Examples

unobserve(attr: str | tuple[str, ...])[source]

Stop observing getattr(Cell, attr)

property value

Returns the value of the cell, if translated

If the cell is not independent,

the value is None if an upstream dependency is undefined or has an error.

For structured cells, the value is also None if the schema is violated.

Transformer class

class seamless.highlevel.Transformer(code=None, *, pins=None, hash_pattern={'*': '#'})[source]

Transforms input values to a result value

See http://sjdv1982.github.io/seamless/sphinx/html/transformer.html for documentation

property INPUT: str

The name of the input attribute. Default is “inp”.

This is the attribute under which the input object is available (i.e. Transformer.inp by default). The input object is similar to a (structured) Cell.

NOTE: changing this attribute is currently not implemented

property RESULT: str

The name of the result variable. Default is “result”.

This is also the attribute under which the result object is available (i.e. Transformer.result by default). The result object is similar to a (structured) Cell.

NOTE: changing this attribute is currently not implemented

add_validator(validator, name)[source]

Adds a validator to the input, analogous to Cell.add_validator

cancel() None[source]

Hard-cancels the transformer.

This will send a cancel signal that will kill the transformation if it is running.

The transformation is killed with a HardCancelError exception. Clearing the exception using Transformer.clear_exception will restart the transformation.

This affects both local and remote execution.

clear_exception() None[source]

Clear any exception associated with this transformer.

Re-execute of the associated transformation. Both local and remote (via assistant) execution are affected.

If this transformer has no transformation (missing or pending inputs), this will set a flag, causing clear_exception to take effect as soon as a transformation is present. Re-translation will clear this flag.

copy()[source]

If not bound to a context, return a copy of the Transformer.

If bound to a workflow, return a copy wrapper. This wrapper can be assigned to a new Context attribute,

creating a copy of the current Transformer, where input parameters and connections to input pins are all copied.

property debug: bool

If debug mode is enabled.

property docker_image: str

Defines the Docker image in which a transformer should run Getting this property is syntactic sugar for:

Transformer.environment.get_docker()[“name”]

Setting this property is more-or-less syntactic sugar for:

Transformer.environment.set_docker({“name”: …})

property environment: Environment | None

Computing environment to execute transformations in

property example

The example handle of the transformer input object.

See Cell.example for more details

property exception

Returns the exception associated with the transformer.

The exception may be raised during one of three stages:

  1. The construction of the input object (Transformer.inp). The input object is cell-like, see Cell.exception for more details.

  2. The construction of the individual input values that are inserted into the transformer namespace before execution.

  3. The execution of the transformer. For Python/IPython cells, this is the exception directly raised in code. For Bash/Docker cells, exceptions are raised upon non-zero exit codes. For compiled transformers, this stage is subdivided into generating the C header, compiling the code module, and executing the compiled code.

  4. The construction of the result object (Transformer.result). The result object is cell-like, see Cell.exception for more details.

property fingertip_no_recompute: bool

If True, recomputation is disabled for fingertipping.

This means recomputation via transformation, which can be intensive. Recomputation via conversion or subcell expression (which are quick) is always enabled.

property fingertip_no_remote: bool

If True, remote calls are disabled for fingertipping.

Remote calls can be for a database or a buffer server.

classmethod from_canonical_interface(tool, command=None)[source]

TODO: document

get_transformation_checksum() str | None[source]

Return the checksum of the transformation dict.

The transformation dict contains the checksums of all input pins, including the code, as well as the following special keys: - __output__: the name (usually “result”) and (sub)celltype of the output pin

If it has a hash pattern, this is appended as the fourth element.

  • __as__ (optional): a dictionary of pin-to-variable renames (pins.pinname.as_ attribute)

  • __format__ (optional): a dictionary that contains deepcell and filesystem attributes

The transformation checksum is the checksum of this dict.

Note that in addition, a transformation dict may contain extra information that is not reflected in this checksum:

  • __env__: the checksum of the environment description

  • __meta__: meta information (Transformer.meta).

  • __compilers__: context-wide compiler definitions.

  • __languages__: context-wide language definition.

Because of the double underscores, this extra information is called “dunder.

ctx.resolve(checksum, “plain”) will return the transformation dict, minus the dunder information. The checksum is treated like any other buffer, i.e. including database, assistant etc.

With Transformer.get_transformation_dict(), you will obtain the full transformation dict, including the dunder.

get_transformation_dict()[source]

Return the full transformation dict. The transformation dict contains the checksums of all input pins, including the code.

In addition, it may contain the following special keys: - __output__: the name (usually “result”) and (sub)celltype of the output pin

If it has a hash pattern, this is appended as the fourth element.

  • __env__: the checksum of the environment description

  • __as__: a dictionary of pin-to-variable renames (pins.pinname.as_ attribute)

  • __format__: a dictionary that contains deepcell and filesystem attributes

Finally, it may contain additional information that is not reflected in its checksum:

  • __meta__: meta information (Transformer.meta).

  • __compilers__: context-wide compiler definitions.

  • __languages__: context-wide language definition.

property header: str | None

For a compiled transformer, the generated C header

property language: str

Defines the programming language of the transformer’s source code.

Allowed values are: python, ipython, bash, or any compiled language.

See seamless.compiler.languages and seamless.compile.compilers for a list

Linker options for compiled modules They are a list of strings, for example: [“-lm”, “-lgfortran”, “-lcudart”]

property local: bool | None

Local execution. If True, transformations are executed in the local Seamless instance. If False, they are delegated to an assistant. If None (default), an assistant is tried first and local execution is a fallback.

property logs

Returns the stdout/stderr logs of the transformer, if any

property meta: dict[str, Any] | None

Dictionary of meta-parameters. These don’t affect the computation result, but may affect job managers Example of meta-parameters: expected computation time, service name

You can set this dictionary directly, or you may assign .meta to a cell

observe(attr, callback, polling_interval, observe_none=False)[source]

Observes attributes of the result, analogous to Cell.observe

property schema

The schema of the transformer input object

See Cell.schema for more details

property scratch: bool

Is this transformer’s result attribute a scratch cell.

Scratch cells are fully dependent cells that are big and/or easy to recompute.

Scratch cell buffers are: - Not added to saved zip archives and vaults. - TODO: Annotated as “scratch” in databases - TODO: cleared automatically from databases a short while after computation

property self

Returns a wrapper where the pins are not directly accessible.

By default, a pin called “compute” will cause “transformer.status” to return the pin, and not the actual transformer status.

To be sure to get the transformer status, you can invoke transformer.self.status.

NOTE: experimental, requires more testing

property status

The status of the transformer, analogous to Cell.status.

See Transformer.exception about the different stages. The first stage with a non-OK status is reported.

undo() str | None[source]

Attempt to undo a finished transformer.

This may be useful in the case of non-reproducible transformers.

While the correct solution is to make them deterministic, this method will allow repeated execution under various conditions, in order to investigate the issue.

If the transformer has no associated transformation (e.g. undefined inputs) or the transformation result is not known, an exception is raised.

Otherwise, the database is contacted in order to contest the result. If the database returns an error message, that is returned as string.

unobserve(attr)[source]

Analogous to Cell.unobserve