Reference
Seamless: a framework for reproducible computing and interactive workflows
Author: Sjoerd de Vries Copyright 2016-2023, INSERM, CNRS and project contributors
Seamless high-level API.
Has a two-fold function:
1. Maintain a workflow graph containing nodes (cells, transformers etc.), checksums, and connections. This workflow graph is pure data that can be serialized any time to JSON (.seamless file).
2. Maintain a translation of the workflow graph to a low-level representation that is constantly being evaluated. Interrogate the low-level representation (asking for its status, checksums, etc.).
- seamless.highlevel.load_graph(graph, *, zip=None, cache_ctx=None, static=False, mounts=True, shares=True)[source]
Load a Context from graph.
“graph” can be a file name or a JSON dict Normally, it has been generated with Context.save_graph / Context.get_graph
“zip” can be a file name, zip-compressed bytes or a Python ZipFile object. Normally, it has been generated with Context.save_zip / Context.get_zip
“cache_ctx”: re-use a previous context for caching (e.g. checksum-to-buffer caching)
“static”: create a StaticContext instead
“mounts”: mount cells and pins to the file system, as specified in the graph.
“shares”: share cells over HTTP, as specified in the graph
Context class
- class seamless.highlevel.Context(manager: Manager | None = None)[source]
Context class. Organizes cells and workers hierarchically.
Wrapper around a workflow graph, which can be serialized as JSON to a .seamless file. Changing the workflow topology (by adding/removing children or connections, or by changing celltypes) marks the context as “untranslated”. Untranslated graphs can be translated explicitly, or implicitly (with the .compute method). Upon translation, wraps a a low-level context object (seamless.core.context). This context does all the work and holds all the data. Most of the methods and properties of the Seamless high-level classes (Cell, Transformer, etc.) are wrappers that interact with their low-level counterparts. Seamless low-level contexts accept value changes but not modifications in topology.
Typical usage: ```python ctx = Context() ctx.a = 32 # equivalent to: ctx.a = Cell().set(32) def func(a, b):
return a + b
- ctx.func = func # equivalent to:
# ctx.func = Transformer(); ctx.func.set(func)
ctx.func.a = ctx.a ctx.func.b = 16 ctx.result = ctx.func.result ctx.compute() assert ctx.result.value == 48 ```
See http://sjdv1982.github.io/seamless/sphinx/html/context.html for documentation
- add_zip(zip, incref: bool = False)[source]
Add entries from “zip” to the checksum-to-buffer cache.
“zip” can be a file name, zip-compressed bytes or a Python ZipFile object. Normally, it has been generated with Context.save_zip / Context.get_zip
Note that caching is temporary and entries will be removed after some time if no element (cell, expression, or high-level library) holds their checksum This can be overridden with “incref=True” (not recommended for long-living contexts)
- property children
Return a wrapper for the direct children of the context. This includes subcontexts and libinstances
- async computation(timeout: float | None = None, report: float = 2)[source]
Block until no more computation is required.
This means that all cells and transformers have either been computed, or they have an error, or they have unsatisfied upstream dependencies.
The graph is first (re-)translated, if necessary.
- compute(timeout: float | None = None, report: float = 2)[source]
Block until no more computation is required.
This means that all cells and transformers have either been computed, or they have an error, or they have unsatisfied upstream dependencies.
The graph is first (re-)translated, if necessary.
This function can only be invoked if no event loop is running, i.e. under python or ipython, but not in a Jupyter kernel.
- property environment
Return the global execution environment of the context
- classmethod from_graph(graph: str | dict, manager: Manager | None, *, mounts: bool = True, shares: bool = True, share_namespace: str | None = None, zip: str | bytes | ZipFile | None = None)[source]
Construct a Context from a graph
“graph” can be a file name or a JSON dict Normally, it has been generated with Context.save_graph / Context.get_graph
“zip” can be a file name, zip-compressed bytes or a Python ZipFile object. Normally, it has been generated with Context.save_zip / Context.get_zip
“manager”: re-use the manager of a previous context. The manager controls caching and execution.
“mounts”: mount cells and pins to the file system, as specified in the graph.
“shares”: share cells over HTTP, as specified in the graph
“share_namespace”: The namespace to use for HTTP sharing (“ctx”by default)
- get_children(type: str | None = None) list[str] [source]
Select all children that are directly ours. A sorted list of strings (attribute names) is returned.
- It is possible to define a type of children, which can be one of:
cell, transformer, context, macro, module, foldercell, deepcell, or deepfoldercell
If type is None, all children and descendants are returned: - SubContexts are not returned, but their children and descendants are (with full path info) - For LibInstance, the children and descendants of the generated SynthContext is returned
- get_graph(runtime: bool = False) dict[str, Any] [source]
Return the graph in JSON format.
“runtime”: The graph is returned after Library/LibInstance/Macro transformations of the graph.
- get_zip(with_libraries: bool = True) bytes [source]
Obtain the checksum-to-buffer cache for the current graph.
The cache is returned as zipped bytes.
- async get_zip_async(with_libraries: bool = True) bytes [source]
Obtain the checksum-to-buffer cache for the current graph
The cache is returned as zipped bytes.
- include(lib: Library, only_zip: bool = False, full_path: bool = False)[source]
Include a library in the graph.
A library (seamless.highlevel.Library) must be included before library instances (seamless.highlevel.LibInstance) can be constructed using ctx.lib
- property lib
Returns the libraries that were included in the graph
- link(first, second)[source]
Create a bidirectional link between the first and second cell.
Both cells must be authoritative (independent).
The actual namespace for sharing cells by the HTTP server
- Cells are shared under:
http://<shareserver URL>/<live_share_namespace>/<cell path>
The live share namespace is in principle equal to the share namespace, but if it is already taken, a number will be added to it (ctx1, ctx2, etc.)
Default: “ctx”
- load_vault(vault_directory: str, incref: bool = False)[source]
Load the contents of a vault directory in the checksum-to-buffer cache.
Normally, the vault has been generated with Context.save_vault.
Note that caching is temporary and entries will be removed after some time if no element (cell, expression, or high-level library) holds their checksum This can be overridden with “incref=True” (not recommended for long-living contexts).
- observe(path, callback, polling_interval, observe_none=False, params=None)[source]
Observe attributes of the context, analogous to Cell.observe.
- remove_connections(path, *, runtime=False, endpoint='both', match='sub')[source]
Remove all connections/links with source or target matching path.
“endpoint” can be “source”, “target”, “connection”, “link” or “all”.
With endpoint “source”, only remove connections where the source matches path. Don’t remove links.
With endpoint “target”, only remove connections where the target matches path. Don’t remove links.
With endpoint “both”, only remove connections where source or target matches path. Don’t remove links.
With endpoint “link”, remove links where “first” or “second” matches path. Don’t remove connections
“match” can be “super”, “exact”, or “sub”.
If “super”, only paths P that are shorter or equal to “path” are matched. The start of P must be identical to “path”
If “exact”, only paths P that are equal to “path” are matched.
If “sub”, only paths that are longer or equal to “path” are matched. The start of “path” must be identical to P.
If “all”, all longer and shorter paths are matched.
- resolve(checksum: Checksum, celltype=None)[source]
Returns the data buffer that corresponds to the checksum. If celltype is provided, a value is returned instead
The checksum must be a SHA3-256 hash, as hex string or as bytes
- save_vault(dirname: str, with_libraries: bool = True)[source]
Save the checksum-to-buffer cache for the current graph in a vault directory
- save_zip(filename: str)[source]
Save the checksum-to-buffer cache for the current graph.
The cache is saved to “filename”, which should be a .zip file.
- async save_zip_async(filename: str)[source]
Save the checksum-to-buffer cache for the current graph.
The cache is saved to “filename”, which should be a .zip file.
- property self
Return a wrapper where the children are not directly accessible.
By default, a Cell called “compute” will cause “ctx.compute” to return the Cell. This is problematic if you want to access the method compute(). This can be done using ctx.self.compute()
NOTE: experimental, requires more testing
- set_graph(graph, *, mounts: bool = True, shares: bool = True)[source]
Set the graph of the Context
“graph” can be a file name or a JSON dict Normally, it has been generated with Context.save_graph / Context.get_graph
“mounts”: mount cells and pins to the file system, as specified in the graph.
“shares”: share cells over HTTP, as specified in the graph
The preferred namespace for sharing cells by the HTTP server
- Cells are shared under:
http://<shareserver URL>/<live_share_namespace>/<cell path>
The live share namespace is in principle equal to the share namespace, but if it is already taken, a number will be added to it (ctx1, ctx2, etc.)
Default: “ctx”
- property status: dict | str
The computation status of the context Returns a dictionary containing the status of all direct children that are not OK. If all children are OK, returns “Status: OK”
- translate(force: bool = False)[source]
(Re-)translate the graph. The graph is translated to a low-level, computable form (seamless.core). After translation, return immediately, although computation will start automatically.
If force=True, translation will happen even though no change in topology or celltype was detected.
This function can only be invoked if no event loop is running, i.e. under python or ipython, but not in a Jupyter kernel.
- async translation(force: bool = False)[source]
(Re-)translate the graph. The graph is translated to a low-level, computable form (seamless.core). After translation, return immediately, although computation will start automatically.
If force=True, translation will happen even though no change in topology or celltype was detected.
Cell class
- class seamless.highlevel.Cell(celltype: str | None = None, *, parent=None, path=None)[source]
Cell class. Contains the checksum of a value.
See http://sjdv1982.github.io/seamless/sphinx/html/cell.html for documentation
Typical usage: ```python # Explicit ctx.a = Cell(“int”).set(42)
# Implicit ctx.a = 42 ctx.a.celltype = “int” ```
- add_validator(validator: Callable, name: str) None [source]
Adds a validator function (in Python)def add to the schema.
The validator must take a single argument, the (buffered) value of the cell It is expected to raise an exception (e.g. an AssertionError) if the value is invalid.
If a previous validator with the same name exists, that validator is overwritten.
- property buffered
For a structured cell, return the buffered value.
The buffered value is the value before schema validation
- property celltype: str
The type of the cell.
The type of the cell is by default “structured”, unless it is a help cell, which are “text” by default.
Non-structured celltypes are:
“plain”: contains any JSON-serializable data
“binary”: contains binary data, wrapped in a Numpy object
“mixed”: an arbitrary mixture of “plain” and “binary” data
“code”: source code in any language
“text”, “cson”, “yaml”
“str”, “bytes”, “int”, “float”, “bool”
- property checksum: Checksum
Contains the checksum of the cell, as SHA3-256 hash.
The checksum defines the value of the cell. If the cell is defined, the checksum is available, even if the value may not be.
- connect_from(other: Cell | Transformer) None [source]
Connect from another cell or transformer to this cell.
- property datatype
The datatype of a structured cell. This makes it possible to indicate that a structural cell conforms to another Seamless celltype and can be trivially converted to it. Use cases: - “plain” or “binary” cells with subcell access and a schema - “str” or “bytes” cell with a validator schema that parses the content - sharing a structured cell over HTTP using the Seamless web interface generator
- property example: Silk
For a structured cell, return a dummy Silk handle.
- The handle does not store any values, but has type inference,
i.e. schema properties are inferred from what is assigned to it.
Examples
- See basic-example.ipynb
in https://github.com/sjdv1982/seamless/tree/master/examples
- property exception
Returns the exception associated with the cell.
For non-structured cells, this exception was raised during parsing. For structured cells, it may also have been raised during validation
- property fallback: Fallback
Get a Fallback object With this, you can set and activate a fallback Cell, which will provide an alternative value once the fallback is activated.
- async fingertip() None [source]
Puts the buffer of this cell’s checksum ‘at your fingertips’:
- Verify that the buffer is locally or remotely available;
if remotely, download it.
- If not available, try to re-compute it using its provenance,
i.e. re-evaluating any transformation or expression that produced it
- Such recomputation is done in “fingertip” mode, i.e. disallowing
cache hits from expression-to-checksum or transformation-to-checksum caches
- property fingertip_no_recompute: bool
If True, recomputation is disabled for fingertipping.
This means recomputation via a transformer, which can be intensive. Recomputation via conversion or subcell expression (which are quick) is always enabled.
- property fingertip_no_remote: bool
If True, remote calls are disabled for fingertipping.
Remote calls can be for a database or a buffer server.
- property handle
Return a Silk handle to a structured cell.
This is a Silk wrapper around the authoritative (independent) part of a structured cell.
- property hash_pattern
The hash pattern of the cell.
This is an advanced feature, not used in day-to-day programming. Possible values: - {“*”: “#”} . The cell will behave as a deep cell. - {“*”: “##”} . The cell will behave as a deep folder. - {“!”: “#”} . The cell will behave as a deep list
(a list of mixed checksums).
Note that all usual safety guards provided by DeepCell and DeepFolder are absent. You can invoke Cell.value, or do similar things that may consume all of your memory.
- property independent: bool
True if the cell has no dependencies
- property language
The programming language for code cells.
Default: Python
- property mimetype
The mimetype of the cell.
- Can be set directly according to the MIME specification,
or as a file extension.
If not set, the default value depends on the celltype:
For structured cells, it is derived from the datatype attribute
For mixed cells, it is “seamless/mixed”
For code cells, it is derived from the language attribute
For plain cells and int/float/bool cells, it is “application/json”
For text cells and str cells, it is “text/plain”
For other cells, it is derived from their default file extension.
- mount(path: str, mode: str = 'rw', authority: str = 'file', *, persistent: bool = True)[source]
Mounts the cell to the file system. Mounting is only supported for non-structured cells.
To delete an existing mount, do del cell.mount
- Parameters:
path (-) – The file path on disk
mode (-) – “r” (read), “w” (write) or “rw”. If the mode contains “r”, the cell is updated when the file changes on disk. If the mode contains “w”, the file is updated when the cell value changes. The mode can only contain “r” if the cell is independent. Default: “rw”
authority (-) – In case of conflict between cell and file, which takes precedence. Default: “file”.
persistent (-) – If False, the file is deleted from disk when the Cell is destroyed Default: True.
- observe(attr: str | tuple[str, ...], callback: Callable, polling_interval: float, observe_none: bool = False)[source]
Adds an observer that monitors
getattr(Cell, attr)
.This value is polled every polling_interval seconds, and if changed,
callback(value)
is invoked.If observe_none, None is considered as a separate value. (Default: False)
This method is not recommended to observe cell values, this is better done with traitlets.
Instead, it is recommended to use this to observe changes in status and exception.
- output(layout: dict | None = None) OutputWidget [source]
Returns an output widget that tracks the cell value.
The widget is a wrapper around an
ipywidgets.Output
and is to be used in Jupyter.“layout” is a dict that is passed on directly to
ipywidgets.Output
Examples
- See basic-example.ipynb
in https://github.com/sjdv1982/seamless/tree/master/examples
- See traitlets.ipynb
in https://github.com/sjdv1982/seamless/tree/master/tests/highlevel
- property scratch: bool
Is the cell a scratch cell.
Scratch cells are fully dependent cells that are big and/or easy to recompute. TODO: enforce that scratch cells must be fully dependent.
Scratch cell buffers are: - Not added to saved zip archives and vaults. - TODO: Annotated as “scratch” in databases - TODO: cleared automatically from databases a short while after computation
- property self
Returns a wrapper where the subcells are not directly accessible. Only relevant for structured cells.
By default, a structured cell with value {“status”: 123} will cause “cell.status” to return “123”, and not the actual cell status.
To be sure to get the cell status, you can invoke cell.self.status.
NOTE: experimental, requires more testing
Share a cell over HTTP.
Typically, the cell is available under http://localhost:5813/ctx/<path>.
If path is None (default), Cell.path is used, with dots replaced by slashes.
If toplevel is True, the cell is instead available under http://localhost:5813/<path>.
If readonly is True, only GET requests are supported. Else, the cell can also be modified using PUT requests using the Seamless JS client (js/seamless-client.js)
Cells with mimetype ‘application/json’ (the default for plain cells) also support subcell GET requests, e.g.
http://.../ctx/a/x/0
for a cellctx.a
with value{'x': [1,2,3] }
To remove a share, do del cell.share
- property status
Return the status of the cell.
The status may be undefined, error, upstream or OK If it is error, Cell.exception will be non-empty.
- traitlet() SeamlessTraitlet [source]
Creates an traitlet object with its value linked to the cell.
- A traitlet is derived from
traitlets.HasTraits
, and can be linked to other traitlet objects, such as ipywidgets.
Examples
- See basic-example.ipynb and datatables.ipynb
in https://github.com/sjdv1982/seamless/tree/master/examples
- See traitlets.ipynb, traitlet.py and traitlet2.py
in https://github.com/sjdv1982/seamless/tree/master/tests/highlevel
- A traitlet is derived from
- property value
Returns the value of the cell, if translated
- If the cell is not independent,
the value is None if an upstream dependency is undefined or has an error.
For structured cells, the value is also None if the schema is violated.
Transformer class
- class seamless.highlevel.Transformer(code=None, *, pins=None, hash_pattern={'*': '#'})[source]
Transforms input values to a result value
See http://sjdv1982.github.io/seamless/sphinx/html/transformer.html for documentation
- property INPUT: str
The name of the input attribute. Default is “inp”.
This is the attribute under which the input object is available (i.e. Transformer.inp by default). The input object is similar to a (structured) Cell.
NOTE: changing this attribute is currently not implemented
- property RESULT: str
The name of the result variable. Default is “result”.
This is also the attribute under which the result object is available (i.e. Transformer.result by default). The result object is similar to a (structured) Cell.
NOTE: changing this attribute is currently not implemented
- add_validator(validator, name)[source]
Adds a validator to the input, analogous to Cell.add_validator
- cancel() None [source]
Hard-cancels the transformer.
This will send a cancel signal that will kill the transformation if it is running.
The transformation is killed with a HardCancelError exception. Clearing the exception using Transformer.clear_exception will restart the transformation.
This affects both local and remote execution.
- clear_exception() None [source]
Clear any exception associated with this transformer.
Re-execute of the associated transformation. Both local and remote (via assistant) execution are affected.
If this transformer has no transformation (missing or pending inputs), this will set a flag, causing clear_exception to take effect as soon as a transformation is present. Re-translation will clear this flag.
- copy()[source]
If not bound to a context, return a copy of the Transformer.
If bound to a workflow, return a copy wrapper. This wrapper can be assigned to a new Context attribute,
creating a copy of the current Transformer, where input parameters and connections to input pins are all copied.
- property debug: bool
If debug mode is enabled.
- property docker_image: str
Defines the Docker image in which a transformer should run Getting this property is syntactic sugar for:
Transformer.environment.get_docker()[“name”]
Setting this property is more-or-less syntactic sugar for:
Transformer.environment.set_docker({“name”: …})
- property environment: Environment | None
Computing environment to execute transformations in
- property example
The example handle of the transformer input object.
See Cell.example for more details
- property exception
Returns the exception associated with the transformer.
The exception may be raised during one of three stages:
The construction of the input object (Transformer.inp). The input object is cell-like, see Cell.exception for more details.
The construction of the individual input values that are inserted into the transformer namespace before execution.
The execution of the transformer. For Python/IPython cells, this is the exception directly raised in code. For Bash/Docker cells, exceptions are raised upon non-zero exit codes. For compiled transformers, this stage is subdivided into generating the C header, compiling the code module, and executing the compiled code.
The construction of the result object (Transformer.result). The result object is cell-like, see Cell.exception for more details.
- property fingertip_no_recompute: bool
If True, recomputation is disabled for fingertipping.
This means recomputation via transformation, which can be intensive. Recomputation via conversion or subcell expression (which are quick) is always enabled.
- property fingertip_no_remote: bool
If True, remote calls are disabled for fingertipping.
Remote calls can be for a database or a buffer server.
- get_transformation_checksum() str | None [source]
Return the checksum of the transformation dict.
The transformation dict contains the checksums of all input pins, including the code, as well as the following special keys: - __output__: the name (usually “result”) and (sub)celltype of the output pin
If it has a hash pattern, this is appended as the fourth element.
__as__ (optional): a dictionary of pin-to-variable renames (pins.pinname.as_ attribute)
__format__ (optional): a dictionary that contains deepcell and filesystem attributes
The transformation checksum is the checksum of this dict.
Note that in addition, a transformation dict may contain extra information that is not reflected in this checksum:
__env__: the checksum of the environment description
__meta__: meta information (Transformer.meta).
__compilers__: context-wide compiler definitions.
__languages__: context-wide language definition.
Because of the double underscores, this extra information is called “dunder.
ctx.resolve(checksum, “plain”) will return the transformation dict, minus the dunder information. The checksum is treated like any other buffer, i.e. including database, assistant etc.
With Transformer.get_transformation_dict(), you will obtain the full transformation dict, including the dunder.
- get_transformation_dict()[source]
Return the full transformation dict. The transformation dict contains the checksums of all input pins, including the code.
In addition, it may contain the following special keys: - __output__: the name (usually “result”) and (sub)celltype of the output pin
If it has a hash pattern, this is appended as the fourth element.
__env__: the checksum of the environment description
__as__: a dictionary of pin-to-variable renames (pins.pinname.as_ attribute)
__format__: a dictionary that contains deepcell and filesystem attributes
Finally, it may contain additional information that is not reflected in its checksum:
__meta__: meta information (Transformer.meta).
__compilers__: context-wide compiler definitions.
__languages__: context-wide language definition.
- property header: str | None
For a compiled transformer, the generated C header
- property language: str
Defines the programming language of the transformer’s source code.
Allowed values are: python, ipython, bash, or any compiled language.
See seamless.compiler.languages and seamless.compile.compilers for a list
- property link_options: list[str]
Linker options for compiled modules They are a list of strings, for example: [“-lm”, “-lgfortran”, “-lcudart”]
- property local: bool | None
Local execution. If True, transformations are executed in the local Seamless instance. If False, they are delegated to an assistant. If None (default), an assistant is tried first and local execution is a fallback.
- property logs
Returns the stdout/stderr logs of the transformer, if any
- property meta: dict[str, Any] | None
Dictionary of meta-parameters. These don’t affect the computation result, but may affect job managers Example of meta-parameters: expected computation time, service name
You can set this dictionary directly, or you may assign .meta to a cell
- observe(attr, callback, polling_interval, observe_none=False)[source]
Observes attributes of the result, analogous to Cell.observe
- property schema
The schema of the transformer input object
See Cell.schema for more details
- property scratch: bool
Is this transformer’s result attribute a scratch cell.
Scratch cells are fully dependent cells that are big and/or easy to recompute.
Scratch cell buffers are: - Not added to saved zip archives and vaults. - TODO: Annotated as “scratch” in databases - TODO: cleared automatically from databases a short while after computation
- property self
Returns a wrapper where the pins are not directly accessible.
By default, a pin called “compute” will cause “transformer.status” to return the pin, and not the actual transformer status.
To be sure to get the transformer status, you can invoke transformer.self.status.
NOTE: experimental, requires more testing
- property status
The status of the transformer, analogous to Cell.status.
See Transformer.exception about the different stages. The first stage with a non-OK status is reported.
- undo() str | None [source]
Attempt to undo a finished transformer.
This may be useful in the case of non-reproducible transformers.
While the correct solution is to make them deterministic, this method will allow repeated execution under various conditions, in order to investigate the issue.
If the transformer has no associated transformation (e.g. undefined inputs) or the transformation result is not known, an exception is raised.
Otherwise, the database is contacted in order to contest the result. If the database returns an error message, that is returned as string.