Remote execution

What delegation means

Setting execution: remote tells Seamless to delegate transformation execution to a remote backend — either a jobserver or a daskserver — instead of running in the local Python process. The step code (your Python functions, bash commands) does not change. Only the backend configuration changes.

From the user's perspective, tf.run() and seamless-run work the same way regardless of whether execution is local or remote.

The --local flag on seamless-run (or .local = True on a transformer) forces in-process execution instead of delegating to a remote server:

seamless-run --local mycommand input.txt   # run in-process even if execution: remote
@delayed
def compute(x): ...

compute.local = True    # always run in-process, skip jobserver/daskserver

jobserver

The jobserver is a lightweight HTTP service that accepts transformation jobs and dispatches them to a local worker pool. It is the simpler of the two remote backends: single-node, low-overhead, and straightforward to set up. It is best suited for development and testing of remote execution, or for simple single-machine deployments where you want worker-process isolation without Dask infrastructure.

To add a jobserver to a cluster definition in ~/.seamless/clusters.yaml:

local:
  ...
  frontends:
    - hostname: localhost
      hashserver:
        ...
      database:
        ...
      jobserver:
        network_interface: 127.0.0.1
        port_start: 55300
        port_end: 55399

With a jobserver configured, set execution: remote (or rely on the default) in seamless.yaml:

- project: my-project
- execution: remote

seamless.config.init() will start the jobserver automatically if it is not already running.


daskserver

The daskserver is the general-purpose remote backend. It uses Dask as its execution and scheduling substrate, making it suitable for multi-node HPC clusters, adaptive scaling, and high-throughput workloads. It also works with dask.distributed.LocalCluster, which makes it the preferred backend for anything beyond single-machine parallelism with spawn. A LocalCluster can be launched either on the client itself or on a remote frontend reached over SSH, depending on whether the cluster frontend has a hostname.

Where jobserver is "one machine with a worker pool", daskserver is "a managed Dask cluster that can scale dynamically". The Seamless worker plugin runs inside each Dask worker and maintains its own local worker process pool, so multiple levels of parallelism are available.

Note: although Dask is a Python framework, bash seamless-run commands are equally handled by the daskserver.

To add a daskserver to a cluster definition:

local:
  ...
  frontends:
    - hostname: localhost
      hashserver:
        ...
      database:
        ...
      daskserver:
        network_interface: 0.0.0.0
        port_start: 60300
        port_end: 60399

And in your project config:

- cluster: mycluster
- execution: remote
- remote: daskserver

When both a jobserver and a daskserver are present on the same cluster, you must explicitly select one with remote: jobserver or remote: daskserver.


Multi-stage workflows with set_stage()

Many workflows need to switch storage or execution context during a run — for example, running pre-processing locally on a laptop and GPU inference remotely on a cluster. Seamless supports this with named stages.

In seamless.yaml:

- project: my-project
- execution: process

- stage gpu:
    - cluster: gpu-cluster
    - execution: remote
    - remote: daskserver

In Python:

import seamless.config

seamless.config.init()         # default stage: local, execution: process

# ... pre-processing transformations run locally ...

seamless.config.set_stage("gpu")   # switch to gpu-cluster daskserver

# ... GPU transformations run remotely on the cluster ...

seamless.config.set_stage(None)    # return to default stage

set_stage() reconfigures the active backends (hashserver, database, jobserver/daskserver) without restarting services that are already running. Stage blocks in the YAML are activated only when the current stage name matches.

From the CLI, seamless-run --stage gpu mycommand input.txt selects the gpu stage for a single invocation.