VisTrails Packages

Packages

Packages are the name of the plugins in VisTrails that provide modules. Each package usually wraps a library or provide related functionalities. They are loaded by the package manager and are wrapped by vistrails.core.modules.package.Package.

A VisTrails Package is a Python module or package in a location where the package manager will find it (either vistrails/packages or .vistrails/userpackages). It is either a single module or a directory with the following structure:

my_codepath
|-- __init__.py
|-- init.py
+-- ...

my_codepath is referred to as “codepath” in the code; concatenated with the “prefix”, it gives the argument passed to import to load the package (set as _module).

my_codepath.__init__ has to exist for the directory to be importable. If my_codepath.init exists, it will be loaded as _module when the module is enabled instead of my_codepath; this allows the bulk of the code (the part that usually has Python dependencies) to be separate from the package root, in which we find a bunch of functions and constants that are used by VisTrails before the package is enabled.

The package should contain (in __init__.py) the following:

  • name: a human-readable name for the package, displayed in dialogs
  • identifier: a unique identifier for the package, used to refer to it everywhere (for dependency links in other packages, and in serialized workflows)
  • version: a version number (see Pipeline upgrades)

It can also optionally have the following:

  • configuration: A ConfigurationObject holding the configuration of this package, that will be persisted to .vistrails. It is currently injected in init and you can thus use it without importing it, but it’s good practice to import it from somewhere anyway (at the very least, it makes IDE not report it as undefined reference).
  • package_dependencies: A simple function that returns a list of package identifiers this package depends on. It will not be possible to enable this package if they are not available. The package manager will also make sure they are enabled first, so cyclic dependencies are not possible here.
  • package_requirements: A function that checks that the other requirements are met. For example, a VisTrails package wrapping a library might want to check that the library is importable there, to give out a clean error message to the user if it isn’t (and optionally, grab it automatically with the bundle system; use vistrails.core.requirements.require_python_module() in there for example).
  • can_handle_identifier DOCTODO
  • can_handle_vt_file DOCTODO
vistrails.core.requirements.require_python_module(module_name, dep_dict=None)

Fail if the given Python module isn’t importable and can’t be installed.

This raises MissingRequirements and is thus suitable for use in a package’s package_requirements() function. If dep_dict is provided, it will try to install the requirement before failing, using install().

Raises MissingRequirement:
 on error
vistrails.core.requirements.require_executable(filename)

Fail if the given executable file name is not in PATH.

This raises MissingRequirements and is thus suitable for use in a package’s package_requirements() function.

Raises MissingRequirement:
 on error

The my_codepath.init (if separate, else my_codepath) module is can contain the following:

  • initialize: The “entry point” of the package, called when initializing. Registers ModuleRegistry (after generating them dynamically, for example; else _modules might be more convenient).
  • _modules: A list of Module subclasses to register with the module registry automatically. It can also be a dict mapping a namespace to a list of modules. A module can also be replaced with a tuple (ModuleClass, options) where options is a dict with the module’s settings.
  • contextMenuName: The single entry of the context menu shown when right-clicking a module in the module palette. The module name (or package name) is the only argument. callContextMenu is called with the same argument if the user clicks the menu entry.
  • callContextMenu: Callback for the context menu. The module name (or package name) is the only argument.
  • handle_module_upgrade_request DOCTODO
  • handle_all_errors DOCTODO
  • handle_missing_module DOCTODO
  • loadVistrailFileHook DOCTODO
  • saveVistrailFileHook DOCTODO

Todo

Right now, contextMenuName() only allows package creator to display a one-element context menu.

#1115

class vistrails.core.modules.package.Package(*args, **kwargs)
can_handle_identifier(identifier)

Asks package if it can handle this package

can_handle_vt_file(name)

Asks package if it can handle a file inside a zipped vt file

handle_missing_module(*args, **kwargs)

report_missing_module(name, namespace):

Calls the package’s module handle_missing_module function, if present, to allow the package to dynamically add a missing module.

load(prefix=None)

load(module=None). Loads package’s module.

If package is already loaded, this is a NOP.

reset_configuration()

Reset package configuration to original package settings.

class vistrails.core.packagemanager.PackageManager(registry, startup)
add_dependencies(package) → None. Register all

dependencies a package contains by calling the appropriate callback.

Does not add multiple dependencies - if a dependency is already there, add_dependencies ignores it.

add_menu_items(pkg: Package) → None

If the package implemented the function menu_items(), the package manager will emit a signal with the menu items to be added to the builder window

add_package(codepath, add_to_package_list=True, prefix=None)

Adds a new package to the manager. This does not initialize it. To do so, call initialize_packages()

available_package_names_list() → returns list with code-paths of all

available packages, by looking at the appropriate directories.

The distinction between package names, identifiers and code-paths is described in doc/package_system.txt

can_be_disabled(identifier)

Returns whether has no reverse dependencies (other packages that depend on it.

dependency_graph() → Graph. Returns a graph with package

dependencies, where u -> v if u depends on v. Vertices are strings representing package names.

enabled_package_list()

package_list() -> returns list of all enabled packages.

finalize_packages()

Finalizes all installed packages. Call this only prior to exiting VisTrails.

get_package_by_codepath(codepath: string) → Package.

Returns a package with given codepath if it is enabled, otherwise throws exception

get_package_by_identifier(identifier: string) → Package.

Deprecated, use get_package() instead.

get_package_configuration(codepath: string)

ConfigurationObject or None

Returns the configuration object for the package, if existing, or None. Throws MissingPackage if package doesn’t exist.

has_package(identifer: string) → Boolean.

Returns true if given package identifier is present.

identifier_is_available(identifier: str) → Pkg

returns true if there exists a package with the given identifier in the list of available (ie, disabled) packages.

If true, returns succesfully loaded, uninitialized package.

import_packages_module()

Imports the packages module using path trickery to find it in the right place.

import_user_packages_module()

Imports the packages module using path trickery to find it in the right place.

initialize_packages(prefix_dictionary={}, report_missing_dependencies=True)

initialize_packages(prefix_dictionary={}): None

Initializes all installed packages. If prefix_dictionary is not {}, then it should be a dictionary from package names to the prefix such that prefix + package_name is a valid python import.

late_disable_package(codepath)

late_disable_package disables a package ‘late’, that is, after VisTrails initialization. All reverse dependencies need to be already disabled.

late_enable_package(codepath, prefix_dictionary={}, needs_add=True)

late_enable_package enables a package ‘late’, that is, after VisTrails initialization. All dependencies need to be already enabled.

look_at_available_package(codepath: string) → Package

Returns a Package object for an uninstalled package. This does NOT install a package.

remove_menu_items(pkg: Package) → None

Send a signal with the pkg identifier. The builder window should catch this signal and remove the package menu items

remove_package(codepath)

remove_package(name): Removes a package from the system.

show_error_message(pkg: Package, msg: str) → None

Print a message to standard error output and emit a signal to the builder so if it is possible, a message box is also shown

Modules

Package register modules with VisTrails. These are the boxes that can be assembled in pipelines and run code when executing.

A module is simply a subclass of Module. It represents both a datatype (e.g., the type of a port or a connection) and a computation unit (e.g. a box in the pipeline view, with ports from and to which you draw connections). Modules can use single-inheritance, which will inherit the ports from the parent unless they are overridden. It is possible to connect one port to a port of a parent type. The special type Variant can connect to and from any other type.

Note that the type of a port, which is a Module, is different from the actual type of Python objects that are passed on the connection. In fact, Module instances are not passed on connections anymore since they cause problems (they have references to the pipeline, the interpreter, ... which make them very unsafe to serialize). For example, an SQLAlchemy Connection object is passed on DBConnection ports, the figure number is passed unwrapped on MplFigure ports, and either numpy arrays or Python list`s are passed on `List ports. The association between a Module and the actual type passed on connections is just convention, although VisTrails will check it in specific cases (such as when one end of the connection is a Variant port) by calling validate() on the value.

class vistrails.core.modules.vistrails_module.Module

Module is the base module from which all module functionality is derived from in VisTrails. It defines a set of basic interfaces to deal with data input/output (through ports, as will be explained later), as well as a basic mechanism for dataflow based updates.

Execution Model

VisTrails assumes fundamentally that a pipeline is a dataflow. This means that pipeline cycles are disallowed, and that modules are supposed to be free of side-effects. This is obviously not possible in general, particularly for modules whose sole purpose is to interact with operating system resources. In these cases, designing a module is harder – the side effects should ideally not be exposed to the module interface. VisTrails provides some support for making this easier, as will be discussed later.

VisTrails caches intermediate results to increase efficiency in exploration. It does so by reusing pieces of pipelines in later executions.

Terminology

Module Interface: The module interface is the set of input and output ports a module exposes.

Designing New Modules

Designing new modules is essentially a matter of subclassing this module class and overriding the compute() method. There is a fully-documented example of this on the default package ‘pythonCalc’, available on the ‘packages/pythonCalc’ directory.

Caching

Caching affects the design of a new module. Most importantly, users have to account for compute() being called more than once. Even though compute() is only called once per individual execution, new connections might mean that previously uncomputed output must be made available.

Also, operating system side-effects must be carefully accounted for. Some operations are fundamentally side-effectful (creating OS output like uploading a file on the WWW or writing a file to a local hard drive). These modules should probably not be cached at all. VisTrails provides an easy way for modules to report that they should not be cached: simply subclass from the NotCacheable mixin provided in this python module. (NB: In order for the mixin to work appropriately, NotCacheable must appear BEFORE any other subclass in the class hierarchy declarations). These modules (and anything that depends on their results) will then never be reused.

Intermediate Files

Many modules communicate through intermediate files. VisTrails provides automatic filename and handle management to alleviate the burden of determining tricky things (e.g. longevity) of these files. Modules can request temporary file names through the file pool, currently accessible through self.interpreter.filePool.

The FilePool class is available in core/modules/module_utils.py - consult its documentation for usage. Notably, using the file pool will make temporary files work correctly with caching, and will make sure the temporaries are correctly removed.

addJobCache()

Add outputs from job cache

DOCTODO

annotate(d)

Manually add provenance information to the module’s execution trace. For example, a module that generates random numbers might add the seed that was used to initialize the generator.

Parameters:d (dict) – a dictionary where both the keys and values are strings
build_stream()

Determines and builds correct generator type.

check_input(port_name) → None.

Raises an exception if the input port named port_name is not set.

Parameters:port_name (str) – the name of the input port being checked
Raises:ModuleError if there is no value on the port
clear()

Removes all references, prepares for deletion.

compare(port_spec, v_module, port)

Function used to compare two port specs.

compute()

This method should be overridden in order to perform the module’s computation.

compute_accumulate()

This method creates a generator object that converts all streaming inputs to list inputs for modules that do not explicitly support streaming.

compute_after_streaming()

This method creates a generator object that computes when the streaming is finished.

compute_all()

This method executes the module once for each input.

Similarly to controlflow’s fold, it calls update() in a loop to handle lists of inputs.

compute_streaming()

This method creates a generator object and sets the outputs as generators.

compute_while()

This method executes the module once for each input.

Similarly to controlflow’s fold, it calls update() in a loop to handle lists of inputs.

createSignature(v_module)

` Function used to create a signature, given v_module, for a port spec.

enable_output_port(port_name)

Set an output port to be active to store result of computation

force_get_input(port_name, default_value=None)

Like get_input() except that if no value exists, it returns a user-specified default_value or None.

Parameters:
  • port_name (str) – the name of the input port being queried
  • default_value – the default value to be used if there is no value on the input port
Returns:

the value being passed in on the input port or the default

force_get_input_list(port_name)

Like get_input_list() except that if no values exist, it returns an empty list

Parameters:port_name (str) – the name of the input port being queried
Returns:a list of all the values being passed in on the input port
get_input(port_name, allow_default=True)

Returns the value coming in on the input port named port_name.

Parameters:
  • port_name (str) – the name of the input port being queried
  • allow_default (bool) – whether to return the default value if it exists
Returns:

the value being passed in on the input port

Raises:

ModuleError if there is no value on the port (and no default value if allow_default is True)

get_input_list(port_name)

Returns the value(s) coming in on the input port named port_name. When a port can accept more than one input, this method obtains all the values being passed in.

Parameters:port_name (str) – the name of the input port being queried
Returns:a list of all the values being passed in on the input port
Raises:ModuleError if there is no value on the port
has_input(port_name)

Returns a boolean indicating whether there is a value coming in on the input port named port_name.

Parameters:port_name (str) – the name of the input port being queried
Return type:bool
is_cacheable()

Returns whether this Module can be reused between executions.

It is safe for a Module to return different values in different occasions. In other words, it is possible for modules to be cacheable depending on their execution context.

job_monitor()

Returns the JobMonitor for the associated controller if it exists.

remove_input_connector(port_name, connector)

Remove a connector from the connection list of an input port

setInputValues(module, inputPorts, elementList, iteration)

Function used to set a value inside ‘module’, given the input ports.

setJobCache()

Checks if this is a job cache and it exists

DOCTODO

set_iterated_ports()

Calculates which inputs needs to be iterated over

set_output(port_name, value)

This method is used to set a value on an output port.

Parameters:
  • port_name (str) – the name of the output port to be set
  • value – the value to be assigned to the port
set_streamed_ports()

Calculates which inputs will be streamed

set_streaming(UserGenerator)

Creates a generator object that computes when the next input is received.

set_streaming_output(port, generator, size=0)

This method is used to set a streaming output port.

Parameters:
  • port (str) – the name of the output port to be set
  • generator – An iterator object supporting .next()
  • size (int) – The number of values if known (default=0)
typeChecking(module, inputPorts, inputList)

Checks if elements of inputList match the inputPort types.

update()

Check module status, update upstream and run compute.

This is the execution logic for the module. It handled all the different possible states (cached, suspended, already failed), run the upstream and the compute() method, reporting everything to the logger.

update_upstream()

Recursively update the modules upstream of this one.

update_upstream_port(port_name)

Updates upstream of a single port instead of all ports.

useJobCache()

Checks if this is a job cache

DOCTODO