Sending data between modules
Data can be sent from one module to another by creating a link between the two modules. This guide will go over use cases when it comes to making use links.
Using data in modules
A link is defined by combining two module endpoints, an output of one module to an input of another. This section will go over how to use these endpoints inside module logic.
Defining IO
Both inputs and outputs are defined in the module definition: the implementation of the create_module
method inside the module source code. Here, use the following two modules to define inputs and outputs respectively:
M.add_input(input_name: str, type: str)
M.add_output(input_name: str, type: str)
These methods make sure that the module exposes an input and output endpoint which can be combined with other modules to create a link. As the method signatures indicate, endpoints are defined via a pair of string properties that describe the name of the output endpoint and what type is expected, for other modules to take account of. The type
property for inputs and outputs is a hint to module users on what type should be expected when connecting the input to another output and vice versa.
Example:
def create_module() -> EcidaModule:
M = EcidaModule("data-loader", "v1")
M.add_input("data-in", "string")
M.add_output("data-out", "string")
return M
Naming restrictions
There are some restrictions with respect to naming inputs and outputs. Please take refer to the reference for a complete overview.
Using outputs
Inside module implementation logic (main
), call M.push(output_name: str, data: any)
to push actual data to a defined output endpoint. Data can be of any type.
Example:
def main():
M.push("data-out", "This sting will be sent.")
Using inputs
To use inputs defined in create_module
, call M.pull(input_name: str)
inside the main
method of the module. This method returns the most data sent to the module.
M.pull
will wait until data is available.Module data is generally received as strings, even if the outputting module does not explicitly specify. This is due to the output being serialized as strings to be able to be sent through a link. For other serialization methods, refer to Custom serialization below.
This means that if specific types are expected based on the module definitions, such as integers, the data first needs to be cast to int
:
def main():
number = int(M.pull("[input name]"))
Similarly, any other datatype can be sent, as long as there exists a serialization and deserialization (serde) method, such as CSVs through strings by using pandas
and StringIO
:
from io import StringIO
import pandas as pd
df = pd.read_csv(StringIO(M.pull("[input name]")))
M.pull
. No casting to string is needed for M.push
.Sending large data through links
Call enable_large_messages()
in create_module
to make sure large data can be sent through links.
Example:
def create_module() -> EcidaModule:
M = EcidaModule("data-loader", "v1")
M.add_output("data-out", "string")
M.enable_large_messages()
return M
Custom serialization
If it becomes inefficient to sent data as strings, data can also be serialized in binary. This might come in handy when wanting to transfer specific custom library types, such as NumPy arrays through numpy.ndarray.tobytes
:
data = np.array([1, 2, 3])
M.push("array-out", array.tobytes())
Make sure to notify this detail inside the module definition by adjusting the output type, such that other users know of the custom serialization used:
M.add_output("array-out", "ndarray-binary`)
As now, to use the specific data inside an inputting module, the proper inverse operation (frombuffer
in this example) needs to be used instead of casting:
data = np.frombuffer(M.pull("input"), dtype=np.array)
Caveat: inspecting links
As the following sections will describe, the Links panel inside the app can only display the content of data when this content is visible as a string. Therefore, while sending data as binary technically works, it prevents you from inspecting this data in the Links panel. Data transfers will still be visible in the result list, but marked as [blob]
.
To still be able to inspect module data, add logging to the respective modules. The data will then be visible in the Logs panel.
Creating links
Viewing module IO in the editor
Input and output endpoints can be inspected from the editor from different locations.
Inside the module library
Hovering any module in the “Modules” panel will show a quick summary of that module, containing the name, description and “handles”, which correspond to, among others, the inputs and outputs defined:
The inputs and outputs are shown in the toggle list. This list can be filtered by clicking any of the four badges above: inputs, outputs, configs and signals, to show only the selected type. For example: clicking inputs and outputs will only show entries of these two types:
Viewing IO of pipeline modules
This list is also available for modules added to the pipeline. Selecting a module will fill the “Inspect” panel with properties of this module, including IO under “summary”:
Additionally, the endpoints can be viewed inside the “Canvas” panel inside the module node:
- On the left side, inputs are shown
- On the right side, outputs are shown
Creating links between modules
Links are defined as part of the pipeline definition in which the modules are used. Inside a module node, hovering the circle of an endpoint will change the cursor to a “plus”, indicating drag-and-drop functionality. Create links by dragging the cursor from this location, either an input or output, to a respective input or output:
This will create a finished link:
Removing links
Links can be removed by clicking the circled “cross” button located at the centre of a link:
Inspecting running links
Once a pipeline has been deployed, data flowing through links can be inspected for further analysis or debugging purposes. The app provides different features to achieve a complete view of what is happening between modules in your pipeline.
Running pipelines can be inspected via the pipeline viewer. This can be accessed from the pipelines page via two means.
Inside the pipelines table, clicking a badge inside the “Environments” column will directly open the viewer with respect to that pipeline and environment:
Alternatively, clicking the “View” button of any single pipeline row will open a dropdown with all environments this pipeline is located in:
Inspecting links in the viewer
Similarly to the editor, the pipeline viewer provides the same general link inspections to inspect what endpoints are available and how these are connected. The canvas shows the pipeline graphically, showing the links as links between modules.
In the viewer, the Inspect panel holds several tabs that provide different inspection methods. Clicking a module in the canvas will fill the Inspect panel with the properties according to this selected module, which is available under the “Properties tab”:
Viewing link data through the Logs panel
If your modules log data via a logger inside the module logic, these logs can be shown in the “Logs” panel. This data will refresh every 5 seconds.
Viewing link data through the Links panel
Another tab under the Inspect panel is “Links”. This sub panel allows you to inspect what data is being sent between modules without having to configure logging inside your module logic. Similarly to the logs section, this data is refreshed every 5 seconds.
Once the links panel is shown, the canvas will highlight all module outputs. Link data origins are identified via these outputs, and these are shown in the Links panel as badges next to actual data that this output exposes.
The “Link messages” subsection contains a complete list of recent data that has been sent between modules in the pipeline.
Each list entry contains the link badge as described above, with next to it the actual data. This data is truncated to enhance readability.
On the right side of a list entry, a timestamp is displayed in form of how much time it has been since this data was sent out (e.g. “5s ago”). Hovering the “Clock” icon on the right shows the full timestamp of a link data entry, with the timezone corresponding to what is set in the server where ECiDA is running:
Filtering links
The “Link messages” subsection can be filtered to show data based on only a selected group of links.
From the Links panel
The field under “Links filter” shows all links that are currently shown. Clicking this field will show a dropdown list in which any combination of links can be selected.
From the Canvas panel
Any selected link is highlighted in the canvas via blue outlines on module output handles. Clicking any of these handles will filter the Links panel on only that selected link:
Inspecting link data details
Clicking any of link data entries will fill the “Details” subsection on the bottom of the Links panel, which shows a complete overview of that specific datapoint. This will show the link it has come from, the complete timestamp similar to the above, and the complete untruncated data: