Quickstart
A quickstart guide to creating a data science pipeline with ECiDA.
This quickstart guide walks you through the basic functionality of ECiDA. You’ll learn how to connect to an existing ECiDA cluster and build a simple data science pipeline with the ECiDA concepts. Through a simple example, you will learn how to transform a data science plan into basic reusable modules, combine them through links in a pipeline, and execute this pipeline in an environment. By the end of this guide, you will have a working example that generates numbers, transforms them, and prints the results.
For more advanced workflows and customizations, please have a look at the various how-to guides.
For this quickstart guide, you need to have the following ready:
- Basic Python knowledge
- Basic Git knowledge
- An ECiDA installation ready to be used
Creating an example pipeline
Step 1. Creating modules
Our task will be to apply a doubling function to a stream of incoming random numbers between 1 and 10 and printing this result:
y = 2x
One of ECiDA’s powers comes from the idea of modularity. Modules that are self-contained units of code that perform a specific task. Modules can be very simple or very complex, but when combining specific modules with each other in a pipeline, more complex algorithms can be made in a way that allows reusability of code or sub-algorithms and the ability to interchange specific modules in an experimental setting.
The example above can be divided into three sub-tasks:
- Generate random numbers
- Apply function
y = 2x
on the numbers - Print the result
We will implement these as modules. Each of these steps will be implemented as a separate module. This modular approach not only simplifies the design but also makes the components reusable, interchangeable, and easy to test in isolation.
1.1. Setting up the sources project and local environment
The source code for ECiDA modules is stored in a project under a sources group on GitLab. GitLab is an online instance of Git, which ECiDA uses to manage your project files.
Let’s create such a project for the modules of this quickstart guide. ECiDA should already be installed and ready by your IT department, and you should have been given the link to your company’s sources group.
Inside the sources group, create a new project by clicking the blue “New project” button on the top right of the page, and then “Create from template”. This will show all templates available for you to use as a basis for your project. ECiDA provides a template to help you get started with creating modules. Just like the sources group, you should have been given access to this template. Click the “Group” tab to get ECiDA specific templates.
Click “Use template” for the “ecida-project” repository in your respective group. Give your repository a unique name, such as “ECiDA quickstart [your name]”.
After the project has been made, you will find yourself at the GitLab repository page. Git links for cloning can be found under “Code”. For more information on Git, please refer to the Git tutorial.
Locally, clone the repository in your directory of choice:
git clone [git link]
After the git project has been cloned locally, you will find a couple of files locally now. If your Python environment is all set up, make sure all dependencies are installed:
pip install -r requirements.txt
One of the dependencies is the ECiDA Python SDK. Using the SDK, you can program your data science logic just like a regular Python program with very little overhead.
For now, the important file is module.py
which contains the template code for ECiDA modules to work with the SDK. We will modify this file to fit our needs for the example.
The file module.py
contains a couple of sections:
-
Module Definition (
create_module()
):- Defines the module name, version, and description
- Declares input and output ports with their data types
- Returns the configured module object
-
Main Processing Logic (
main(M)
):- Takes the initialized module as input
- Uses
M.pull()
to get input data from another module - Processes the data just like a regular Python program
- Uses
M.push()
to send output data to another module
-
Entry Point:
- Creates the module
- Initializes it
- Passes it to the main function
1.2 Implementing modules using the ECiDA SDK
Make sure that strings inside the create_module
do not contain any dots. For more guidelines, refer to the Naming restrictions in the SDK specification.
Let’s start with the central module: the function y = 2x
. Rename module.py
to doubler.py
. For create_module
, insert the following definition. Change “v1” to something that distinguishes your version from quickstart modules created by other users:
def create_module() -> EcidaModule:
# Change the second parameter to something of your liking
M = EcidaModule("doubler", "v1")
M.add_description("Executes the function y = 2x.")
M.add_input("in", "int")
M.add_output("out", "int")
return M
Next we can use the definition inside main
to apply our main function logic:
def main(M: EcidaModule):
logger.info(f"START MODULE {M.name}:{M.version}")
while True:
sample = int(M.pull("in"))
logger.info(f"Got number: {sample}")
result = sample * 2
M.push("out", result)
Let’s do the same for the remaining two modules: a number generator and a number printer:
number-generator.py
from Ecida import EcidaModule
import logging
import time
import random
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def create_module() -> EcidaModule:
# Change the second parameter to something of your liking
M = EcidaModule("number-generator", "v1")
M.add_description("Generates a random number between 1 and (including) 10.")
M.add_output("random-number", "int")
return M
def main(M: EcidaModule):
logger.info(f"START MODULE {M.name}:{M.version}")
while True:
number = random.randint(1, 10)
logger.info(f"Sending number {number}")
M.push("random-number", number)
# Wait a bit before continuing to the next number
time.sleep(5)
if __name__ == "__main__":
M = create_module()
M.initialize()
main(M)
number-printer.py
from Ecida import EcidaModule
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def create_module() -> EcidaModule:
# Change the second parameter to something of your liking
M = EcidaModule("number-printer", "v1")
M.add_description("Simple number printer")
M.add_input("number", "int")
return M
def main(M: EcidaModule):
logger.info(f"START MODULE {M.name}:{M.version}")
while True:
number = int(M.pull("number"))
logger.info(f"{number}")
if __name__ == "__main__":
M = create_module()
M.initialize()
main(M)
1.3 Saving and automatically compiling your modules
Commit your files and push the changes to the GitLab repository:
git add .
git commit -m "Quickstart modules"
git push
Once this is done, ECiDA will automatically generate your code into modules that can be used inside the system.
Refreshing the repository page will show your latest commit on the top side of the screen, with an icon next to it indicating the progress towards preparing your module for ECiDA. Once the icon is a green checkmark circle, your modules have been compiled and ECiDA moves them to the cluster installation.
Step 2. Building pipelines in the ECiDA App
2.1. Navigating to the pipeline editor
With our modules ready to be used, let’s head over to the ECiDA App to combine them in a pipeline! The ECiDA App is your central place for creating, deploying and monitoring data pipelines through an intuitive graphical user interface.
To access the app, use the link custom to your installation. This link has been set up for you by your IT department, or by yourself if you manually installed ECiDA using the ECiDA Setup
Example:
app.your-company.ecida.io
If you navigate to here, you will be first greeted on the landing page with a list of already created pipelines (or a message if none are available). Let’s go to the pipeline editor by clicking “New” to start creating your pipeline.
2.2. Creating a new pipeline
The pipeline editor is your graphical way of combining modules in a complex ML pipeline. Let’s start by naming our pipeline in the Inspect panel to something of your liking (e.g. “ECiDA Quickstart [your name]”)
2.3. Adding and connecting modules
Next, let’s add our modules via the Modules panel. This panel shows all modules available for you to use in your pipelines. You can search for them by name, or find them manually by scrolling the result list. Let’s search for our created number-generator
, doubler
, and number-printer
modules.
Once you found your modules, drag and drop the three modules to the Canvas panel. Both the module name and the info window will be moving. Alternatively, select them at once by enabling Multi-select, selecting the modules and clicking “Add”.
Then, connect them by dragging an input to an output to make sure that data from one module will be passed to the other.
As seen in the earlier section:
The example above can be divided into three sub-tasks:
- Generate random numbers
- Apply function
y = 2x
on the numbers- Print the result
Which means for our links:
- Connect
number-generator.random-number
todoubler.in
- Connect
doubler.out
tonumber-printer.number
![]() |
![]() |
![]() |
2.4. Saving and deploying the pipeline
Your pipeline is now ready to be saved. Click “Save” at the top-left of your screen to start the saving process.
Saving takes a short moment. Once it is done, you will see the following message pop up:
Let’s head back to the landing page by clicking the ECiDA icon at the start of the top bar. If you haven’t closed the previous save message yet, you can also click the “projects” link.
ECiDA has pushed your pipeline to its Git deployments repository, and it will take a short moment before it is shown in the pipeline table. You will see it pop up quickly under the name you have given in the editor. By default, the table sorts pipelines by modification date, so your new pipeline should show up (somewhere) at the top.
Saving a pipeline does not yet mean it is running. To run, or “deploy” the pipeline, click the “Manage” button for your pipeline. The Manage panel will now be visible. Here, all available environments are shown in which you can deploy your pipeline. Environments are, as the name suggests, different environments in which you can run your pipeline. These may correspond to, for example, a production vs a development setting.
Let’s click the rocket “Deploy” button for the “development” environment. You will see a badge pop up next to it: “to be deployed”. Click “Apply (1)” to send your changes to the system.
![]() |
![]() |
Once your pipeline is running, you will see the “Deployed” badge under “Status”.
![]() |
![]() |
That’s all for deploying a pipeline. The Manage panel can be closed.
2.5 Monitoring pipeline execution
Similar to saving pipelines, for deploying pipelines, it takes a bit before ECiDA displays your deployed pipeline on the pipelines page. Once available, you will see the same “development” badge from the Manage panel now pop up under “Environments” of your pipeline. Let’s click the badge to open the pipeline viewer.
![]() |
![]() |
The pipeline viewer is the central place for inspecting and monitoring running pipelines. We see the same graphical representation of our pipeline, but now with an extended Inspect panel showing various tabs for inspection and monitoring.
For now, the Logs panel is selected by default. Select the number-printer
module in the canvas to view the logs. You should be seeing the numbers flowing in!
It may take a bit before the module has fully started. In the meantime, you may see a ‘Failed to load logs’ message. If it takes too much time, something in your pipeline or modules may have gone wrong. Please refer to Troubleshooting for more information.
That’s it! What’s next?
That’s it! You have completed the ECiDA quickstart guide. You now know the basics of creating modules and pipelines in ECiDA, and how to manage them. To go deeper, you can use the advanced tutorial to explore features such as configurable modules and pipeline-level metrics. Or, you can access the how-to-guides for a complete overview of the specific features in a more general use-case based setting.
Need help along the way? You can access this documentation anytime through the in-app Help button. For direct support, check out our Slack channel or take the guided tour in the app.
We hope ECiDA becomes a helpful and enjoyable part of your data science journey!