This quickstart guide walks you through the basic functionality of ECiDA. You’ll learn how to connect to an existing ECiDA cluster and build a simple data science pipeline with the ECiDA concepts. Through a simple example, you will learn how to transform a data science plan into basic reusable modules, combine them through links in a pipeline, and execute this pipeline in an environment. By the end of this guide, you will have a working example that generates numbers, transforms them, and prints the results.

For more advanced workflows and customizations, please have a look at the various how-to guides.

For this quickstart guide, you need to have the following ready:

  • Basic Python knowledge
  • Basic Git knowledge
  • An ECiDA installation ready to be used

Creating an example pipeline

Step 1. Creating modules

Our task will be to apply a doubling function to a stream of incoming random numbers between 1 and 10 and printing this result:

  y = 2x
  

One of ECiDA’s powers comes from the idea of modularity. Modules that are self-contained units of code that perform a specific task. Modules can be very simple or very complex, but when combining specific modules with each other in a pipeline, more complex algorithms can be made in a way that allows reusability of code or sub-algorithms and the ability to interchange specific modules in an experimental setting.

The example above can be divided into three sub-tasks:

  • Generate random numbers
  • Apply function y = 2x on the numbers
  • Print the result

We will implement these as modules. Each of these steps will be implemented as a separate module. This modular approach not only simplifies the design but also makes the components reusable, interchangeable, and easy to test in isolation.

1.1. Setting up the sources project and local environment

The source code for ECiDA modules is stored in a project under a sources group on GitLab. GitLab is an online instance of Git, which ECiDA uses to manage your project files.

Let’s create such a project for the modules of this quickstart guide. ECiDA should already be installed and ready by your IT department, and you should have been given the link to your company’s sources group.

image

Inside the sources group, create a new project by clicking the blue “New project” button on the top right of the page, and then “Create from template”. This will show all templates available for you to use as a basis for your project. ECiDA provides a template to help you get started with creating modules. Just like the sources group, you should have been given access to this template. Click the “Group” tab to get ECiDA specific templates.

image

Click “Use template” for the “ecida-project” repository in your respective group. Give your repository a unique name, such as “ECiDA quickstart [your name]”.

image

After the project has been made, you will find yourself at the GitLab repository page. Git links for cloning can be found under “Code”. For more information on Git, please refer to the Git tutorial.

Locally, clone the repository in your directory of choice:

  git clone [git link]
  

After the git project has been cloned locally, you will find a couple of files locally now. If your Python environment is all set up, make sure all dependencies are installed:

  
pip install -r requirements.txt
  

One of the dependencies is the ECiDA Python SDK. Using the SDK, you can program your data science logic just like a regular Python program with very little overhead.

For now, the important file is module.py which contains the template code for ECiDA modules to work with the SDK. We will modify this file to fit our needs for the example.

The file module.py contains a couple of sections:

  1. Module Definition (create_module()):

    • Defines the module name, version, and description
    • Declares input and output ports with their data types
    • Returns the configured module object
  2. Main Processing Logic (main(M)):

    • Takes the initialized module as input
    • Uses M.pull() to get input data from another module
    • Processes the data just like a regular Python program
    • Uses M.push() to send output data to another module
  3. Entry Point:

    • Creates the module
    • Initializes it
    • Passes it to the main function

1.2 Implementing modules using the ECiDA SDK

Let’s start with the central module: the function y = 2x. Rename module.py to doubler.py. For create_module, insert the following definition. Change “v1” to something that distinguishes your version from quickstart modules created by other users:

  
def create_module() -> EcidaModule:
    # Change the second parameter to something of your liking
    M = EcidaModule("doubler", "v1")
    M.add_description("Executes the function y = 2x.")

    M.add_input("in", "int")

    M.add_output("out", "int")

    return M
  

Next we can use the definition inside main to apply our main function logic:

  
def main(M: EcidaModule):
    logger.info(f"START MODULE {M.name}:{M.version}")
    
    while True:
        sample = int(M.pull("in"))

        logger.info(f"Got number: {sample}")
        result = sample * 2

        M.push("out", result)
  

Let’s do the same for the remaining two modules: a number generator and a number printer:

number-generator.py

  
from Ecida import EcidaModule
import logging
import time
import random

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def create_module() -> EcidaModule:
    # Change the second parameter to something of your liking
    M = EcidaModule("number-generator", "v1")
    M.add_description("Generates a random number between 1 and (including) 10.")
    
    M.add_output("random-number", "int")

    return M

def main(M: EcidaModule):
    logger.info(f"START MODULE {M.name}:{M.version}")

    while True:
        number = random.randint(1, 10)
        logger.info(f"Sending number {number}")

        M.push("random-number", number)

        # Wait a bit before continuing to the next number
        time.sleep(5)
      


if __name__ == "__main__":
    M = create_module()
    M.initialize()
    main(M)
  

number-printer.py

  
from Ecida import EcidaModule
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def create_module() -> EcidaModule:
    # Change the second parameter to something of your liking
    M = EcidaModule("number-printer", "v1")
    M.add_description("Simple number printer")

    M.add_input("number", "int")

    return M

def main(M: EcidaModule):
    logger.info(f"START MODULE {M.name}:{M.version}")

    while True:
        number = int(M.pull("number"))

        logger.info(f"{number}")


if __name__ == "__main__":
    M = create_module()
    M.initialize()
    main(M)
  

1.3 Saving and automatically compiling your modules

Commit your files and push the changes to the GitLab repository:

  git add .
git commit -m "Quickstart modules"
git push
  

Once this is done, ECiDA will automatically generate your code into modules that can be used inside the system.

image

Refreshing the repository page will show your latest commit on the top side of the screen, with an icon next to it indicating the progress towards preparing your module for ECiDA. Once the icon is a green checkmark circle, your modules have been compiled and ECiDA moves them to the cluster installation.

Step 2. Building pipelines in the ECiDA App

2.1. Navigating to the pipeline editor

With our modules ready to be used, let’s head over to the ECiDA App to combine them in a pipeline! The ECiDA App is your central place for creating, deploying and monitoring data pipelines through an intuitive graphical user interface.

To access the app, use the link custom to your installation. This link has been set up for you by your IT department, or by yourself if you manually installed ECiDA using the ECiDA Setup

Example:

  app.your-company.ecida.io
  

image

If you navigate to here, you will be first greeted on the landing page with a list of already created pipelines (or a message if none are available). Let’s go to the pipeline editor by clicking “New” to start creating your pipeline.

2.2. Creating a new pipeline

image

The pipeline editor is your graphical way of combining modules in a complex ML pipeline. Let’s start by naming our pipeline in the Inspect panel to something of your liking (e.g. “ECiDA Quickstart [your name]”)

2.3. Adding and connecting modules

Next, let’s add our modules via the Modules panel. This panel shows all modules available for you to use in your pipelines. You can search for them by name, or find them manually by scrolling the result list. Let’s search for our created number-generator, doubler, and number-printer modules.

Once you found your modules, drag and drop the three modules to the Canvas panel. Both the module name and the info window will be moving. Alternatively, select them at once by enabling Multi-select, selecting the modules and clicking “Add”.

Then, connect them by dragging an input to an output to make sure that data from one module will be passed to the other.

As seen in the earlier section:

The example above can be divided into three sub-tasks:

  • Generate random numbers
  • Apply function y = 2x on the numbers
  • Print the result

Which means for our links:

  • Connect number-generator.random-number to doubler.in
  • Connect doubler.out to number-printer.number
image image image

2.4. Saving and deploying the pipeline

Your pipeline is now ready to be saved. Click “Save” at the top-left of your screen to start the saving process.

Saving takes a short moment. Once it is done, you will see the following message pop up:

image

Let’s head back to the landing page by clicking the ECiDA icon at the start of the top bar. If you haven’t closed the previous save message yet, you can also click the “projects” link.

ECiDA has pushed your pipeline to its Git deployments repository, and it will take a short moment before it is shown in the pipeline table. You will see it pop up quickly under the name you have given in the editor. By default, the table sorts pipelines by modification date, so your new pipeline should show up (somewhere) at the top.

image

Saving a pipeline does not yet mean it is running. To run, or “deploy” the pipeline, click the “Manage” button for your pipeline. The Manage panel will now be visible. Here, all available environments are shown in which you can deploy your pipeline. Environments are, as the name suggests, different environments in which you can run your pipeline. These may correspond to, for example, a production vs a development setting.

Let’s click the rocket “Deploy” button for the “development” environment. You will see a badge pop up next to it: “to be deployed”. Click “Apply (1)” to send your changes to the system.

image image
Now, the badge has changed to a blinking “Deployment sent” badge. This means that ECiDA is currently aware of your request and will be deploying your pipeline.

Once your pipeline is running, you will see the “Deployed” badge under “Status”.

image image

That’s all for deploying a pipeline. The Manage panel can be closed.

2.5 Monitoring pipeline execution

image

Similar to saving pipelines, for deploying pipelines, it takes a bit before ECiDA displays your deployed pipeline on the pipelines page. Once available, you will see the same “development” badge from the Manage panel now pop up under “Environments” of your pipeline. Let’s click the badge to open the pipeline viewer.

image image

The pipeline viewer is the central place for inspecting and monitoring running pipelines. We see the same graphical representation of our pipeline, but now with an extended Inspect panel showing various tabs for inspection and monitoring.

For now, the Logs panel is selected by default. Select the number-printer module in the canvas to view the logs. You should be seeing the numbers flowing in!

That’s it! What’s next?

That’s it! You have completed the ECiDA quickstart guide. You now know the basics of creating modules and pipelines in ECiDA, and how to manage them. To go deeper, you can use the advanced tutorial to explore features such as configurable modules and pipeline-level metrics. Or, you can access the how-to-guides for a complete overview of the specific features in a more general use-case based setting.

Need help along the way? You can access this documentation anytime through the in-app Help button. For direct support, check out our Slack channel or take the guided tour in the app.

We hope ECiDA becomes a helpful and enjoyable part of your data science journey!