Built-in modules in ECiDA are reusable components designed to handle common data ingestion and processing tasks. They provide ready-to-use solutions that simplify pipeline development, especially when working with external data sources or services. This guide introduces each built-in module, outlining its purpose, configuration, inputs and outputs, and links to the implementation.


MinIO Data Loader

The MinIO Data Loader is a built-in module that simplifies loading data from a MinIO bucket. Instead of uploading files directly through Git, this module allows users to provide a reference path, enabling more flexible and scalable data ingestion.

Use Cases

  • Loading large datasets or files without committing them to Git.
  • Streamlining pipeline development by referencing external data stores.
  • Improving integration with MinIO or similar S3-compatible services.

Endpoints

Inputs:

  • None: This module does not consume any inputs.

Outputs:

  • minio-url: An ecida-s3://... string that represents the file location in MinIO. This reference can be used by downstream modules to pull the actual data.

Configurations:

  • local-minio-path: The full path to the file within the MinIO bucket.

The above specified path must exist and be accessible from the MinIO instance configured in the ECiDA environment. Files can be uploaded to a MinIO bucket via the MinIO Console. This interface allows you to create buckets or upload files to existing buckets. Access the console in the app by clicking “Files” from the top bar, or access manually via the URL that can be provided by you. This URL has the same structure as the app URL. For example, when the app is at app.dev.ecida.io, files can be explored via files.dev.ecida.io.

Credentials can be given to you to access the MinIO console. These should give you permissions to upload files. If you have not yet received your credentials, or are unable to upload files, please contact us at [email protected].

After logging in, you will be first greeted with the Object Browser where all available buckets are listed. Clicking a bucket opens it, showing all files present in the bucket. You can upload your files from this page by clicking “Upload” in the top right side of the page.

If you have permission to create buckets, you can do so via the Buckets page from the side bar. Here, all buckets are listed again, and you can create new buckets by clicking “Create Bucket” on the top right side of the page.

Source Code

You can explore the full implementation of this module on GitLab: View the MinIO Data Loader source code

Example Configuration

A typical configuration for the MinIO Data Loader would look like:

    local-minio-path: my-bucket/data.csv
  

This configuration will generate a corresponding ecida-s3:// reference URL that downstream modules can use to access the file.


API Caller Module

The API Caller Module is a built-in component that periodically fetches data from public APIs using configurable settings. It enables dynamic data ingestion by integrating external APIs into ECiDA pipelines. The module supports both authenticated and unauthenticated API access and includes basic error handling and logging.

Use Cases

  • Fetching real-time data from public APIs (e.g., weather, finance, or public datasets).
  • Integrating dynamic external data sources into analytics pipelines.
  • Periodically refreshing datasets from various APIs without manual intervention.

Endpoints

Inputs:

  • None: The module triggers itself periodically based on the configured interval.

Outputs:

  • api-response: The fetched API response (typically in JSON format).

Configurations:

The following configuration fields define how the API call should be made:

  • base_url: The base URL of the target API.
  • endpoint: The specific endpoint to be appended to the base URL.
  • params: Optional query parameters.
  • headers: Optional headers (e.g., for authentication).
  • method: HTTP method to use (e.g., GET).
  • timeout: Request timeout duration (in seconds).
  • interval: Interval between API calls (in milliseconds).

Source Code

You can explore the full implementation of this module on GitLab: View the API Caller source code

Example Configuration

A configuration for fetching weather data would look like:

    base_url: https://api.open-meteo.com
  endpoint: /v1/forecast
  params: {
    "latitude": 52.098,
    "longitude": 5.128,
    "current_weather": true,
    "hourly": "temperature_2m",
    "timezone": "Europe/Amsterdam"
  }
  headers: {}
  method: GET
  timeout: 10
  interval: 60000
  

This configuration fetches current weather data from the Open-Meteo API every 60 seconds.