Data Processing Pipeline

This project is not a traditional data engineering pipeline like the ones used to move data between systems. Instead, it’s a pipeline specifically designed to process data for deep learning workflows. It focuses on tasks like loading, preparing, transforming, and augmenting data right before it’s fed into a machine learning model.

Key Features

Deferred Execution: The pipeline supports deferred execution, allowing users to define the pipeline without immediately executing it. This feature is useful for building complex data transformations and processing steps.
Flexible Input/Output Nodes: Users can define multiple input and output nodes to handle various data formats and workflows.
Modular Design: The pipeline consists of nodes that can be combined in various ways to form complex processing graphs.
Validator Integration: Built-in validators ensure that outputs meet certain criteria at runtime.

Example Use Case

Here’s a simple example of how to use the pipeline:

# Import the necessary module
import dl_data_pipeline as dp
from dl_data_pipeline.process_functions import process_2d

# Define the inputs for the pipeline
input_node1 = dp.InputNode(name="1")

# Pass the input through functions to create the graph
x = process_2d.open_rgb_image(input_node1)
out1 = process_2d.padding_2d(x, (256,256), fill_value = 0.0)

# Create the pipeline by specifying the inputs and outputs
pipe = dp.Pipeline(inputs=[input_node1], outputs=[out1])

# Call the pipeline with the required inputs and get the outputs
img = pipe("path/to/image.png")

documentation
GitHub
PyPI package