This project is not a traditional data engineering pipeline like the ones used to move data between systems. Instead, it’s a pipeline specifically designed to process data for deep learning workflows. It focuses on tasks like loading, preparing, transforming, and augmenting data right before it’s fed into a machine learning model.
Key Features
- Deferred Execution: The pipeline supports deferred execution, allowing users to define the pipeline without immediately executing it. This feature is useful for building complex data transformations and processing steps.
- Flexible Input/Output Nodes: Users can define multiple input and output nodes to handle various data formats and workflows.
- Modular Design: The pipeline consists of nodes that can be combined in various ways to form complex processing graphs.
- Validator Integration: Built-in validators ensure that outputs meet certain criteria at runtime.
Example Use Case
Here’s a simple example of how to use the pipeline:
# Import the necessary module
import dl_data_pipeline as dp
from dl_data_pipeline.process_functions import process_2d
# Define the inputs for the pipeline
input_node1 = dp.InputNode(name="1")
# Pass the input through functions to create the graph
x = process_2d.open_rgb_image(input_node1)
out1 = process_2d.padding_2d(x, (256,256), fill_value = 0.0)
# Create the pipeline by specifying the inputs and outputs
pipe = dp.Pipeline(inputs=[input_node1], outputs=[out1])
# Call the pipeline with the required inputs and get the outputs
img = pipe("path/to/image.png")