Open-source ETL pipeline for LLM data processing with a block-based interface. Supports multi-source ingestion, Spark-based distributed processing, and privacy-aware filtering. Accepted to NAACL 2025 Demo.

Paper

Venue NAACL 2025

Library

dataopen-sourceresearch