Dataverse
libraryOpen-source ETL pipeline for LLM data processing with a block-based interface. Supports multi-source ingestion, Spark-based distributed processing, and privacy-aware filtering. Accepted to NAACL 2025 Demo.
Paper
arXiv: 2403.19340
Venue: NAACL 2025