Dataverse
libraryOpen-source ETL pipeline for LLM data processing with a block-based interface. Supports multi-source ingestion, Spark-based distributed processing, and privacy-aware filtering. Accepted to NAACL 2025 Demo.
Paper
Library
Stars 564