Reference#

NeMo Curator on Kubernetes

Demonstration of how to run the NeMo Curator on a Dask Cluster deployed on top of Kubernetes

NeMo Curator and Apache Spark

Demonstration of how to read and write datasets when using Apache Spark and NeMo Curator

Best Practices

A collection of suggestions on how to best use NeMo Curator to curate your dataset

Next Steps

Now that you’ve curated your data, let’s discuss where to go next in the NeMo Framework to put it to good use.

Tutorials

To get started, you can explore the NeMo Curator GitHub repository and follow the available tutorials and notebooks. These resources cover various aspects of data curation, including training from scratch and Parameter-Efficient Fine-Tuning (PEFT).

API Docs

API Documentation for all the modules in NeMo Curator