Skip to main content Skip to secondary navigation

Snorkel

Main content start

Snorkel is a system for programmatically building and managing training datasets.

In Snorkel, users can develop training datasets in hours or days rather than hand-labeling them over weeks or months.

Snorkel currently exposes three key programmatic operations: labeling data, for example using heuristic rules or distant supervision techniques; transforming data, for example rotating or stretching images to perform data augmentation; and slicing data into different critical subsets. Snorkel then automatically models, cleans, and integrates the resulting training data using novel, theoretically-grounded techniques.

Snorkel has been deployed in industry, medicine, science, and government to build new ML applications in a fraction of the time; for more, see tutorials and other resources.