Bringing machine learning (ML) models into production is often hindered by fragmented MLOps processes that are difficult to scale with the underlying data. Many enterprises stitch together a complex mix of various MLOps tools to build an end-to-end ML pipeline. The friction of having to set up and manage separate environments for features and models creates operational complexity that can be costly to maintain and difficult to use.
Snowflake ML is the integrated set of capabilities that enable developers to securely build, deploy and manage ML data, features and models at scale on a single platform. At Snowflake Summit 2024, customers, including Ecolab, Classy and Purpose Financial, shared how Snowflake ML is already enabling data scientists and ML engineers to expedite time to insights.
To further enhance this experience, we launched a suite of powerful new integrated MLOps capabilities:
Collaborative Model Management: Snowflake Model Registry is generally available for scalable model management and inference.
Continuously Updated ML Features: Snowflake Feature Store is in public preview for integrated management and serving of consistent, fresh ML features for use in downstream ML pipelines across training and inference.
Governed ML Traceability: ML Lineage, in private preview, offers visibility into the usage of objects and artifacts across development and operations.
With these new features in Snowflake ML, developers can leverage centralized, governed MLOps for faster management of features and models. Teams can interact and manage these objects using Snowflake’s unified UI or from any notebook or IDE, using intuitive Python APIs.
Collaborative ML model and custom LLM management with Snowflake Model Registry
In ML pipelines, a model object is the central artifact and point of handoff between ML development and production. Many customers are already using the Snowflake Model Registry to operationalize their ML models, including Lessmore. Lessmore is a mobile games company that uses the Snowflake Model Registry with Snowflake Streams to streamline their end-to-end ML workflows in Snowflake:
“At Lessmore, leveraging the Snowflake Model Registry has transformed our model development and experimentation process for our customer lifetime value forecasts. This shift has not only accelerated our innovation cycle but also reduced our costs by a factor of 10, while enhancing efficiency.” —Moritz Schöne, Head of Data Science, Lessmore
Reliability of models in production is predicated on disciplined and well-governed model management. The Snowflake Model Registry, in general availability, provides a centralized repository to manage all models and their related artifacts and metadata. The Model Registry allows customers to securely and flexibly manage and run predictions over models deployed on Snowflake warehouses, with support for models deployed into Snowpark Container Services coming soon. The Model Registry supports models trained or available on a variety of platforms including Snowflake with Snowpark ML Modeling, ML platforms from cloud providers and other external platforms like Dataiku, or open source repositories such as HuggingFace. To provide a unified experience for all custom models, support for LLMs fine-tuned using Snowflake Cortex Fine-Tuning is coming soon. A Snowsight UI for Model Registry is now available in public preview.
Models are first-class, schema-level Snowflake objects that provide fine-grained role-based access control (RBAC). The Model Registry lets users define their own versioning scheme, model lifecycle stages (using aliases) and custom model types. Due to this flexibility, Model Registry provides a foundational component for anyone building solutions, applications and services on top of Snowflake.
The Model Registry can be accessed through APIs in Python and SQL for inference with CPUs or GPUs. The Snowflake Model Registry can be used from the UI in Snowsight, directly using SQL, or through the Python API from Snowpark ML library, which comes preinstalled into Snowflake Notebooks (public preview) or can be downloaded onto any IDE of choice.
Figure 1. Snowsight UI Models tab shows all available models across databases and schemas with access from the user’s role
Figure 2. Model metadata details and versions in the Snowsight UI
Single source of truth for all ML features with Snowflake Feature Store
Feature engineering at scale can be difficult with redundancy and inconsistencies between training and serving pipelines. The Snowflake Feature Store (public preview) is an integrated solution used to define, manage, store and discover ML features. This unifies pipelines across features created in Snowflake or external tools, like dbt, for a single and up-to-date source of truth for model training and inference. Teams can visually interact with Feature Store objects and their metadata from a new UI (private preview) in Snowsight. For a code-oriented approach, the Python APIs exposed from Snowpark ML can be used from Snowflake Notebooks or any IDE of choice to create, manage and retrieve features.
Feature Store supports automated, incremental refresh from batch and streaming data sources, so that feature pipelines are only defined once and are continuously updated on new data. The Feature Store supports backfill and point-in-time correct lookup, using the new performant and scalable ASOF JOIN capability in Snowflake. Data scientists can create training data sets for use in Snowpark ML or retrieve features in batches for external training. All of this is secured and governed with full RBAC in Snowflake.
Figure 3. Snowsight UI Features tab includes views for entities and features.
Govern and trace lineage of ML models and features with Snowflake Horizon
Because the ML development process is iterative and production pipelines can be complex with a large set of dependencies, it is critical to track the usage of objects and artifacts through the entire ML lifecycle. Snowflake’s ML Lineage, now in private preview, helps teams trace the end-to-end lineage of features, data sets and models, from data to insight, for seamless reproducibility, compliance and simplified observability that extend Snowflake Horizon’s governance capabilities to ML artifacts. End-to-end visibility helps teams develop better ML solutions, quickly debug problems with models and manage traceability of ML workflows for audit and compliance needs. To simplify access, a graphical UI for ML Lineage will be available soon in Snowsight.
Figure 4. ML Lineage traces objects and artifacts throughout the ML workflow
Learn more
Snowflake ML continues to make it simpler to operationalize scalable ML workflows on a single platform. Our MLOps announcements are complemented by advancements in Snowflake Notebooks, which can be used with Container Runtime, a new code execution environment in private preview. Snowflake Notebooks with Container Runtime provides optimizations for data loading from Snowflake, out-of-the-box distributed training support, automatic lineage capture and Model Registry integration. Check out the following resources to learn more about end-to-end ML capabilities in Snowflake: