Data-driven organizations choose Snowflake because it offers data storage, processing, and analytics that are some of the fastest, most flexible, and easiest to use on the market. It allows users to collaborate logically and globally, store and access structured, semi-structured, and unstructured data in one place, and has nearly unlimited resources from its cloud-focused architecture.
If you are reading this, you probably know the importance of ensuring data quality on Snowflake.
Today, we are announcing the release of a new capability in our data quality stack: native pushdown on Snowflake. This will allow our customers to set realistic expectations and make smart decisions with their data, informing them if it's in good shape for its intended purpose. Let's learn more about this exciting new feature.
What is pushdown for Snowflake, and how does it work?
As a Snowflake partner, Ataccama has invested in native integration for the most efficient and cost-effective data processing on Snowflake. This means your data is processed on Snowflake and won't be moved anywhere.
How does data quality assurance work with others tools, and why is pushdown such a game changer?
Without it, you would need to do everything in a separate data processing engine. Besides investing in a data quality solution with a rule library and deployment components, you'll need to:
- Set up dedicated infrastructure like a dedicated server, Spark cluster, or other data processing technology.
- Connect that cluster to your Snowflake instance and allow data transfers between the two.
- Every time you need to perform a regular monitoring check on Snowflake (to measure data quality metrics or do a data transformation job), you will need to copy data off Snowflake into the data processing cluster, calculate the results or transform data, and then send transformed data or results of data quality checks back to Snowflake.
Dedicated processing engines are typically well-tuned for specific data processing jobs, offering unmatched performance in a given area. However, users should be mindful of the disadvantages and risks data transfer between the source and engine could bring - including the time needed for the transfer (total performance impact) and the data security aspect.
With pushdown, all data quality checks happen natively on Snowflake, allowing you to assess data quality without having to transfer data between Snowflake and a data processing engine. Ataccama data processing jobs are translated/transformed into Snowflake jobs, utilizing basic SQL, special user-defined functions (UDFs), and the Snowpark library for more complex logic.
The best part – all the data quality rules are written in the Ataccama tool and translated to Snowflake automatically, allowing you to use Ataccama's user-friendly interface to set up data quality protocols. Otherwise, you would have to code them into Snowflake manually. Once you set up your DQ rules in Ataccama ONE, your data will never have to leave Snowflake to assess its quality.
Features and benefits of pushdown for Snowflake
Pushdown is the best choice for managing DQ on Snowflake because Ataccama is already highly advanced at managing data quality, and you can now run it natively (using pushdown mode). We offer speed, security, infrastructure, and ease of use advantages.
By utilizing the distributed processing power of snowflake, pushdown offers processing times that are significantly faster (you can evaluate 150 million records in 15 seconds) than processing with a single engine.
With pushdown, you do not need to transfer large datasets outside of Snowflake, which can be risky in terms of data security. With the usage of external processing engines, you are moving data somewhere else: so to ensure security and compliance, you need to, e.g., verify compliance with your internal policies (such as PII data handling and storing, GDPR-related requirements, geo-restriction (data not moving particular region), set appropriate permissions (consider system accounts/user accounts) for the engine, ensure the desired level of encryption during the transfer, ensure the means of data transfer (e.g., ports) are secured. All of this is, of course, possible to secure, but it adds a lot more work compared to keeping everything inside of Snowflake, where security is already guaranteed. Learn more about Snowflake's security here.
By employing Snowflake's existing scalable infrastructure, pushdown allows users to evaluate their data and gain insights without needing to set up and manage additional large data processing servers.
Ease of Use
Necessary data quality configurations are created in Ataccama's user-friendly environment. Knowledge of Snowflake and how to work with it is optional. All of Atacama's data quality-related features (such as DQ rules) can be used for Snowflake pushdown without needing to create anything new or put in any additional effort. You can also reuse the rules you created in Ataccama ONE for other data sources.
How to get started with pushdown for Snowflake
Once your Snowflake data source is connected to the Ataccama platform, setting up pushdown is as simple as one click. Go into your data source's configuration and select the "pushdown" option, and it will immediately allow you to run data quality checks natively on Snowflake.
Making high-quality data decisions on Snowflake has never been easier. Using Pushdown, you can understand your stored data faster and more securely in a scalable and easy-to-use way. You'll minimize risks and costs while utilizing the flexibility of your Snowflake infrastructure with our in-place processing. The best part – your data never leaves Snowflake! Learn more about our partnership with Snowflake here.