{"id":29069,"date":"2020-03-31T12:54:33","date_gmt":"2020-03-31T16:54:33","guid":{"rendered":"https:\/\/centricconsulting.com\/?p=29069"},"modified":"2023-08-31T10:32:25","modified_gmt":"2023-08-31T14:32:25","slug":"what-should-i-know-about-snowflake","status":"publish","type":"post","link":"https:\/\/centricconsulting.com\/blog\/what-should-i-know-about-snowflake\/","title":{"rendered":"What Should I Know about Snowflake?"},"content":{"rendered":"
Cloud data platform Snowflake<\/a> is growing quickly and getting a lot of press<\/a>. \u00a0So, what is it exactly, and is there more to it than a quirky name? How does it compare with data products from Microsoft, Amazon and Google? Should I use it, and what for?<\/p>\n Here are a few things you should know.<\/p>\n At heart, Snowflake is a database platform \u2013 think SQL Server or Oracle \u2013 purpose-built from scratch for the cloud. Its developers kept familiar concepts (tables, views, SQL queries) but threw out all assumptions about how databases traditionally work and embraced everything cloud computing offers.<\/p>\n Being cloud-native opens all kinds of interesting doors:<\/strong><\/p>\n Let’s look behind each of these doors.<\/p>\n Snowflake breaks data into small, immutable files stored as multiple copies in cheap and plentiful object storage (AWS S3, Azure Blob Storage, Google Buckets) designed to reliably hold and distribute enormous amounts of data (think Netflix). The largest single database<\/em> run by a Snowflake customer is four petabytes and growing.<\/p>\n It keeps storage completely separate from the compute power used to query it \u2013 unlike a server or a virtual machine, you don’t need to pay for more storage space to get a more powerful engine, or vice-versa. Snowflake calls its compute engines “warehouses,” to reinforce the idea that each department (or each user) can have “their own data warehouse.”<\/strong><\/p>\n When Snowflake refers to a warehouse, however, think only about the workers and forklifts \u2013 not the stuff. Any warehouse (compute engine) can process any data, and multiple<\/em> warehouses can access the same data at the same time without interfering with each other at all.<\/p>\n Under the covers, Snowflake uses an MPP architecture, similar to Hadoop. Picture the fiction section of a library, where they arrange all the books by author. Now imagine you want to find all the books with “dog” in the title. There might be a few dozen, but they’d be scattered all over the shelves, and you’d have to look at each title in turn to find them all. With MPP, when you say, “SELECT * FROM fiction WHERE title LIKE ‘%dog%’,” a whole bunch of librarians fan out, each checking one shelf and bringing the results back to you. Unlike Hadoop, however, you don’t need to learn a new way of asking \u2013 Snowflake is fully ANSI-SQL<\/a> compliant, so you can write the standard SQL queries you’re used to.<\/p>\n Query performance is, in a word, impressive. For example, one of our clients had a query that took three-plus hours in Oracle (when it ran at all), and now it runs in just about three minutes. There’s a bit of a catch, though: Snowflake optimized their platform for storing and querying large amounts of data, not for performing one-off transactions. Because it distributes the data in immutable files, updating individual records is relatively slow<\/strong>. You would never use it as the database for an interactive application. Snowflake is your database for analysis, dashboards, data science, and more.<\/p>\n While they may take a little longer to update, immutable files give you a couple of other interesting benefits. Because the original file never changes during an update:<\/strong><\/p>\n If you enable “time travel,” it will hang onto old versions of the files for a while, so you can query the data precisely as it was an hour, a day, or a week ago. This feature also means that you can “un-drop” a table after accidentally deleting it.<\/p>\n With compute separate from storage, you need to make some careful comparisons based on how you’ll use it. The storage is cheap. Snowflake passes through the cost of the underlying provider (currently $23 per TB per month for Amazon S3). The compute appears relatively expensive on a per-hour basis, but it doesn’t run 24 hours a day like a traditional database server.<\/strong> Think of it more like an automatic faucet. It starts when someone wants to use it and shuts down when they’re finished.<\/p>\n In the cloud data warehousing space, Snowflake most directly compares to AWS Redshift, Azure Synapse Data Warehouse and Google BigQuery. If you already have a successful Redshift, Synapse or BigQuery environment, you may be happy to stick with it. However, in our experience, Snowflake is easier to set up and maintain than any of those, as there’s effectively no maintenance other than managing usage and security. Snowflake’s<\/b> built-in logic takes care of indexing and query optimization, so there isn’t any traditional performance-tuning work.<\/strong><\/p>\nWhat Is It?<\/h2>\n
\n
Storage
\n<\/strong><\/h2>\nMassively Parallel Processing<\/h2>\n
Performance and Trade-Offs<\/h2>\n
Other Benefits<\/h2>\n
\n
Pricing Model<\/h2>\n
Cloud Considerations<\/h2>\n