Snowflake is a cloud based, managed data warehouse solution. Snowflake and other data warehousing solutions are moving towards decoupled compute and storage. Your data is stored in a centralized storage (like S3 bucket) in a distributed way. Whenever you want to perform any operation, you request the compute, perform the desired operation/transformation, and delete the compute. This way, you only pay for what you use. You don’t have to have dedicated compute reserved at all time.

In case of BigQuery, compute is also abstracted but in case of Snowflake, the decoupling between compute (aka virtual warehouse) and storage is much more apparent.

Some alternatives to snowflake are Amazon Redshift, Google BigQuery, Azure Synapse, and Databrick’s Data Intelligence platforms. Even though there are many competing products Snowflake has its own USP, biggest of all being simplicity. You can read about all supported feature.

<aside> ⚠️ Udemy course: Snowflake decode: Master the fundamental concepts

</aside>

<aside> 💬 ChatGPT : https://chat.openai.com/share/ec597b42-0cef-472c-ad95-191e89747b0e

</aside>

<aside> 🗒️ Note: Data Warehouse primarily support structured data and semi-structured data (like JSON). And they mostly provide SQL (plus some extensions) for interaction.

</aside>

Lets study snowflake in detail . . .

Architecture


Untitled

Snowflake has three major components:

  1. Storage Layer: Cost effective object store by any CSP.
  2. Compute Layer: AKA Virtual warehouse for analytics, loading/unloading data, and other development workloads.
  3. Cloud Services Layer: includes Web UI, Security, connectivity (SnowSQL, drivers and other clients), Data sharing, MetaData, ACID, etc.

Snowflake has a Multi-cluster, shared data architecture. In simple terms, you can have multiple compute clusters sharing the same data. Read More

<aside> ⚠️ Each Snowflake account is hosted in a single region. If you wish to use Snowflake across multiple regions, you must maintain a Snowflake account in each of the desired regions.

</aside>

Micro-partitioning


IMO, this is one of the best features of snowflake. This is so fundamental to snowflake that many of its unique features like Time Travel, Fail Safe, and Zero Copy Cloning are only possible because of micro-partitioning.