Is Data Too Expensive Part II
Learn how the Modern Data Stack is costing you way more then you need it to and how Artemis is changing that.
When Future announced their top 50 data startups for 2022, the first 10 companies were considered unicorns (Private companies with a valuation or $1 Billion +) by value. Snowflake had one of the largest tech IPOs for its time back in 2020 and ushered in what is now called the “data gold rush”.
With this gold rush, came huge amounts of innovation, which in today’s data landscape is called the Modern Data Stack. This is a collection of data tools from ELT, data warehousing, storage, transformation, scheduling, and business intelligence (There are more but this is the core). The best-in-class tools for these respective parts of the data stack, litter the top 10 of the top 50 data startups. The typical data stack looks similar to the image below.
This shift from older tools and processes is great for data teams and organizations around the world, providing access to process data faster and easier than ever before; except with every innovation comes its own wave of problems.
Leaking products, more problems.
First, the modern data stack began with companies attacking one specific problem or said another way – solving point solutions. Fivetran handles ELT, dbt simplifies data transformation and so on, however, as the industry evolves, we see a shift as these startups with massive valuations look to grow their user base and capture more revenue and market share.
Each point solution in this stack is fighting to keep users on their platform. This leads to products leaking into one another, making an already complicated industry even more opaque.
Tools are constantly confused with what they actually solve due to a lack of terminology and a cross-over of feature sets, which is only going to continue.
Pricing power? Or pricing weakness?
Second, the modern data stack suffers from a price illusion problem. Best explained by Benn Stancil, data tools push their true costs onto compute engines (Snowflake, Big Query, Databricks, etc.). How does it do this?
Let’s look a tool a standard data stack. A team uses Fivetran for ELT, Snowflake for data warehouse and compute, dbt for transformations and Tableau for BI (seen in the image above☝️)
When you look at each piece, they are cheap in essence, what is not shown is the price to run these platforms on top of your compute engine.
- Fivetran syncs data on an hourly basis - For every sync you need a warehouse to spin up to ingest the data. For every sync, you are paying for the sync + the warehouse cost each hour, making your snowflake warehouse run 24/7 (Syncing every hour) and raising your bill.
- Snowflake - the platform is “pay-as-you-go” but as you connect more tools, it begins to turn from pay-as-you-go platform to a always on platform. Meaning the price analysis you do when you signed up is no longer accurate.
- dbt costs $50/seat and sits on top of your Snowflake warehouse. Once again, every single time you run a model, it requires Snowflake to spin up a warehouse, meaning that $50/month turns into hundreds of dollars in Snowflake bills are you run more and more models, hoping for better insights.
The last piece of this price issue is the people bloat begins to take over. We’ve seen it before, company X gets a ton of new funding, starts hiring analysts, engineers and expands tooling. This might sound good, but here is the problem, with more people on staff and more tools, more people are querying data, more tools are running in the background, the more warehouses and compute credits are being spun up, the more versions of truth are being developed and more datasets being edited.
This leads to an exponentially rising cloud bill, higher levels of confusion, and less results.
Unfortunately, in today's world, this horrible cycle continues as organization’s believe the solution to this problem is to add more analysts to “simplify” the process.
This is a very simple example, but it shows how compute bills easily creep up and double and triple without organizations noticing. Simply said, the true cost of these tools is the tool + the compute needed for the tools to work. This is a much different price when you factor all the pieces into your data stack, especially when you add the price for the higher head count.
The New Age of Data
At Artemis, we are a part of a new age of data tools. One where we prioritize simplicity and effectiveness for our users and ensure they can achieve their data goals and operations without needing thousands of dollars or being backed by a huge VC.
This is why we ensure that our product is simple, aggressively low cost, and provides immense value to our users. We do this by:
- Integrating your data stack for you. No need for multiple tools or large vendor contracts.
- Optimizing workflows so you no longer need to run your warehouse 24/7 and are in complete control.
- Simplifying our platform so you only need 1-3 people managing your entire data operations.
- Aggressively low cost (5x cheaper) to ensure our users can solve their data problems without breaking the bank. (Even cheaper if you use our lakehouse)
- Provide the true cost of our platform. Having our platform work together to solve problems, not add costs – We do not hide our costs in your compute engine.
Put another way:
Get Started Today!
Head over to artemisco.ca and letting us know what you think.
Tweet @artemis_data to say hi. 👋
Follow us on LinkedIn.