The Innovators’ Dilemma in Cloud
Being a founder in the data space has been incredible, but one thing is evident. The data industry is filled with an excessive amount of tools and products.
Don’t believe me? Check out this link which goes through the industry and the chaos of tools available.
You’d think all data problems would be solved with so many tools. Not so fast; an Accenture study found that 74% of employees are overwhelmed when working with data, and 44% of a data engineer’s day is spent integrating and managing infrastructure. Safe to say data still has problems.
So what are the main issues? When we speak with users about their number one pain point, one common problem surfaces regardless of technical ability, company size, or priority: Data is siloed and takes a ton of time to work with. To fully understand why this is the case today despite the wealth of data tools, let’s rewind and travel back in time.
In March 2006, Jeff Bezos launched AWS, creating a tidal wave of innovation and imitation. Furthermore, AWS and its business model allowed almost every data startup to follow and reap massive profits from its users.
Let me explain.
To be more flexible than its incumbent Oracle (funny how AWS’s first data warehouse is named Redshift!), AWS launched as one of the first data operations platforms that could be paid for by credit card. Developers and companies could pay as little as a dollar if they choose. This experience was much more user-friendly than Oracle, which had an army of sales reps with large annual contracts and all the traditional enterprise sales tactics in their playbooks.
In principle, the concept was incredible — stop overpaying for what you are not using and only pay for what you need. At first, the innovation was insanely impactful; startups like Stripe, Coinbase, and Instagram could start with just a few hundred dollars in capital and low investment in their back-end infrastructure. Many startups today wouldn’t have survived the cost of building without this business model in the cloud to keep expenses low early on.
In 2008 Jeff Bezos spoke to the Y Combinator and shared a famous analogy.
Focus on what makes your beer taste better.
What Jeff is saying is to outsource your company’s infrastructure. Except when you do this, you are no longer controlling the one thing you can control — cost.
The Data Industry
The biggest issue I have uncovered in the past year doesn’t necessarily pertain to how teams work with data — it’s that data products and tools are not built for ROI; they are made to be slot machines.
For companies like Fivetran, Snowflake, dbt, Databricks, and other leaders of the Modern Data Stack, this means one simple thing: running queries and models = money printing machine.
Snowflake and Databricks make money every time a user queries their warehouse. Fivetran makes money from every row synced, and it doesn’t stop there. They charge more as you increase the number of tables and rows synced.
These products are inefficient for users because companies have to sacrifice their profits and growth to create the next generation of tools that will 10x users’ experience again. This is similar to why Oracle did not allow people to access their database using credit cards and self-service.
If Fivetran halved the number of tables they sync by aggregating them more smartly, this would be crushing their bottom line. For example, you must sync 800 tables to load in Salesforce data! If Fivetran consolidated these tables, the number of rows synced could drop by over 60%.
Snowflake and Databricks’ product innovation is hamstrung because they can’t improve query efficiency or performance too much, regardless of what they say at their annual data summits. Why? Because a 25% increase in query performance would result in a huge revenue decrease. The more queries, the better.
Now that we understand their business model incentives, we find ourselves faced with the innovator’s dilemma in the cloud.
Marc Andreessen writes about this topic, with the conclusion being as companies scale to the enterprise level, they should bring the cloud on-prem and in-house.
David Heinemeier Hansson published a blog about why 37Signals (creators of Basecamp and Hey.com) are moving from the cloud to their own data center.
While the cloud has dominated and Databricks and Snowflake are on the rise, I think now is a perfect time to make a product that changes the game by playing by a different set of rules.
The industry currently has three significant issues (or big opportunities!):
- Product and User alignment is fundamentally off.
- Recession = Audits + Financial Reasoning.
- Users still have siloed data.
#1 - Product and User are Misaligned
Data is structured and managed to maximize company profits, not to serve customers.
Cloud companies offer incentives to encourage teams to spend more money. For example, AWS, Azure, and GCP give $100k in free credits. The goal is to make you scale up your servers to a high monthly cost, so scaling back down is challenging.
Here are more quick examples of how the current data industry is misaligned:
- Integration tools scatter data across numerous tables (More rows = $$$)
- Force teams to query multiple times to join tables back together = $$$
- Increase storage costs = $$$
- Schedule queries for cleaner tables using dbt = $$$
Packy Macormick chats about how Google is built to drive search, not efficiency, since search is how they make money. Cloud platforms are designed for teams to query data, not to be efficient. Think of how many tabs you have open in your browser. To get anywhere, you need to open a new tab and search, an inefficient workflow but a multi-billion dollar one. This is where opportunity starts.
#2 - Recession = Audits + Financial Reasoning
Over the past ten years, companies have been focused on growing as quickly as possible. This has led to many companies using over 60-300 SaaS tools on average and has caused industries like Fintech, Data, and AI to become fragmented into single-purpose products. Artemis was created to solve this problem. We started as a data platform that bundled best-in-breed tools, but even with the best tools, our users returned to the main issue, siloed data.
Recently, due to the financial shockwave, many teams have been forced to cut back on spending. This means that many single-purpose tools will no longer be used. Teams will only pay for tools that solve their pain points and provide tangible value. Unfortunately, despite so many data tools available, teams still struggle to get basic answers to their questions.
#3 - Increase in tools = Increase in Nightmare
While tools are being audited and reigned in, startups and enterprises are still entrenched with various and will continue using them. This forces teams to centralize data for easier analysis and better insights. Still, it means that data is overlapping and needs to be cleaned and ironed out to be truly valuable.
While the increase in tools contributes to the siloed data problem, it is not where the problem lies. Today, centralizing data in a warehouse is 100x more effortless than a decade ago with a flurry of integration startups such as Fivetran, Airbyte, Estuary, Portable, etc.
Siloed data is no longer trapped in SaaS tools; it lives in organizations databases.
Making it easier to centralize data in a database only shifted the problem of data siloes further downstream. The chaos now starts in your database with teams asking: what do we do with the thousands of tables in our database?
How HubSpot labels and builds tables is fundamentally different than how Stripe does. Every table has a different data model depending on where it came from. A contact in the HubSpot table might be called
property_contact, while in Stripe, a contact is labelled as
The difference in underlying models means teams must wrangle the underlying data each time it wants to work with data across platforms. This problem only grows as teams centralize more and more data. This singular issue costs teams hundreds of hours and dollars.
The New Data Paradigm
What does this mean for teams?
Nearly every business depends on data, but the current options need reinvention. Vertical SaaS tools offer powerful data models but are rigid to use and limit teams to specific workflows. Customizing them can take hundreds of hours and tens of thousands of dollars in consultants. While they can be incredibly powerful, the user experience is often lacking (i.e. Salesforce).
On the other end of the spectrum, horizontal data tools (i.e. Databricks, Snowflake, dbt, etc.) offer more flexibility and a better user experience, but they aren’t as intuitive. They push the data modelling on the end user and take more of a one-size-fits-all approach.
What is Artemis doing?
That's why we at Artemis are working hard to tackle this head-on with a powerful AI-generated data model that works with any data across your database. In other words, we are creating the world's first self-organizing data workspace.
Artemis helps operations teams say goodbye to struggling with single-table analysis and constant table joins. Our AI seamlessly combines data from hundreds of related tables across your database, eliminating data silos and dramatically decreasing time to insight.
Our goal is not to build a better way to build more beautiful charts. Our goal is to empower users to work with data in ways they never thought were possible. We're solving the deeper problems, not just pushing the challenging data tasks onto the user.
The about section on Attio’s website says it best: There’s a revolution happening in business software right now. Notion is changing how we organize, Figma on how we design, and Slack changed how we communicate.
If we were to add anything, it would be that Artemis is changing how we analyze data.
Connect with us on Linkedin
Follow us on Twitter @artemis_data
Learn more at www.artemisdata.io