Xcelyst Partners

In today’s enterprise technology landscape, the pressure to rationalize spend has never been higher. Whether it’s in the wake of economic uncertainty, shifting business priorities, or simply the demand for greater ROI from IT investments, companies are taking a hard look at what they pay for and what value they actually derive. Yet, paradoxically, even as these businesses attempt to streamline and optimize, many find themselves more entangled than ever in complex relationships with data platform vendors. This paradox is driven largely by how data intelligence platforms are designed: not merely to serve, but to retain. Lock-in, stickiness, and expansive monetization strategies are no longer accidental; they are engineered.

From data warehouses to data lakes and analytics tools, the modern vendor toolkit is loaded with ways to build customer dependency. Platforms such as Snowflake, Databricks, Redshift, and BigQuery begin by addressing critical data storage and processing needs, but quickly expand into governance, AI/ML tooling, security, and real-time data streaming. These services may seem complementary at first, but over time, they create an ecosystem so convenient and tightly coupled that switching to another provider becomes painful, if not impossible.

Subscription-based models in data platforms are often tiered and usage-based. While they promise elasticity and pay-as-you-go benefits, they also introduce billing complexity. Enterprises often find it difficult to forecast costs, especially when usage spikes due to analytics workloads, model training, or real-time data ingestion. Vendors encourage enterprises to upgrade to higher tiers by offering marginal yet critical features—higher compute concurrency, faster query performance, or broader AI model integration. Once these features become embedded into business workflows, moving away from the platform becomes operationally risky.

APIs, once symbols of openness, are also strategically used to lock customers in. Many modern data platforms offer robust APIs for ingestion and processing but restrict high-performance export or transformation capabilities. For example, a platform like Snowflake might allow for seamless data onboarding, but exporting large volumes of historical data or moving transformation logic to another environment can become technically and financially taxing. Over time, enterprises find themselves architecturally bound to these platforms, even if alternatives become more attractive.

Cloud-native data platforms amplify this challenge. Solutions like Amazon Redshift or Google BigQuery come deeply integrated with their respective cloud ecosystems. While this tight integration improves performance and manageability, it also makes cross-cloud or cloud-neutral strategies harder to execute. Data gravity becomes a real issue—once petabytes of data are housed in a particular cloud’s managed platform, the costs and risks associated with moving it elsewhere are enormous. Moreover, associated services such as governance policies, role-based access control, and ML workloads often rely on proprietary features that don’t translate cleanly across clouds.

Vendors also build lock-in through vertical integration. Confluent, for example, has built a fully managed streaming data platform around Apache Kafka, bundling security, monitoring, governance, and schema registry into its cloud product. While this reduces operational overhead for data teams, it also means that migrating to open-source Kafka or another stream processor would require re-engineering not just code but operational practices.

AI and ML add another layer. Databricks, with its Unified Analytics Platform, integrates deeply across the ML lifecycle—from data prep to model deployment. While this accelerates experimentation and collaboration, it often means that model metadata, training pipelines, and performance metrics reside within the same vendor’s control plane. Extracting these components to run independently or in another environment may involve non-trivial refactoring and performance tuning.

Meanwhile, vendors use up-sell and cross-sell strategies to deepen entrenchment. A business might start with data storage but soon find itself using the same platform for transformation (dbt integrations), machine learning (MLflow), and even BI. Over time, the organization’s entire data lifecycle—raw ingestion to actionable insight—is dependent on one vendor’s tools and semantics. The illusion of a seamless data experience hides the underlying fact that alternatives have become prohibitively expensive to explore.

Some platforms, however, are more open than others. Databricks, while offering tight integration, supports open formats like Delta Lake and open-source engines like Apache Spark, making it somewhat easier to adopt a hybrid or multi-platform strategy. Snowflake, on the other hand, has a more closed architecture where most compute and transformation must happen within its ecosystem. Redshift sits somewhere in between, with SQL-based compatibility and some extensibility but close integration with AWS. Confluent, although based on Apache Kafka, layers in many proprietary enhancements that can make migrations tricky. In contrast, platforms like Presto, Trino, and open-source lakehouse solutions like Apache Iceberg or Hudi offer much more modularity and interoperability, giving enterprises the flexibility to switch components as needed without wholesale re-platforming.

To counter this, enterprises must be deliberate about architecture and strategy. Rather than choosing platforms solely based on initial convenience, they should evaluate long-term control, data sovereignty, and portability. This starts with adopting open standards. Data formats such as Parquet and Avro, query engines like Presto or Trino, and orchestration frameworks like Apache Airflow provide vendor-neutral building blocks that reduce reliance on proprietary ecosystems.

Data infrastructure should be designed to support modularity. For instance, using an open data lakehouse pattern allows organizations to separate storage from compute. Tools like Delta Lake and Apache Iceberg enable transactional semantics on object storage, allowing companies to run Spark, Presto, or even Redshift Spectrum on the same underlying data. This flexibility ensures that switching compute engines or adding new analytical capabilities doesn’t mean rehydrating or reformatting the entire data corpus.

Contractually, enterprises should demand clarity on data ownership and portability. Every engagement with a data platform should include provisions for full data export in usable formats, documentation for pipeline migration, and guarantees around access to metadata, logs, and configurations. Enterprises should also routinely test their ability to extract and redeploy data and workloads elsewhere, just as they might perform DR drills.

Building internal capability is equally essential. Enterprises should avoid letting vendors dictate tooling or architecture. Instead, they should cultivate in-house skills in open-source alternatives like Apache Kafka, Spark, and DuckDB. By investing in platform-agnostic skills and tooling, organizations retain flexibility and reduce dependence on any single vendor’s roadmap or pricing decisions.

Lastly, the narrative of innovation must shift. Many vendors pitch themselves as accelerators of digital transformation, but true innovation comes from how enterprises leverage data to create unique value. This requires resisting the temptation to adopt features simply because they’re new or bundled. Enterprises should evaluate each addition to their stack against strategic goals—whether it improves customer experience, operational efficiency, or competitive differentiation.

In summary, the data intelligence platform market is rife with opportunities—and risks. Vendors are evolving rapidly to offer end-to-end capabilities, but often at the cost of lock-in and lost flexibility. Enterprises that design for portability, invest in internal skills, and insist on transparency will be best positioned to evolve on their terms. In a world where data is a strategic asset, retaining control over its flow, processing, and application is no longer optional—it’s a core business imperative.

Back to Insights