
14 minute read
BUILD LESS, DELIVER MORE IN TRADITIONAL SECTOR S: THE SMARTER PATH TO DATA & AI ROI FOR MID-SIZED COMPANIES
TARUSH AGGARWAL is one of the leading experts on empowering businesses to make better datadriven decisions. A Carnegie Mellon computer engineering major, Tarush was the first data engineer at @Salesforce.com in 2011, creating the first framework (LMF) to allow data scientists to extract product metrics at scale.
Prior to 5X, Tarush was the global head of data platform at @WeWork. Through his work with the International Institute of Analytics (IIA), he’s helped over 2 dozen of the Fortune 1000 on data strategy and analytics.
He’s now working on 5X with a mission to help traditional companies organise their data and build game-changing AI.
You’ve focused heavily on traditional, non-digital sectors. Was this a conscious decision? How did you end up working mainly with real-world-first businesses?
The landscape has undergone significant changes over the last decade. Digital-first companies could adopt data and AI early because they had the resources and technical expertise. But traditional businesses are different.
These companies are naturally technology-averse. They employ mostly blue-collar and grey-collar workers who view technology as a risk. Their buying decisions prioritise risk minimisation – they buy what everyone else buys. Many have only recently invested in digitalisation through platforms like
SAP, Salesforce, and Oracle.
The result? Massive data silos and fragmentation. Different teams can’t access different datasets. Integrating new vendors with SAP and building custom APIs is expensive and complex. These businesses are now suffering from the very fragmentation they created.
However, there’s a tailwind: every company is considering AI. The reality is you can’t have an AI strategy without a data strategy. If you don’t understand your data, AI won’t help. Over the next five years, these companies will invest heavily in data platforms, cleanup, and AI products.
What are the main problems you see when traditional companies start from scratch with data and AI?
Let me break this down from a decision-maker’s perspective. Everyone thinks they’re sitting on a gold mine of data and can activate it overnight. The reality is very different.
Companies focus on the ‘last mile’ – AI applications that create value when embedded directly into business operations. Think supply chain optimisation, inventory management, customer churn prediction, or demand forecasting. This is where the real value lies.
But here’s the mistake: they skip the foundational work. Before deploying AI applications, you need clean, centralised data in an automated warehouse with structured models that give you a clear view of your business.
Instead, most companies buy Power BI – Microsoft’s popular enterprise tool – and connect it directly to SAP or Salesforce to build basic dashboards. They think this makes them data-ready, but it’s just lightweight reporting on top of existing systems.
That won’t work. Just like posting on Instagram doesn’t make you a marketer, a few dashboards don’t make you data-driven. Companies want the end result without investing in the foundation.
If a CEO or CTO at a medium-sized company calls you and says, ‘We’ve bought data tools like Power BI or Snowflake, even hired a data analyst, but we’re not getting any value. Nothing’s working as expected’ – what’s your response? A data warehouse is an excellent tool – it’s the foundation for storing all your data. A data analyst’s job is to analyse that data and generate insights. But here’s the problem: if you’re a traditional business, you likely have manual processes, data entry issues, and missing data gaps because some processes still run on Google Sheets.
With all these fragmented pieces, analysts can’t deliver insights because they need clean data as input. This isn’t a data insights problem – it’s a data foundation problem.
We’ve launched a data and AI readiness test: 5x.co. It asks 15 straightforward questions across four or five categories to show you exactly where you are in your journey. Are you at the infrastructure layer? How reactive versus proactive are you?
The key is understanding your current position. This is where fractional help – like a fractional chief data officer – becomes valuable. They can assess what the business wants to achieve, where you currently stand, and create a comprehensive roadmap.
You might have all the infrastructure – Snowflake, great BI tools – but without a clear direction, roadmap, and company-wide adoption, teams won’t use what you’ve built. Success requires infrastructure, topdown buy-in, execution, governance, and effective implementation.
Two years ago, I would have said, ‘Start building core data models’. Now I believe in understanding your holistic position first, then mapping where you want to go.
Without that strategy, there’s a big risk of executing in the wrong direction.
What are the main differences in data and AI strategy between a 500-person company versus a 10,000 or 50,000-person company? As companies grow, they pay an indirect ‘communication tax’ – it becomes harder to keep everyone aligned. Large companies typically break into sub-organisations or subsidiaries to manage this complexity.
Large companies inevitably have data silos. Marketing, finance, and product teams work in different tools, lack access to shared datasets, and use different metrics. Even ‘revenue’ isn’t standardised – you have financial revenue (money in), sales revenue (contracts signed), and pipeline revenue.
At 500 people, you’re still centralised with a unified data team, standardised tooling, and everyone reporting up similar chains. Large enterprises use everything – AWS, Azure, Google Cloud, Databricks, Snowflake – just because of their scale.
These structural differences create different requirements for data teams. A 500-person company might have centralised data teams with some decentralised support. Large enterprises are mostly decentralised with multiple independent data teams.
Do you think traditional businesses need different strategies compared to digital-first companies?
Traditional businesses have nothing in common with tech-first companies. The differences go far beyond products and go-to-market strategies – how you build solutions is fundamentally different.
Traditional businesses face massive integration challenges. They use non-standardised software, custom systems, and platforms like SAP, Salesforce, and Oracle that are incentivised to lock in your data. Salesforce just changed Slack’s APIs so you can’t extract your own messaging data. Digitalnative companies typically use AWS and tech-first services where data extraction is straightforward.
Traditional companies want managed products, not managed infrastructure. While platforms like Databricks, Snowflake, and Fabric offer great APIs and containers for building applications, traditional businesses won’t build apps. For conversational AI, a tech company will build knowledge graphs and custom chatbots. A traditional company wants it out-of-the-box: ‘Here’s my data, I want to speak to it’.
Digital companies embrace SaaS offerings as standard. Traditional businesses hesitate about full cloud adoption. When you’re buying five data tools, you’re introducing five different clouds. They prefer private cloud or on-premise deployments.
Traditional businesses want integrated services – handson support, custom use case development, and implementation help. Digital-native companies are comfortable building their own data teams.
These core differences show that building products for traditional businesses requires completely different approaches from building for tech-first companies.
Data Infrastructure For Traditional Businesses
What’s the minimum viable data infrastructure that a small or medium-sized enterprise needs to start seeing value, and how do you prioritise what gets built first?
Until recently, the core stack was about reporting on data. You needed a data warehouse to store everything, plus an ingestion tool because today’s average SMB has 10-15 data sources – Postgres, financial tools, Facebook ads, Google ads, Google Sheets, Salesforce, helpdesk software like Zendesk, and enterprise tools like SAP, Anaplan, or Lighthouse.
The traditional minimum was four tools: warehouse, ingestion, modelling with orchestration, and BI. But we’re entering an AI-first world where this is changing.
Conversational AI is becoming the primary way to interact with data. I’m less excited about checking dashboards every morning – I want to ask my data questions on Slack. ‘What happened in sales yesterday? How many meetings did we book this week?’ I want accurate responses and the ability to tag people directly in Slack, not wait for the data team to build dashboards.
We’re also seeing AI outputs become standalone applications. Instead of putting churn prediction results into a BI tool, we’re building churn prediction models with their own UI that live as separate applications in your data platform.
You can’t do AI without proper metadata and semantics. This is where you define business metrics. When I ask, ‘What was revenue last month?’ the system needs to know which of the three revenue types I’m referring to.
In the AI world, we now have seven or eight core categories: ingestion, warehouse, modelling, BI, semantic layer, conversational AI, data and AI apps, and data catalogue. You need all of them quickly, and you need them without spending time building and integrating tools yourself.
How long does the traditional approach of building data infrastructure typically take for companies with 300-500 people?
Building infrastructure has no business value. The right metric is ‘time to first use case’ – how long until you deploy your first production use case, which is your first sign of data ROI.
For a traditional 500-person company, this is typically a 6-12 month process. You start evaluating tools, Microsoft likely surfaces through some relationship with free credits and introduces you to a systems integrator who builds a stack on Microsoft. You might evaluate Databricks, Snowflake, Amazon, or GCP, but it’s fundamentally the same process – building multiple different components.
Six to twelve months is standard for the time to first use case today.
What’s the alternative, the new way of doing this?
We want to see your first use cases running in production within the first month. The issue isn’t that data teams are slacking during those 6-12 months – they’re managing integrations, connecting tools, and building pipelines. But none of that ties into meaningful work you actually need to do.
AI allows us to focus on what we need to accomplish, not worry about the infrastructure. When companies come to 5X, we talk about seeing first ROI – whether it’s a data app, AI application, dashboarding, or migration – within three to four weeks with productiongrade deliverables.
It sounds almost mind-blowing that something taking 6-12 months can be collapsed into a 6-week sprint. How are you generating efficiencies of this kind?
Let’s look at a typical mid-market manufacturing company. They have factories, run SAP, manage inventory, sell to customers, have various SAP integrations, and use Google Sheets.
With traditional platforms like Fabric or Databricks, they spend months pulling data from SAP using Azure Data Factory, building OData APIs, cleaning data, setting up Azure warehouses, configuring Power BI, orchestrating workflows, structuring data, and defining security permissions. They either need external consultants or spend additional months hiring and training a team on their chosen platform.
Your time is fundamentally spent on foundational pieces rather than business value.
We’ve built the 5X Platform on nine battle-tested open source layers with some proprietary components, ensuring no vendor lock-in and strong community support. We’re a managed product, not infrastructure. We provide 600 connectors out-of-the-box for SAP, Business Central, Oracle, Salesforce, and hundreds of others. If you need one we don’t have, we’ll build it.
Everything works together automatically – warehouse, modelling, and orchestration. No tool integration needed. Whatever you define in the semantic layer, you can speak to immediately. Ask questions about undefined metrics, and we’ll prompt the data team to add them.
We’re optimised for traditional company needs: data integration solutions, managed products over infrastructure, built-in services, and flexible deployment – cloud, private cloud, or on-premise.
The result? Projects quoted as two-year implementations, we deliver in six months on the 5X platform.
What led you to develop an all-inone data platform?
The data space is one of the most fragmented industries. We have hundreds of vendors across 10-15 core categories –ingestion, warehouse, modelling, BI, governance – and now AI is adding even more categories.
Our analogy: data vendors today are selling car parts. Imagine walking into Honda and instead of selling you a Civic, they sell you an engine and expect you to build your own car.
When we pivoted 5X to developing our platform, we had a thesis: companies won’t keep buying five different vendors and stitching them together. We’re entering the era of all-in-one data platforms. We launched our new platform in August – a true all-in-one solution across nine categories, packaged in a single offering.
And another key benefit is that you don’t have to deal with multiple vendors?
Exactly. We’re built on nine different open source technologies under the hood. Think of Confluent – the streaming company. If you’re doing streaming, you’re probably using Confluent. Under the hood, they use two open-source projects: Kafka for pipelines and Apache Flink for stream processing.
But Confluent built enterprisegrade capabilities around Flink and Kafka, delivering a single product with everything you need for streaming. You don’t think about the underlying open source components.
Similarly, 5X is the enterprisemanaged version of nine different open source technologies. You get a complete managed offering from day one without dealing with multiple vendors.
sign a 2-3 year contract, but it will change after that. Having your data and AI locked in is dangerous.
We’re fundamentally built on open source technologies. If we disappear tomorrow, you could take your GitHub repository with all your scripts, spin up the open source projects yourself, and import everything. No vendor lock-in.
More importantly, the space changes rapidly. When great new tools emerge, we can add them quickly. Companies building proprietary stacks take much longer, and it’s impossible to build 7-10 different categories proprietarily and remain competitive over a 10year window.
We don’t build categories –we study the market, find the best vendors, and deploy those solutions inside our platform. This is completely abstracted from you, but we’re delivering the current best-inbreed platform that improves daily.
DATA & AI TEAM SECTION
How does an organisation’s choice of platform impact how it should build a data & AI team?
It’s slightly different when you think about it from a data team perspective. You start with zero people, then one person, eventually maybe 5-10 people. But here’s what people miss: just because you’ve invested time building something doesn’t mean you should keep doing it.
The bigger issue companies don’t discuss enough is vendor lock-in. It’s very real. If you’re building on Fabric or Databricks today, these are proprietary platforms that could change pricing overnight. You might
The math is telling. Data teams spend about 20% of their time managing their data platform. At WeWork, we had a 100-person data team with 20 people dedicated to building and managing our platform. At $200k average US salary, that’s $4 million annually spent on managing infrastructure that’s not differentiated from what any other company builds.
This ignores the initial 6-12 month investment to build a basic platform. Even after that investment, maintenance consumes 20% of your resources – whether that’s one person or 10 people, it scales proportionally. Why continue this when you could use something like 5X and be ready out of the box?
With a fully managed platform approach, what size and structure should modern data teams have? How does this scale from a 100-person company to 500 people?
I don’t think in terms of data engineers, analysts, and scientists anymore – they’re essentially the same role. We’re entering an era where I look for data generalists. What is data engineering? It’s data modelling – cleaning, structuring, and formatting data so you can answer questions. This is the last remaining job AI won’t take over because AI can never have business context about what ‘revenue’ means. That requires human business definitions.
Analysis should be a skill anyone in data can handle. Building data science models is becoming commoditised with plug-and-play solutions. GenAI capabilities are increasingly available through tools.
I’d start with two roles: a fractional chief data officer to sit with the business, understand goals, manage stakeholders, and drive adoption, plus a data generalist who can handle modelling and implementation. Use 5X for infrastructure, and this combination serves a 100-200 person company. At 500 people, scale to 2-3 data generalists with either the fractional leader or a fulltime hire.
One caveat: I’d be cautious about hiring a first-time manager as head of data. I want someone who’s managed similar teams and can manage up, getting buyin that we’re not just building dashboards, but driving adoption and strategic decisions.

