TrueState is a technology company focused on empowering organisations to solve their most important problems with machine learning and artificial intelligence.
Our developer platform provides teams with access to cutting-edge algorithms, high-leverage APIs and an easy-to-use interface, while our implementation solutions help organisations take their first steps with AI with confidence and peace of mind.
In late 2019 our team was huddled around a whiteboard in a conference room at one of our Fortune 50 clients. The company had engaged QuantumBlack to help optimise their sales process, which spanned across multiple regions and product lines.
We built a sophisticated machine learning pipeline that could predict the likelihood that a lead was going to convert in the next 14 days and recommend targeted strategies to maximise the likelihood of conversion. The model worked well and everyone was happy with the initial results.
In that room we faced a critical challenge: the initial scaling from one product to two had taken 16 weeks. With dozens of products across EMEA, North America, and APAC regions awaiting similar solutions, we needed a fundamentally different approach to scale the solution.
Our ROI was directly related to how many new geographies and products we could serve, but our technology stack was holding us back. Rather painfully, we created a scalable solution focusing on high-degrees of software reuse and developer productivity. There were maintenance challenges and teething pains along the way but we ultimately reached the scale we needed and delivered enormous value.
In a rather black swan occurrence for the data science industry, 5 years on, our solution is still in production, delivering value across all of the products and geographies.
This scenario, which played out during my time at McKinsey's AI division, highlighted a fundamental truth about data science in enterprise settings: the ability to scale your impact is often more important than improving the performance of individual models from a pure ROI perspective.
In this post we will explore the role of speed and scalability in driving impact for data science teams as well as how you can set your data science team up to maximise their ROI.
The Economics of Data Science ROI
In accounting, one of the primary metrics for the “value” of a team or division is their return on investment (ROI). In other words, given how much money is required to keep this team running, how much value does the organisation generate. Let’s get specific on the economics ROI of data science teams.
This equation tells us that the ROI for a data science team is the sum (Σ) of all of the net benefits (ΔPi - Ci) of all the n data science initiatives, divided by the fixed costs of running that team. This is generally calculated on a yearly-basis.
Let’s break down the different parts of this equation.
ROI
ROI is the return on investment of your data science function - in other words, how much value does your data science function generate vs how much it costs to run. ROI should be the “North Star” for all data science teams and is expressed as a percentage.
Initiative Performance
This is the incremental impact a single data science initiative (i) has on the organisation. It could be the result of increased conversions, prevented churn, or cost savings from automations. These should always be calculated in dollar equivalents.
This performance improvement is made up of two factors:
How good the models are - derived from the data, methods and general know-how of the team.
How well the models are integrated into operations - derived from solution design and change management.
Most data science teams are focused on maximising model performance and neglect the other parts of the ROI equation.
Variable Costs
These are the costs of running each initiative (i). Variable costs commonly take the form of:
Processing costs (e.g., compute, data storage).
Variable licence costs (e.g., licences for third-party data).
Fixed Costs
These are the fixed costs of your data science team. Fixed costs are primarily broken down into two factors:
Staff.
Any fixed licence costs (e.g., software).
Prioritising data science speed and scalability as a driver of ROI
If we reflect on the ROI formula for data science, a theme that emerges; given decent model performance, ROI is dependent on the number of initiatives you can build and maintain.
But what does this mean? It means your impact is correlated with your scale.
Is this approach of balancing scale and individual initiative performance reflected in the way most data science teams operate? Hardly.
As an industry we’re overwhelmingly focused on model improvements in individual use-cases, to our detriment. We want our models to be great. We want our stakeholders to be amazed by their predictive power. We want to knock it out of the park.
The reality is that perfect is the enemy of good in data science. In other words, if the model is good-enough and you’re able to effectively communicate the benefits of your models, your stakeholders are probably already going to be amazed. It’s much better to make things stable and move on to another use-case.
In our example at the start of this article, we were building a sales optimisation solution where we were classifying leads in a pipeline as likely or unlikely to convert in the next 14 days and recommending actions to convert these leads. Our core model reached the mid-80% mark for precision and recall which, given there was no existing ML solution supporting this function, blew the past performance out of the water.
Management was thrilled and wanted to see these results across the whole business. Would it have been nice to get the performance to the low-90% range? Absolutely. Was it a better use of our time than scaling the impact to rest of the business? Absolutely not.
For data science teams looking to boost their ROI, there’s first a mindset shift that’s required. It’s one that appreciates the value of scalability as well as model performance.
Then there’s a review of ways of working.
Unfortunately many teams are unable to go beyond the 1 or 2 use-cases they’re in their current backlog because their setup is holding them back. They’re stuck in a very special kind of hell integrating an alphabet soup of technologies while also trying to build a great solution and communicate its effectiveness to an often apathetic group of stakeholders.
This affects the build of new use-cases and the maintenance of existing ones. It also affects morale. Fighting day-in, day-out to make complex data science solutions performant and stable is crippling for morale. It burns data teams out.
So how do we improve the productivity and scalability of our data science teams? How do we make these teams kick goals while also remaining sane?
We’ll dive into this in more detail in a future post but TL;DR here’s a few ways;
Deliberately consider your data science technology stack:
Are you building notebooks from scratch every time? Are you falling over an alphabet soup of niche tools in pursuit of a feature-complete framework? If this is you, I’d recommend checking out TrueState, our fully-integrated data science platform focused on making data scientists as productive as possible.
In any case, make sure you’re not fighting against your tech stack.Practice concise, accessible communication with stakeholders:
Meetings and communications are some of the biggest time-sinks for data scientists.
Prepare communications relevant to your target audience and ensure you keep the conversations focused on impact to their job, not the technical nuances of your solution. Hard truth - they don’t care and you’re only going to lose them trying to explain it.
Focus on reusability:
It takes precious time to write code. If you’re re-writing the same modelling code every time, you’re shooting yourself in the foot. Create reusable assets instead and create mechanisms to share those assets easily between projects.
Wrap up
In this post we’ve explored the importance of scaling your impact as a data science team. It’s not easy but if you can shift your mindset from focusing just on individual model performance to also prioritising the scale of your impact, you’re off to a great start.
If you can follow that up with selecting the right technology stack, practicing effective and efficient stakeholder communications and prioritising reusability, you’ll unlock new levels of impact for your data science team and your organisation.