The science behind managing Data Science Products

12 min readSep 30, 2020
Image Credit: Mika Baumeister


There are numerous articles written by experienced and aspirant Data scientists on Data science topics related to Machine learning, be it model theory and metrics, statistics and its applications, Forecasting with Time series, Feature engineering, Data visualization, Deep Learning, and so on. However, as a product manager, I was missing the big picture.

I want to build a data-driven product with all the data science and ML know-how. Several questions arose in my mind. How does ML fit into a product life cycle? How do I equip myself with Data science know-how to solve the business problem in my hand? How do I determine if the problem can be solved better with ML? How does the product success metrics differ? Who are the additional stakeholders I have to deal with? How do I take the solution to production? Do I need consent from stakeholders since we are dealing with data?

This is my attempt to understand the process and the tools needed for a Product Manager to solve a Machine Learning business problem.

5 steps pre model checklist

Before you get your hands dirty in solving a business problem using Machine learning models, follow this ML pre-requisite checklist.

Manual / Automation?

Can we continue with a manual solution or does this problem need automation?

For example, a Marketing company wants to run a targeted campaign to increase the rate of subscription. This would involve a team of marketing agents to follow up with all trial customers without any context or a definite outcome. This may still work if the customer base is small and the ROI does not justify an automated solution. For companies with a large customer base, this problem qualifies for an automated ML solution as we can build a model that can predict the subscription outcome based on past data.

Rules / ML ?

Now that we know that the problem has to be automated, how do we solve it? Let us say a bank wants to predict if a customer would default on a loan or not. Will a rules-based engine suffice to solve the problem, assuming that we are not dealing with unstructured data such as images, text, or email.

For starters, a rule is a simple logic-based code such as if A then B else if X then Y … In this case, a bank could come up with 100 rules and we can build a rule-based system based on a deterministic approach that can answer the question based on the inputs given.

What are the challenges with the rule-based system?

  1. Every time there is a change in the default process, the rules have to be updated.
  2. Dependency on the dev team to update the rules each time there is a change in the rules and reiterate the dev cycle.
  3. The codebase may become clumsy leading to errors in the outcome.

As the data grows, it is time to look beyond the deterministic approach and go for a probabilistic approach that relies on machine learning to learn the data and predict the outcome. Machine learning takes a probabilistic approach by using historical data and outcomes. It considers not only the input but several other factors. Machine learning understands patterns and trends in historic data and gives you the probability of different outcomes such as how much of the loan the person is likely to repay, or the probability of the person committing fraud and hanging you out to dry.

Is the problem definition clear?

  • Unlike a traditional software product development, the ML product life cycle [ Define the problem > prepare data > build a set of models > test and iterate] involves a lot more experiments, uncertainties, and variabilities. Instead of detailing all requirements on your PRD, focus on defining objective functions and key performance criteria, allowing the team to explore and experiment on simple prototypes rather than building a comprehensive end to end solution.
  • According to the Google development platform, start simple, and state your problem as a binary classification or a unidimensional regression problem and iteratively articulate the problem statement and build the model. Binary classification example: A patient has heart disease or not. A regression example: How many days to close a deal?

Is Data available?

You want to build an automated ML model-based solution but we need to ask if we have data to solve the business problem

Data is the heart of any ML solution. Without sufficient training data set, supervised models perform poorly leading to failure of the project.

With data security and privacy issues, data collection is a complicated process especially in large companies leading to frustration. It is advisable to build a Proof of concept (POC) to establish the problem statement and plausible solution before spending a lot of time in data collection, data preparation, and data transformational activities. Click here to understand more about data preparation activities.

Do we have the right features?

Data is represented as a table in the form of columns and rows. A column is an attribute or a predictor variable or a feature. An attribute is a feature if it represents the structure of the problem and influences the outcome. Not all attributes are features. The art of transforming raw data into meaningful features that represents the problem is Feature engineering. Feature engineering plays an important role in the Machine learning process. If the right features are selected, the model built is more accurate with less time spent on model tuning.

Often features are fetched from various sources with identity data from a CRM, social data from social platforms, and user web page activity from web database. If there are no meaningful features, you have to look into various data sources to represent the underlying problem. Once you are ready, multiple feature sets are integrated with labeled data(outcome) into one flat normalized tabular data and ready to be trained by a model.

Is success metrics measurable?

Metrics are needed at different levels to measure

  1. Business objectives
  2. Product Goals and outcomes
  3. Model Performance

OKR (Objectives and Key Results) framework works nicely as it ties back the product goals and outcomes with the overall business objectives. While OKRs are good to measure progress for high-level goals, we need to define Operational KPIs to quantify the success of the product and model metrics to assess the performance of the model.

If your product is heavy on AI, then the AI features directly influence operational KPIs such as the number of sessions, the number of new users, number of returning users, number of conversions, etc.

If your product is only using AI/ML to enhance a feature, then these features do not have much influence on operational KPIs.

To define model metrics, you need to know the kind of problem you are solving? Is it a classification problem or a regression problem?

For Classification problems, Precision and Recall are two extremely important model evaluation metrics. While precision refers to the percentage of your results that are relevant, recall refers to the percentage of total relevant results correctly classified by your algorithm.

Example: Classify a Lead as Hot or Cold or Warm.

You have the following possibilities:

True Positives — A lead correctly classified as a hot lead by the model

False Positives — A lead incorrectly classified as a hot lead by the model

True Negatives — A lead correctly classified as a cold lead by the model

False Negatives — A lead incorrectly classified as a cold lead by the model

Which metric to choose?

It is based on the domain and the problem you are trying to solve with a trade-off between the two. Recall is relevant for healthcare-related problems such as identifying whether a patient has cancer or not where a false negative can cost a person’s life. In the above lead example, recall is more relevant as a wrongly classified cold lead may cost to the company. If there is a large Pre Sales team to follow up, then precision could be a better metric.

For Regression problems such as predicting the deal amount to be closed in a quarter or predicting the number of hires in a company or the number of new student enrollments in a university, we need metrics to track the prediction accuracy.

Mean Absolute Error (MAE)

  • The objective is to minimize the error between the predicted and the actual value. This metric is an average of the absolute difference between the predicted and the actual values. Lower the MAE better is the prediction. If you are predicting the deal amount closed in a quarter, then the error in prediction could be measured in the deal currency.

Root Mean Squared Error (RMSE)

  • In this case, all errors are squared before they are averaged, the RMSE gives weight to larger errors. If the magnitude of the errors plays a role, RMSE is suitable for determining the average model prediction errors.

Now, let us include all the above metrics in an OKR example:

Objective: To develop an NLP based AI chatbot to improve customer satisfaction rate by 5 %

Key Result 1: Improve first response rate by 10 %

  • Lead measures include classification of user query based on a user profile, related product or service, new product query, technical issues, etc.

Key Result 2: Improve the resolution rate by 20 % to user queries.

  • Lead measures include building an NLP based knowledge base product to recommend the right solution to the user query.

Operational KPIs: Increase in the number of new product sales by 20%, Increase in support renewals for existing product users by 90%

Model metrics: Accuracy, Precision, and Recall can be used to monitor the model performance for each of the models used to achieve the key results.

AI / ML Product life cycle

You have the problem formulated, collected the data, got the buy-in from the stakeholders on the need for ML/AI solution, defined your success metrics, what next?

Label Preparation

Data labeling is a time consuming and a significant step in the machine learning product life cycle with almost 80 % of AI project time spent on gathering, organizing, and labeling data according to analyst form Cognilytica.

What is data labeling?

If you have identified your output class for a sample of data, then the data is marked up or labeled or annotated or classified. Structured and labeled data is a prerequisite to properly train and deploy models. Accurately labeled data can provide ground truth for testing and iterating your models.

“Ground truth” means checking the results of ML algorithms for accuracy against the real world.

You can get labeling done via 1. External parties or 2. Users.

Factors to be considered are:

  1. Does the labeling work need domain experts?
  2. Medical data labeling needs domain experts whereas dog vs cat image classification does not.
  3. Can the users label data as an inbuilt feature of the product similar to Facebook tagging?
  4. This needs additional feature development and monitoring but pays off in the long run.
  5. Can I outsource the data to external vendors who provide labeling as a service?

The cost could be high and may not be needed if the data volume is low and the modeling is still in the early stages. Also needs consent from customers before it is shared with external parties.

  1. Can I use my partners and customers to do this?
  2. Data consent is needed if customer data has to be shared with others, otherwise, this works well if the customer community is ready to contribute and become a key stakeholder in the solution phase. You can create a crowdsourcing platform for data labeling if data security is not an issue.

Feature Engineering

What is a Feature and what is Feature Engineering?

Data is represented as a table in the form of columns and rows. A column is an attribute or a predictor variable or a feature. An attribute is a feature if it represents the structure of the problem and influences the outcome. Not all attributes are features.

A feature is an individual measurable property OR a characteristic of a phenomenon being observed.

Identity features, Behavioral features, Social Features are individual-related features.

Attributes that provide the structure for the occurrence of an event or a phenomenon are the event-related features.

Let us say you want to predict the closure of a deal. Deal start date is an attribute whereas Deal duration derived from Deal start date is a feature that impacts the deal closure prediction problem. In the Natural Language processing problem, an email body is a text but the number of spam words is a feature.

Feature engineering in a layman language is simply transforming the data into features based on one’s domain knowledge that is relevant to solving a business problem so that the model can predict with better accuracy on unseen data.

Feature engineering plays an important role in the Machine learning process. If the right features are selected, the model built is more accurate with less time spent on model tuning.

Model learning

By defining the model performance metrics, the job is cut out for the data scientist. So, what are the other things to consider from a PM standpoint?

Since the model building is an iterative process, a Product Manager as a domain expert or a subject matter expert can help in feature selection to improve the model accuracy.

It is best to start with a simple model with an optimal feature set on a sample training set and incrementally improve the complexity of the model until you reach the target measures.

At the end of each cycle, record the time taken to train and run, and the model performance metrics. This forms the baseline and can be compared as the model complexity and data increases.

One of the key responsibilities of a product manager is to fix the bias of a model as identifying and fixing in an early stage is less costly compared to fixing it in the later stages.

What are the examples of bias? If seasonality is not considered, the model does not perform well on unseen data. For example, a surge in holiday sales has to be considered as part of the training dataset.

Your data set could also have a strong gender bias and you could end up building a model on a biased data set resulting in wrong predictions. This could be prevented by sampling a non-biased data set. A sampling of data is another option if you don’t have enough data for all the genders.

Another important decision that a PM can facilitate is to mitigate or fix the bias by opening this up to the users. For example, Facebook allows users to tag/correct the tagging recommendation. The feedback loop has to be brainstormed and built into the training and development process before it is deployed to all users.

Model Deployment

Before we get to model deployment, the model output has to go through a regular development cycle to determine the rendering of the outcomes within the product. For example, an outcome from a cancer prediction model has to be incorporated into the health app during the model development process.

Deployment is the last step in the ML process life cycle and the PM must provide scaling inputs to the ML engineering team.

  1. How many users access this ML feature? Remember this varies based on what you are building. The needs change based on whether it is a feature within a product or a completely AI-driven product.
  2. Does the prediction be real-time or available at a predetermined cadence such as hourly, daily, weekly, and so on. A deal closure prediction in a sales CRM need not be real-time whereas recommended stocks in a trading portal have to be real-time.
  3. What are the latency requirements? Does the prediction result come up in less than a second?

Model management

Unlike other features, ML features need continuous monitoring, training, and feedback to keep the models beat the performance benchmarks as the data set changes.

As mentioned earlier, a feedback loop from real users helps correct the predictions and come up with a better recommendation model. Start with a descriptive and progress into predictive and prescriptive analytics as the model matures.


As a last note, the Product Manager has to interact with new stakeholders such as Data Scientists, Machine Learning Engineers, and BI experts in addition to the regular stakeholders while managing a ML product life cycle.

Data are just summaries of thousands of stories — tell a few of those stories to help make the data meaningful.” — Chip & Dan Heath