3 min read
Author | Dr. Shou-de Lin, Chief Machine Learning Scientist, Appier
Businesses today are dealing with huge amounts of data and the volume is growing faster than ever. At the same time, the competitive landscape is changing rapidly and it’s critical for commercial organizations to make decisions fast. Business success comes from making quick, accurate decisions using the best possible information, such as with machine learning as a service.
Machine learning (ML) is a vital technology for companies seeking a competitive advantage, as it can process large volumes of data fast that can help businesses more effectively make recommendations to customers, hone manufacturing processes, or anticipate changes to a market, for example.
Machine Learning as a Service (MLaaS) is defined in a business context as companies designing and implementing ML models that will provide a continuous and consistent service to customers. This is critical in areas where customer needs and behaviors change rapidly. For example, this year people have changed how they shop, work, and socialize as a direct result of the COVID-19 pandemic and businesses have had to shift how they service their customers to meet their needs.
This means that the technology they are using to gather and process data also needs to be flexible and adaptable to new data inputs, allowing businesses to move fast and make the best decisions.
One current challenge of taking ML models to machine learning as a service has to do with how we currently build ML models and how we teach future ML talent to do it. Most research and development of ML models focuses on building individual models that use a set of training data (with pre-assigned features and labels) to deliver the best performance in predicting the labels of another set of data (normally we call it testing data). However, if we’re looking at real-world businesses trying to meet the ever-evolving needs of real-life customers, the boundary between training and testing data becomes less clear. The testing or prediction data for today can be exploited as the training data to create a better model in the future.
Consequently, the data used for training a model will no doubt be imperfect for several reasons. Besides the fact that real-world data sources can be incomplete or unstructured (such as open answer customer questionnaires), they can come from a biased collection process. For instance, the data to be used for training a recommendation model are normally collected from the feedbacks of another recommender system currently serving online. Thus the data collected are biased by the online serving model.
Additionally, sometimes the true outcome we really care about is usually the hardest to evaluate. Let’s take digital marketing for eCommerce as an example. The most straightforward customer journey would be ‘click item, view item, add item to cart, purchase item’. However, the process is rarely this simple- people might look at an item several times on different devices, and they may remove it from the cart before putting it back in or abandon the purchase altogether. Usually, the actions in the deeper funnel (i.e. purchase) are much harder to obtain than the ones on the upper funnel. If an MLaSS model relies only on the simplest metrics (i.e. clicks and view), its suggestion (e.g. when to send out marketing messages) will not align with the ultimate business goal.
Finally, for a B2B AI company that provides machine learning as a service, they normally need to serve for thousands or even more customers from different domains. It means consistently there will be at least multi-thousand models serving online. Furthermore, for those models to consistently perform to meet ongoing and constantly shifting business goals, they need to be retrained or updated every day to keep up with evolving real-world scenarios. To achieve those goals, one needs to design not only an automated training pipeline but also to guarantee that models will have close to zero probability to converge to a bad local optimal.
Ensuring the overall stability and robustness of MLaaS models is critical. It is no doubt challenging and requires significant ongoing investment, research, and experimentation, but the rewards for businesses can be huge, allowing them to quickly adapt and pivot to changing business environments and allowing them to stay ahead of the game.
I look forward to addressing this topic further at ODSC APAC on December 9, 2020, during my talk, “ Machine Learning as a Service: Challenges and Opportunities”.
* This article was originally published on Open Data Science.