
Floarea Serban
IT-Architect
Automated Machine Learning (AutoML) has gained much popularity. All cloud providers have done research in this area and offer AutoML services.
Author: Floarea Serban
The various AutoML services evaluate numerous ML algorithms with automatic model building based on previously defined data. The generated models can be deployed on the public cloud or in a container and later integrated into applications via API.
AutoML is a process that automates the repetitive and reusable tasks of data science processes. This process allows data engineers, data scientists, analysts and developers to develop models with higher scalability, speed, efficiency and productivity as well as good model quality.
According to Gartner, AutoML has attracted great interest in recent years: "Sales lead scoring, risk assessment and next-best-action recommendation". In addition, one of Gartner's strategic hypotheses is that by 2022 the number of applications using AutoML will increase from 1% to 25%.
The traditional development of ML models requires considerable resources and different employee profiles. On the one hand, data engineers must obtain and provide large amounts of data from different sources. On the other hand, it is up to the data scientists and business analysts to understand, aggregate and transform the data. At the same time, ML researchers are constantly developing and optimizing new algorithms and model structures. These are integrated by software developers into reusable libraries, which finally end up in the applications. Only then does the circle close and the optimized models can (at best) generate business value.
AutoML aims to automate the entire data science process - from data cleansing to parameter optimization. This process goes through the following steps:
Until now, most AutoML tools have focused on model selection and parameter optimization.
AutoML is nothing new. Research has been working in this area for years. It started with the development of hyperparameter optimization methods for individual models and currently extends to the development of methods such as automated stacking, neural architecture search, pipeline optimization and feature engineering. The popularity of this field has increased significantly in recent years due to the increased use of the cloud. This raises the question: What are the latest developments of cloud providers and what solutions do they offer?
There are several cloud-based AutoML platforms such as Databricks, DataRobot, IBM, RapidMiner, H2O.ai and TPOT.
The top three of the largest cloud providers are analyzed below in Gartner's Magic Quadrant for Cloud AI Developer Services (see Figure 2). They all provide their own platform.
The analysis is based on the suppliers' product catalogues as well as on experience from various sources. A detailed comparison of the remaining products can be found here.
Amazon, Google and Microsoft (in alphabetical order, no ranking) launched their AutoML services simultaneously. Since 2018, AWS SageMaker Automatic Model Tuning, Google Cloud AutoML, Microsoft Azure AutoML are available in the cloud platforms.
The Sagemaker Auto-Pilot is part of the AWS framework Sagemaker for Machine Learning. It provides a platform and tools to support the entire life cycle of an ML project. You can read more about it in our blog «Why Machine Learning in the Cloud - the example of Amazon SageMaker».
In summary, the SageMaker Autopilot works as follows:
Based on the research results of the Google Research Labs, Google Cloud AutoML offers specialized solutions for various areas such as Natural Language Processing (NLP), Computer Vision and Tables. The products are based on Transfer Learnings and Neural Architecture Search as technologies. Model development and model selection are proprietary tools from Google and their functionality is not disclosed. The NLP product can train custom models for four different tasks:
The training of models can take several hours, depending on the file size. After it has been successfully trained, various metrics of the model can be checked, for example, how accurate and how well it performed.
AutoML Vision simplifies the entire ML process for the user. All that is required is to provide the images with the appropriate labels. When the model is fully trained, an overview of the performance of the model is provided. This shows how good the model is by means of different results (Precision, Recall, Confusion Matrix, etc.). The evaluation is shown as a diagram.
Google Tables can be used for structured data. Before the training, the following feature engineering tasks are performed in AutoML Tables:
Training in AutoML is conducted simultaneously for different model architectures. This approach allows you to find the appropriate model architecture within a short time. The following model architectures are supported:
Microsoft's AutoML services are also a result of Microsoft research in recent years. Microsoft uses probabilistic methods to derive automated decisions and meta-learning to reduce the complexity in high-dimensional optimization problems and to enable the transfer of knowledge about files and problems. Microsoft recommends using AutoML for the following three problems: Classification, regression, and time series prediction.
Microsoft Azure AutoML can train a model and work towards a defined target metric. The focus is on the following steps of the ML process:
The service iterates with the Feature Selection through all ML algorithms. Each iteration results in a model with associated training score. The higher the score the better the model. During Azure ML training, many parallel pipelines are created to test different algorithms and parameters. The whole experiment is considered to be completed when the target criterion matches the actual score of the experiment.
Azure AutoML shows how many models were tested, what score they achieved and how long the training took. The best model can be deployed directly as a Web Service. AutoML is also integrated and available in other Microsoft services/products such as ML.NET, HDInsight, Power BI and SQL Server.
Besides the advantages of AutoML, there are also some deficits which must not be ignored. I list the most important ones here:
In summary, with AutoML the productive models can be developed much faster. Data Scientists and Data Analysts as well as software developers in various industries can use AutoML to: