AI Across Industries Archives | DataRobot AI Platform

How to Improve Anti-Money Laundering Programs with AutoML

May Masoud — Thu, 29 Jun 2023 18:58:16 +0000

How big a problem is anti-money laundering (AML)? Worldwide, it costs businesses $2 trillion every year and is directly tied to an array of criminal activities. For financial organizations, AML can present a relentless hurdle. Among millions of transactions, AML teams must look for that small but mighty percentage of transactions that are problematic. And that takes plenty of time and resources.

The good news is that AI is a perfect antidote to money laundering. Even better news is that we’re not starting from scratch. Most financial institutions have an anti-money laundering (AML) process in place that AI can plug right into to enhance efficiencies.

Traditionally, transactions are run through a rules-based system, which will determine if a transaction is suspicious. If a transaction is deemed potentially suspicious, a suspicious activity report (SAR) is filed and it goes through a manual review process. This is an inefficient way to do things and creates a big pile of alerts that are generally unranked—a process that creates many false positives.

By inserting AI into the existing process, we can rank suspicious activity, determine which ones are actually worth investigating as a priority, and make the whole process more efficient, allowing the experts to focus their attention on the highest risk alerts first.

What Does the Model Building Process Look Like?

Speed. Quality. Transparency. These are the three criteria that are essential to any successful anti-money laundering program. Finding suspicious activity is like trying to hit a moving target. Data science teams need to move fast, and they need to find high priority suspicious activity without chasing after false positives. And because financial services is such a highly regulated industry, the explanations need to be fully transparent—ones that can be easily explained to regulators and stakeholders.

Customer Success Story

Valley Bank Reduces Anti-Money Laundering False Positive Alerts by 22%

Learn more

Enter DataRobot to speed up the process exponentially, reduce false positives, and automatically create compliance reports, saving data scientists hours of manual work. In our webinar, How to Improve Anti-Money Laundering Programs with Automated Machine Learning, I take a deep dive into how financial organizations can use DataRobot to win against money launderers.

Building Inside the DataRobot AI Platform

Start by selecting a data source. Once you go into the AI Catalog, you can see all the tables you’re already connected to. Here we are using Google BigQuery.

First, though, let’s look at the data. In this sample dataset, we see the historical data we used to train our models. We can see that alerts were generated some time ago, each of which may or may not have had a suspicious activity report (SAR) filed. There’s also a lot of other contextual data here–customer risk score, the date, total spend, and even the call center notes (text data).

Next we create the modeling project.

Remember that my goals are threefold:

Accelerate the process of identifying problematic transactions. (Speed)
Be more accurate in identifying suspicious activity. (Quality)
Explain and document each step. (Transparency)

Once you bring in the data, DataRobot will ask you what you want to predict. We’re selecting SAR, and DataRobot will first show you a quick distribution of SAR in your data. It’s telling you that this is what your target looks like.

Secondary datasets. In addition to the primary dataset, DataRobot can easily automatically connect to new datasets that could enrich the training data. DataRobot automatically joins all input datasets and generates new features that can improve model accuracy.

DataRobot will also automatically identify any data quality issue–inliers, outliers, too many zeros, any potential problems—so that you stay on track with quality as you speed through the modeling process.

Once you click the Start button, DataRobot initializes the rapid experimentation process—experimenting with feature engineering and data enrichment stats. It’s going to start training hundreds of models, searching for the best model, the champion model that will give the best chance of success. At this stage, you are presented with new insights, including how important an input feature is to our target, ranked in order of importance.

You’ll also see new features that were not there in the original primary dataset. This means that DataRobot did find value in the secondary dataset and automatically generated new features across all our input data.

To be fully transparent in this tightly regulated industry, you can click in and look at feature lineage. It will take you all the way back to where each feature was pulled from and what transformations were done. For any new feature, you can look at the lineage and explain how this feature was generated.

Speed

We’ve gotten the champion model quickly, but we need to check the quality and the transparency of the model. By drilling down into it, we can see what algorithms and techniques were used. It also shows all the steps that were taken along the way. You can further fine-tune the parameters you want and compare it with the original model.

Evaluate the quality

How good or bad is this model at actually predicting an outcome? You can click on Evaluate to look at the ROC curve or the lift chart. This is the point where you decide what the threshold is for suspicious activity. Don’t just think of it from the data science point of view. Remember what the model is going to be used for within the context of the business, so keep in mind the cost and benefit of each outcome to the business. As you interactively test for different thresholds, the numbers for the confusion matrix change in real time, and you can ask the business about the cost they assign to a false positive to help determine the optimal threshold.

Transparency

As noted, in a highly regulated industry, transparency is of paramount importance. Click the Understand button. Feature Impact can tell you which features have the greatest impact on model’s accuracy and what is really driving behavior. Maybe you use this information to understand customer behavior and improve your KYC score (Know Your Customer score). Maybe you use it for process improvement, such as asking customers the right questions when they’re opening an account.

You can also explore how a model’s input can change the output. Go to Feature Effects where you can check how a model’s output changes when one particular parameter is changed. This enables you to look at a model’s blind spot.

Explainability. So far, you can see the effects of one feature, but in real life, your model is going to be driven by multiple features at the same time. If you want to understand why one prediction was made, you can see all the variables that affected the prediction as a combination. How much did each of these variables contribute to the outcome?

Because this is a use case for a regulated industry, you need to document all of this for your compliance team. Under the Compliance tab, with the click of a button, it will automatically generate a 60-page compliance report that captures all of the assumptions, the feature engineering steps, the secondary tables, and everything that was done to get to the final model.

It’s a simple Word document that saves you hours and hours of compliance work if you are a data scientist in a regulated industry.

Predict tab. There are a lot of options to deploy the model. With one click, I can deploy it to a predictions server and then it will be added to the MLOps dashboard, which you can see under the Deployments tab.

No matter how good your model was when you trained it, it’s going to degrade over time. Data and external factors are going to change. Businesses change. You will want to monitor your model over time. At the top, I can see how all my deployed models are doing in terms of data drift, accuracy and even service health. Have risk factors changed? How are my models holding up in the long run?

I can also see where these models were deployed. Models can be built and hosted elsewhere, but they can still be managed and tracked in this dashboard. DataRobot is a central location to govern and manage any and all models, not just models created in DataRobot.

DataRobot Brings You Speed, Quality, and Transparency Automatically

To stay ahead of money laundering, financial institutions need the features that DataRobot brings to the table:

Automated Feature Engineering takes care of tedious, manual processes.
Rapid Experimentation allows you to fine tune models and make additional enhancements.
The user-friendly interface allows you to solve problems quickly and find blind spots.
Data Quality Assessment helps you understand how healthy your data is, a key metric in highly regulated industries.
The Interactive Model Threshold allows you to set the right thresholds for your business. It checks for false positives and negatives and shows what the effect on the business is, thereby ensuring the quality of the model.
Automated monitoring and retraining allows you to maintain the quality of your model.
Feature lineage, explainability, and automated compliance documentation is mandatory for transparency in financial services industries, and DataRobot does that automatically.

The post How to Improve Anti-Money Laundering Programs with AutoML appeared first on DataRobot AI Platform.

A Data Scientist Explains: When Does Machine Learning Work Well in Financial Markets?

Peter Simon — Tue, 17 Jan 2023 14:00:00 +0000

As a data scientist, one of the best things about working with DataRobot customers is the sheer variety of highly interesting questions that come up. Recently, a prospective customer asked me how I reconcile the fact that DataRobot has multiple very successful investment banks using DataRobot to enhance the P&L of their trading businesses with my comments that machine learning models aren’t always great at predicting financial asset prices. Peek into our conversation to learn when machine learning does—and doesn’t—work well in financial markets use cases.

Why is machine learning able to perform well in high frequency trading applications, but is so bad at predicting asset prices longer-term?

While there have been some successes in the industry using machine learning for price prediction, they have been few and far between. As a rule of thumb, the shorter the prediction time horizon, the better the odds of success.

Generally speaking, market making use cases that DataRobot (and other machine learning approaches) excel at share one or more of the following characteristics:

For forward price prediction: a very short prediction horizon (typically within the next one to 10 seconds), the availability of good order book data, and an acknowledgment that even a model that is 55%–60% accurate is useful—it’s ultimately a percentage game.
For price discovery (e.g., establishing an appropriate price illiquid securities, predicting where liquidity will be located, and determining appropriate hedge ratios) as well as more generally: the existence of good historical trade data on the assets to be priced (e.g., TRACE, Asian bond market reporting, ECNs’ trade history) as well as a clear set of more liquid assets which can be used as predictors (e.g., more liquid credits, bond futures, swaps markets, etc.).
For counterparty behavior prediction: some form of structured data which contains not only won trades but also unsuccessful requests/responses.
Across applications: an information edge, for instance from commanding a large share of the flow in that asset class, or from having customer behavior data that can be used.

Areas where any form of machine learning will struggle are typically characterized by one or more of these aspects:

Rapidly changing regimes, behaviors and drivers: a key reason why longer-term predictions are so hard. We very often find that the key model drivers change very regularly in most financial markets, with a variable that’s a useful indicator for one week or month having little information content in the next. Even in successful applications, models are re-trained and re-deployed very regularly (typically at least weekly).
Infrequent data: a classic example here is monthly or less frequent data. In such cases, the behavior being modeled typically changes so often that by the time that enough training data for machine learning has accrued (24 months or above), the market is in a different regime. For what it’s worth, a few of our customers have indeed had some success at, for instance, stock selection using predictions on a one-month horizon, but they’re (understandably) not telling us how they’re doing it.
Sparse data: where there’s insufficient data available to get a good picture of the market in aggregate, such as certain OTC markets where there aren’t any good ECNs.
An absence of predictors: in general, data on past behavior of the variable being predicted (e.g., prices) isn’t enough. You also need data describing the drivers of that variable (e.g., order books, flows, expectations, positioning). Past performance is not indicative of future results… .
Limited history of similar regimes: because machine learning models are all about recognising patterns in historical data, new markets or assets can be very difficult for ML models. This is known in academia as the “cold start problem.” There are various strategies to deal with it, but none of them are perfect.
Not actually being a machine learning problem: Value-at-Risk modeling is the classic example here—VaR isn’t a prediction of anything, it’s a statistical summation of simulation results. That said, predicting the outcome of a simulation is an ML problem, and there are some good ML applications in pricing complex, path-dependent derivatives.

Finally, and aside from the above, a critical success factor in any machine learning use case which shouldn’t be underestimated is the involvement of capable and motivated people (typically quants and sometimes data scientists) who understand the data (and how to manipulate it), business processes, and value levers. Success is usually driven by such people carrying out many iterative experiments on the problem at hand, which is ultimately where our platform comes in. As discussed, we massively accelerate that process of experimentation. There’s a lot that can be automated in machine learning, but domain knowledge can’t be.

To summarize: it’s fair to say that the probability of success in trading use cases is positively correlated with the frequency of the trading (or at least negatively with the holding period/horizon) with a few exceptions to prove the rule. It’s also worth bearing in mind that machine learning is often better at second-order use cases such as predicting the drivers of markets, for instance, event risk and, to some extent, volumes, rather than first-order price predictions— subject to the above caveats.

The post A Data Scientist Explains: When Does Machine Learning Work Well in Financial Markets? appeared first on DataRobot AI Platform.

Showcasing the Power of AI in Investment Management: a Real Estate Case Study

Jaume Masip Tresserra — Tue, 20 Dec 2022 14:00:00 +0000

The use of artificial intelligence (AI) in the investment sector is proving to be a significant disruptor, catalyzing the connection between the different players and delivering a more vivid picture of the future risk and opportunities across all different market segments. Real estate investments are not an exception. In this article, we’ll showcase the ability of AI to improve the quality of the potential investment’s future performance, with a specific example from the real estate segment.

The lack of transparency, efficiency, and sustainability in real estate today is more a rule than an exception. One might imagine that the increase in available data would lead to greater transparency and more efficient markets, but the opposite seems to be the case as increased access to massive amounts of data has made assessing real estate assets much more complex.

In this context, an augmented intelligence approach around the data will be increasingly more critical for asset managers, investors, and real estate developers to ensure a better understanding of the real estate assets and take better decisions aimed at optimizing both the Net Asset Value and the Net Operating Income. Yet, in the digital transformation era, the pricing and assessment of real estate assets is more difficult than described by brokers’ presentations, valuation reports, and traditional analytical approaches like hedonic models.

Previously, we demonstrated how DataRobot AI Platform allows investors, asset managers, and real estate developers to successfully overcome most of the existing challenges regarding the real estate investment business.

In this article, we’ll first take a closer look at the concept of Real Estate Data Intelligence and the potential of AI to become a game changer in this niche. We’ll then empirically test this assumption based on an example of real estate asset assessment. For this purpose, we will showcase an end-to-end, data-driven approach to price predictions of real estate assets through the DataRobot AI Platform.

Real Estate Data Intelligence

Today, the most critical ‘raw material’ driving the real estate market is data. Many real estate players have long made decisions based on traditional data to answer the question of the quality of asset’s assessment and an investment’s location within a city. This usually involved gathering market and property information, socio-economic data about a city on a zip code level and information regarding access to amenities (e.g., parks and restaurants), and transportation networks. The traditional assessment approach also considered factors such as market intuition and experience.

Although the amount of data has been growing exponentially—hosting new variables that may make it possible to have a better picture of location’s future risks and opportunities—the intelligence needed to process all this data and use it to benefit real estate decisions is still relatively nascent.

Let’s assume that investors, asset managers, and real estate developers want to evaluate an asset’s performance. While the impact of proximity might be intuitive, home prices and rents are not just driven by having nearby amenities like top-tier restaurants and educational facilities. Instead, they are driven by the access to the appropriate quantity, mix and quality of neighborhood features. More is not always better. Nonlinear relationships between prices and amenities seem to be the rule rather than the exception across cities worldwide.

Also, the intersection of proximity and density to amenities varies among neighborhoods and cities. This sweet spot has been obscured by a growing mass of new available multimodal data (geospatial, time series, text, and image data) that is increasingly difficult to tame, such as building energy consumption spatially related to other assets in the same zip code, number of permits in the last 3 months issued to build swimming pools, Google reviews for nearby businesses, and asset’s exterior images captured by Google.

What would happen if an automated intelligence machine approach could process and understand all this increasingly massive multimodal data through the lens of a real estate player and use it to obtain quick actionable insights?

For example, just to name the business of asset managers is generally dependent on these (but not limited to) four fundamentals:

Accurately estimating the current asset’s price and rent
Estimating the growth potential of a city and neighborhood
Automating and optimizing their investment strategy
Selling asset portfolios at a price that maximizes returns while minimizing time to market

However, they are also simultaneously dealing with several challenges that may block them to obtain valuable and actionable business insights. As discussed in the previous article, these challenges may include:

Automating the data preprocessing workflow of complex and fragmented data
Monitoring models in production and continuously learning in an automated way, so being prepared for real estate market shifts or unexpected events.

Yet, when assessing property’s value and the quality of investment’s location other key specific challenges arise, including:

Handling multimodal data such as images, geospatial and text
Building analytical approaches to assess asset’s price and rent that comply with regulations
Treating customers fairly and avoiding bias in the analytical approach to estimate property’s value.

From this viewpoint, one may sustain that if an automated intelligence machine approach can successfully handle all these challenges while matching the real estate players’ business expectations, this would become a real game changer for the industry as it will bring intensive light on the discussion about real estate data intelligence: efficiency, transparency, location knowledge, and actionable insights.

Predicting the Real Estate Asset’s Price Using DataRobot

Processing Multimodal Datasets

Datarobot enables users to easily combine multiple datasets into a single training dataset for AI modeling. DataRobot also processes nearly every type of data, such as satellite and street imagery of real estate properties using DataRobot Visual AI, the latitude and longitude of properties and nearby city’s points of interest using DataRobot Location AI, tweets, and reviews with geotagged locations using DataRobot Text AI. Recent historical trends in neighborhoods can also be seen with DataRobot Feature Discovery and a variety of other details such as solar orientation, construction year, and energy performance.

DataRobot combines these datasets and data types into one training dataset used to build machine learning models. In this educated example, the aim is to predict home prices at the property level in the city of Madrid and the training dataset contains 5 different data types (numerical, categorical, text, location, and images) and +90 variables that are related to these 5 different groups:

Market performance
Property performance
Property features
Neighborhood attributes
City’s pulse (quality and density of the points of interest)

The great thing about DataRobot Explainable AI is that it spans the entire platform. You can understand the data and model’s behavior at any time. Once you use a training dataset, and after the Exploratory Data Analysis, DataRobot flags any data quality issues and, if significant issues are spotlighted, will automatically handle them in the modeling stage.

Rapid Modeling with DataRobot AutoML

DataRobot AutoML rapidly builds and benchmarks hundreds of modeling approaches using customized model blueprints. Using built-in automation workflows, either through the no-code Graphical User Interface (GUI) or the code-centric DataRobot for data scientists, both data scientists and non-data scientists—such as asset managers and investment analysts—can build, evaluate, understand, explain, and deploy their own models.

Enabling image augmentation generated the best results for predicting house prices across the city of Madrid. DataRobot automatically determines the best configuration for the dataset. However, we can customize it further. As the figure below shows, you can customize the image augmentation flips, rotating, and scaling images to increase the number of observations for each object in the training dataset aimed to create high performing computer vision models.

DataRobot starts modeling after we enable some additional settings, like including advanced ensembling and blueprints, as well as search for interactions to leverage relationships across multiple variables, potentially yielding a better model accuracy and feature constraints to integrate the real estate market expertise and knowledge.

In less than an hour, DataRobot produced a house-price multimodal model that correctly predicted house prices across space and performed especially well at predicting which 10% of properties had the highest home prices. By using this model, all accuracy metrics would also comply with national valuation regulations—as defined by the Bank of Spain. For example, the model produced a RMSLE (Root Mean Squared Logarithmic Error) Cross Validation of 0.0825 and a MAPE (Mean Absolute Percentage Error) Cross Validation of 6.215. This would entail a roughly +/-€24,520 price difference on average, compared to the true price, using MAE (Mean Absolute Error) Cross Validation.

Understand & Explain Models with DataRobot Trusted AI

DataRobot AI Platform tries to bridge the gap between model development and business decisions while maximizing transparency at every step of the ML & AI lifecycle. As discussed earlier, this is highly critical for all real estate players, including asset managers, as they need to build analytics approaches to assess asset sale and rent prices without any black-box patterns in the decision-making, delivering transparency in how predictions are generated.

So, let’s look under the hood at some of DataRobot Explainable AI functionality that can be more relevant for real estate players, allowing them to understand the behavior of models, inspire confidence in their results, and easily translate these modeling results into actionable business insights and great outcomes.

Accuracy over Space

Location AI and in particular, with the Accuracy Over Space explainability tool, we can better understand how the house-price multimodal model, developed in DataRobot, is behaving at the local level. Model accuracy can vary greatly across geographic locations—but, thanks to this explainability tool—asset managers and investment analysts can quickly and exactly identify where, in terms of location, the model is accurate and where it is not.

In the figure below, we see a good spatial fit of our machine learning model with most locations where the average residual is low and very few locations where the model is either over-predicting (see light blue bars) or under-predicting (see light red bars), e.g., properties located near Pozuelo de Alarcón.

Global Explainability

One of the first things that real estate players usually want to understand better is the behavior of the model as a whole across all data. This is where the interpretability capabilities of DataRobot, like Feature Impact, Feature Effects, and Activation Maps—among others—come into play.

Feature Impact shows the most important features of the model’s predictions. DataRobot can use either Permutation Based Importance or SHAP Importance to compute importance. It is worth mentioning here that when spatial structure is present in the training dataset, DataRobot Location AI expands the traditional automated feature engineering to fully accommodate new geospatial variables for modeling that improves model performance.

In the next figure we see that among the top-25 most important features in the most accurate house-price multimodal model, the city’s amenities and location-based variables are the most representative. For example, there is a significant impact from the average price (GEO_KNN_K10_LAG1_buy_price) and the kernel density average price (GEO_KNL_K10_LAG1_buy_price) of the first ten nearest neighbors, as well as amenities variables like proximity to both educational and health facilities.

Once we know which features are most influential to the model’s decision making, real estate players can also be interested in addressing the question of how exactly do the features affect the model. This is exactly what you can address by using Feature Effects, which allows DataRobot users to see how different values of a variable affects the model’s predictions. The calculation is based on Partial Dependence.

Looking at the Feature Effects of our top model, we can see, for example, that greater energy performance and being located closer to Santiago Bernabéu Stadium (Real Madrid CF Stadium) lead to higher average predicted house prices. These two insights match a quick gut-check: e.g., Santiago Bernabéu Stadium exerts a home price distance-decay effect over its neighboring areas because it acts, coupled with Azca, as a major sub-center of economic, retail, and leisure activity in Madrid.

Because our training dataset is multimodal and contains imagery data of residential properties in Madrid, DataRobot used machine learning models that contain deep learning based image featurizers. Activation Maps allows DataRobot users to see which part of various images the machine learning model is using for making predictions. This can help real estate professionals determine if the machine learning model is learning the right information for the use case, does not contain undesired bias, and is not overfitting on spurious details.

Looking at the Activation Maps of our top model, we can observe that the model is generally focused on the exterior image of properties. Of course, DataRobot users can easily customize the image featurizer if necessary.

Local Explainability

After describing the overall model’s behavior, real estate players and, in particular, asset managers and real estate appraisers, would probably want to know why a model made an individual prediction. This is extremely valuable when you need to justify the decision an analytical model has made. Also, when you need to optimize the real estate product to develop in a specific location or the investment’s location decision within a city.

Let’s assume that, as a real estate developer, you would like to optimize the property’s price given a location in a city while minimizing time on market. Local Explainability will help them to identify the main property’s value contributors at the training time and subsequently running both what-if scenarios and mathematical optimization at the scoring time by changing actionable features, e.g., home size, number of rooms and bathrooms, and swimming pool construction.

Local Explainability in DataRobot AI Platform is available through Prediction Explanations. This will tell real estate professionals which features and values contributed to an individual prediction—and their impact and how much they contributed. DataRobot can use either its own XEMP explanations or SHAP explanations. Both types of prediction explanations can be produced at training or scoring time.

Let’s have a closer look at both prediction explanations types. In the first figure below, using our most accurate house-price multimodal model, we are looking at the XEMP prediction explanation for row 7,621, which had a prediction of roughly €1,891,000 for home sales price. The specific spatial location of this property, including all related geospatial variables (e.g., the average number of educational facilities within 500 meters of the second ten nearest neighbors), and having 244 square meters, three bathrooms, and five rooms were the strongest contributors to this prediction. If we were to use SHAP explanations (see second figure below) that would produce actual numbers for each feature value, which add up to the total predicted property’s sale price.

Compliant-Ready AI

With regulations across various industries—and the real estate sector not being an exception—the pressure on real estate professional teams to deliver compliant-ready AI is greater than ever. This may be the case, for example, when asset managers or real estate servicers would like to assess the value of Non-Performing Loans (NPL) portfolios or appraisers when carrying out property’s valuations that comply with national regulations.

DataRobot Automated Compliance Documentation allows to create automated customizable reports based on each step of the machine learning model lifecycle with just a few clicks, thereby exponentially decreasing the time-to-deployment while ensuring transparency and effective model risk management.

Consume Results with DataRobot AI Applications

By bringing the recommended house-price multimodal model to DataRobot No Code AI Apps, real estate investors, asset managers, and developers can easily get intelligent AI Applications that automate the decision-making process of their business.

Within the AI App, real estate players can predict a real estate portfolio with thousands of assets and dig deeper into the reasons driving each prediction with a few clicks. They could also assess new locations for either investment or real estate development as well as building their own reporting dashboards. As their core business is based on the quality of asset’s assessment and an investment’s location, these AI Application’s examples would be especially valuable for asset managers, real estate services, valuation advisory firms, and real estate developers.

Interestingly, real estate players can also create their own scenarios based on their intuition and knowledge of the market to benchmark model outputs or build optimization models that either maximize or minimize their business outcomes. This also would help them to automate their investment and development strategy.

For example, asset managers will be able to sell asset portfolios at a price that maximizes returns while minimizing time to market. Likewise, real estate developers will be able to add new property price scenarios in different city locations by changing those actionable variables of their interest (e.g., home size, number of rooms) or building optimization models to maximize specific outcomes given certain business and market constraints (e.g., finding the best real estate product configuration to go to market with, given certain market price conditions). DataRobot will rapidly generate new insights aimed at helping real estate players to have full flexibility in testing different potential situations, scenarios, and optimal business outcomes as we can see below.

Last but not least, advanced analytics teams could also take advantage of the code-centric DataRobot functionality to build their own code-based applications. An example of code-based application is shown below. With the use of DataRobot API, advanced analytics teams in the real estate sector will be able to easily build AI applications in days that could do the following :

Accurately predict the property’s price for a single asset or portfolio and a new location, while digging deeper into the reasons driving each prediction
Estimate the future real estate market changes (e.g., prices and rents over the next year) and the growth potential of neighborhoods, districts, and cities
Search and benchmark potential investment’s locations against real estate comparables
Either maximize or minimize business outcomes through optimization models
Automate their business strategy and decision-making process

Conclusions

We have just shown how AI can foster and scale Augmented Intelligence in investment and real estate by showing howDataRobot quickly produced a scalable and transparent end-to-end analytics approach to price predictions of real estate assets, while ensuring transparency and effective model risk management at every step of the ML & AI lifecycle.

DataRobot AI Platform is able to analyze a wide variety of patterns and make predictions based on the data that’s being analyzed. This is critical, as the real estate sector also has major business challenges that may require the use of other ML & AI approaches, like unsupervised learning (multimodal clustering and time series anomaly detection) to successfully address them. AI can also be applied to numerous other valuable use cases in the real estate sector and beyond the living real estate segment. Examples include both the office and retail market segments, as well as use cases related to investors, property managers, and commercial tenants. For instance, use cases related to optimizing the leasing portfolio management, like predicting which tenants will renew and which ones will leave the property when their lease expires—thereby helping to maintain a higher occupancy rate and foster a greater Net Operating Income (NOI).

The post Showcasing the Power of AI in Investment Management: a Real Estate Case Study appeared first on DataRobot AI Platform.

Mindshare Integrates Predictive Analytics to Deliver Performance Marketing at Scale

Ikechi Okoronkwo — Tue, 08 Nov 2022 17:09:41 +0000

How can you build a performance-driven organization where driving outcomes is ingrained in your culture and the ownership of the process is shared across agency and client stakeholders? Learn more from guest blogger Ikechi Okoronkwo, Executive Director, Business Intelligence & Advanced Analytics at Mindshare.

Organizations like Mindshare, a global media agency network, drive client value by using data to understand consumers and how media influences them on a deeper level than any organization on the planet. In a world that is increasingly outcome-focused and platform-based, we have integrated strategy and predictive analytics to move at the speed of our clients’ decisions and established a scalable framework for uncovering and acting on insights in an organized, simple, and transparent operating model.

As a global media agency network that delivers value in different ways (media investment management, planning and buying, content, creative, strategy, analytics, etc.), most of what Mindshare does is designed expressly to inform and enhance our client’s decision-making. To do that, we have expanded beyond intermediary signals, like reach and other delivery metrics, and applied more focus on measurement that is linked to business outcomes—because it allows us to have better and more nuanced conversations.

Ebook

AI in Customer Analytics: Tapping Your Data for Success

Download Now

Combining the Right Leading Indicators is Critical for Accurate Decision-Making

In an increasingly digital world, speed is a critical competitive advantage in making decisions. However, speed is often at odds with confidence as it relates to decision-making. To solve this, we use data science tools to identify the right leading indicators across the different levers that we can pull to support faster decisions—using methods that establish causation to the larger business objectives of their clients. This is core to the Mindshare approach to full funnel measurement and planning, which is all about balancing investments across the brand to demand spectrum. At Mindshare, performance is not a siloed comms objective limited to certain channels or tactics. Performance is the HOW to explain the WHY and WHAT Mindshare does for clients across the globe.

An example of this in action is one of our global clients, where we manage sales risk across their product portfolio. Our predictions are so accurate that we work directly with the client’s finance organization to set targets and track how they deliver against those targets throughout the year. There are multiple levers outside of media that can be pulled, and our analysis takes the entire ecosystem into consideration. In terms of impact, we have helped this client with driving incremental revenue and media driven ROI, despite considerable headwinds in their market environment.

Establishing a Reliable Data Layer and Breaking Down Silos is the First Step

To understand and influence performance, you need to connect disparate activities by building a mathematical simulation of your marketing environment and use that to run different scenarios to make data-driven decisions. Everyone knows the popular adage, “What gets measured gets done.” Performance marketing does not happen without breaking down silos across the organization and using data to stitch them back together to form a holistic view.

Establishing a reliable data layer is the first step in becoming performance-oriented, something that Mindshare has invested considerable resources to get right. We’ve built scalable architecture to centralize and automate the ingestion of data across Paid, Earned, Shared and Owned data sources. We can ingest custom data sources, including event-level data with tools for faster speed to insights with bespoke visualization and dashboarding capabilities. In summary, the first step to being performance-driven is data automation to ensure adherence to standards across our people, our processes, and our products.

We make sure that all this data is being used to power scenario planning that allows us to make quick decisions that can be validated through predictions. Some examples include:

Improving efficiency by reducing, re-allocating, or removing media spend that does not drive business performance
Driving effectiveness by recommending how to improve incremental return in existing channels and partners through optimization
Prioritizing expansion and incremental return by shifting spend into under-utilized channels or partners to engage with valuable audiences

To do this, we built out a global, unified analytics platform—Synapse—which is our proprietary platform for delivering attribution, budget optimization, scenario planning, forecasting, and performance simulations across multiple outcomes, all in one ecosystem. We’ve made strategic investments in R&D and technology partnerships to acquire industry-leading machine learning and AI capabilities that unlock the full potential for modeling and data-driven decision-making based on analysis outputs. A big driver for the success of the Mindshare Synapse platform is the use of human-friendly visuals through a customizable UI that is accessible to the Mindshare organization and our clients. Synapse can be used by analysts, strategists, and clients because of its ability to democratize complex analysis in a simple way across multiple stakeholders at scale.

Machine Learning and AI Fuel Media Governance, Performance Success, and Analytics

As mentioned above, understanding performance should be ingrained in all parts of the marketing value chain. At Mindshare, machine learning, AI, predictive analytics, and scenario planning is used to fuel data-driven decisioning in three key areas.

The first is media governance, which is about ensuring that we can communicate to clients, with empirical evidence, that we’re managing their investments and how they show up in the marketplace in a responsible, transparent, and measurable way.

The second is connecting media to outcomes in a way that matches how our clients define performance success. Mindshare uses this to evolve client thinking from only relying on backward-looking analysis to leveraging forward-looking analysis around informing what decisions we need to make and predicting the outcome of those decisions. We don’t focus on reach and vanity metrics—we have conversations with clients and gain a better understanding of what is relevant for extracting insights and performing optimizations.

The third area is macro + media analytics, which expands to position Mindshare as an extension of our clients as a business partner. We have business planning discussions with clients to talk about how media influences outcomes alongside other factors that are controllable or non-controllable. As we talk to clients about performance across the brand to demand spectrum, we’re able to consider other things outside media as well (promotions, variable marketing expense, supply chain management, CRM, competition, economic impacts, weather, etc.).

The Mindshare Synapse Platform Drives Value by Answering Client Questions with Data Science and Delivering More Insights

Our ability to identify a question or challenge and translate that into a solution has incredibly huge upside for our clients and our business by:

Expanding how data science is used to answer more client questions and empower the Mindshare workforce with better insights that are elevated at the speed of the decisions within our clients business environments. This involves simple, quick analysis to large scale macroeconomic + media use cases.
Delivering more insights, more efficiently, and accelerating time to value.
Supporting a product toolkit that can be customized to unique client needs. We productize certain things for efficiency, but our approach is less about selling products or tools and more about driving value to answer our clients’ learning agenda questions.
Establishing a simplified measurement and analytics narrative that solves for multiple complex needs in an accessible ecosystem that is easy to use and explain.
Delivering performance insights—using best-in-class technology that is scalable— future proofs Mindshare as a trusted advisor and resource for clients, as it relates to predictive analytics and data science. Our mission is to achieve good growth, so it is important for us to leverage tools that use cloud infrastructure and modeling technology that is privacy compliant and takes bias mitigation and AI governance very seriously.

The DataRobot AI Platform Empowers Mindshare to Utilize the Most Statistically Robust Approach for Any Data Type

The DataRobot AI Platform is a major asset in our toolkit. Using DataRobot software has significantly improved our technology architecture, and provided many benefits, including but not limited to:

An intuitive and easy-to-use toolkit that lowers the barrier to entry for advanced analytics use cases, which expands the types of solutions we can build
An extensive pool of machine learning algorithms allows us to utilize the most powerful and statistically robust approach for any data type or business purposes
An enhanced ability to uncover insights and optimization opportunities quicker in an automated fashion
A foundational tool to help with creating a more effective workforce because they can perform existing and new tasks using a powerful platform that allows them to focus on more strategic areas

At Mindshare, we’ve embraced the idea that a truly performance-driven organization must intentionally adopt technology that enhances the symbiosis between human and machine decision-making. To win in the platform age, we will continue to use tools that complement our workforce to drive creativity and client value through data science as a service. This helps the agency to do better work, with more impactful outputs that drive good growth in a truly data-driven way.

The post Mindshare Integrates Predictive Analytics to Deliver Performance Marketing at Scale appeared first on DataRobot AI Platform.

Scoring More Goals in Football with AI: Predicting the Likelihood of a Goal Based on On-the-Field Events

Atalia Horenshtien — Tue, 01 Nov 2022 14:01:20 +0000

Can artificial intelligence predict outcomes of a football (soccer) game? In a special project created to celebrate the world’s biggest football tournament, the DataRobot team set out to determine the likelihood of a team scoring a goal based on various on-the-field events.

My Dad is a big football (soccer) fan. When I was growing up, he would take his three daughters to the home games of Maccabi Haifa, the leading football team in the Israeli league. His enthusiasm rubbed off on me, and I continue to be a big football fan to this day (I even learned how to whistle!). I recently went to a Tottenham vs. Leicester City game in London as part of the Premier League, and I’m very much looking forward to the 2022 World Cup.

Football is the most popular sport in the world by a vast margin, with the possible exception of American football in the U.S. Played in teams of 11 players on the field, every team has one objective—to score as many goals as possible and win the game. However, beyond a player’s skill and teamwork, every detail of the game, such as the shot place, body part used, location side, and more, can make or break the outcome of the game.

I love the combination of data science and sports and have been lucky to work on multiple data science projects for DataRobot, including March Mania, McLaren F1 Racing, and advised actual customers in the sports industry. This time, I am excited to apply data science to the football field.

In my project, I try to predict the likelihood of a goal in every event among 10,000 past games (and 900,000 in-game events) and to get insights into what drives goals. I used the DataRobot AI platform to develop and deploy a machine learning project to make the predictions.

Football Goal Predictions with DataRobot AI Platform

Using the DataRobot platform, I asked several critical questions.

Which features matter most? On the macro level, which features drive model decisions?

Feature Impact – By recognizing which factors are most important to model outcomes, we can understand what drives a higher probability of a team scoring a goal based on various on-the-field events of a team scoring a goal.

Here is the relative impact:

THE WHAT AND HOW: On a micro level, what is the feature’s effect, and how is this model using this feature?

Feature effects – The effect of changes in the value of each feature on the model’s predictions, while keeping all other features as they were.

From this football model, we can learn interesting insights to help make decisions, or in this case, decisions about what will contribute to scoring a goal.

1. Events from the corner are highly likely to result in scoring a goal, regardless of which corner.

Shot place – Ranked in first place.

Situation – Ranked in third place, besides the corner if it’s a set piece. That occurs any time there is a restart of play from a foul or the ball going out of play, which provides a better starting position for the event to result in a goal.

2. Events with the foot have a higher chance of resulting in a goal than events from the head. Although most people are right-footed, it looks like football players use both feet pretty equally.

Body part – Ranked in second place.

3. Events happening from the box—center, left and right side, and from a close range—have almost equal opportunities for a higher likelihood of a goal.

Location – Ranked in 4th place.

Time – In the first 10 minutes of the game, the intensity builds up and keeps its momentum going from between 20 minutes into the game and halftime. After halftime, we see another increase, potentially from changes in the team. At the 75-minute mark, we see a drop, which indicates that the team is tired. This leads to more mistakes and wasting more time on defense in an effort to keep the competitive edge.

The insights from unstructured data

DataRobot supports multimodal modeling, and I can use structured or unstructured data (i.e., text, images). In the football demo, I got a high value from text features and used some of the in-house tools to understand the text.

From text prediction explanation, this example shows an event that occurred during the game and involved two players. The words “box” and “corner” have a positive impact, which is not surprising based on the insights we discovered earlier.

From the world cloud, we can see the top 200 words and how each relates to the target feature. Larger words, such as kick, foul, shot, and attempt, appear more frequently than words in smaller text. The color red indicates a positive effect on the target feature, and blue indicates a negative effect on the target feature.

The lifecycle of the model is not over at this step. I deployed this model and needed to see the predictions based on different scenarios. With a click from a deployed model, I created a predictor app to play like gamification—where fans can create different scenarios and see the likelihood of a goal based on a scenario from the model. For example, I created an event scenario in which there was an attempt from the corner using the left foot, along with some additional variables, and I got a 95.8% chance of a goal.

Predict Football Goals with AI App by DataRobot

Over 95% is pretty high. Can you do better than that? Play and see.

DataRobot launched this project at Global AI Summit 2022 in Riyadh, aligning with the lead up to the World Cup 2022 in Qatar. At the event, we partnered with SCAI | سكاي. to showcase the application and to let attendees make their own predictions.

Watch the video to see the DataRobot platform in action and to learn how this project was developed on the platform. Or try to develop it by yourself using the data and use case located in DataRobot Pathfinder. Feel free to contact me with any questions!

The post Scoring More Goals in Football with AI: Predicting the Likelihood of a Goal Based on On-the-Field Events appeared first on DataRobot AI Platform.

Can You Estimate How Long It Will Take McLaren Formula 1 Team to Complete a Race Through ML and Human Intelligence?

Arjun Arora — Tue, 11 Oct 2022 13:00:00 +0000

Are you new to Formula 1? Want to learn how AI/ML can be so effective in this space? 3. . . 2. . .1. . . Let’s begin! F1 is one of the most popular sports in the world and is also the highest class of international racing for open-wheeled single-seater formula racing cars. Made up of 20 cars from 10 teams, the sport has only become more popular after all the recent documentaries on drivers, team dynamics, car innovations, and the general celebrity level status that most races and drivers receive across the world! Additionally, F1 has a long tradition of pushing the limits of racing and continuous innovation and is one of the most competitive sports on the planet – which is why I like it even more!

So how can AI/ML help McLaren Formula 1 Team, one of the sports oldest and most successful teams, in this space? And what are the stakes? Each race, there are a myriad of critical decisions made which impacts performance— for example, with McLaren, how many pit stops should Lando Norris or Daniel Ricciardo take, when to take them, and what tyre type to select. AI/ML can help transform millions of data points that are being collected over time from cars, events, and other sources into actionable insights that can significantly help optimize operations, strategy, and performance! (Learn more about how McLaren is using data and AI to gain a competitive advantage here.)

Learn how McLaren is using data and AI to gain a competitive advantage

As an avid F1 racing viewer, data enthusiast, and curious person that I am, I thought – what if we could leverage machine learning to predict how long a race will take to finish as the first hypothesis?

Based on some strategic decisions can I reliably and accurately estimate how long will it take for Lando Norris or Daniel Ricciardo to complete a race in Miami?
Can machine learning really help generate some insightful patterns?
Can it help me make reliable estimates and race time decisions?
What else can I do if I did this?

What I am going to share with you is how I went from using publicly available data to building and testing various cutting edge machine learning techniques to gaining critical insights around reliably predicting race completion time in less than a week! Yes – less than a week!

The How – Data, Modeling, and Predictions!

Racing Data Summary

I started by using some simple race level data that I pulled through the FastF1 API! Quick overview on the data — it includes details on race times, results, and tyre setting for each lap taken per driver, and if any yellow or red flags occurred during the race (a.k.a. any uncertain situations like crashes or obstacles on course). From there, I also added in weather data to see how the model learns from external conditions and whether it enables me to make a better race time estimate. Lastly, for modeling purposes, I leveraged about 1140 races across 2019-2021.

Visualizing the distribution of completion time across different circuits — Seems like the Emilia Romagna GP takes the longest, while the Belgian GP is typically shorter in race time (despite being the longest track on the calendar).

Race Time Estimation Modeling

Key Questions – What algorithms do I start with? A lot of data is not easily available— for example, if there was a disqualification, or crash, or telemetry issue, sometimes the data is not captured. What about converting the raw data into a format that will be easily consumed by the learning algorithms I am typically familiar with? Will this work in the real world? These are some of the key questions I started thinking about before approaching what comes next. One of the first questions is, what is Machine Learning Doing Here? Machine learning is learning patterns from historical data (what tyre settings were used for a given race that led to faster completion time, how did drivers perform during different seasons, how did variations in pit stop strategy lead to different outcomes, and more) to predict how long a future race will take to complete.

Process – Typically, this process can take weeks of coding and iterations — processing data, imputing missing values, training and testing various algorithms, and evaluating results. Sometimes even after coming up with a good model — I only realize later that the data was never a good fit for the predictions or had some target leakage. Target Leakage happens when you train your algorithm on a dataset that includes information that would not be available at the time of prediction when you apply that model to data you collect in the future. For example, I want to predict whether someone will buy a pair of jeans online, and my model recommends it to them only because they are going through the checkout process — well that is too late because they are already buying the jeans — a.k.a. lots of leakage.

My approach – To save time on iterations, I can also leverage automation, guardrails, and Trusted AI tools to quickly iterate on the entire process and tasks previously listed and get reliable and generalizable race time estimates.

Start – Me clicking the start button to train and test hundreds of different automated data processing, feature engineering, and algorithmic tasks on racing data. DataRobot is also alerting me on issues with data and missing values in this case. However, for today we will go ahead with the inbuilt expertise on handling such variations and data issues.

Insights – Of the hundreds of experiments automatically tested, let’s review at a high level what are the key factors in racing that have the most impact on predicting total race time — I am not McLaren Formula 1 Team driver (yet), but I can see that having a red flag, or safety car alert does impact overall performance/completion time.

More Insights – On a micro level, we can now see how each factor is individually affecting the total race time. For example, the longer I wait to make my first pit stop (X axis), the better outcomes I will get (shorter total race time). Typically, a lot of drivers stop around the 20-25 mark for their first pit stop.

Evaluation – Is this accurate? Will it work in the real world? In this case, we can quickly leverage the automated testing results that have been generated. The testing is done by selecting 90 races that were not seen by the model during the learning phase and then comparing actual completion time versus predicted completion time. While I always think results can be better, I am pretty happy that the recommended approach is only off by 20 seconds on average. Although in racing 20 seconds sounds like a lot, and that can be the difference between P3 to P9, the scope here is to provide a reasonable estimate on total time with an error rate in seconds vs minutes— which is what the actual estimates can fall across. For example, imagine if I had to guess how long Lando Norris or Daniel Ricciardo will take to complete a race in Miami without much prior context or F1 knowledge? I definitely would say maybe 1 hour 10 minutes or 1 hour 30 minutes, but using data and learned patterns, we can augment decision-making and enable more F1 enthusiasts to make critical race time and strategy decisions.

Can’t wait to use AI models to make intelligent race day decisions – Check out the Datarobot X Mclaren App here! For more details on the use case and data, you can find more information on this post.

DataRobot X McLaren App

Use AI Models to Make Intelligent Race Day Decisions

Try Now

What’s Next

For now, I’ve built my model for 2019-2021 races. But the project is really motivating me to revisit more data sources and strategy features within F1. I recently started watching the Netflix series Drive to Survive, and can’t wait to incorporate this year’s data and retrain my race time simulation models. I’ll be continuing to share my F1 and modeling passion. If you have feedback or questions about the data, process, or my favorite F1 Team – feel free to reach out arjun.arora@datarobot.com!

Imagine how easily this can expand to over 100 AI models — what would you do?

The post Can You Estimate How Long It Will Take McLaren Formula 1 Team to Complete a Race Through ML and Human Intelligence? appeared first on DataRobot AI Platform.

AI in Supply Chain — A Trillion Dollar Opportunity

Wei Shiang Kao — Thu, 18 Aug 2022 16:13:33 +0000

Supply chain and logistics industries worldwide lose over $1 trillion a year due to out-of-stock or overstocked items¹. Shifting demands and shipping difficulties make the situation worse.

Challenges in inventory management, demand forecasting, price optimization, and more can result in missed opportunities and lost revenue.

The retail marketplace has become increasingly complex and competitive. Keeping pace with the connected consumer, embracing emerging trends in shopping, or staying ahead of the competition—these challenges bear down on retailers and manufacturers greater than ever before.

AI in Supply Chain Management

According to McKinsey & Company, organizations that implement AI improve logistics costs by 15%, inventory levels by 35%, and service levels by 65%². AI can reduce costs and minimize supply chain challenges by driving more informed choices across all aspects of supply chain management.

Retailers and manufacturers that incorporate AI in supply chain management greatly enhance their ability to forecast demand, manage inventory, and optimize price. Those who become AI driven will become market leaders and will be better positioned to capture new markets and maximize profits.

Enabling AI in the supply chain empowers organizations to make decisions with confidence, adjust business practices quickly, and outpace the competition.

Benefits of AI in Supply Chain

AI enables manufacturers and retailers to innovate across their operations and maximize business impact. AI-enabled supply chain management empowers organizations to become multifaceted, connected, agile, competitive—and above all—responsive to the ever-changing demands of the empowered consumer.

Manufacturing and retail organizations that employ AI in their supply chain enable benefits including:

Improve demand forecasts for increased accuracy and granularity
Apply nowcasting to bridge the gap on lagged data
Refine forecast error margins to reduce buffer stock inefficiencies
Optimize price and flag cost anomalies along the supply chain
Detect defective products coming off of a manufacturing line
Identify bottlenecks to improve warehouse throughput
Improve coordination of shipment logistic and reduce scheduling inefficiencies
Identify and mitigate accident risks that incorporate financial liability
Reduce company driver turnover
Understand the impacts of macroeconomic conditions on product demand
And more

The benefits of AI in supply chain provides data-driven insights that help supply chain and logistics organizations solve their hardest problems, drive success, and deliver real ROI.

Application of AI in Supply Chain

Gains from implementing AI in your supply chain can be spectacular. One global retailer was able to achieve $400 million in annual savings and a 9.5% improvement in forecasting accuracy³.

Despite these potential returns, 96% of retailers find it difficult to build effective AI models, and 90% report trouble moving models into production⁴. Organizations need a center of excellence for deploying AI/ML models. Collaboration across data science, business, and IT teams throughout the AI lifecycle also greatly impacts AI success.

Increasing supply chain volatility exacerbates the urgency for organizations to enable AI within their supply chain and drive business impact.

AI has been called the Fourth Industrial Revolution for good reason. Many manufacturers and retailers apply AI to their supply chain, addressing three major challenges: market demand, product and supply management, and operational efficiencies.

Real-World Examples: AI Use Cases in Supply Chain

OYAK Cement Boosts Alternative Fuel Usage from 4% to 30% — for Savings of Around $39M

OYAK Cement, a leading Turkish cement maker, needed to reduce costs by increasing operational efficiency. The organization also needed to reduce CO2 emissions and lessen the risk of costly penalties from exceeding government emissions limits.

OYAK turned to AI to optimize and automate its processes in addition to lowering its energy consumption.

The result: OYAK Cement optimized grinding processes, used materials more efficiently, predicted maintenance needs, and better sustained material quality. OYAK Cement also improved alternative fuel usage from 4% to 30%.

The manufacturer experienced operational efficiencies and cost savings by deploying AI:

Reduced costs by approximately $39 million
Reduced the time to predict mechanical failures by 75%
Increased alternative fuel usage by seven times

With DataRobot, we can now see on a cost basis, efficiency basis, and most importantly, an environmental basis, where we will see an advantage and proactively make changes.

Berkan Fidan
Performance and Process Director, OYAK Cement

Read Now: OYAK Customer Success Story

Learn how AI-enabled supply chain management empowered OYAK Cement

CVS Health Saves Lives with AI-Driven Vaccine Rollout

When the COVID-19 vaccine first hit the market, there were thousands of people dying every day. The urgency to distribute vaccines was immediate. CVS Health needed to optimize COVID-19 vaccine distribution given the very limited supply and extremely high demand.

CVS Health turned to DataRobot to deliver testing and vaccines as efficiently and effectively as possible.

The result: CVS Health administered more than 60 million vaccines nationwide. The organization saved lives with AI-driven vaccine rollout:

60 million vaccines were administered nationwide
20% of nationwide vaccines were administered by CVS Health
90% of vaccinated individuals returned for the second dose

One of the benefits of DataRobot is that it’s transparent. Checking and making sure that one of your colleagues built a model you can confidently share with leadership and trust entirely is quite an endeavor.

Francois Fressin
Sr. Director, Data Science and Machine Learning, CVS Health

Watch how CVS Health used AI to power their distribution strategy and save lives

Lenovo Computes Supply Chain and Retail Success with DataRobot

Lenovo Brazil needed to equalize the supply and demand for laptops and computers among the Brazilian retailers that received thousands of Lenovo products each week. They were also resource constrained. They needed to either invest in more data scientists or find a platform that could automate modeling and forecasting steps.

Lenovo Brazil turned to DataRobot to build machine learning models at a faster rate, while improving prediction accuracy.

The result: Lenovo Brazil more accurately predicted sell out volume, propelling it to become the leader in volume share on notebook sales for the B2C segment in Brazil. In parallel, it looked to expand use cases including scoring sales leads, predicting payment delays, and predicting default risks.

Lenovo Brazil saw efficiency gains and dramatic accuracy improvements:

Reduced model creation time from four weeks to three days
Reduced model deployment time from two days to five minutes
Improved prediction accuracy from less than 80% to over 90%

The biggest impact DataRobot has had on Lenovo is that decisions are now made in a more proactive and precise way. We have discussions about what actions to take based on variables, and we can compare predictions with what really happened to keep refining our machine learning process and overall business knowledge.

Rodrigo Bertin
Senior Business Development Manager, Latin America, Lenovo Brazil

Improving Supply Chain Management with DataRobot

Manufacturers and retailers face enormous challenges and require best-in-class solutions. Through AI-enabled supply chain management, manufacturers and retailers gain an automated means to forecast demand, manage inventory, and optimize pricing.

See how AI Platform for Retail can be used to solve challenges such as demand forecasting and out-of-stock issues. Accelerate the delivery of AI to drive strategic business outcomes.

The post AI in Supply Chain — A Trillion Dollar Opportunity appeared first on DataRobot AI Platform.

Healthcare: Why Integrated Care Systems Need to Focus on AI and not BI

Rob O'Neill, Benjamin Tucker — Wed, 10 Aug 2022 16:06:18 +0000

Change is happening fast across the NHS with the focus squarely on harnessing the huge amount of data the NHS generates — to drive forward the transformation programmes needed to address the backlog for elective care and growing demands for services.

As Integrated Care Systems (ICSs) in England officially launch, we take a look at the key opportunities presented to ICS regions to harness cutting-edge modern and integrated analytical frameworks to accelerate the attainment of operating efficiencies, the modernisation of care pathways, and the improvement of patient outcomes.

Transforming the Workforce

Staff are both the NHS’s greatest asset and its greatest vulnerability. This is being particularly felt by trusts as the high volume of nursing staff vacancies impacts operational delivery and patient care. As ICSs develop plans to deliver around 30% more elective activity by 2024-2025 than before the pandemic, the need to retain clinical staff is paramount. NHS organisations are used to using workforce KPIs to manage staffing levels, but the real opportunity is being able to identify staff that are at risk of leaving post and to implement strategies to retain their much needed talent.

Snowflake provides a state-of the-art data platform for collating and analysing workforce data, and with the combined addition of DataRobot Solution Accelerator models, trusts can have predictive models running with little experimentation — further accelerated by the wide range of supportive datasets available through the Snowflake Marketplace.

Responding to COVID-19 as it mutates and continues to impact society

The pandemic has affected all of our lives and those of our families and communities. The rapid creation and subsequent evolution of regional dataflows and analysis was a cornerstone of the UK’s COVID-19 response and action plan and the recently published Data Saves Lives policy paper sets out the UK Government’s plan for data-driven healthcare reform.

DataRobot and Snowflake have been at the heart of the pandemic response across the globe including supporting NHS trusts and ICSs build predictive solutions, building and sharing COVID-19 datasets, partnering with US states to responding and preparing for future disease outbreaks, and driving the distribution of 20% of the US’s vaccine rollout.

Tackling the elective backlog

Ensuring that patients waiting for elective operations are prioritised and treated is the top concern for the NHS, and research predicts that the number of people waiting for treatment will reach 7 million by 2025.

Through the integration of Snowflake and DataRobot, ICSs can rapidly build solutions to not only risk-assess all patients waiting for treatment but also harness geospatial predictive capabilities to model which citizens are likely to require intervention in the future to enable pre-admission intervention. This exact approach is being taken by Greater Manchester Health and Social Care Partnership who have built a Snowflake ICS data platform and are also building and deploying DataRobot models to identify risk and to suggest prioritisation order of patients waiting for treatment across the region.

Resetting urgent care performance and delivery

The way the NHS measures urgent care performance is evolving and the change is welcome as the 4-hour standard is a crude method of measurement with patients waiting for increasingly long lengths of time (during March 2022 27% of all patients (in England) requiring emergency admission waited for over 4-hours from decision to admission). Accurately forecasting non-elective demand is a necessity for ICSs and acute trusts, but this task is complicated by the pandemic and the data disruption that ensued.

DataRobot’s Automated Time Series forecasting capability gives ICSs the ability to generate highly accurate hour-by-hour forecasts and to augment historically acute data with environmental datasets from the Snowflake Marketplace that are proven to have predictive value — including weather forecasting, public holidays, etc.

Enabling population health management and reducing health inequalities

Population health management is regarded as the essential approach to sustainable healthcare delivery and is a core strategic aim for ICSs.

People are living longer but with an increased burden of disease and mental health disorder, however much of this could be preventable if health systems are able to transition from being reactive to proactive. Social Determinants of Health (SDOH) are proven to impact on a citizen’s life, and quality of life, expectancy and ICSs have a unique opportunity to either build or ingest (from the Snowflake marketplace) and share datasets that will add predictive value including data relating to citizen housing, employment and education.

Health systems around the globe are already doing exactly this, and they are sharing datasets through Snowflake and deploying DataRobot models that are predicting with accuracy citizen and community disease propensity. The step for ICSs is to both understand the health and care needs of their populations and implement actions to take preemptive action, and there is a growing body of evidence that this is eminently achievable through the correct data-driven approach.

Improving patient outcomes through a data-first approach

Whether it is harnessing the power of automated machine learning to better identify patients at-risk of readmission, predicting hospital acquired conditions, or looking to improve patient outcomes through operating theatre data – DataRobot:Snowflake integration gives trusts revolutionary power to derive deep insight into patient condition, deterioration and outcomes.

Through the Snowflake Data Cloud and DataRobot AI Platform and by adopting a partnership approach, ICSs and NHS organisations are able to leverage our experience of the types of data that give the best predictive output, and to then harness them so that they deliver accurate, and decision-ready predictions.

Action to Take

Learn more about the Snowflake and DataRobot partnership.
Register for the HETT Show on 27-28 September in London where DataRobot and Snowflake will have a joint stand. Book an appointment to talk to the team and see a live demonstration of both platforms.
Watch for more healthcare blogs to stay up to date on how DataRobot and Snowflake enable rapid, secure, scalable, and integrated health and care transformation.

The post Healthcare: Why Integrated Care Systems Need to Focus on AI and not BI appeared first on DataRobot AI Platform.

Minding Your Models

DataRobot — Fri, 22 Jul 2022 14:22:40 +0000

Using AI-based models increases your organization’s revenue, improves operational efficiency, and enhances client relationships.

But there’s a catch.

You need to know where your deployed models are, what they do, the data they use, the results they produce, and who relies upon their results. That requires a good model governance framework.

At many organizations, the current framework focuses on the validation and testing of new models, but risk managers and regulators are coming to realize that what happens after model deployment is at least as important.

Legacy Models

No predictive model — no matter how well-conceived and built — will work forever. It may degrade slowly over time or fail suddenly. So, older models need to be monitored closely or rebuilt entirely from scratch.

Even organizations with good current controls may have significant technical debt from these models. Models built in the past may be embedded in reports, application systems, and business processes. They may not have been documented, tested, or actively monitored and maintained. If the developers are no longer with the company, reverse engineering will be necessary to understand what they did and why.

Future Models

Automated machine learning (AutoML) tools make building hundreds of models almost as easy as building only one. Aimed at citizen data scientists, these tools are expected to dramatically increase the number of models that organizations put into future production and need to continuously monitor.

Reduce Risk with Systematic Model Controls

Every organization needs a model governance framework that scales as its use of models grows. You need to know if your models are at risk of failure or are measuring the right data. With growing financial regulations to ensure model governance and model risk practices, such as SR 11-7, you must also verify that the models meet applicable external standards.

This framework should cover such subjects as roles and responsibilities, access control, change and audit logs, troubleshooting and follow-up records, production testing, validation activities, a model history library, and traceable model results.

Using DataRobot MLOps

Our machine learning operations (MLOps) tool allows different stakeholders in an organization to control all production models from a single location, regardless of the environments or languages in which the models were developed or where they are deployed.

For Model Management

The DataRobot “any model, anywhere” approach gives its MLOps tool the ability to deploy AI models to virtually any production environment — the cloud, on-premises, or hybrid.

It creates a model lifecycle management system that automates key processes, such as troubleshooting and triage, model approvals, and secure workflow. It can also handle model versioning and rollback, model testing, model retraining, and model failover and failback.

For Model Monitoring

This advanced tool from DataRobot provides instant visibility into the performance of hundreds of models, regardless of deployment location. It refreshes production models on a schedule over their full lifecycle or automatically when a specific event occurs. To support trusted AI, it even offers configurable bias monitoring.

Find Out More

Regulators and auditors are increasingly aware of the risks of poorly managed AI, and more stringent model risk management practices will soon be required.

Now is the time to address the gaps in your organization’s model management by adopting a robust new system. As a first step, download the latest DataRobot white paper, “What Risk Managers Need to Know about AI Governance,” to learn about our dynamic model management and monitoring solutions.

The post Minding Your Models appeared first on DataRobot AI Platform.

Location AI: The Next Generation of Geospatial Analysis

DataRobot — Tue, 05 Jul 2022 14:02:59 +0000

Real world problems are multidimensional and multifaceted. Location data is a key dimension whose volume and availability has grown exponentially in the last decade. At the confluence of cloud computing, geospatial data analytics, and machine learning we are able to unlock new patterns and meaning within geospatial data structures that help improve business decision-making, performance, and operational efficiency.

The power of this convergence is demonstrated by the following example. Cleaned and enriched geospatial data combined with geostatistical feature engineering provides substantial positive impact on a housing price prediction model’s accuracy. The question we’ll be looking at is: What is the predicted sale price for a home sale listing? Keep in mind, however, that this workflow can be used for a broad range of geospatial use cases.

Utah Spatial Modeling Process

A Light Gradient Boosted Trees Regressor with Early Stopping model was trained without any geospatial data on 5,657 residential home listings to provide a baseline for comparison. This produced a RMSLE Cross Validation of 0.3530. By example, this model predicted a roughly $21,000 increase in price compared to its true price.

In order to isolate the impact of the geospatial features, we compare modeling results with the same blueprint as the baseline model using the data’s available location identifiers. Enabling spatial data in the modeling workflow resulted in a 7.14% RMSLE Cross Validation improvement from the baseline and a $12,000 increase in prediction price compared to the true price, roughly $9,000 lower than the baseline model.

As a practice, spatial data scientists attempt to transfer human-spatial reasoning for machines to learn from. Five hypothesized key factors that contribute to housing prices were used to enrich the listing data using spatial joins:

select demographic variables from the U.S. Census Bureau,
walkability scores from the Environmental Protection Agency,
highway distance,
school district scores, and
distance to recreation, namely, ski resorts.

Geospatial enrichment in combination with Location AI’s Spatial Neighborhood Featurizer reveal local spatial dependence structures such as spatial autocorrelation that exists between number of bedrooms, the square footage of the listing data, and the enriched feature for walkability score. Spatial data enrichment resulted in a 8.73% RMSLE Cross Validation improvement from the baseline and a $1,300 increase in price compared to the true price, roughly $11,000 lower than the enabled dataset model and about $20,000 less than the baseline model.

Geospatial Data Enrichment Example

Price Prediction Example

Spatial predictive modeling is applicable to a wide reach of industries because of the general availability of spatial data. Analyzing and understanding the applicability of spatial data enrichment to any particular machine learning scenario does not have to be a complex undertaking. To learn more on the best practices utilized for developing this location-aware model, read the full white paper here.

The post Location AI: The Next Generation of Geospatial Analysis appeared first on DataRobot AI Platform.