The Power of Machine Learning in Insurance

Categories: Financial Services

Machine learning is in the news.

  • Google says it is “rethinking everything” around machine learning.
  • In Slate, tech writer David Auerbach argues that machine learning reshapes how we live.
  • In Harvard Business Review, Mike Yeomans writes that “every manager” should know about machine learning.

If you are an insurance executive, these claims may seem like hype. Insurers invented some of the most advanced statistical techniques more than a hundred years ago.   

But consider how these insurers use machine learning today.

  • Progressive Insurance uses machine learning to predict claims from telematics and geospatial data.
  • Zurich Insurance uses machine learning to support marketing, fraud detection, and claims management.
  • Transamerica uses machine learning to recommend products to customers.

Gary Reader, KPMG’s Global Head of Insurance, writes:

For the insurance sector, we see machine learning as a fundamental game-changer since most insurance companies today focus on three main objectives: improving compliance, improving cost structures and improving competitiveness. Machine learning can form at least part of the answer to all three. 

Here are three practical ways that the insurance industry uses machine learning today:

Submission Prioritization. Insurers use machine learning to predict premiums, conversion, and losses for the policies that brokers submit based on the data available on the first day. This practice helps underwriters focus on the most valuable business. Detecting good risks early in the process enables insurers to make better use of underwriters’ time and delivers a huge competitive advantage.

Fraud Detection. Insurance fraud is a growing problem. In the US, insurance companies lost more than $50 B in insurance fraud in 2016. Insurers use machine learning to sift through claims and identify those that warrant deeper investigation.

Claims Triage. Insurers can save millions of dollars in claim costs through proactive management: fast settlement, targeted investigations, and case management. Machine learning helps insurers detect complex claims early in the lifecycle, so they can triage cases accordingly.

Of course, insurers use machine learning across many different business functions, including:

  • Optimal pricing
  • Direct marketing
  • Conversion
  • Targeting inspections and audits
  • Predicting litigation
  • Claims forecasting
  • Customer retention

Here’s why machine learning drives so much value:

  • Machine learning delivers more accurate predictions than traditional analysis or human judgment.
  • Modern techniques make these predictions easy to understand and transparent.
  • With better predictions, managers make smarter decisions.
  • Smarter decisions produce more revenue, lower costs, and a better bottom line.

What can machine learning do for your company?

Four Myths About Machine Learning

“But wait!” you may think. “Machine learning is complicated. That’s fine for big companies like AIG and Zurich, but there’s no way we’ll ever be able to afford that.”

Let’s talk about that.

Myth #1: “We will never be able to hire enough experts to use machine learning.”

Machine learning used to be the exclusive domain of data scientists, who are hard to find, hire and retain. The Wall Street Journal, The Chicago Tribune, and many others all note the shortage. Data scientists are so rare that Harvard Business Review suggests that you stop looking or lower your standards.

The good news is that modern machine learning tools are much easier to use today than just a few years ago. Your organization already has domain experts: actuaries, claims managers, product and underwriting managers, marketing managers, and underwriters. With the right tools and training, anyone can contribute to machine learning projects.

Myth #2: “Machine learning projects take forever.”

Managers recently surveyed by Gartner said that time to value is one of their biggest problems with machine learning. Those managers reported that it takes an average of 52 business days for their team to build a predictive model. (Some said it took months.) When your business needs the benefits of machine learning, weeks or months can seem like forever.

Fortunately, modern machine learning software is faster than in the past. It works on distributed platforms so that you can harness more computing power. With deployable code and prediction APIs, getting models into production takes a lot less time.

Myth #3: “Machine learning is opaque.”

Machine learning models seem very hard to interpret because they are more complex than statistical models. They deliver more accurate predictions because they fit our complex world better than statistical models.

While it is harder to understand a machine learning model by inspection, it is still possible to know how it behaves, and how it produces predictions. If we focus on how well a machine learning model predicts, and not what it looks like, we can understand how well it will perform when we use it in a production environment.

Myth #4: “Hadoop and Data Science are not for the business users”

Machine learning libraries that are designed to work in Hadoop, such as Apache Spark or H2O, require a great deal of expertise to use effectively. Data scientists with the necessary Java, Scala, or Python skills are hard to find, recruit, and retain. Physically extracting data from Hadoop to a server takes extra time, and it may be impossible with very large datasets.

Fortunately, there are modern software packages that make machine learning in Hadoop accessible for the everyone. They are smart enough to be appealing and useful to the hardcore data scientists. They are easy to use and intuitive enough to be useful to the business users. These platforms push the workload into the Hadoop cluster for distributed execution and produce insight without data movement. The end result is a much superior product that is collaboratively built with all stakeholders involved: not data scientists and data engineers – but operational leaders and end users who make business decisions based on the data analytics and machine learning.

Automated Machine Learning powered by an Enterprise Data Hub

Automation is the key to success in machine learning. Just as factory automation makes manufacturing more efficient, automated machine learning makes data science more efficient. It speeds the process, broadens the pool of people who can contribute, and ensures high-quality results without sacrificing compliance.

There are three main benefits of Automated Machine Learning:


  • Rapid Development of accurate predictive models. Automated machine learning is fast. Conventional data science is like a craft: experts working singly to develop a piece of analysis. Automated machine learning, combined with modern distributed computing and a vast eco-system of open source algorithms, is like having an army of highly trained data scientists with different calibers. In the time it takes a conventional data scientist to complete one operation, an automated platform can perform hundreds.
  • Fast Productionalization of the selected models. Data scientists are using a wide range of open source tools these days and building cutting-edge models. When these are producing accurate results – executives are struggling with productionalizing these diverse set of models and also, with an overall IT compliance. A good automated machine learning design brings different ad-hoc process together under the same umbrella, add IT compliance to it and distributed computing allows high volume and high frequency scoring in production (in-place or through API).
  • Friction-less Operationalization of the predictions. With automated machine learning, actuaries, underwriters, claims adjusters, fraud investigators, and other stakeholders can collaborate on projects. A lack of programming skills or in-depth knowledge of machine learning methods is no longer a barrier. Automated machine learning complements actuaries’ business acumen and mathematical background, and helps them leverage their domain knowledge. It moves machine learning projects out of the “skunk works” and into the front office. With a clear definition of the business problem and direct feedback from stakeholders, machine learning projects deliver better results and get integrated easily everywhere in the organization.



As you evaluate machine learning and data science platforms, keep the importance of automation in mind. Seek out platforms that automate the entire workflow, not just pieces of it. Look for built-in best practices that ensure that business users can produce valid and high-quality insights that your organization can put into production. Seek transparency; when your team delivers a predictive model, it should be clear and interpretable to executives, stakeholders, and regulators.

Above all, look for a machine learning platform that runs in Hadoop under YARN. For simple implementation and management, your machine learning platform should utilize the built-in resources of your preferred Hadoop distribution. For those that choose Cloudera, that means using Cloudera Manager to distribute runtime libraries; use Custom Service Descriptors to manage and monitor the application; and take advantage of native security, auditing, lineage and encryption.

Call to Action

As a 21st century insurance company seeks to harness the power of machine learning and AI, executives need to think about two strategic priorities.

  1. How they are enabling their data scientists and business users with machine learning capabilities
  2. How they are building a robust compute infrastructure that can support and productionalize machine learning

DataRobot and Cloudera have built a platform to address these two crucial strategic priorities of the insurance companies. How do we do it? See for yourself! Check out the Cloudera and DataRobot webinar recording.


Leave a Reply