Insurance claims data is growing in volume exponentially across the globe. The data contains various types of information in the form of continuous or discrete numbers, short texts, huge paragraphs, and, of course, images.
Let’s consider a scenario where a claimant got into an accident. An adjuster visits the accident spot to collect the FNOL (First Notice of Loss) data, which includes basic information of the accident like the speed of the vehicle, vehicle type, claim type in a structured format. The adjuster then documents the accident spot with photographs and makes notes about the accident using shorthand and notations. This information is submitted to the insurance company where external data such as the price of the parts, the current price of the vehicle, and driving history is added and the claim payout is analyzed. If done manually, the whole process may take a few days to a few weeks, depending on adjuster allocation, speed of filing reports, and complexities in the insurance claim.
Most of the insurance industries have started using machine learning algorithms to get more insight from the FNOL data instantly when the adjuster collects them. Still, nearly 95% of them do not include the image data, and process only the tabular and text data.
To gain accurate insights, it makes sense to include the entire range of data for analysis. However, it is seen that insurance companies initially focus on numbers while developing some logic or machine learning model. To give their models an edge, they need to include unstructured data and images along with the numbers.
The question arises, why exclude the images at all? Wouldn’t they have all the relevant information about the accident? Of course, they do. It’s just that it is difficult to analyze these images due to various technical challenges. Today, it is possible to more accurately extract data from images due to several advancements in the computer vision industry.
What is computer vision?
Computer vision is a field that works on enabling computers to see, identify, and process images/videos like a human being. It is like imparting human intelligence and instincts to a computer. It may sound promising; however, in reality, it is quite difficult to enable computers to recognize images of different objects. It is not a new field, but it has caught more attention and become more precise only after the advent of deep learning in the last decade.
One of the main reasons for the rapid evolution of computer vision is the speed of evolution of deep learning methods, which can be attributed to the use of GPUs (a shout out to gamers all over the world!). Deep neural networks for computer vision have surpassed human level accuracy in more than several tasks like image classification and object detection.
How image data analysis impacts the insurance industry
Before getting to the use cases where computer vision algorithms impact insurance Industries and insurance claims, let us first get to know all possible types of image dataset which can be used.
Machine learning models are useless unless you train it with valid data, so the first step before planning to build any analytical algorithms is to check the available dataset.
Previously, the world was reliant on image data from digital cameras or smartphones, but now there are other media that offer a different perspective:
- Drone images: Aerial view images captured by camera drones like DJI Mavic or Parrot Anafi.
- CCTV cameras: Feed recorded from surveillance cameras.
- Satellite Images: Images from several satellites available through both government and a third-party platform like DigitalGlobe, NASA, or Planet.
The range of problems that we can target in the insurance industry
Dataset collection for insurance claims processing mainly depends on the use case or the problem we are trying to solve. Some of the possible use cases in the Insurance industry which uses image data are:
- Estimating home premium based on
- interior photos of the house
- exterior photos using drones or satellite images
- Detecting fraud in property insurance using satellite surveillance images (unauthorized swimming pool, construction of a separate structure, correct calculation of the parcel area)
- Estimating the vehicle damage using accident images
- Assessing possible risks for properties like finding trees near the roofs or damaged roof using satellite images
- Converting the paper-based application documents to digital using OCR
- Medical image analysis for health insurance
- Assessing crop fields using satellite or drone images for crop insurance
- Validation claim description using the accident images
- Constructing a 3D-model for any property using images for detailed analyses
The above list is some of the most common use cases. Based on the requirements and the dataset, computer vision can also be used for several other applications as well.
Once a particular use case or a set of use cases is fixed, and the necessary dataset for it is collected, then focus should be on selecting respective computer vision techniques. The computer vision techniques can then be applied to those use cases with the collected dataset.
For example, if you have satellite images of multiple parcels, and you want to classify whether the parcel contains a bungalow or a farmhouse, then you should select some classification algorithms like Dense Net or ResNet.
In case you want to locate some structures like trees, swimming pool, or sheds in the parcel image, then you may want to go for object detection techniques like Faster RCNN, SSD, or YOLO.
Some useful techniques are:
- Image recognition/classification
- Object localization
- Object detection using a bounding box
- Object detection using pixel by pixel segments
- Image generation
- Image reconstruction
- 3D model generation
- Caption generation
Irrespective of the technique you use, there are certain common fundamental procedures involved in preprocessing the images and model building:
Image normalization: Changing the pixel intensity for making the pixel distribution for low deviations.
Channel selection: This won’t be needed in a normal image with RGB band channels. However, images taken from satellites and at times, by drones would have more channels like alpha or depth intensities.
For example, MIR (mid-infrared) band from LandSat satellite can be used to find the features of ground soil.
Image augmentation: Most of the time, the dataset collected by the adjuster may not be enough for the model to be accurate. In those cases, it is best to augment the images by transforming them based on several instantiations like the angle of rotation, depth, texture, and other aspects.
Tagging: Tagging is the process of finding out a particular object in the images. We match this object to a particular class which we want the model to find. This process is different for each and every technique. For a classification problem, images have to be separated based on the classes, whereas for a localization problem, objects in the image have to be annotated.
There are also several other preprocessing steps depending on the technique used.
After all those preprocessing, the data is fed into the machine learning model. Developing a model has become easy due to multiple frameworks and open source codes like torchvision, TensorFlow object detection API, and fast.ai, among others.
As we discussed above, there are a lot of new applications of computer vision algorithms in the insurance industry. For exploring those applications, one must have a handle on the possibilities that deep learning offers, understand its limitations, and most importantly, their requirements.
A survey by Accenture state that only 10 to 15 percent of the data available in an insurance company is currently being used to generate usable data for insurance claims processing, which is mostly available in structured format. The unused unstructured dataset has a lot of potential that would change how any processing flow works. It would also add a lot of automation factors in the processing pipeline, which would actually surpass human level interpretation of a particular problem.
Most people think that the technical part involving model building and training is the most complicated, in reality, there are several toolkits and frameworks which are available in open source that can help you simplify these complexities.