Insurers are overflowing with data, but that is only part of their challenge in being able to effectively detect fraud. In terms of information, their predicament stems more from being able to tighten the amount of data to gain access to the right information sources.
A recent report by the Coalition Against Insurance Fraud indicates that more insurers are using predictive analytics and other fraud detection technology to detect and deter fraud. The survey found that 95 percent of respondents use anti-fraud technology, an increase from 88 percent in 2012. However, difficulties in data integration and poor data quality, like spelling and transposition errors, were identified as major stumbling blocks to successfully implementing these new technologies.
According to a recent article by James Ruotolo that appeared in Information Week: Insurance and Technology, consolidating data can be tricky, but addressing data quality and integration issues up front are imperative to a successful fraud analytics model and will pay significant dividends in improved detection rates.
He recommends four key steps in preparing insurance data for fraud analytics:
- Integrate data silos. Core processing systems serve a specific purpose that often have nothing to do with each other or aggregated data analysis. For purposes of fraud analytics, claims, policy, application, billing, and medical data sources originating in different places need to be consolidated. Be sure to include legacy systems and other less formal “systems” like spreadsheets, watch lists, case-management applications, and shared file systems. Document the integration efforts and ensure that they are repeatable and auditable. This is critical when you enable fraud analytics scoring in production.
- Manage missing and erroneous data. If your systems contain Social Security numbers like 999-99-9999 or claim files with missing telephone numbers, for example, then you need to fix this information. Ignoring these errors can have a negative impact on fraud analytic results. Leading data quality tools can help identify, repair, and replace missing or erroneous data. In some cases, missing data is found in another system or can be inferred based on a combination of other sources. Also, standardizing formats for common fields like addresses will be helpful in the future.
- Resolve entities. Once data is aggregated from multiple systems, identify whether the same individuals, companies, or other entities exist in multiple places; for example, if one system uses name and Social Security number, while another captures name and date of birth. Entity resolution techniques can link these two records and identify them as the same individual. Best results involve more advanced analytical techniques to determine the likelihood of matching, especially if social network analytics or link analysis will be used in the fraud analytics solution. The ability to link one individual—who could have roles as an insured, claimant, witness, driver, vendor, and employee across multiple claims—is a powerful tool in detecting suspicious activity.
- Process unstructured text. It is estimated that up to 80 percent of insurer data is kept in text format with some of the best information captured in the loss description or claim notes fields. Managing this data can be complicated because abbreviations, acronyms, industry jargon, and misspellings are common. However, a text analytics solution containing a library of terminology specially designed for insurance data can address this issue. During this analysis, additional model variables can also be created expanding the scope of fraud analytics without having to include external data sources. Machine learning and natural-language processing techniques should be used to find and create useful variables for fraud analytic modeling.
Click on the link to read the article.