Data Analysis: Insurance Fraud Data Set
Insurance Fraud Data Set
Two entities were taken as :
- A person
- A house (marked by address)
And they are linked through vehicle.
The card for a person as an entity is SSN(social security number) and a card for a House is its pincode. The attribute for the link (vehicle) is created by dropdown method. ( the column Registration State was dropdown in attribute pan for link.
Data quality issues: The date format of Vehicle year created an issue when a card was being created through this column (vehicle year). This was not fitting with the date-time format, which has been set as DD-MM-YYYY to capture DOB column. Due to this there was an issue in importing the data.
Also, for bonus points:
When identity in entity is fixed with some columns and then if we directly move to attribute pan and it is found that the attribute pan has no specification, i.e. its empty then this indicates that some of the columns should be used as a card. So, this indicates that the entity associated carries a card.
With a detailed data analysis based on Analysts notebook it has been found that John Smith and Henry Casteel have same social security number (SSN) - 222-85-9632. This might be related to some fraud. A detailed enquiry should be done on these two names and attached single security number to avoid/catch any fraud associated.