Create beautiful visualization of data in TrelliscopeJS package in R (insurance data)

A good learning experience for those who want to give a lively touch to their data

Harish Nagpal
Analytics Vidhya

--

Photo by Ulises Baga on Unsplash

TrelliscopeJS package was created by Ryan Hafen. As per Hafen “trelliscopejs is an R package that brings faceted visualizations to life while plugging in to common analytical workflows like ggplot2 or the tidyverse.”

You might have faceted your data with R ggplot2’s facet_wrap function. ‘face_wrap’ gives out beautiful data visualization but we can’t interact with that as powerfully as we can do with trelliscope package. That’s how I decided to give it a try on an Indian Insurance Industry data. After reading this article you will find that how in less than 20 lines of code you can present your data in a wonder full way which even your colleagues can understand easily.

I will be giving telliscopejs demo on Indian Life Insurance companies’ Persistency data.

What is Persistency Ratio?

Persistency ratio is an important ratio for every life insurance company worldwide. It shows the loyalty of customer towards an insurance companies. Life insurance is a long term contract between an insurance company and a customer (called policy holder), where the customer promises to pay a regular sum called premium to get the risk cover as mentioned in the contract till the full term of the policy and insurance company promises to pay the Sum assured (risk cover) towards an untoward happening with the customer or after the maturity of the policy. There are different kinds of life insurance policies and I won’t get into details of such policies here.

If the customer does not pay his premium then his policy might lapse and he losses the valuable risk cover. This is a loss to both insurance company and the customer. To customer, because he losses valuable risk cover as company stops covering the risk cover if a policy lapses for non payment of due premium and to life insurance company because it does not get the promised renewal premium thus loss of premium income for the company.

Persistency ratio looks at how many customer are keeping their policy inforce by paying the regular premium and how many customers are lapsing their policy. It also takes into consideration how much premium income is expected from the customers who are present in the books of insurance companies and how much premium income is lost for non payment of due premiums by the customers.

Persistency is important for profitability, share holder value, premium income and overall growth of the insurance company. It is generally measured in terms of 13th month, 25th month, 37th month, 49th month and 61st month.

The data being used for this demo has been taken from here.

Lets start exploring the data. The R packages have been mentioned in below code. If you do not have these packages in your R library, then you can download them before using.

# the purpose of below code is to present Indian Insurance Companies data
# through trelliscopejs package
# load packages
library(tidyr)
library(ggplot2)
library(dplyr)
library(trelliscopejs)

Let’s set the working directory and get the data

setwd(“D:/Harish/Lapsation model”)
getwd()
MasterData = read.csv(“Indian_Insurance_Persistency_201718_1.csv”)

Have a look at the data.

Inside Persistency Data

From data we find that data is in cross tabular form. We have insurance company name and financial year on left hand side. And each month’s persistency figures have been given in different columns.

Let us check NA values in the data

We find that there are total 840 values out of which 9 have none values. In my analysis I will ignore the NA data because there is a valid reason for data being not present as its a 5–6 years long data. So chances are that for some companies 5th year data for 13th month may not be present as those companies started their operations late.

Let’s filter data for one insurance company and make a ggplot out of this data. ggplot is a wonderful package from R and you should learn it for visualization.

#filtering for one insurer
ABSL <- filter(MasterData, Insurer == “Aditya Birla Sunlife”)
head(ABSL)
Data for Aditya Birla Sun Life

Let’s make a ggplot out of this data

h1 <- ggplot(data=ABSL, aes(x = Fin_Year, y = M13, group=1)) + 
geom_line(linetype=”dashed”, color=”blue”, size=.0) +
geom_point()
h1
Year wise Persistency data for Aditya Birla Sunlife

From the data we can conclude that the insurer put a good effort in increasing the persistency of 13th month as it increased consistently over the years.

Let us filter the data for one financial year and see what happens.

#filtering for one financial year
YearWise <- filter(MasterData, Fin_Year == “2013–14”)
head(YearWise)

Let us make a ggplot out of this

h2 <- ggplot(data=YearWise, aes(x = Insurer, y = M13, group=1)) + 
geom_point(aes(col=YearWise$Insurer, size=M13)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
h2
13th Persistency Data for Financial year 2013–14

From the plot we can conclude that Sahara Life has highest persistency ratio while DHFL Pramerica has lowest persistency ratio in 13th month.

Now let us use trelliscopejs package for visualization. I am showing only 13th month persistency data to draw the inference. Look at the code below. Please note that I am using “as_plotly=TRUE” argument here. This argument helps in showing the data as a tool tip text.

#showing 13th month data — trelliscope
h3 <- ggplot(data=MasterData, aes(x = Insurer, y = M13, group=1)) +
geom_point() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
geom_point(aes(col=Insurer, size=M13)) +
facet_trelliscope(~ Fin_Year, as_plotly = TRUE,
name = “Persistency of Indian Insurance Companies”,
desc = “Persistency of Insurers year wise”) +
theme(legend.position = “none”)
h3
Data with trelliscopejs package

Look at above visualization. When we use the trelliscopejs package, the output comes in a browser like container. In this container we can also filter the data. Look at the tool tip text. This is displayed using “ as_plotly = TRUE” argument. You can see next financial year data by clicking on ‘Next’ link given on top right hand side.

There is one challenge with above visualization. It does not display the data for other years like 25th month, 37th month etc. So what to do now?

Let us look at our master data again as given below:

Master Data

We can see that the data for 13th, 25th, 37th, 49th and 61st month are present in different columns. So how do we bring months on x-axis?

There is one wonderful solution in R’s tidyr package which helps in reshaping of data. We will use gather() function of tidyr package. gather() helps in making ‘wide’ data longer. And spread() function makes ‘long’ data wider. So let’s gather() our data and see what happens.

#make wider data long
All_Gather <- MasterData %>% gather(Per_month, Value, M13, M25,
M37, M49, M61) %>% group_by(Insurer)
head(All_Gather)
Data after columns converted into Rows

We can see that months have come into a new column “Per_month” and their value have come into a new column “value”. So now we are good to use this data for our final visualization. As Fin_Year is a factor data, I will use factor() argument for Per_month column as given in code below:

h5 <- ggplot(data = All_Gather, aes(x = factor(Per_month), y = Value,colour = Fin_Year, group = Fin_Year )) +
geom_line(linetype=”dotted”) +
geom_point() +
xlab(“Month”) + ylab(“Persistency”) +
facet_trelliscope(~ Insurer, as_plotly = TRUE,
name = “Persistency Report”,
desc = “Year wise persistency of Indian Insurance Companies”) +
theme_classic()
h5

Look at the results below:

We can see that months are now on x axis and persistency is on y axis. The lines show the data for each financial year. Below x axis you can see the name of insurer.

You can also present the data in 2*2 panel as given below by using ‘Grid’ option given on LHS.

You can also filter the data on name of insurer by using ‘Filter’ option given on left hand side. See below image

We can use various kinds of analysis on our data. By looking at the data we can conclude that for all insurers, as the policy life cycle passes, persistency drastically reduces as we reach 61st month. This trend is visible for all insurers. Also we can see that as years’ pass by 61st month’s persistency is improving for most of the insurers.

But one thing is certain. For almost all insurers 61st month’s persistency is less than 50%. It means that 50% of policies sourced 5 years ago by each insurer are in lapse status thus impacting the profitability of insurers.

You can also include minimum value, mean value etc from LHS menu. See below chart for LIC of India:

With this I come to end of the article. Do share your feed back. You are free to use this code for your analysis.

--

--

Harish Nagpal
Analytics Vidhya

An IT professional, passionate about art in all forms — data, nature, paintings and visual art. Linkedln — https://www.linkedin.com/in/harish-nagpal-8696529/