Healthcare Insurance Analysis

Healthcare costs have been increasing rapidly in recent years, posing significant challenges to individuals, families, and the healthcare system as a whole. The analysis of factors that affect healthcare costs is an important tool for policymakers, insurance providers, and healthcare providers in understanding the drivers of healthcare costs and developing strategies to ensure access to quality healthcare services at affordable prices.

About the project

Purpose

This study aims to analyze the factors that affect healthcare costs and provide insights into their impact on insurance charges. The study will explore the relationships between insurance charges and various demographic and health-related factors such as gender, age, BMI, smoking status, geographic location, and the number of children.

The questions need to be answered

·         1. Are there differences in insurance charges between males and females?

·         2. How do age and BMI affect insurance charges?

·         3. What is the difference in insurance charges between smokers and non-smokers?

·         4. Are there any differences in insurance charges in different geographic locations?

·         5. How does the number of kids affect insurance charges?

·         6. What is the most important factor that affects insurance charges?

Objectives

  1. Collect and prepare healthcare insurance data of 1,339 people.

  2. Combine, clean, and process datasets to get ready for analysis.

  3. Analyze dataset using Excel Toolpak and Pivot Table

  4. Share the conclusion of this analysis with policymakers, insurance providers, and healthcare providers in understanding the drivers of healthcare costs and developing strategies to ensure access to quality healthcare services at affordable prices

Regression Analysis

The R Squared value is over 75% and the Significance F value is below 0.5, which indicates a meaningful correlation between the input variables and the output variable of insurance charges. Taking a closer look at the P-value for each of the individual variables, there is a bit more information about the significance of each of the individual variables. The variables “age”, “BMI”, “children”, and “smoker” are strongly correlated with charges. There is also a slightly less significant correlation between the output variable and “region.” The variable “gender” does not appear to have a strong correlation with costs, as its p-value is much higher than 0.05.

 Gender is not a significant factor affecting insurance charges, while age, BMI, and smoking status are the most important factors impacting insurance costs. Non-smokers pay a much lower insurance cost than smokers. It may be said that elders, people with high BMI, or smokers have a high level of health risks, which leads to a higher insurance cost. Regions also influence insurance costs, Southeast and Northeast pay a higher charge than other regions, while the number of children has a light correlation with the insurance cost.

Visualization

Key Findings 

  • R square = 75%. The difference in input variables explains 75% of the difference in insurance charges.

  • Age, BMI, smoking status, and the number of children are the most critical factors affecting insurance costs, while the regions also influence insurance costs, but it is not a major factor.

  • Gender is not a significant factor impacting insurance charge

  • People who have obesity, especially obesity level 2, tend to pay more insurance cost 

  • The elders pay higher insurance costs than others

  • Smokers pay 3 times more insurance costs than non-smokers, regardless of their gender.

  • People living in the southeast and the northeast pay higher than in other regions