A Data-Driven Analysis of Gun Deaths and Policy Effectiveness
1 Problem & Objectives
Firearm mortality remains a pressing public health issue in the United States, with gun-related deaths accounting for a significant portion of homicides, suicides, and accidental fatalities. The complexity of this issue extends beyond crime statistics, encompassing social, economic, and policy-driven factors that influence gun violence trends. This project aims to analyze firearm mortality rates across U.S. states and examine the impact of gun control policies, domestic violence-related firearm deaths, and access to mental health resources. By identifying key contributing factors, this study seeks to provide insights into the effectiveness of existing regulations and potential strategies to mitigate firearm-related deaths. This project will contribute to the broader discourse on public safety and policy effectiveness in reducing gun violence.
Code
import pandas as pdimport geopandas as gpdimport matplotlib.pyplot as pltimport seaborn as snsfrom mpl_toolkits.axes_grid1 import make_axes_locatablefrom matplotlib.colors import LinearSegmentedColormap# Load datasetgun_deaths = pd.read_csv("data/merged_data.csv")# Ensure state names are lowercase for merginggun_deaths["State"] = gun_deaths["State"].str.lower()# Define mortality bins and labelsmortality_bins = [0, 10, 15, 20, 25, float("inf")]mortality_labels = [1, 2, 3, 4, 5]# Assign states into mortality categoriesgun_deaths["mortality_score"] = pd.cut( gun_deaths["firearm_mortality_by_state_2022"], bins=mortality_bins, labels=mortality_labels, right=False).astype(float)states_map = gpd.read_file("data/cb_2018_us_state_20m/cb_2018_us_state_20m.shp")# Ensure lowercase for mergingstates_map["region"] = states_map["NAME"].str.lower()# Remove Alaska to prevent distortion of the mapstates_map = states_map[states_map["region"] !="alaska"]# Merge state map with gun deaths datamap_data = states_map.merge(gun_deaths, left_on="region", right_on="State", how="left")# Define color mapcolors = ["#68bb59", "#acdf87", "#fab733", "#ff6242", "#c61a09"]cmap = LinearSegmentedColormap.from_list("custom_cmap", colors)# Plot Gun Deaths by Statefig, ax = plt.subplots(figsize=(4.5, 3.5)) # Drastically reduced figure sizedivider = make_axes_locatable(ax)cax = divider.append_axes("right", size="2%", pad=0.05) # Minimal padding# Plot statesmap_data.plot(column="mortality_score", cmap=cmap, linewidth=0.1, edgecolor="black", ax=ax, legend=True, cax=cax)# Customize plotax.set_title("Firearm Mortality by State (2022)", fontsize=6) # Tiny fontax.set_xticks([])ax.set_yticks([])ax.set_frame_on(False)# Add custom legend labelscbar = plt.gcf().axes[-1] # Get the colorbar axistick_labels = ["Very Low", "Low", "Moderate", "High", "Very High"]cbar.set_yticks([1.4, 2.2, 3.0, 3.8, 4.6]) # Position the tickscbar.set_yticklabels(tick_labels, fontsize=4) # Tiny font for labelscbar.tick_params(labelsize=4) # Tiny tick labelsplt.tight_layout()plt.show()
As we can see, firearm mortality rates tend to be higher in parts of the South and West, while the Northeast and the far West Coast generally experience much lower rates.
Code
import pandas as pdimport matplotlib.pyplot as pltimport seaborn as sns# Load datagun_deaths = pd.read_csv("data/merged_data.csv")# Define bins and labelsbins = [0, 10, 15, 20, 25, float('inf')]labels = ["Very Low", "Low", "Moderate", "High", "Very High"]# Assign bins, ensuring no overlap (excluding the right endpoint)gun_deaths['mortality_bin'] = pd.cut( gun_deaths['firearm_mortality_by_state_2022'], bins=bins, labels=labels, right=False)# Count states in each bingun_death_distribution = gun_deaths['mortality_bin'].value_counts().sort_index().reset_index()gun_death_distribution.columns = ['mortality_bin', 'State_Count']# Define colorscolors = {"Very Low": "#68bb59","Low": "#acdf87","Moderate": "#fab733","High": "#ff6242","Very High": "#c61a09"}# Plottitle ="Firearm Mortality by State (2022)"subtitle ="Number of States in Each Mortality Range"plt.figure(figsize=(3.5, 2.5)) # Drastically reduced figure sizeax = sns.barplot( y='mortality_bin', x='State_Count', data=gun_death_distribution, order=labels, palette=colors, edgecolor='black', linewidth=0.2# Extremely thin border)# Add labels with tiny font and closer to barsfor index, value inenumerate(gun_death_distribution['State_Count']): ax.text(value +0.2, index, str(value), va='center', ha='left', fontsize=5, color='black')plt.title(f"{title}\n{subtitle}", fontsize=6, weight='normal') # Tiny titleplt.xlabel("")plt.ylabel("")plt.xticks(fontsize=4) # Tiny tick labelsplt.yticks(fontsize=4) # Tiny tick labelsplt.grid(axis='x', linestyle="--", alpha=0.5, linewidth=0.5) # Thinner grid linesplt.xlim(0, gun_death_distribution['State_Count'].max() +3) # Less padding# Tighter layoutplt.tight_layout()plt.show()
A majority of states (30 out of 51, or nearly 60%) fall into the Low or Moderate categories, suggesting that many parts of the country experience mid-range firearm mortality levels.
Only a small number of states (4) face very high firearm mortality, highlighting a concentrated burden in specific regions.
The Very Low mortality group is notably smaller, with just 8 states, mostly concentrated in the Northeast and along the Pacific Coast.
These findings suggest that regional factors, possibly including policy strength, socioeconomic conditions, or cultural differences, may play a role in shaping firearm mortality outcomes.
Code
# Create Map for Gun Policy Strength Scoresimport pandas as pdimport geopandas as gpdimport matplotlib.pyplot as pltfrom mpl_toolkits.axes_grid1 import make_axes_locatablefrom matplotlib.colors import LinearSegmentedColormap# Load dataset (same as before)gun_deaths = pd.read_csv("data/merged_data.csv")gun_deaths["State"] = gun_deaths["State"].str.lower()# Load U.S. state shapefile (without Alaska)states_map = gpd.read_file("data/cb_2018_us_state_20m/cb_2018_us_state_20m.shp")states_map["region"] = states_map["NAME"].str.lower()states_map = states_map[states_map["region"] !="alaska"] # Remove Alaska# Merge state map with gun deaths datamap_data = states_map.merge(gun_deaths, left_on="region", right_on="State", how="left")# Define policy strength binspolicy_bins = [0, 20, 40, 60, 80, 100]policy_labels = [1, 2, 3, 4, 5]# Assign states into policy categoriesmap_data["policy_score"] = pd.cut( map_data["gun_policy_strength"], bins=policy_bins, labels=policy_labels, right=True).astype(float)# Define color map (reversed from mortality map to show stronger policies as better)policy_colors = ["#c61a09", "#ff6242", "#fab733", "#acdf87", "#68bb59"]policy_cmap = LinearSegmentedColormap.from_list("policy_cmap", policy_colors)# Plot Gun Policy Strength by Statefig, ax = plt.subplots(figsize=(4.5, 3.5)) # Drastically reduced figure sizedivider = make_axes_locatable(ax)cax = divider.append_axes("right", size="2%", pad=0.05) # Minimal padding# Plot statesmap_data.plot(column="policy_score", cmap=policy_cmap, linewidth=0.1, edgecolor="black", ax=ax, legend=True, cax=cax, missing_kwds={"color": "lightgray"})# Customize plotax.set_title("Gun Policy Strength by State (0-100 Scale)", fontsize=6) # Tiny fontax.set_xticks([])ax.set_yticks([])ax.set_frame_on(False)# Add custom legend labelscbar = plt.gcf().axes[-1] # Get the colorbar axispolicy_tick_labels = ["Very Weak", "Weak", "Moderate", "Strong", "Very Strong"] # Shortened labelscbar.set_yticks([1.4, 2.2, 3.0, 3.8, 4.6]) # Position the tickscbar.set_yticklabels(policy_tick_labels, fontsize=4) # Tiny font for labelscbar.tick_params(labelsize=4) # Tiny tick labelsplt.tight_layout()plt.show()
Interestingly, a large number of states have very weak gun policy strength scores, which is somewhat surprising given the public health risks associated with firearm mortality. This suggests a potential disconnect between the severity of firearm-related outcomes and the strength of preventive legislation in many regions.
2 Motivation and Case Studies
Gun violence remains a critical public health issue in the U.S., with over 48,000 deaths in 2022—a rate of one every 11 minutes. The majority of these deaths are from suicide (55%) and homicide (41%), with additional cases stemming from unintentional injuries and police shootings(Johns Hopkins Center for Gun Violence Solutions, 2023). Over 200 people are treated daily for nonfatal firearm injuries.
Geographic Disparities: Gun deaths are concentrated in the South and Mountain West, where gun laws are weaker, and are lowest in the Northeast, where regulations are stricter (Johns Hopkins Center for Gun Violence Solutions, 2023).
Weaker law states face significantly higher firearm mortality, often exacerbated by interstate gun trafficking from lenient states—a phenomenon seen in Illinois and Maryland (Everytown for Gun Safety Support Fund, 2025).
The “iron pipeline” continues to funnel guns from states without background checks into more regulated regions, undermining local safety efforts (Everytown for Gun Safety Support Fund, 2025).
Conclusion
Gun policy strength is a critical factor in reducing firearm violence. States with stronger legislation—especially around storage, background checks, and public carry—see lower mortality rates, while weaker states face higher rates and spillover effects into neighboring regions.
3 Research Questions
Data Science Question: How do social, economic, and policy-related factors influence firearm mortality rates across U.S. states?
Subquestions:
Continuous Predictors (Linear Regression):
To what extent do continuous variables—such as incarceration rate, teen birth rate, and education attainment—predict variation in state-level firearm mortality?
Are certain socioeconomic indicators (e.g., poverty, unemployment, alcohol-related death rates) more strongly associated with higher firearm mortality?
Binary Predictors (Logistic Regression):
How do the presence or absence of specific gun control policies (e.g., assault weapon bans, background checks, domestic violence restrictions) relate to the likelihood of a state falling into a high vs. low mortality category?
Which policies appear most protective when controlling for other state-level characteristics?
Modeling Approach:
How do different linear regression models compare with eachother in terms of performance?
How do different logistic regression models compare with eachother in terms of performance?