Building a Social Determinants of Health (SDOH) dataset using Fidap

Ashish Singal
May 5, 2022

The health of any given population lies at the intersection of various factors. Can we triangulate some of these factors and coalesce them into a proxy measure?

In some of our previous articles, we talked about how we could harness the power of Fidap and its datasets to power your analytics project. Today, we will extend this analysis to community health.

What are Social Determinants of Health

Healthcare providers and researchers have known for a while that the health outcomes of any given population are affected by a variety of socio-economic and environmental factors.

WHO defines social determinants of health (SDOH) as “the non-medical factors that influence health outcomes. They are the conditions in which people are born, grow, work, live, and age, and the wider set of forces and systems shaping the conditions of daily life.” There’s lots of research from the US Department of Health and Human Services and from the CDC that dives deeper into this subject.

Factors impacting social determinants of health

Generally, here are some factors that fall under SDOH that we will take a closer look at -

  1. Demographic information — family structure
  2. Education
  3. Food deserts
  4. Crime
  5. Infrastructure
  6. Climate and the environment

With this in mind, our challenge today is to pull together a set of indicators using data hosted on Fidap that can approximate these determinants of health at the zip code level for the city of Chicago, IL.

Zip Codes

We are quite lucky in that Chicago’s zip codes all start with 606. We should try to identify all of Chicago’s zip codes first. While we are at it, calculate their service area.

In order to do this, we’ll use data from the Geo Boundaries dataset — made available by Google BigQuery Public Datasets and indexed by Fidap.

Specifically, we are interested in the zip code table

The zip code boundaries table on Fidap

Demography — Family

No socio-economic analysis is complete without some mention of underlying demographic variables such as population count by age, race, gender, and other household indicators. We will run three different queries, centered on age, race, and household family structure in each zip code.

To do this, we can grab the ACS dataset from the Census Bureau indexed on Fidap (from BigQuery Public Datasets) —

We are most interested in the 2018 data by zip code (ZCTA). Here’s the query —

A breakdown of the population by age-group can help us identify areas with a high concentration of economic dependents. Meanwhile, race can show us the prevalence of entrenched issues such as housing segregation. Finally, family and household structure can tell us more about the stability of households, and proxy the incidence of poverty through food stamps eligibility.


Education is obviously a very important determinant of public health. All things held equal, a more educated population is more likely to be healthier. With this in mind, we will look at the educational attainment of the adult population in each zip code.

Again, we can use the same ACS census data that we did above, except grab columns relevant to educational attainment.

Food Deserts

Food deserts are a real problem in America’s cities. In such areas, communities are deprived of the ability to access sources of fresh food. This forces communities to rely on less desirable sources for nutrition that can adversely affect their health outcomes. Unlike the previous categories where we mainly relied on Census data, the US Census Bureau does not log this particular data point. However, we can still obtain a count of the number of grocery stores and supermarkets within a particular zip code and scale it to the population of each zip code.

In order to do this, we will look at the OpenStreetMap dataset indexed by Fidap.

Fidap’s OpenStreetMaps dataset

We also use some of BigQuery’s specialized GIS functions.

Of course, we have to do a bit more work with that table. With a bit of Python and the demographic tables from previous queries, we are able to count the number of supermarkets per 10,000 residents in each zip code.


We are also interested in the prevalence of crime in each zip code. We can pull data from Chicago Police Department’s open data source that is connected to Fidap, layer over the boundaries of zip codes and count the number of crimes logged within each zip code.

Fidap’s Chicago Crime dataset


We aren’t done yet!

Here we look at how accessible each zip code is in terms of transportation infrastructure. We define transportation structure as the number of public transit stops within each zip code.

We are interested in the quality of the housing stock as approximated by the median age of the structure. Historical appeal aside, we believe that older buildings tend to be more hazardous for reasons such as the prevalence of lead pipes and asbestos. Remember Flint, MI and lead-contaminated water? If you want to find out more about how we pulled that from the Census dataset, be sure to check it out in our notebook on Fidap.

Climate and the Environment

Environmental factors clearly play a part in determining the health of residents. Here, we look at Chicago’s air quality, which can be determined by PM2.5 levels. PM2.5 refers to the concentration of particulate matter in the air that is smaller than 2.5 microns in diameter. Prolonged exposure to these particles can cause them to enter deep into the lungs and bloodstream. The EPA has two weather stations in Chicago that provide such measurements. Each zip code will take the PM2.5 reading of the closest measuring station. Again, you can check this out in our notebook.

At the same time, we can also count the number of severe storms within Chicago, or obtain the centroid of each incident, group, and then count by zip code.

Final Assembly

By now, we have a very comprehensive notebook (and dataset) describing the socio-economic and environmental conditions of each zip code in Chicago which we believe to have an impact on health outcomes. This is by no means exhaustive but we believe it is a start to help healthcare providers, medical practitioners, and policymakers understand the complex interplay of socio-economic conditions and community health.

Our challenge at Fidap is for you to do the same for the city you live in using data hosted on our platform!

We did it for Chicago, Il. Can you do it for another city?

Ashish Singal
Ash is the founder / CEO of Fidap. Previously, he was at Google and Bloomberg. He loves chocolate, puppies, and clean data.

Our latest news

Find our company news, product announcements, and in depth data analysis on our blog.

Ready to get started?

Start for Free