censusapi
is a wrapper for the United States Census Bureau’s APIs. As of 2017 over 200 Census API endpoints are available, including Decennial Census, American Community Survey, Poverty Statistics, and Population Estimates APIs. This package is designed to let you get data from all of those APIs using the same main function—getCensus
—and the same syntax for each dataset.
censusapi
generally uses the APIs’ original parameter names so that users can easily transition between Census’s documentation and examples and this package. It also includes metadata functions to return data frames of available APIs, variables, and geographies.
To use the Census APIs, sign up for an API key. Then, if you’re on a non-shared computer, add your Census API key to your .Renviron profile and call it CENSUS_KEY. censusapi
will use it by default without any extra work on your part. Within R, run:
# Add key to .Renviron
Sys.setenv(CENSUS_KEY=YOURKEYHERE)
# Reload .Renviron
readRenviron("~/.Renviron")
# Check to see that the expected key is output in your R console
Sys.getenv("CENSUS_KEY")
In some instances you might not want to put your key in your .Renviron - for example, if you’re on a shared school computer. You can always choose to specify your key within getCensus
instead.
To get started, load the censusapi
library.
library(censusapi)
The Census APIs have over 200 endpoints, covering dozens of different datasets.
To see a current table of every available endpoint, run listCensusApis
:
apis <- listCensusApis()
View(apis)
This returns useful information about each endpoint, including
name
, which you’ll need to make your API call.
getCensus
The main function in censusapi
is getCensus
, which makes an API call to a given Census API and returns a data frame of results. Each API has slightly different parameters, but there are always a few required arguments:
name
: the name of the API as defined by the Census, like “acs5” or “timeseries/bds/firms”vintage
: the dataset year, generally required for non-timeseries APIsvars
: the list of variable names to getregion
: the geography level to return, like state or countySome APIs have additional required or optional arguments, like time
, monthly
, or period
. Check the specific documentation for your API to see what options are allowed.
Let’s walk through an example getting uninsured rates by income group using the Small Area Health Insurance Estimates API, which provides detailed annual state-level and county-level estimates of health insurance rates.
censusapi
includes a metadata function called listCensusMetadata
to get information about an API’s variable options and geography options. Let’s see what variables are available in the SAHIE API:
sahie_vars <- listCensusMetadata(name = "timeseries/healthins/sahie",
type = "variables")
head(sahie_vars)
name | label | concept | predicateType | group | limit | required |
---|---|---|---|---|---|---|
AGE_DESC | Age Category Description | Demographic ID | int | N/A | 0 | NA |
NUI_LB90 | Number Uninsured, Lower Bound for 90% Confidence Interval | Uncertainty Measure | int | N/A | 0 | NA |
STATE | State FIPS Code | Geographic ID | int | N/A | 0 | NA |
NIC_MOE | Number Insured, Margin of Error | Uncertainty Measure | int | N/A | 0 | NA |
NIPR_PT | Number in Demographic Group for Selected Income Range, Estimate | Estimate | int | N/A | 0 | NA |
RACECAT | Race Category | Demographic ID | int | N/A | 0 | default displayed |
We’ll use a few of these variables to get uninsured rates by income group:
IPRCAT
: Income Poverty Ratio CategoryIPR_DESC
: Income Poverty Ratio Category DescriptionPCTUI_PT
: Percent Uninsured in Demographic Group for Selected Income Range, EstimateNAME
: Name of the geography returned (e.g. state or county name)We can also use listCensusMetadata
to see which geographic levels we can get data for using the SAHIE API.
listCensusMetadata(name = "timeseries/healthins/sahie",
type = "geography")
name | geoLevelId | referenceDate | requires | wildcard | optionalWithWCFor |
---|---|---|---|---|---|
us | 010 | 2015-01-01 | NULL | NULL | NA |
county | 050 | 2015-01-01 | state | state | state |
state | 040 | 2015-01-01 | NULL | NULL | NA |
This API has three geographic levels: us
, county
within states, and state
.
First, using getCensus
, let’s get uninsured rate by income group at the national level for 2015.
getCensus(name = "timeseries/healthins/sahie",
vars = c("NAME", "IPRCAT", "IPR_DESC", "PCTUI_PT"),
region = "us:*",
time = 2016)
time | us | NAME | IPRCAT | IPR_DESC | PCTUI_PT |
---|---|---|---|---|---|
2016 | 1 | United States | 0 | All Incomes | 10.0 |
2016 | 1 | United States | 1 | <= 200% of Poverty | 17.0 |
2016 | 1 | United States | 2 | <= 250% of Poverty | 16.3 |
2016 | 1 | United States | 3 | <= 138% of Poverty | 17.4 |
2016 | 1 | United States | 4 | <= 400% of Poverty | 14.0 |
2016 | 1 | United States | 5 | 138% to 400% of Poverty | 12.1 |
We can also get this data at the state level for every state by changing region
to "state:*"
:
sahie_states <- getCensus(name = "timeseries/healthins/sahie",
vars = c("NAME", "IPRCAT", "IPR_DESC", "PCTUI_PT"),
region = "state:*",
time = 2016)
head(sahie_states)
time | state | NAME | IPRCAT | IPR_DESC | PCTUI_PT |
---|---|---|---|---|---|
2016 | 01 | Alabama | 0 | All Incomes | 10.8 |
2016 | 01 | Alabama | 1 | <= 200% of Poverty | 18.1 |
2016 | 01 | Alabama | 2 | <= 250% of Poverty | 17.0 |
2016 | 01 | Alabama | 3 | <= 138% of Poverty | 19.2 |
2016 | 01 | Alabama | 4 | <= 400% of Poverty | 14.2 |
2016 | 01 | Alabama | 5 | 138% to 400% of Poverty | 11.0 |
Finally, we can get county-level data. The geography metadata showed that we can choose to get county-level data within states. We’ll use region
to specify county-level results and regionin
to request data for Alabama and Alaska.
sahie_counties <- getCensus(name = "timeseries/healthins/sahie",
vars = c("NAME", "IPRCAT", "IPR_DESC", "PCTUI_PT"),
region = "county:*",
regionin = "state:1,2",
time = 2016)
head(sahie_counties, n=12L)
time | state | county | NAME | IPRCAT | IPR_DESC | PCTUI_PT |
---|---|---|---|---|---|---|
2016 | 01 | 001 | Autauga County, AL | 0 | All Incomes | 8.5 |
2016 | 01 | 001 | Autauga County, AL | 1 | <= 200% of Poverty | 15.9 |
2016 | 01 | 001 | Autauga County, AL | 2 | <= 250% of Poverty | 14.7 |
2016 | 01 | 001 | Autauga County, AL | 3 | <= 138% of Poverty | 17.2 |
2016 | 01 | 001 | Autauga County, AL | 4 | <= 400% of Poverty | 11.5 |
2016 | 01 | 001 | Autauga County, AL | 5 | 138% to 400% of Poverty | 9.0 |
2016 | 01 | 003 | Baldwin County, AL | 0 | All Incomes | 10.7 |
2016 | 01 | 003 | Baldwin County, AL | 1 | <= 200% of Poverty | 20.0 |
2016 | 01 | 003 | Baldwin County, AL | 2 | <= 250% of Poverty | 18.4 |
2016 | 01 | 003 | Baldwin County, AL | 3 | <= 138% of Poverty | 21.2 |
2016 | 01 | 003 | Baldwin County, AL | 4 | <= 400% of Poverty | 14.9 |
2016 | 01 | 003 | Baldwin County, AL | 5 | 138% to 400% of Poverty | 11.8 |
Because the SAHIE API is a timeseries (as indicated in its name), we can get multiple years of data at once using the time
argument.
sahie_years <- getCensus(name = "timeseries/healthins/sahie",
vars = c("NAME", "PCTUI_PT"),
region = "state:1",
time = "from 2006 to 2016")
head(sahie_years)
time | state | NAME | PCTUI_PT |
---|---|---|---|
2006 | 01 | Alabama | 15.7 |
2007 | 01 | Alabama | 14.6 |
2008 | 01 | Alabama | 15.3 |
2009 | 01 | Alabama | 15.8 |
2010 | 01 | Alabama | 16.9 |
2011 | 01 | Alabama | 16.6 |
The American Community Survey (ACS) APIs include estimates (variable names ending in “E”), annotations, margins of error, and statistical significance, depending on the data set. Read more on ACS variable types and annotation symbol meanings on the Census website.
You can retrieve these annotation variables manually, by specifying a list of variables. We’ll get the estimate, margin of error and annotations for median household income in the past 12 months for Census tracts in Alaska.
acs_income <- getCensus(name = "acs/acs5",
vintage = 2016,
vars = c("NAME", "B19013_001E", "B19013_001EA", "B19013_001M", "B19013_001MA"),
region = "tract:*",
regionin = "state:02")
head(acs_income)
state | county | tract | NAME | B19013_001E | B19013_001EA | B19013_001M | B19013_001MA |
---|---|---|---|---|---|---|---|
02 | 013 | 000100 | Census Tract 1, Aleutians East Borough, Alaska | 65926 | NA | 2430 | NA |
02 | 016 | 000100 | Census Tract 1, Aleutians West Census Area, Alaska | 59167 | NA | 4680 | NA |
02 | 016 | 000200 | Census Tract 2, Aleutians West Census Area, Alaska | 92083 | NA | 4791 | NA |
02 | 020 | 000101 | Census Tract 1.01, Anchorage Municipality, Alaska | 101420 | NA | 15802 | NA |
02 | 020 | 000102 | Census Tract 1.02, Anchorage Municipality, Alaska | 76690 | NA | 14441 | NA |
02 | 020 | 000201 | Census Tract 2.01, Anchorage Municipality, Alaska | 93636 | NA | 17769 | NA |
You can also retrieve also estimates and annotations for a group of variables in one command. Here’s the group
call for that same table, B19013.
acs_income_group <- getCensus(name = "acs/acs5",
vintage = 2016,
vars = c("NAME", "group(B19013)"),
region = "tract:*",
regionin = "state:02")
head(acs_income_group)
state | county | tract | NAME | B19013_001E | B19013_001M | B19013_001M_1 | B19013_001EA | B19013_001MA |
---|---|---|---|---|---|---|---|---|
02 | 013 | 000100 | Census Tract 1, Aleutians East Borough, Alaska | 65926 | 2430 | 2430 | NA | NA |
02 | 016 | 000100 | Census Tract 1, Aleutians West Census Area, Alaska | 59167 | 4680 | 4680 | NA | NA |
02 | 016 | 000200 | Census Tract 2, Aleutians West Census Area, Alaska | 92083 | 4791 | 4791 | NA | NA |
02 | 020 | 000101 | Census Tract 1.01, Anchorage Municipality, Alaska | 101420 | 15802 | 15802 | NA | NA |
02 | 020 | 000102 | Census Tract 1.02, Anchorage Municipality, Alaska | 76690 | 14441 | 14441 | NA | NA |
02 | 020 | 000201 | Census Tract 2.01, Anchorage Municipality, Alaska | 93636 | 17769 | 17769 | NA | NA |
Some variable groups contain many related variables and their associated annotations. As an example, we’ll get table B17020, poverty status by age.
acs_poverty_group <- getCensus(name = "acs/acs5",
vintage = 2016,
vars = c("NAME", "group(B17020)"),
region = "tract:*",
regionin = "state:02")
# List column names
colnames(acs_poverty_group)
#> [1] "state" "county" "tract" "NAME"
#> [5] "B17020_001E" "B17020_001M" "B17020_002E" "B17020_002M"
#> [9] "B17020_003E" "B17020_003M" "B17020_004E" "B17020_004M"
#> [13] "B17020_005E" "B17020_005M" "B17020_006E" "B17020_006M"
#> [17] "B17020_007E" "B17020_007M" "B17020_008E" "B17020_008M"
#> [21] "B17020_009E" "B17020_009M" "B17020_010E" "B17020_010M"
#> [25] "B17020_011E" "B17020_011M" "B17020_012E" "B17020_012M"
#> [29] "B17020_013E" "B17020_013M" "B17020_014E" "B17020_014M"
#> [33] "B17020_015E" "B17020_015M" "B17020_016E" "B17020_016M"
#> [37] "B17020_017E" "B17020_017M" "B17020_001M_1" "B17020_001EA"
#> [41] "B17020_001MA" "B17020_002M_1" "B17020_002EA" "B17020_002MA"
#> [45] "B17020_003M_1" "B17020_003EA" "B17020_003MA" "B17020_004M_1"
#> [49] "B17020_004EA" "B17020_004MA" "B17020_005M_1" "B17020_005EA"
#> [53] "B17020_005MA" "B17020_006M_1" "B17020_006EA" "B17020_006MA"
#> [57] "B17020_007M_1" "B17020_007EA" "B17020_007MA" "B17020_008M_1"
#> [61] "B17020_008EA" "B17020_008MA" "B17020_009M_1" "B17020_009EA"
#> [65] "B17020_009MA" "B17020_010M_1" "B17020_010EA" "B17020_010MA"
#> [69] "B17020_011M_1" "B17020_011EA" "B17020_011MA" "B17020_012M_1"
#> [73] "B17020_012EA" "B17020_012MA" "B17020_013M_1" "B17020_013EA"
#> [77] "B17020_013MA" "B17020_014M_1" "B17020_014EA" "B17020_014MA"
#> [81] "B17020_015M_1" "B17020_015EA" "B17020_015MA" "B17020_016M_1"
#> [85] "B17020_016EA" "B17020_016MA" "B17020_017M_1" "B17020_017EA"
#> [89] "B17020_017MA"
Some geographies, particularly Census tracts and blocks, need to be specified within larger geographies like states and counties. This varies by API endpoint, so make sure to read the documentation for your specific API and run listCensusMetadata
to see the available geographies.
You may want to get get data for many geographies that require a parent geography. For example, tract-level data from the 1990 Decennial Census can only be requested from one state at a time.
In this example, we use the built in fips
list of state FIPS codes to request tract-level data from each state and join into a single data frame.
fips
#> [1] 1 2 4 5 6 8 9 10 11 12 13 15 16 17 18 19 20 21 22 23 24 25 26
#> [24] 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 44 45 46 47 48 49 50
#> [47] 51 53 54 55 56
tracts <- NULL
for (f in fips) {
stateget <- paste("state:", f, sep="")
temp <- getCensus(name = "sf3",
vintage = 1990,
vars = c("P0070001", "P0070002", "P114A001"),
region = "tract:*",
regionin = stateget)
tracts <- rbind(tracts, temp)
}
head(tracts)
state | county | tract | P0070001 | P0070002 | P114A001 |
---|---|---|---|---|---|
01 | 001 | 020100 | 944 | 917 | 11663 |
01 | 001 | 020200 | 917 | 1060 | 8555 |
01 | 001 | 020300 | 1451 | 1518 | 11782 |
01 | 001 | 020400 | 2166 | 2223 | 15323 |
01 | 001 | 020500 | 1604 | 1582 | 14522 |
01 | 001 | 020600 | 1784 | 1661 | 10630 |
The regionin
argument of getCensus
can also be used with a string of nested geographies, as shown below.
The 2010 Decennial Census summary file 1 requires you to specify a state and county to retrieve block-level data. Use region
to request block level data, and regionin
to specify the desired state and county.
data2010 <- getCensus(name = "dec/sf1",
vintage = 2010,
vars = "P001001",
region = "block:*",
regionin = "state:36+county:027")
head(data2010)
state | county | tract | block | P001001 |
---|---|---|---|---|
36 | 027 | 010000 | 1000 | 31 |
36 | 027 | 010000 | 1011 | 17 |
36 | 027 | 010000 | 1028 | 41 |
36 | 027 | 010000 | 1001 | 0 |
36 | 027 | 010000 | 1031 | 0 |
36 | 027 | 010000 | 1002 | 4 |
For the 2000 Decennial Census summary file 1, tract is also required to retrieve block-level data. This example requests data for all blocks within Census tract 010000 in county 027 of state 36.
data2000 <- getCensus(name = "sf1",
vintage = 2000,
vars = "P001001",
region = "block:*",
regionin = "state:36+county:027+tract:010000")
head(data2000)
state | county | tract | block | P001001 |
---|---|---|---|---|
36 | 027 | 010000 | 1000 | 18 |
36 | 027 | 010000 | 1001 | 26 |
36 | 027 | 010000 | 1002 | 59 |
36 | 027 | 010000 | 1003 | 67 |
36 | 027 | 010000 | 1004 | 52 |
36 | 027 | 010000 | 1005 | 116 |
This product uses the Census Bureau Data API but is not endorsed or certified by the Census Bureau.