Bioactivity API

Center for Computational Toxicology and Exposure

Introduction

In this vignette, CTX Bioactivity API will be explored.

NOTE: Please see the introductory vignette for an overview of the ctxR package and initial set up instruction with API key storage.

Data provided by the API’s Bioactivity endpoints are sourced from ToxCast’s invitrodb.

US EPA’s Toxicity Forecaster (ToxCast) program makes in vitro medium- and high-throughput screening assay data publicly available for prioritization and hazard characterization of thousands of chemicals.

The ToxCast pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores ToxCast data to populate its linked MySQL database, invitrodb . These assays comprise Tier 2-3 of the new Computational Toxicology Blueprint, and employ automated chemical screening technologies, to evaluate the effects of chemical exposure on living cells and biological macromolecules, such as proteins (Thomas et al., 2019). More information on the ToxCast program can be found at https://www.epa.gov/comptox-tools/toxicity-forecasting-toxcast.

This flexible analysis pipeline is capable of efficiently processing and storing large volumes of data. The diverse data, received in heterogeneous formats from numerous vendors, are transformed to a standard computable format and loaded into the invitrodb database by vendor-specific R scripts. Once data is loaded into the database, ToxCast utilizes generalized processing functions provided in this package to process, normalize, model, qualify, and visualize the data.

Figure 1: Conceptual overview of the ToxCast Pipeline functionality
Figure 1: Conceptual overview of the ToxCast Pipeline functionality

The Bioactivity API endpoints are organized into two different resources, “Assay” and “Data”. “Assay” resource endpoints provide assay metadata for specific or all invitrodb ‘aeids’ (assay endpoint ids). These include annotations from invitrodb’s assay, assay_component, assay_component_endpoint, assay_list, assay_source, and gene tables, all returned in a by-aeid format.

“Data” resource endpoints are split into summary data (by ‘aeid’) and bioactivity data by ‘m4id’ (i.e. both ‘aeid’ and ‘spid’). The summary endpoint returns the number of active hits and total multi- and single-concentration chemicals tested for specific ‘aeids’. The other endpoints return chemical information, level 3 concentration-response values, level 4 fit parameters, level 5 hit parameters, and level 6 flags for individual chemicals tested for given ‘AEIDs’, ‘m4ids’, ‘SPIDs’, or ‘DTXSIDs’.

Regular ToxCast/invitrodb users may find it easier to use tcpl, which has integrated ctxR’s bioactivity functions to make the API data retrievable in a familiar format. See the tcpl vignette regarding data retrieval via API for more information.

Functions

Several ctxR functions are used to access the CTX Bioactivity API data.

Bioactivity Assay Resource

Specific assays may be searched as well as all available assays that have data using two different functions.

Get annotation by aeid

get_annotation_by_aeid() retrieves annotation for a specific assay endpoint id (aeid).

assay <- get_annotation_by_aeid(AEID = "891")

get_annotation_by_aeid_batch() retrieves annotation for a list (or vector) of assay endpoint ids (aeids).

assays <- get_annotation_by_aeid_batch(AEID = c(759,700,891))
# return is in list form by aeid, convert to table for output
assays <- data.table::rbindlist(assays)
printFormattedTable(assays, c(4, 18, 19, 33, 51)) # printed using custom formatted table

Get all assay annotations

get_all_assays() retrieves all annotations for all assays available.

all_assays <- get_all_assays()

Bioactivity Data Resource

There are several resources for retrieving bioactivity data associated with a variety of identifier types (e.g., DTXSID, aeid) that are available to the user.

Get summary data

get_bioactivity_summary() retrieves a summary of the number of active hits compared to the total number tested for both multiple and single concentration by aeid.

summary <- get_bioactivity_summary(AEID = "891")

get_bioactivity_summary_batch() retrieves a summary for a list (or vector) of assay endpoint ids (aeids).

summary <- get_bioactivity_summary_batch(AEID = c(759,700,891))
summary <- data.table::rbindlist(summary)

Get data

get_bioactivity_details() can retrieve all available multiple concentration data by assay endpoint id (aeid), sample id (spid), Level 4 ID (m4id), or chemical DTXSID. Returned is chemical information, level 3 concentration-response values, level 4 fit parameters, level 5 hit parameters, and level 6 flags for individual chemicals tested. An example for each request parameter is provided below:

# By spid
spid_data <- get_bioactivity_details(SPID = 'TP0000904H05')
# By m4id
m4id_data <- get_bioactivity_details(m4id = 739695)
# By DTXSID
dtxsid_data <- get_bioactivity_details(DTXSID = "DTXSID30944145")
# By aeid
aeid_data <- get_bioactivity_details(AEID = 704)

Similar to the other _batch functions, get_bioactivity_details_batch() retrieves data for a list (or vector) of assay endpoint ids (aeid), sample ids (spid), Level 4 IDs (m4id), or chemical DTXSIDs.

aeid_data_batch <- get_bioactivity_details_batch(AEID = c(759,700,891))
aeid_data_batch <- data.table::rbindlist(aeid_data_batch, fill = TRUE)

Conclusion

In this vignette, a variety of functions that access different types of data found in the Bioactivity endpoints of the CTX APIs were listed. We encourage the reader to explore the data accessible through these endpoints work with it to get a better understanding of what data is available. Additionally, experienced ToxCast/invitrodb users may find it easier to continue to use tcpl, which has integrated ctxR’s bioactivity functions to make the API data retrievable in a familiar format.