Title: | Download and Extract Data from US EPA's ECOTOX Database |
---|---|
Description: | The US EPA ECOTOX database is a freely available database with a treasure of aquatic and terrestrial ecotoxicological data. As the online search interface doesn't come with an API, this package provides the means to easily access and search the database in R. To this end, all raw tables are downloaded from the EPA website and stored in a local SQLite database <doi:10.1016/j.chemosphere.2024.143078>. |
Authors: | Pepijn de Vries [aut, cre, dtc] (0000-0002-7961-6646) |
Maintainer: | Pepijn de Vries <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.1.1 |
Built: | 2024-11-01 05:40:03 UTC |
Source: | https://github.com/pepijn-devries/ECOTOXr |
This function is called automatically after download_ecotox_data()
. The database
files can also be downloaded manually from the EPA website from which a local
database can be build using this function.
build_ecotox_sqlite(source, destination = get_ecotox_path(), write_log = TRUE)
build_ecotox_sqlite(source, destination = get_ecotox_path(), write_log = TRUE)
source |
A |
destination |
A |
write_log |
A |
Raw data downloaded from the EPA website is in itself not very efficient to work with in R. The files are large and would put a large strain on R when loading completely into the system's memory. Instead use this function to build an SQLite database from the tables. That way, the data can be queried without having to load it all into memory.
EPA provides the raw table from the ECOTOX database as text files with pipe-characters ('|') as table column separators. Although not documented, the tables appear not to contain comment or quotation characters. There are records containing the reserved pipe-character that will confuse the table parser. For these records, the pipe-character is replaced with a dash character ('-').
In addition, while reading the tables as text files, this package attempts to decode the text as UTF8. Unfortunately, this process appears to be platform-dependent, and may therefore result in different end-results on different platforms. This problem only seems to occur for characters that are listed as 'control characters' under UTF8. This will have consequences for reproducibility, but only if you build search queries that look for such special characters. It is therefore advised to stick to common (non-accented) alpha-numerical characters in your searches, for the sake of reproducibility.
Use 'suppressMessages()
' to suppress the progress report.
Returns NULL
invisibly.
Pepijn de Vries
## Not run: ## This example will only work properly if 'dir' points to an existing directory ## with the raw tables from the ECOTOX database. This function will be called ## automatically after a call to 'download_ecotox_data()'. test <- check_ecotox_availability() if (test) { files <- attributes(test)$files[1,] dir <- gsub(".sqlite", "", files$database, fixed = T) path <- files$path if (dir.exists(file.path(path, dir))) { ## This will build the database in your temp directory: build_ecotox_sqlite(source = file.path(path, dir), destination = tempdir()) } } ## End(Not run)
## Not run: ## This example will only work properly if 'dir' points to an existing directory ## with the raw tables from the ECOTOX database. This function will be called ## automatically after a call to 'download_ecotox_data()'. test <- check_ecotox_availability() if (test) { files <- attributes(test)$files[1,] dir <- gsub(".sqlite", "", files$database, fixed = T) path <- files$path if (dir.exists(file.path(path, dir))) { ## This will build the database in your temp directory: build_ecotox_sqlite(source = file.path(path, dir), destination = tempdir()) } } ## End(Not run)
Functions for handling chemical abstract service (CAS) registry numbers
cas(length = 0L) is.cas(x) as.cas(x) ## S3 method for class 'cas' x[[i]] ## S3 method for class 'cas' x[i] ## S3 replacement method for class 'cas' x[[i]] <- value ## S3 replacement method for class 'cas' x[i] <- value ## S3 method for class 'cas' format(x, hyphenate = TRUE, ...) ## S3 method for class 'cas' as.character(x, ...) show.cas(x, ...) ## S3 method for class 'cas' print(x, ...) ## S3 method for class 'cas' as.list(x, ...) ## S3 method for class 'cas' as.double(x, ...) ## S3 method for class 'cas' as.integer(x, ...) ## S3 method for class 'cas' c(...) ## S3 method for class 'cas' as.data.frame(...)
cas(length = 0L) is.cas(x) as.cas(x) ## S3 method for class 'cas' x[[i]] ## S3 method for class 'cas' x[i] ## S3 replacement method for class 'cas' x[[i]] <- value ## S3 replacement method for class 'cas' x[i] <- value ## S3 method for class 'cas' format(x, hyphenate = TRUE, ...) ## S3 method for class 'cas' as.character(x, ...) show.cas(x, ...) ## S3 method for class 'cas' print(x, ...) ## S3 method for class 'cas' as.list(x, ...) ## S3 method for class 'cas' as.double(x, ...) ## S3 method for class 'cas' as.integer(x, ...) ## S3 method for class 'cas' c(...) ## S3 method for class 'cas' as.data.frame(...)
length |
A non-negative |
x |
Object from which data needs to be extracted or replaced, or needs to be coerced into a specific
format. For nearly all of the functions documented here, this needs to be an object of the S3 class 'cas',
which can be created with |
i |
Index specifying element(s) to extract or replace. See also |
value |
A replacement value, can be anything that can be converted into an S3 cas-class object with |
hyphenate |
A |
... |
Arguments passed to other functions |
In the database CAS registry numbers are stored
as text (type character
). As CAS numbers can consist of a maximum of 10 digits (plus two hyphens) this means
that each CAS number can consume up to 12 bytes of memory or disk space. By storing the data numerically, only
5 bytes are required. These functions provide the means to handle CAS registry numbers and coerce from and to
different formats and types.
Functions cas
, c
and as.cas
return S3 class 'cas' objects. Coercion functions
(starting with 'as') return the object as specified by their respective function names (i.e., integer
,
double
, character
, list
and data.frame
). The show.cas
and print
functions
also return formatted charater
s. The function is.cas
will return a single logical
value,
indicating whether x
is a valid S3 cas-class object. The square brackets return the selected index/indices,
or the vector
of cas objects where the selected elements are replaced by value
.
Pepijn de Vries
## This will generate a vector of cas objects containing 10 ## fictive (0-00-0), but valid registry numbers: cas(10) ## This is a cas-object: is.cas(cas(0L)) ## This is not a cas-object: is.cas(0L) ## Three different ways of creating a cas object from ## Benzene's CAS registry number (the result is the same) as.cas("71-43-2") as.cas("71432") as.cas(71432L) ## This is one way of creating a vector with multiple CAS registry numbers: cas_data <- as.cas(c("64175", "71432", "58082")) ## This is how you select a specific element(s) from the vector: cas_data[2:3] cas_data[[2]] ## You can also replace specific elements in the vector: cas_data[1] <- "7440-23-5" cas_data[[2]] <- "129-00-0" ## You can format CAS numbers with or without hyphens: format(cas_data, TRUE) format(cas_data, FALSE) ## The same can be achieved using as.character as.character(cas_data, TRUE) as.character(cas_data, FALSE) ## There are also show and print methods available: show(cas_data) print(cas_data) ## Numeric values can be obtained from CAS using as.numeric, as.double or as.integer as.numeric(cas_data) ## Be careful, however. Some CAS numbers cannot be represented by R's 32 bit integers ## and will produce NA's. This will work OK: huge_cas <- as.cas("9999999-99-5") ## Not run: ## This will not: as.integer(huge_cas) ## End(Not run) ## The trick applied by this package is that the final ## validation digit is stored separately as attribute: unclass(huge_cas) ## This is how cas objects can be concatenated: cas_data <- c(huge_cas, cas_data) ## This will create a data.frame as.data.frame(cas_data) ## This will create a list: as.list(cas_data)
## This will generate a vector of cas objects containing 10 ## fictive (0-00-0), but valid registry numbers: cas(10) ## This is a cas-object: is.cas(cas(0L)) ## This is not a cas-object: is.cas(0L) ## Three different ways of creating a cas object from ## Benzene's CAS registry number (the result is the same) as.cas("71-43-2") as.cas("71432") as.cas(71432L) ## This is one way of creating a vector with multiple CAS registry numbers: cas_data <- as.cas(c("64175", "71432", "58082")) ## This is how you select a specific element(s) from the vector: cas_data[2:3] cas_data[[2]] ## You can also replace specific elements in the vector: cas_data[1] <- "7440-23-5" cas_data[[2]] <- "129-00-0" ## You can format CAS numbers with or without hyphens: format(cas_data, TRUE) format(cas_data, FALSE) ## The same can be achieved using as.character as.character(cas_data, TRUE) as.character(cas_data, FALSE) ## There are also show and print methods available: show(cas_data) print(cas_data) ## Numeric values can be obtained from CAS using as.numeric, as.double or as.integer as.numeric(cas_data) ## Be careful, however. Some CAS numbers cannot be represented by R's 32 bit integers ## and will produce NA's. This will work OK: huge_cas <- as.cas("9999999-99-5") ## Not run: ## This will not: as.integer(huge_cas) ## End(Not run) ## The trick applied by this package is that the final ## validation digit is stored separately as attribute: unclass(huge_cas) ## This is how cas objects can be concatenated: cas_data <- c(huge_cas, cas_data) ## This will create a data.frame as.data.frame(cas_data) ## This will create a list: as.list(cas_data)
Tests whether a local copy of the US EPA ECOTOX database exists in
get_ecotox_path()
.
check_ecotox_availability(target = get_ecotox_path())
check_ecotox_availability(target = get_ecotox_path())
target |
A |
When arguments are omitted, this function will look in the default directory (get_ecotox_path()
).
However, it is possible to build a database file elsewhere if necessary.
Returns a logical
value indicating whether a copy of the database exists. It also returns
a files
attribute that lists which copies of the database are found.
Pepijn de Vries
check_ecotox_availability()
check_ecotox_availability()
Performs some simple tests to check whether the locally built database is not corrupted.
check_ecotox_build(path = get_ecotox_path(), version, ...)
check_ecotox_build(path = get_ecotox_path(), version, ...)
path |
A |
version |
A |
... |
Arguments that are passed to |
For now this function tests if all expected tables are present in the locally built
database. Note that in later release of the database some tables were added. Therefore
for older builds this function might return FALSE
whereas it is actually just fine
(just out-dated).
Furthermore, this function tests if all tables contain one or more records. Obviously, this is no guarantee that the database is valid, but it is a start.
More tests may be added in future releases.
Returns an indicative logical value whether the database is not corrupted.
TRUE
indicates the database is most likely OK. FALSE
indicates that something might
be wrong. Additional messages (when FALSE
) are included as attributes containing hints
on the outcoming of the tests. See also the 'details' section.
Pepijn de Vries
## Not run: check_ecotox_build() ## End(Not run)
## Not run: check_ecotox_build() ## End(Not run)
Checks the version of the database available on-line
from the EPA against the specified version (latest by default) of the database build
locally. Returns TRUE
when they are the same.
check_ecotox_version(path = get_ecotox_path(), version, verbose = TRUE)
check_ecotox_version(path = get_ecotox_path(), version, verbose = TRUE)
path |
When you have a copy of the database somewhere other than the default
directory ( |
version |
A |
verbose |
A |
Returns a logical
value invisibly indicating whether the locally build
is up to date with the latest release by the EPA.
Pepijn de Vries
## Not run: check_ecotox_version() ## End(Not run)
## Not run: check_ecotox_version() ## End(Not run)
Cite the downloaded copy of the ECOTOX database and this package
(citation("ECOTOXr")
) for reproducible results.
cite_ecotox(path = get_ecotox_path(), version)
cite_ecotox(path = get_ecotox_path(), version)
path |
A |
version |
A |
When you download a copy of the EPA ECOTOX database using download_ecotox_data()
,
a BibTex file is stored that registers the database release version and the access (= download) date. Use this
function to obtain a citation to that specific download.
In order for others to reproduce your results, it is key to cite the data source as accurately as possible.
Returns a vector
of bibentry()
's, containing a reference to the downloaded database
and this package.
Pepijn de Vries
## Not run: ## In order to cite downloaded database and this package: cite_ecotox() ## End(Not run)
## Not run: ## In order to cite downloaded database and this package: cite_ecotox() ## End(Not run)
Wrappers for dbConnect()
and
dbDisconnect()
methods.
dbConnectEcotox(path = get_ecotox_path(), version, ...) dbDisconnectEcotox(conn, ...)
dbConnectEcotox(path = get_ecotox_path(), version, ...) dbDisconnectEcotox(conn, ...)
path |
A |
version |
A |
... |
Arguments that are passed to |
conn |
An open connection to the ECOTOX database that needs to be closed. |
Open or close a connection to the local ECOTOX database. These functions are only required when you want
to send custom queries to the database. For most searches the search_ecotox()
function
will be adequate.
A database connection in the form of a DBI::DBIConnection-class()
object.
The object is tagged with: a time stamp; the package version used; and the
file path of the SQLite database used in the connection. These tags are added as attributes
to the object.
Pepijn de Vries
## Not run: ## This will only work when a copy of the database exists: con <- dbConnectEcotox() ## check if the connection works by listing the tables in the database: dbListTables(con) ## Let's be a good boy/girl and close the connection to the database when we're done: dbDisconnectEcotox(con) ## End(Not run)
## Not run: ## This will only work when a copy of the database exists: con <- dbConnectEcotox() ## check if the connection works by listing the tables in the database: dbListTables(con) ## Let's be a good boy/girl and close the connection to the database when we're done: dbDisconnectEcotox(con) ## End(Not run)
In order for this package to fully function, a local copy of the ECOTOX database needs to be build. This function will download the required data and build the database.
download_ecotox_data( target = get_ecotox_path(), write_log = TRUE, ask = TRUE, verify_ssl = getOption("ECOTOXr_verify_ssl"), ... )
download_ecotox_data( target = get_ecotox_path(), write_log = TRUE, ask = TRUE, verify_ssl = getOption("ECOTOXr_verify_ssl"), ... )
target |
Target directory where the files will be downloaded and the database compiled. Default is
|
write_log |
A |
ask |
There are several steps in which files are (potentially) overwritten or deleted. In those cases
the user is asked on the command line what to do in those cases. Set this parameter to |
verify_ssl |
When set to |
... |
Arguments passed on to |
This function will attempt to find the latest download url for the ECOTOX database from the
EPA website (see get_ecotox_url()
).
When found it will attempt to download the zipped archive containing all required data. This data is then
extracted and a local copy of the database is build.
Use 'suppressMessages()
' to suppress the progress report.
Returns NULL
invisibly.
On some machines this function fails to connect to the database download URL from the
EPA website due to missing
SSL certificates. Unfortunately, there is no easy fix for this in this package. A work around is to download and
unzip the file manually using a different machine or browser that is less strict with SSL certificates. You can
then call build_ecotox_sqlite()
and point the source
location to the manually extracted zip
archive. For this purpose get_ecotox_url()
can be used. Alternatively, one could try to call download_ecotox_data()
by setting verify_ssl = FALSE
; but only do so when you trust the download URL from get_ecotox_URL().
Pepijn de Vries
## Not run: ## This will download and build the database in your temp dir: download_ecotox_data(tempdir()) ## End(Not run)
## Not run: ## This will download and build the database in your temp dir: download_ecotox_data(tempdir()) ## End(Not run)
Get information on how and when the local ECOTOX database was build.
get_ecotox_info(path = get_ecotox_path(), version)
get_ecotox_info(path = get_ecotox_path(), version)
path |
A |
version |
A |
Get information on how and when the local ECOTOX database was build. This information is retrieved
from the log-file that is (optionally) stored with the local database when calling download_ecotox_data()
or build_ecotox_sqlite()
.
Returns a vector
of character
s, containing a information on the selected local ECOTOX database.
Pepijn de Vries
## Not run: ## Show info on the current database (only works when one is downloaded and build): get_ecotox_info() ## End(Not run)
## Not run: ## Show info on the current database (only works when one is downloaded and build): get_ecotox_info() ## End(Not run)
Obtain the local path to where the ECOTOX database is (or will be) placed.
get_ecotox_sqlite_file(path = get_ecotox_path(), version) get_ecotox_path()
get_ecotox_sqlite_file(path = get_ecotox_path(), version) get_ecotox_path()
path |
When you have a copy of the database somewhere other than the default
directory ( |
version |
A |
It can be useful to know where the database is located on your disk. This function
returns the location as provided by rappdirs::app_dir()
, or as
specified by you using options(ECOTOXr_path = "mypath")
.
Returns a character
string of the path.
get_ecotox_path
will return the default directory of the database.
get_ecotox_sqlite_file
will return the path to the sqlite file when it exists.
Pepijn de Vries
get_ecotox_path() ## Not run: ## This will only work if a local database exists: get_ecotox_sqlite_file() ## End(Not run)
get_ecotox_path() ## Not run: ## This will only work if a local database exists: get_ecotox_sqlite_file() ## End(Not run)
This function downloads the webpage at https://cfpub.epa.gov/ecotox/index.cfm. It then searches for the download link for the complete ECOTOX database and extract its URL.
get_ecotox_url(verify_ssl = getOption("ECOTOXr_verify_ssl"), ...)
get_ecotox_url(verify_ssl = getOption("ECOTOXr_verify_ssl"), ...)
verify_ssl |
When set to |
... |
arguments passed on to |
This function is called by download_ecotox_data()
which tries to download the file from the resulting
URL. On some machines this fails due to issues with the SSL certificate. The user can try to download the file
by using this URL in a different browser (or on a different machine). Alternatively, the user could try to use
[download_ecotox_data](verify_ssl = FALE)
when the download URL is trusted.
Returns a character
string containing the download URL of the latest version of the EPA ECOTOX
database.
Pepijn de Vries
## Not run: get_ecotox_url() ## End(Not run)
## Not run: get_ecotox_url() ## End(Not run)
List the field names (table headers) that are available from the ECOTOX database
list_ecotox_fields( which = c("default", "extended", "full", "all"), include_table = TRUE )
list_ecotox_fields( which = c("default", "extended", "full", "all"), include_table = TRUE )
which |
A |
include_table |
A |
This can be useful when specifying a search_ecotox()
, to identify which fields
are available from the database, for searching and output.
Not that when requesting 'all
' fields, you will get all fields available from the
latest EPA release of the ECOTOX database. This means that not necessarily all
fields are available in your local build of the database.
Returns a vector
of type character
containing the field names from the ECOTOX database.
Pepijn de Vries
## Fields that are included in search results by default: list_ecotox_fields("default") ## All fields that are available from the ECOTOX database: list_ecotox_fields("all") ## All except fields from the tables 'chemical_carriers', 'media_characteristics', ## 'doses', 'dose_responses', 'dose_response_details', 'dose_response_links' and ## 'dose_stat_method_codes' that are available from the ECOTOX database: list_ecotox_fields("full")
## Fields that are included in search results by default: list_ecotox_fields("default") ## All fields that are available from the ECOTOX database: list_ecotox_fields("all") ## All except fields from the tables 'chemical_carriers', 'media_characteristics', ## 'doses', 'dose_responses', 'dose_response_details', 'dose_response_links' and ## 'dose_stat_method_codes' that are available from the ECOTOX database: list_ecotox_fields("full")
Create (and execute) an SQL search query based on basic search terms and options. This allows you to search the database, without having to understand SQL.
search_ecotox( search, output_fields = list_ecotox_fields("default"), group_by_results = TRUE, compute = FALSE, as_data_frame = TRUE, ... ) search_ecotox_lazy( search, output_fields = list_ecotox_fields("default"), compute = FALSE, ... ) search_query_ecotox(search, output_fields = list_ecotox_fields("default"), ...)
search_ecotox( search, output_fields = list_ecotox_fields("default"), group_by_results = TRUE, compute = FALSE, as_data_frame = TRUE, ... ) search_ecotox_lazy( search, output_fields = list_ecotox_fields("default"), compute = FALSE, ... ) search_query_ecotox(search, output_fields = list_ecotox_fields("default"), ...)
search |
A named Each element in that list should contain another list with at least one element named 'terms'. This should
contain a Search terms for a specific field (table header) will be combined with 'or'. Meaning that any record that matches any of the terms are returned. For instance when 'latin_name' 'Daphnia magna' and 'Skeletonema costatum' are searched, results for both species are returned. Search terms across fields (table headers) are combined with 'and', which will narrow the search. For instance if 'chemical_name' 'benzene' is searched in combination with 'latin_name' 'Daphnia magna', only tests where Daphnia magna are exposed to benzene are returned. When this search behaviour described above is not desirable, the user can either adjust the query manually, or use this function to perform several separate searches and combine the results afterwards. Beware that some field names are ambiguous and occur in multiple tables (like |
output_fields |
A |
group_by_results |
Ecological test results are generally the most informative element in the ECOTOX database. Therefore, this search function returns a table with unique results in each row. However, some tables in the database (such as 'chemical_carriers' and 'dose_responses') have a one to many relationship with test results. This means that multiple chemical carriers can be linked to a single test result, similarly, multiple doses can also be linked to a single test result. By default the search results are grouped by test results. As a result not all doses or chemical carriers may
be displayed in the output. Set the |
compute |
The ECOTOXr package tries to construct database queries as lazy as possible. Meaning that R
moves as much of the heavy lifting as possible to the database. When your search becomes complicated (e.g., when
including many output fields), you may run into trouble and hit the SQL parser limits. In those cases you can set
this parameter to |
as_data_frame |
|
... |
Arguments passed to |
The ECOTOX database is stored locally as an SQLite file, which can be queried with SQL. These functions
allow you to automatically generate an SQL query and send it to the database, without having to understand
SQL. The function search_query_ecotox
generates and returns the SQL query (which can be edited by
hand if desired). You can also directly call search_ecotox
, this will first generate the query,
send it to the database and retrieve the result.
Although the generated query is not optimized for speed, it should be able to process most common searches
within an acceptable time. The time required for retrieving data from a search query depends on the complexity
of the query, the size of the query and the speed of your machine. Most queries should be completed within
seconds (or several minutes at most) on modern machines. If your search require optimisation for speed,
you could try reordering the search fields. You can also edit the query generated with search_query_ecotox
by hand and retrieve it with DBI::dbGetQuery()
.
Note that this package is actively maintained and this function may be revised in future versions.
In order to create reproducible results the user must: always work with an official release from
CRAN and document the package and database version that are used to generate specific results (see also
cite_ecotox()
).
In case of search_query_ecotox
, a character
string containing an SQL
query is returned. This query is built based on the provided search terms and options.
In case of search_ecotox
a data.frame
is returned based on the search query built with
search_query_ecotox
. The data.frame
is unmodified as returned by SQLite, meaning that all
fields are returned as character
s (even where the field types are 'date' or 'numeric').
The results are tagged with: a time stamp; the package version used; and the file path of the SQLite database used in the search (when applicable). These tags are added as attributes to the output table or query.
Pepijn de Vries
Other search-functions:
websearch_comptox()
,
websearch_ecotox()
## Not run: ## let's find the ids of all ecotox tests on species ## where Latin names contain either of 2 specific genus names and ## where they were exposed to the chemical benzene if (check_ecotox_availability()) { search <- list( latin_name = list( terms = c("Skeletonema", "Daphnia"), method = "contains" ), chemical_name = list( terms = "benzene", method = "exact" ) ) ## rows in result each represent a unique test id from the database result <- search_ecotox(search) query <- search_query_ecotox(search) cat(query) } else { print("Sorry, you need to use 'download_ecotox_data()' first in order for this to work.") } ## End(Not run)
## Not run: ## let's find the ids of all ecotox tests on species ## where Latin names contain either of 2 specific genus names and ## where they were exposed to the chemical benzene if (check_ecotox_availability()) { search <- list( latin_name = list( terms = c("Skeletonema", "Daphnia"), method = "contains" ), chemical_name = list( terms = "benzene", method = "exact" ) ) ## rows in result each represent a unique test id from the database result <- search_ecotox(search) query <- search_query_ecotox(search) cat(query) } else { print("Sorry, you need to use 'download_ecotox_data()' first in order for this to work.") } ## End(Not run)
Search https://comptox.epa.gov/dashboard for substances and their chemico-physical properties and meta-information.
websearch_comptox( searchItems, identifierTypes = c("chemical_name", "CASRN", "INCHIKEY", "dtxsid"), inputType = c("IDENTIFIER", "DTXCID", "INCHIKEY_SKELETON", "MSREADY_FORMULA", "EXACT_FORMULA", "MASS"), downloadItems = c("DTXCID", "CASRN", "INCHIKEY", "IUPAC_NAME", "SMILES", "INCHI_STRING", "MS_READY_SMILES", "QSAR_READY_SMILES", "MOLECULAR_FORMULA", "AVERAGE_MASS", "MONOISOTOPIC_MASS", "QC_LEVEL", "SAFETY_DATA", "EXPOCAST", "DATA_SOURCES", "TOXVAL_DATA", "NUMBER_OF_PUBMED_ARTICLES", "PUBCHEM_DATA_SOURCES", "CPDAT_COUNT", "IRIS_LINK", "PPRTV_LINK", "WIKIPEDIA_ARTICLE", "QC_NOTES", "ABSTRACT_SHIFTER", "TOXPRINT_FINGERPRINT", "ACTOR_REPORT", "SYNONYM_IDENTIFIER", "RELATED_RELATIONSHIP", "ASSOCIATED_TOXCAST_ASSAYS", "TOXVAL_DETAILS", "CHEMICAL_PROPERTIES_DETAILS", "BIOCONCENTRATION_FACTOR_TEST_PRED", "BOILING_POINT_DEGC_TEST_PRED", "48HR_DAPHNIA_LC50_MOL/L_TEST_PRED", "DENSITY_G/CM^3_TEST_PRED", "DEVTOX_TEST_PRED", "96HR_FATHEAD_MINNOW_MOL/L_TEST_PRED", "FLASH_POINT_DEGC_TEST_PRED", "MELTING_POINT_DEGC_TEST_PRED", "AMES_MUTAGENICITY_TEST_PRED", "ORAL_RAT_LD50_MOL/KG_TEST_PRED", "SURFACE_TENSION_DYN/CM_TEST_PRED", "THERMAL_CONDUCTIVITY_MW/(M*K)_TEST_PRED", "TETRAHYMENA_PYRIFORMIS_IGC50_MOL/L_TEST_PRED", "VISCOSITY_CP_CP_TEST_PRED", "VAPOR_PRESSURE_MMHG_TEST_PRED", "WATER_SOLUBILITY_MOL/L_TEST_PRED", "ATMOSPHERIC_HYDROXYLATION_RATE_(AOH)_CM3/MOLECULE*SEC_OPERA_PRED", "BIOCONCENTRATION_FACTOR_OPERA_PRED", "BIODEGRADATION_HALF_LIFE_DAYS_DAYS_OPERA_PRED", "BOILING_POINT_DEGC_OPERA_PRED", "HENRYS_LAW_ATM-M3/MOLE_OPERA_PRED", "OPERA_KM_DAYS_OPERA_PRED", "OCTANOL_AIR_PARTITION_COEFF_LOGKOA_OPERA_PRED", "SOIL_ADSORPTION_COEFFICIENT_KOC_L/KG_OPERA_PRED", "OCTANOL_WATER_PARTITION_LOGP_OPERA_PRED", "MELTING_POINT_DEGC_OPERA_PRED", "OPERA_PKAA_OPERA_PRED", "OPERA_PKAB_OPERA_PRED", "VAPOR_PRESSURE_MMHG_OPERA_PRED", "WATER_SOLUBILITY_MOL/L_OPERA_PRED", "EXPOCAST_MEDIAN_EXPOSURE_PREDICTION_MG/KG-BW/DAY", "NHANES", "TOXCAST_NUMBER_OF_ASSAYS/TOTAL", "TOXCAST_PERCENT_ACTIVE"), massError = 0, timeout = 300, verify_ssl = getOption("ECOTOXr_verify_ssl"), ... )
websearch_comptox( searchItems, identifierTypes = c("chemical_name", "CASRN", "INCHIKEY", "dtxsid"), inputType = c("IDENTIFIER", "DTXCID", "INCHIKEY_SKELETON", "MSREADY_FORMULA", "EXACT_FORMULA", "MASS"), downloadItems = c("DTXCID", "CASRN", "INCHIKEY", "IUPAC_NAME", "SMILES", "INCHI_STRING", "MS_READY_SMILES", "QSAR_READY_SMILES", "MOLECULAR_FORMULA", "AVERAGE_MASS", "MONOISOTOPIC_MASS", "QC_LEVEL", "SAFETY_DATA", "EXPOCAST", "DATA_SOURCES", "TOXVAL_DATA", "NUMBER_OF_PUBMED_ARTICLES", "PUBCHEM_DATA_SOURCES", "CPDAT_COUNT", "IRIS_LINK", "PPRTV_LINK", "WIKIPEDIA_ARTICLE", "QC_NOTES", "ABSTRACT_SHIFTER", "TOXPRINT_FINGERPRINT", "ACTOR_REPORT", "SYNONYM_IDENTIFIER", "RELATED_RELATIONSHIP", "ASSOCIATED_TOXCAST_ASSAYS", "TOXVAL_DETAILS", "CHEMICAL_PROPERTIES_DETAILS", "BIOCONCENTRATION_FACTOR_TEST_PRED", "BOILING_POINT_DEGC_TEST_PRED", "48HR_DAPHNIA_LC50_MOL/L_TEST_PRED", "DENSITY_G/CM^3_TEST_PRED", "DEVTOX_TEST_PRED", "96HR_FATHEAD_MINNOW_MOL/L_TEST_PRED", "FLASH_POINT_DEGC_TEST_PRED", "MELTING_POINT_DEGC_TEST_PRED", "AMES_MUTAGENICITY_TEST_PRED", "ORAL_RAT_LD50_MOL/KG_TEST_PRED", "SURFACE_TENSION_DYN/CM_TEST_PRED", "THERMAL_CONDUCTIVITY_MW/(M*K)_TEST_PRED", "TETRAHYMENA_PYRIFORMIS_IGC50_MOL/L_TEST_PRED", "VISCOSITY_CP_CP_TEST_PRED", "VAPOR_PRESSURE_MMHG_TEST_PRED", "WATER_SOLUBILITY_MOL/L_TEST_PRED", "ATMOSPHERIC_HYDROXYLATION_RATE_(AOH)_CM3/MOLECULE*SEC_OPERA_PRED", "BIOCONCENTRATION_FACTOR_OPERA_PRED", "BIODEGRADATION_HALF_LIFE_DAYS_DAYS_OPERA_PRED", "BOILING_POINT_DEGC_OPERA_PRED", "HENRYS_LAW_ATM-M3/MOLE_OPERA_PRED", "OPERA_KM_DAYS_OPERA_PRED", "OCTANOL_AIR_PARTITION_COEFF_LOGKOA_OPERA_PRED", "SOIL_ADSORPTION_COEFFICIENT_KOC_L/KG_OPERA_PRED", "OCTANOL_WATER_PARTITION_LOGP_OPERA_PRED", "MELTING_POINT_DEGC_OPERA_PRED", "OPERA_PKAA_OPERA_PRED", "OPERA_PKAB_OPERA_PRED", "VAPOR_PRESSURE_MMHG_OPERA_PRED", "WATER_SOLUBILITY_MOL/L_OPERA_PRED", "EXPOCAST_MEDIAN_EXPOSURE_PREDICTION_MG/KG-BW/DAY", "NHANES", "TOXCAST_NUMBER_OF_ASSAYS/TOTAL", "TOXCAST_PERCENT_ACTIVE"), massError = 0, timeout = 300, verify_ssl = getOption("ECOTOXr_verify_ssl"), ... )
searchItems |
A |
identifierTypes |
Substance identifiers for searching CompTox. Only used when |
inputType |
Type of input used for searching CompTox. See usage section for valid entries. |
downloadItems |
Output fields of CompTox data for requested substances |
massError |
Error tolerance when searching for substances based on their monoisotopic mass. Only used for |
timeout |
Time in seconds (default is 300 secs), that the routine will wait for the download link to get ready.
It will throw an error if it takes longer than the specified |
verify_ssl |
When set to |
... |
Arguments passed on to |
The CompTox Chemicals Dashboard is a freely accessible on-line U.S. EPA database. It contains information on physico-chemical properties, environmental fate and transport, exposure, usage, in vivo toxicity, and in vitro bioassay of a wide range of substances.
The function described here to search and retrieve records from the on-line database is experimental. This is because this feature is not formally supported by the EPA, and it may break in future incarnations of the on-line database. The function forms an interface between R and the CompTox website and is therefore limited by the restrictions documented there.
Returns a named list
of dplyr::tibbles containing the search results for the requested output tables and fields.
Results are unpolished and ‘as is’ returned by EPA's web service.
Pepijn de Vries
Official US EPA CompTox website: https://comptox.epa.gov/dashboard/
Williams, A.J., Grulke, C.M., Edwards, J., McEachran, A.D., Mansouri, K, Baker, N.C., Patlewicz, G., Shah, I., Wambaugh, J.F., Judson, R.S. & Richard, A.M. (2017), The CompTox Chemistry Dashboard: a community data resource for environmental chemistry. J Cheminform, 9(61) doi:10.1186/s13321-017-0247-6
Other search-functions:
search_ecotox()
,
websearch_ecotox()
## Not run: ## search for substance name 'benzene' and CAS registration number 108-88-3 ## on https://comptox.epa.gov/dashboard: comptox_results <- websearch_comptox(c("benzene", "108-88-3")) ## search for substances with monoisotopic mass of 100+/-5: comptox_results2 <- websearch_comptox("100", inputType = "MASS", massError = 5) ## End(Not run)
## Not run: ## search for substance name 'benzene' and CAS registration number 108-88-3 ## on https://comptox.epa.gov/dashboard: comptox_results <- websearch_comptox(c("benzene", "108-88-3")) ## search for substances with monoisotopic mass of 100+/-5: comptox_results2 <- websearch_comptox("100", inputType = "MASS", massError = 5) ## End(Not run)
Functions to search and retrieve records from the on-line database at https://cfpub.epa.gov/ecotox/search.cfm.
websearch_ecotox( fields = list_ecotox_web_fields(), habitat = c("aquire", "terrestrial"), verify_ssl = getOption("ECOTOXr_verify_ssl"), ... ) list_ecotox_web_fields(...)
websearch_ecotox( fields = list_ecotox_web_fields(), habitat = c("aquire", "terrestrial"), verify_ssl = getOption("ECOTOXr_verify_ssl"), ... ) list_ecotox_web_fields(...)
fields |
A named |
habitat |
Use |
verify_ssl |
When set to |
... |
In case of In case of |
The functions described here to search and retrieve records from the on-line database are experimental. This is because this feature is not formally supported by the EPA, and it may break in future iterations of the on-line database. The functions form an interface between R and the ECOTOX website and is therefore limited by its restrictions as described in the package documentation: ECOTOXr. The functions should therefore be used with caution.
Returns named list
of dplyr::tibbles with search results. Results are unpolished and ‘as is’ returned by EPA's web service.
list_ecotox_web_fields()
returns a named list with fields that can be used in a web search of EPA's ECOTOX database, using
websearch_ecotox()
.
IMPORTANT: when you plan to perform multiple adjacent searches (for instance in a loop), please insert a call to Sys.sleep()
.
This to avoid overloading the server and getting your IP address banned from the server.
Pepijn de Vries
Other search-functions:
search_ecotox()
,
websearch_comptox()
## Not run: search_fields <- list_ecotox_web_fields( txAdvancedSpecEntries = "daphnia magna", RBSPECSEARCHTYPE = "EXACT", txAdvancedChemicalEntries = "benzene", RBCHEMSEARCHTYPE = "EXACT") search_results <- websearch_ecotox(search_fields) ## End(Not run)
## Not run: search_fields <- list_ecotox_web_fields( txAdvancedSpecEntries = "daphnia magna", RBSPECSEARCHTYPE = "EXACT", txAdvancedChemicalEntries = "benzene", RBCHEMSEARCHTYPE = "EXACT") search_results <- websearch_ecotox(search_fields) ## End(Not run)