Title: | Client for Delphi's 'COVIDcast Epidata' API |
---|---|
Description: | Tools for Delphi's 'COVIDcast Epidata' API: data access, maps and time series plotting, and basic signal processing. The API includes a collection of numerous indicators relevant to the COVID-19 pandemic in the United States, including official reports, de-identified aggregated medical claims data, large-scale surveys of symptoms and public behavior, and mobility data, typically updated daily and at the county level. All data sources are documented at <https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html>. |
Authors: | Taylor Arnold [aut], Jacob Bien [aut], Logan Brooks [aut],
Sarah Colquhoun [aut], David Farrow [aut], Jed Grabman [ctb],
Pedrito Maynard-Zhang [ctb], Kathryn Mazaitis [aut], Alex
Reinhart [aut, cre] |
Maintainer: | Alex Reinhart <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.5.2 |
Built: | 2025-03-04 04:34:17 UTC |
Source: | https://github.com/cran/covidcast |
Look up FIPS codes by state abbreviations (including District of Columbia and
Puerto Rico); this function is based on grep()
, and hence allows for
regular expressions.
abbr_to_fips( abbr, ignore.case = TRUE, perl = FALSE, fixed = FALSE, ties_method = c("first", "all") )
abbr_to_fips( abbr, ignore.case = TRUE, perl = FALSE, fixed = FALSE, ties_method = c("first", "all") )
abbr |
Vector of state abbreviations to look up. |
ignore.case , perl , fixed
|
Arguments to pass to |
ties_method |
If "first", then only the first match for each name is returned. If "all", then all matches for each name are returned. |
A vector of FIPS codes if ties_method
equals "first", and a list of
FIPS codes otherwise. These FIPS codes have five digits (ending in "000").
abbr_to_fips("PA") abbr_to_fips(c("PA", "PR", "DC")) # Note that name_to_fips() works for state names too: name_to_fips("^Pennsylvania$")
abbr_to_fips("PA") abbr_to_fips(c("PA", "PR", "DC")) # Note that name_to_fips() works for state names too: name_to_fips("^Pennsylvania$")
Look up state names by state abbreviations (including District of Columbia
and Puerto Rico); this function is based on grep()
, and hence allows for
regular expressions.
abbr_to_name( abbr, ignore.case = FALSE, perl = FALSE, fixed = FALSE, ties_method = c("first", "all") )
abbr_to_name( abbr, ignore.case = FALSE, perl = FALSE, fixed = FALSE, ties_method = c("first", "all") )
abbr |
Vector of state abbreviations to look up. |
ignore.case , perl , fixed
|
Arguments to pass to |
ties_method |
If "first", then only the first match for each name is returned. If "all", then all matches for each name are returned. |
A vector of state names if ties_method
equals "first", and a list
of state names otherwise.
abbr_to_name("PA") abbr_to_name(c("PA", "PR", "DC"))
abbr_to_name("PA") abbr_to_name(c("PA", "PR", "DC"))
covidcast_signal
objects into one data frameAggregates covidcast_signal
objects into one data frame, in either "wide"
or "long" format. (In "wide" aggregation, only the latest issue from each
data frame is retained, and several columns, including data_source
and
signal
are dropped; see details below). See vignette("multi-signals", package = "covidcast")
for examples.
aggregate_signals(x, dt = NULL, format = c("wide", "long"))
aggregate_signals(x, dt = NULL, format = c("wide", "long"))
x |
Single |
dt |
Vector of shifts to apply to the values in the data frame |
format |
One of either "wide" or "long". The default is "wide". |
This function can be thought of having three use cases. In all three
cases, the result will be a new data frame in either "wide" or "long"
format, depending on format
.
The first use case is to apply time-shifts to the values in a given
covidcast_signal
object. In this use case, x
is a covidcast_signal
data frame and dt
is a vector of shifts.
The second use case is to bind together, into one data frame, signals that
are returned by covidcast_signals()
. In this use case, x
is a list of
covidcast_signal
data frames, and dt
is NULL
.
The third use case is a combination of the first two: to bind together
signals returned by covidcast_signals()
, and simultaneously, apply
time-shifts to their values. In this use case, x
is a list of
covidcast_signal
data frames, and dt
is either a vector of shifts—to
apply the same shifts for each signal in x
, or a list of vector of
shifts—to apply different shifts for each signal in x
.
Data frame of aggregated signals in "wide" or "long" form, depending
on format
. In "long" form, an extra column dt
is appended to indicate
the value of the time-shift. In "wide" form, only the latest issue of data
is retained; the returned data frame is formed via full joins of the input
data frames (on geo_value
and time_value
as the join key), and the
columns data_source
, signal
, issue
, lag
, stderr
, sample_size
are all dropped from the output. Each unique signal—defined by a
combination of data source name, signal name, and time-shift—is given its
own column, whose name indicates its defining quantities. For example, the
column name "value+2:usa-facts_confirmed_incidence_num" corresponds to a
signal defined by data_source = "usa-facts"
, signal = "confirmed_incidence_num"
, and dt = 2
.
covidcast_wider()
, covidcast_longer()
covidcast_signal
.Several methods are provided to convert common objects (such as data frames)
into covidcast_signal
objects, which can be used with the various
covidcast_signal
methods (such as plot.covidcast_signal()
or
covidcast_cor()
). See vignette("external-data")
for examples.
as.covidcast_signal(x, ...) ## S3 method for class 'covidcast_signal' as.covidcast_signal(x, ...) ## S3 method for class 'data.frame' as.covidcast_signal( x, signal = NULL, geo_type = c("county", "msa", "hrr", "dma", "state", "hhs", "nation"), time_type = c("day", "week"), data_source = "user", issue = NULL, metadata = list(), ... )
as.covidcast_signal(x, ...) ## S3 method for class 'covidcast_signal' as.covidcast_signal(x, ...) ## S3 method for class 'data.frame' as.covidcast_signal( x, signal = NULL, geo_type = c("county", "msa", "hrr", "dma", "state", "hhs", "nation"), time_type = c("day", "week"), data_source = "user", issue = NULL, metadata = list(), ... )
x |
Object to be converted. See Methods section below for details on formatting of each input type. |
... |
Additional arguments passed to methods. |
signal |
The signal name to use for this data. |
geo_type |
The geography type stored in this object. |
time_type |
The time resolution stored in this object. If "day", the default, each observation covers one day. If "week", each time value is assumed to be the start date of the epiweek (MMWR week) that the data represents. |
data_source |
The name of the data source to use as a label for this data. |
issue |
Issue date to use for this data, if not present in |
metadata |
List of metadata to attach to the |
covidcast_signal
object; see covidcast_signal()
for documentation
of fields and structure.
as.covidcast_signal(covidcast_signal)
: Simply returns the covidcast_signal
object
unchanged.
as.covidcast_signal(data.frame)
: The input data frame x
must contain the
columns time_value
, value
, and geo_value
. If an issue
column is
present in x
, it will be used as the issue date for each observation; if
not, the issue
argument will be used. Other columns will be preserved
as-is.
Data set on county populations, from the 2019 US Census.
county_census
county_census
A data frame with 3193 rows, one for each county (along with the 50 states and DC). Columns include:
Geographic summary level. Either 40 (state) or 50 (county).
Census Region code
Census Division code
State FIPS code.
County FIPS
Name of the state in which this county belongs.
County name, to help find counties by name.
Estimate of the county's resident population as of July 1, 2019.
Five-digit county FIPS codes. These are unique identifiers
used, for example, as the geo_values
argument to covidcast_signal()
to
request data from a specific county.
United States Census Bureau, at https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.csv
Census Bureau documentation of all columns and their meaning: https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.pdf, https://www.census.gov/data/tables/time-series/demo/popest/2010s-total-puerto-rico-municipios.html, and https://www.census.gov/data/tables/2010/dec/2010-island-areas.html
county_fips_to_name()
, name_to_fips()
covidcast_signal
data framesComputes correlations between two covidcast_signal
data frames, allowing
for slicing by geo location, or by time. (Only the latest issue from each
data frame is used for correlations.) See the correlations vignette
for examples: vignette("correlation-utils", package = "covidcast")
.
covidcast_cor( x, y, dt_x = 0, dt_y = 0, by = c("geo_value", "time_value"), use = "na.or.complete", method = c("pearson", "kendall", "spearman") )
covidcast_cor( x, y, dt_x = 0, dt_y = 0, by = c("geo_value", "time_value"), use = "na.or.complete", method = c("pearson", "kendall", "spearman") )
x , y
|
The |
dt_x , dt_y
|
Time shifts (in days) to consider for |
by |
If "geo_value", then correlations are computed for each geo location, over all time. Each correlation is measured between two time series at the same location. If "time_value", then correlations are computed for each time, over all geo locations. Each correlation is measured between all locations at one time. Default is "geo_value". |
use , method
|
Arguments to pass to |
A data frame with first column geo_value
or time_value
(matching
by
), and second column value
, which gives the correlation.
## Not run: # For all these examples, let x and y be two signals measured at the county # level over several months. ## `by = "geo_value"` # Correlate each county's time series together, returning one correlation per # county: covidcast_cor(x, y, by = "geo_value") # Correlate x in each county with values of y 14 days later covidcast_cor(x, y, dt_y = 14, by = "geo_value") # Equivalently, x can be shifted -14 days: covidcast_cor(x, y, dt_x = -14, by = "geo_value") ## `by = "time_value"` # For each date, correlate x's values in every county against y's values in # the same counties. Returns one correlation per date: covidcast_cor(x, y, by = "time_value") # Correlate x values across counties against y values 7 days later covidcast_cor(x, y, dt_y = 7, by = "time_value") ## End(Not run)
## Not run: # For all these examples, let x and y be two signals measured at the county # level over several months. ## `by = "geo_value"` # Correlate each county's time series together, returning one correlation per # county: covidcast_cor(x, y, by = "geo_value") # Correlate x in each county with values of y 14 days later covidcast_cor(x, y, dt_y = 14, by = "geo_value") # Equivalently, x can be shifted -14 days: covidcast_cor(x, y, dt_x = -14, by = "geo_value") ## `by = "time_value"` # For each date, correlate x's values in every county against y's values in # the same counties. Returns one correlation per date: covidcast_cor(x, y, by = "time_value") # Correlate x values across counties against y values 7 days later covidcast_cor(x, y, dt_y = 7, by = "time_value") ## End(Not run)
These functions take signals returned from aggregate_signals()
and convert
between formats. covidcast_longer()
takes the output of
aggregate_signals(..., format = "wide")
and converts it to "long" format,
while covidcast_wider()
takes the output of aggregate_signals(..., format = "long")
and converts it to "wide" format.
covidcast_longer(x) covidcast_wider(x)
covidcast_longer(x) covidcast_wider(x)
x |
A |
The object pivoted into the opposite form, i.e. as if
aggregate_signals()
had been called in the first place with that
format
argument.
Obtains a data frame of metadata describing all publicly available data streams from the COVIDcast API.
covidcast_meta()
covidcast_meta()
Data frame containing one row per signal, with the following columns:
data_source |
Data source name. |
signal |
Signal name. |
min_time |
First day for which this signal is available. |
max_time |
Most recent day for which this signal is available. |
geo_type |
Geographic level for which this signal is available, such as county, state, msa, or hrr. Most signals are available at multiple geographic levels and will hence be listed in multiple rows with their own metadata. |
time_type |
Temporal resolution at which this signal is reported. "day", for example, means the signal is reported daily. |
num_locations |
Number of distinct geographic locations available for
this signal. For example, if |
min_value |
Smallest value that has ever been reported. |
max_value |
Largest value that has ever been reported. |
mean_value |
Arithmetic mean of all reported values. |
stdev_value |
Sample standard deviation of all reported values. |
max_issue |
Most recent issue date for this signal. |
min_lag |
Smallest lag from observation to issue, in |
max_lag |
Largest lag from observation to issue, in |
COVIDcast API sources and signals documentation: https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html
Obtains data for selected date ranges for all geographic regions of the
United States. Available data sources and signals are documented in the
COVIDcast signal documentation.
Most (but not all) data sources are available at the county level, but the
API can also return data aggregated to metropolitan statistical areas,
hospital referral regions, or states, as desired, by using the geo_type
argument.
covidcast_signal( data_source, signal, start_day = NULL, end_day = NULL, geo_type = c("county", "hrr", "msa", "dma", "state", "hhs", "nation"), geo_values = "*", as_of = NULL, issues = NULL, lag = NULL, time_type = c("day", "week") )
covidcast_signal( data_source, signal, start_day = NULL, end_day = NULL, geo_type = c("county", "hrr", "msa", "dma", "state", "hhs", "nation"), geo_values = "*", as_of = NULL, issues = NULL, lag = NULL, time_type = c("day", "week") )
data_source |
String identifying the data source to query. See https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html for a list of available data sources. |
signal |
String identifying the signal from that source to query. Again, see https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html for a list of available signals. |
start_day |
Query data beginning on this date. Date object, or string in
the form "YYYY-MM-DD". If |
end_day |
Query data up to this date, inclusive. Date object or string
in the form "YYYY-MM-DD". If |
geo_type |
The geography type for which to request this data, such as "county" or "state". Defaults to "county". See https://cmu-delphi.github.io/delphi-epidata/api/covidcast_geography.html for details on which types are available. |
geo_values |
Which geographies to return. The default, "*", fetches all geographies. To fetch specific geographies, specify their IDs as a vector or list of strings. See https://cmu-delphi.github.io/delphi-epidata/api/covidcast_geography.html for details on how to specify these IDs. |
as_of |
Fetch only data that was available on or before this date,
provided as a |
issues |
Fetch only data that was published or updated ("issued") on
these dates. Provided as either a single |
lag |
Integer. If, for example, |
time_type |
The temporal resolution to request this data. Most signals are available at the "day" resolution (the default); some are only available at the "week" resolution, representing an MMWR week ("epiweek"). |
For data on counties, metropolitan statistical areas, and states, this
package provides the county_census
, msa_census
, and state_census
datasets. These include each area's unique identifier, used in the
geo_values
argument to select specific areas, and basic information on
population and other Census data.
Downloading large amounts of data may be slow, so this function prints
messages for each chunk of data it downloads. To suppress these, use
base::suppressMessages()
, as in
suppressMessages(covidcast_signal("fb-survey", ...))
.
covidcast_signal
object with matching data. The object is a data
frame with additional metadata attached. Each row is one observation of one
signal on one day in one geographic location. Contains the following
columns:
data_source |
Data source from which this observation was obtained. |
signal |
Signal from which this observation was obtained. |
geo_value |
String identifying the location, such as a state name or county FIPS code. |
time_value |
Date object identifying the date of this observation. For
data with |
issue |
Date object identifying the date this estimate was issued.
For example, an estimate with a |
lag |
Integer giving the difference between |
value |
Signal value being requested. For example, in a query for the
"confirmed_cumulative_num" signal from the "usa-facts" source, this would
be the cumulative number of confirmed cases in the area, as of the given
|
stderr |
Associated standard error of the signal value, if available. |
sample_size |
Integer indicating the sample size available in that
geography on that day; sample size may not be available for all signals,
due to privacy or other constraints, in which case it will be |
Consult the signal documentation for more details on how values and standard errors are calculated for specific signals.
The returned data frame has a metadata
attribute containing metadata
about the signal contained within; see "Metadata" below for details.
The returned object has a metadata
attribute attached containing basic
information about the signal. Use attributes(x)$metadata
to access this
metadata. The metadata is stored as a data frame of one row, and contains the
same information that covidcast_meta()
would return for a given signal.
Note that not all covidcast_signal
objects may have all fields of metadata
attached; for example, an object created with as.covidcast_signal()
using
data from another source may only contain the geo_type
variable, along with
data_source
and signal
. Before using the metadata of a covidcast_signal
object, always check for the presence of the attributes you need.
The COVIDcast API tracks updates and changes to its underlying data, and
records the first date each observation became available. For example, a data
source may report its estimate for a specific state on June 3rd on June 5th,
once records become available. This data is considered "issued" on June 5th.
Later, the data source may update its estimate for June 3rd based on revised
data, creating a new issue on June 8th. By default, covidcast_signal()
returns the most recent issue available for every observation. The as_of
,
issues
, and lag
parameters allow the user to select specific issues
instead, or to see all updates to observations. These options are mutually
exclusive, and you should only specify one; if you specify more than one, you
may get an error or confusing results.
Note that the API only tracks the initial value of an estimate and changes
to that value. If a value was first issued on June 5th and never updated,
asking for data issued on June 6th (using issues
or lag
) would not
return that value, though asking for data as_of
June 6th would. See
vignette("covidcast")
for examples.
Note also that the API enforces a maximum result row limit; results beyond
the maximum limit are truncated. This limit is sufficient to fetch
observations in all counties in the United States on one day. This client
automatically splits queries for multiple days across multiple API calls.
However, if data for one day has been issued many times, using the issues
argument may return more results than the query limit. A warning will be
issued in this case. To see all results, split your query across multiple
calls with different issues
arguments.
By default, covidcast_signal()
submits queries to the API anonymously. All
the examples in the package documentation are compatible with anonymous use
of the API, but there are some limits on anonymous queries,
including a rate limit. If you regularly query large amounts of data, please
consider registering for a free API key, which lifts
these limits. Even if your usage falls within the anonymous usage limits,
registration helps us understand who and how others are using the Delphi
Epidata API, which may in turn inform future research, data partnerships, and
funding.
If you have an API key, you can use it by setting the covidcast.auth
option once before calling covidcast_signal()
or covidcast_signals()
:
options(covidcast.auth = "your_api_key") cli <- covidcast_signal(data_source = "fb-survey", signal = "smoothed_cli", start_day = "2020-05-01", end_day = "2020-05-07", geo_type = "state")
COVIDcast API documentation: https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html
Documentation of all COVIDcast sources and signals: https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html
COVIDcast public dashboard: https://delphi.cmu.edu/covidcast/
plot.covidcast_signal()
, covidcast_signals()
,
as.covidcast_signal()
, county_census
, msa_census
,
state_census
## Not run: ## Fetch all counties from 2020-05-10 to the most recent available data covidcast_signal("fb-survey", "smoothed_cli", start_day = "2020-05-10") ## Fetch all counties on just 2020-05-10 and no other days covidcast_signal("fb-survey", "smoothed_cli", start_day = "2020-05-10", end_day = "2020-05-10") ## Fetch all states on 2020-05-10, 2020-05-11, 2020-05-12 covidcast_signal("fb-survey", "smoothed_cli", start_day = "2020-05-10", end_day = "2020-05-12", geo_type = "state") ## Fetch all available data for just Pennsylvania and New Jersey covidcast_signal("fb-survey", "smoothed_cli", geo_type = "state", geo_values = c("pa", "nj")) ## Fetch all available data in the Pittsburgh metropolitan area covidcast_signal("fb-survey", "smoothed_cli", geo_type = "msa", geo_values = name_to_cbsa("Pittsburgh")) ## End(Not run)
## Not run: ## Fetch all counties from 2020-05-10 to the most recent available data covidcast_signal("fb-survey", "smoothed_cli", start_day = "2020-05-10") ## Fetch all counties on just 2020-05-10 and no other days covidcast_signal("fb-survey", "smoothed_cli", start_day = "2020-05-10", end_day = "2020-05-10") ## Fetch all states on 2020-05-10, 2020-05-11, 2020-05-12 covidcast_signal("fb-survey", "smoothed_cli", start_day = "2020-05-10", end_day = "2020-05-12", geo_type = "state") ## Fetch all available data for just Pennsylvania and New Jersey covidcast_signal("fb-survey", "smoothed_cli", geo_type = "state", geo_values = c("pa", "nj")) ## Fetch all available data in the Pittsburgh metropolitan area covidcast_signal("fb-survey", "smoothed_cli", geo_type = "msa", geo_values = name_to_cbsa("Pittsburgh")) ## End(Not run)
This convenience function uses covidcast_signal()
to obtain multiple
signals, potentially from multiple data sources.
covidcast_signals( data_source, signal, start_day = NULL, end_day = NULL, geo_type = c("county", "hrr", "msa", "dma", "state", "hhs", "nation"), time_type = c("day", "week"), geo_values = "*", as_of = NULL, issues = NULL, lag = NULL )
covidcast_signals( data_source, signal, start_day = NULL, end_day = NULL, geo_type = c("county", "hrr", "msa", "dma", "state", "hhs", "nation"), time_type = c("day", "week"), geo_values = "*", as_of = NULL, issues = NULL, lag = NULL )
data_source |
String identifying the data source to query. See https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html for a list of available data sources. |
signal |
String identifying the signal from that source to query. Again, see https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html for a list of available signals. |
start_day |
Query data beginning on this date. Date object, or string in
the form "YYYY-MM-DD". If |
end_day |
Query data up to this date, inclusive. Date object or string
in the form "YYYY-MM-DD". If |
geo_type |
The geography type for which to request this data, such as "county" or "state". Defaults to "county". See https://cmu-delphi.github.io/delphi-epidata/api/covidcast_geography.html for details on which types are available. |
time_type |
The temporal resolution to request this data. Most signals are available at the "day" resolution (the default); some are only available at the "week" resolution, representing an MMWR week ("epiweek"). |
geo_values |
Which geographies to return. The default, "*", fetches all geographies. To fetch specific geographies, specify their IDs as a vector or list of strings. See https://cmu-delphi.github.io/delphi-epidata/api/covidcast_geography.html for details on how to specify these IDs. |
as_of |
Fetch only data that was available on or before this date,
provided as a |
issues |
Fetch only data that was published or updated ("issued") on
these dates. Provided as either a single |
lag |
Integer. If, for example, |
The argument structure is just as in covidcast_signal()
, except
the first four arguments data_source
, signal
, start_day
, end_day
are permitted to be vectors. The first two arguments data_source
and
signal
are recycled appropriately in the calls to covidcast_signal()
;
see example below. The next two arguments start_day
, end_day
, unless
NULL
, must be either length 1 or N.
See vignette("multi-signals")
for additional examples.
A list of covidcast_signal
data frames, of length N = max(length(data_source), length(signal))
. This list can be aggregated
into a single data frame of either "wide" or "long" format using
aggregate_signals()
.
covidcast_signal()
, aggregate_signals()
## Not run: ## Fetch USAFacts confirmed cases and deaths over the same time period covidcast_signals("usa-facts", signal = c("confirmed_incidence_num", "deaths_incidence_num"), start_day = "2020-08-15", end_day = "2020-10-01") ## End(Not run)
## Not run: ## Fetch USAFacts confirmed cases and deaths over the same time period covidcast_signals("usa-facts", signal = c("confirmed_incidence_num", "deaths_incidence_num"), start_day = "2020-08-15", end_day = "2020-10-01") ## End(Not run)
Look up state abbreviations by FIPS codes (including District of Columbia and Puerto Rico). Will match the first two digits of the input codes, so should work for 5-digit county codes, or even longer tract and census block FIPS codes.
fips_to_abbr(code)
fips_to_abbr(code)
code |
Vector of FIPS codes to look up; will match the first two digits of the code. Note that these are treated as strings; the number 1 will not match "01". |
A vector of state abbreviations.
fips_to_abbr("42000") fips_to_abbr(c("42", "72", "11"))
fips_to_abbr("42000") fips_to_abbr(c("42", "72", "11"))
The data returned from covidcast_signal()
or covidcast_signals()
can, if
called with the issues
argument, contain multiple issues for a single
observation in a single location. These functions filter the data frame to
contain only the earliest issue or only the latest issue.
latest_issue(df) earliest_issue(df)
latest_issue(df) earliest_issue(df)
df |
A |
A data frame in the same form, but with only the earliest or latest issue of every observation. Note that these functions sort the data frame as part of their filtering, so the output data frame rows may be in a different order.
Data set on metropolitan area populations, from the 2019 US Census. This includes metropolitan and micropolitan statistical areas, although the COVIDcast API only supports fetching data from metropolitan statistical areas.
msa_census
msa_census
A data frame with 2797 rows, each representing one core-based statistical area (including metropolitan and micropolitan statistical areas, county or county equivalents, and metropolitan divisions). There are many columns. The most crucial are:
Core Based Statistical Area code. These are unique identifiers
used, for example, as the geo_values
argument to covidcast_signal()
when requesting data from specific metro areas (with geo_type = 'msa'
).
Metropolitan Division code
State and county code
Name or title of the area.
Legal/Statistical Area Description, identifying if this is a metropolitan or micropolitan area, a metropolitan division, or a county.
State FIPS code.
Estimate of the area's resident population as of July 1, 2019.
United States Census Bureau, at https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/metro/totals/cbsa-est2019-alldata.csv
Census Bureau documentation of all columns and their meaning: https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/metro/totals/cbsa-est2019-alldata.pdf
cbsa_to_name()
, name_to_cbsa()
Look up state abbreviations by state names (including District of Columbia
and Puerto Rico); this function is based on grep()
, and hence allows for
regular expressions.
name_to_abbr( name, ignore.case = FALSE, perl = FALSE, fixed = FALSE, ties_method = c("first", "all") )
name_to_abbr( name, ignore.case = FALSE, perl = FALSE, fixed = FALSE, ties_method = c("first", "all") )
name |
Vector of state names to look up. |
ignore.case , perl , fixed
|
Arguments to pass to |
ties_method |
If "first", then only the first match for each name is returned. If "all", then all matches for each name are returned. |
A vector of state abbreviations if ties_method
equals "first", and
a list of state abbreviations otherwise.
name_to_abbr("Penn") name_to_abbr(c("Penn", "New"), ties_method = "all")
name_to_abbr("Penn") name_to_abbr(c("Penn", "New"), ties_method = "all")
Look up FIPS or CBSA codes by county or metropolitan area names,
respectively; these functions are based on grep()
, and hence allow for
regular expressions.
name_to_fips( name, ignore.case = FALSE, perl = FALSE, fixed = FALSE, ties_method = c("first", "all"), state = NULL ) name_to_cbsa( name, ignore.case = FALSE, perl = FALSE, fixed = FALSE, ties_method = c("first", "all"), state = NULL )
name_to_fips( name, ignore.case = FALSE, perl = FALSE, fixed = FALSE, ties_method = c("first", "all"), state = NULL ) name_to_cbsa( name, ignore.case = FALSE, perl = FALSE, fixed = FALSE, ties_method = c("first", "all"), state = NULL )
name |
Vector of county or metropolitan area names to look up. |
ignore.case , perl , fixed
|
Arguments to pass to |
ties_method |
If "first", then only the first match for each name is returned. If "all", then all matches for each name are returned. |
state |
Two letter state abbreviation (case insensitive) indicating a
parent state used to restrict the search. For example, when |
A vector of FIPS or CBSA codes if ties_method
equals "first", and a
list of FIPS or CBSA codes otherwise.
state_fips_to_name()
, cbsa_to_name()
name_to_fips("Allegheny") name_to_cbsa("Pittsburgh") name_to_fips("Miami") name_to_fips("Miami", ties_method = "all") name_to_fips(c("Allegheny", "Miami", "New "), ties_method = "all")
name_to_fips("Allegheny") name_to_cbsa("Pittsburgh") name_to_fips("Miami") name_to_fips("Miami", ties_method = "all") name_to_fips(c("Allegheny", "Miami", "New "), ties_method = "all")
covidcast_signal
object as choropleths, bubbles, or time seriesSeveral plot types are provided, including choropleth plots (maps), bubble
plots, and time series plots showing the change of signals over time, for a
data frame returned by covidcast_signal()
. (Only the latest issue from the
data frame is used for plotting.) See vignette("plotting-signals", package = "covidcast")
for examples.
## S3 method for class 'covidcast_signal' plot( x, plot_type = c("choro", "bubble", "line"), time_value = NULL, include = c(), range = NULL, choro_col = c("#FFFFCC", "#FD893C", "#800026"), alpha = 0.5, bubble_col = "purple", num_bins = 8, title = NULL, choro_params = list(), bubble_params = list(), line_params = list(), ... )
## S3 method for class 'covidcast_signal' plot( x, plot_type = c("choro", "bubble", "line"), time_value = NULL, include = c(), range = NULL, choro_col = c("#FFFFCC", "#FD893C", "#800026"), alpha = 0.5, bubble_col = "purple", num_bins = 8, title = NULL, choro_params = list(), bubble_params = list(), line_params = list(), ... )
x |
The |
plot_type |
One of "choro", "bubble", "line" indicating whether to plot a choropleth map, bubble map, or line (time series) graph, respectively. The default is "choro". |
time_value |
Date object (or string in the form "YYYY-MM-DD") specifying
the day to map, for choropleth and bubble maps. If |
include |
Vector of state abbreviations (case insensitive, so "pa" and
"PA" both denote Pennsylvania) indicating which states to include in the
choropleth and bubble maps. Default is |
range |
Vector of two values: min and max, in this order, to use when
defining the color scale for choropleth maps and the size scale for bubble
maps, or the range of the y-axis for the time series plot. If |
choro_col |
Vector of colors, as specified in hex code, to use for the choropleth color scale. Can be arbitrary in length. Default is similar to that from https://delphi.cmu.edu/covidcast/. |
alpha |
Number between 0 and 1, indicating the transparency level to be used in the maps. For choropleth maps, this determines the transparency level for the mega counties. For bubble maps, this determines the transparency level for the bubbles. Default is 0.5. |
bubble_col |
Bubble color for the bubble map. Default is "purple". |
num_bins |
Number of bins for determining the bubble sizes for the
bubble map (here and throughout, to be precise, by bubble size we mean
bubble area). Default is 8. These bins are evenly-spaced in between the min
and max as specified through the |
title |
Title for the plot. If |
choro_params , bubble_params , line_params
|
Additional parameter lists for the different plot types, for further customization. See details below. |
... |
Additional arguments, for compatibility with |
The following named arguments are supported through the lists
choro_params
, bubble_params
, and line_params
.
For both choropleth and bubble maps:
subtitle
Subtitle for the map.
missing_col
Color assigned to missing or NA geo locations.
border_col
Border color for geo locations.
border_size
Border size for geo locations.
legend_position
Position for legend; use "none" to hide legend.
legend_height
, legend_width
Height and width of the legend.
breaks
Breaks for a custom (discrete) color or size scale. Note
that we must set breaks
to be a vector of the same length as choro_col
for choropleth maps. This works as follows: we assign the i
th color for
choropleth maps, or the i
th size for bubble maps, if and only if the
given value satisfies breaks[i] <= value < breaks[i+1]
, where we take by
convention breaks[0] = -Inf
and breaks[N+1] = Inf
for N = length(breaks)
.
legend_digits
Number of decimal places to show for the legend labels.
For choropleth maps only:
legend_n
Number of values to label on the legend color bar. Ignored
for discrete color scales (when breaks
is set manually).
For bubble maps only:
remove_zero
Should zeros be excluded from the size scale (hence effectively drawn as bubbles of zero size)?
min_size
, max_size
Min size for the size scale.
For line graphs:
xlab
, ylab
Labels for the x-axis and y-axis.
stderr_bands
Should standard error bands be drawn?
stderr_alpha
Transparency level for the standard error bands.
A ggplot
object that can be customized and styled using standard
ggplot2 functions.
covidcast_meta
objectPrints a brief summary of the metadata, and then prints the underlying data
frame, for an object returned by covidcast_meta()
.
## S3 method for class 'covidcast_meta' print(x, ...)
## S3 method for class 'covidcast_meta' print(x, ...)
x |
The |
... |
Additional arguments passed to |
The covidcast_meta
object, unchanged.
covidcast_signal
objectPrints a brief summary of the data source, signal, and geographic level, and
then prints the underlying data frame, for an object returned by
covidcast_signal()
.
## S3 method for class 'covidcast_signal' print(x, ...)
## S3 method for class 'covidcast_signal' print(x, ...)
x |
The |
... |
Additional arguments passed to |
The covidcast_signal
object, unchanged.
Data set on state populations, from the 2019 US Census.
state_census
state_census
Data frame with 53 rows (including one for the United States as a whole, plus the District of Columbia and the Puerto Rico Commonwealth). Important columns:
Geographic summary level.
Census Region code
Census Division code
State FIPS code
Name of the state
Estimate of the state's resident population in 2019.
Estimate of the state's resident population in 2019 that is over 18 years old.
Estimate of the percent of a state's resident population in 2019 that is over 18.
Postal abbreviation of the state
United States Census Bureau, at https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.pdf, https://www.census.gov/data/tables/time-series/demo/popest/2010s-total-puerto-rico-municipios.html, and https://www.census.gov/data/tables/2010/dec/2010-island-areas.html
abbr_to_name()
, name_to_abbr()
, abbr_to_fips()
, fips_to_abbr()
Look up county or metropolitan area names by FIPS or CBSA codes. Looking up FIPS code is done with the first 2 numbers (state) or 5 numbers (county) and therefore can be called with longer FIPS codes.
state_fips_to_name(code) county_fips_to_name(code) cbsa_to_name(code)
state_fips_to_name(code) county_fips_to_name(code) cbsa_to_name(code)
code |
Vector of FIPS or CBSA codes to look up. |
A vector of state, county or metro names.
name_to_fips()
, name_to_cbsa()
state_fips_to_name("42") state_fips_to_name("42003") # same as previous county_fips_to_name("42003") county_fips_to_name("42000") # the county "000" returns the state name cbsa_to_name("38300")
state_fips_to_name("42") state_fips_to_name("42003") # same as previous county_fips_to_name("42003") county_fips_to_name("42000") # the county "000" returns the state name cbsa_to_name("38300")
covidcast_meta
objectPrints a tabular summary of the object returned by covidcast_meta()
,
containing each source and signal and a summary of the geographic levels it
is available at.
## S3 method for class 'covidcast_meta' summary(object, ...)
## S3 method for class 'covidcast_meta' summary(object, ...)
object |
The |
... |
Additional arguments, for compatibility with |
A data frame with one row per unique signal in the metadata, with the following columns:
data_source |
Data source name |
signal |
Signal name |
county |
"*" if this signal is available at the county level, |
msa |
|
dma |
|
hrr |
|
state |
|
hhs |
|
nation |
|
covidcast_signal
objectPrints a variety of summary statistics about the underlying data, such as
median values, the date range included, sample sizes, and so on, for an
object returned by covidcast_signal()
.
## S3 method for class 'covidcast_signal' summary(object, ...)
## S3 method for class 'covidcast_signal' summary(object, ...)
object |
The |
... |
Additional arguments, for compatibility with |
No return value; called only to print summary statistics.