Squashing bugs before the holidays.
r2dii.data 0.1.6, r2dii.match 0.0.7 and r2dii.analysis 0.1.3 are now on CRAN. These releases fix a number of bugs. All changes are listed in the changelog at the website of each package (r2dii.data, r2dii.match, r2dii.analysis); this post highlights the major bugs fixes and feature additions.
You can update or install these packages from CRAN using:
install.packages("r2dii.data")
install.packages("r2dii.match")
install.packages("r2dii.analysis")
To use r2dii packages, you can load them into your active R session with library()
. This example uses some other packages that you may also load now.
library(dplyr, warn.conflicts = FALSE)
library(ggplot2)
library(r2dii.data)
library(r2dii.match)
library(r2dii.analysis)
Most changes to r2dii.data are with updates to the internal region
and classification
files. In particular, the cnb_classification
dataset was added, to facilitate the sector classification of Nigerian banks:
cnb_classification %>%
filter(sector == "aviation") # show some illustrative results for aviation
#> # A tibble: 2 x 5
#> original_code code_level code sector borderline
#> <chr> <dbl> <dbl> <chr> <lgl>
#> 1 TRANSPORTATION AND STORAGE 1 1800 aviation TRUE
#> 2 AIR TRANSPORT 2 1804 aviation FALSE
The country code for Kosovo was also updated to the correct value of xk
:
filter(iso_codes, country == "kosovo")
#> # A tibble: 1 x 2
#> country country_iso
#> <chr> <chr>
#> 1 kosovo xk
The only user-facing change to r2dii.match was that match_name()
gained the argument ...
to pass arguments to stringdist::stringsim()
. Most users won’t need this feature:
matched <- match_name(loanbook_demo, ald_demo)
matched %>%
filter(score != 1) %>%
select(name, name_ald, score)
#> # A tibble: 30 x 3
#> name name_ald score
#> <chr> <chr> <dbl>
#> 1 Yuama Inc. ykk usa, inc. 0.843
#> 2 Yukon Energy Corp 1736 yukon development corp 0.813
#> 3 Yuanbsaoshan Power Generati… aba hydropower generation co ltd 0.809
#> 4 Yuanbsaoshan Power Generati… yiyang baoyuan power generation… 0.849
#> 5 Bhushan Energy Ltd. bhagwan energy ltd. 0.906
#> 6 York Research Corporation yorkshire windpower limited 0.810
#> 7 York Research Corporation yorkshire water services ltd 0.815
#> 8 York Research Corporation york cogeneration corporation 0.832
#> 9 Yolo Regional District yolo county 0.81
#> 10 Yit Corporation international finance corp 0.804
#> # … with 20 more rows
However, advanced users can now specify many different parameters of the different methods available to stringdist::stringsim
. For example, the q
parameter of the qgram
method can now be tweaked directly:
matched_qgram <- match_name(
loanbook_demo,
ald_demo,
method = "qgram",
q = 1.5
)
matched_qgram %>%
filter(score != 1) %>%
select(name, name_ald, score)
#> # A tibble: 11 x 3
#> name name_ald score
#> <chr> <chr> <dbl>
#> 1 Yuamen Changyuan Hydropower Co.… yiyang baoyuan power genera… 0.825
#> 2 China Electric Power (Fujian) D… yingjiang mingyu electric p… 0.826
#> 3 China Electric Power (Fujian) D… nanchong electric power dev… 0.855
#> 4 Yuexi County AAAA Xingguang Ele… yingjiang mengyuan electric… 0.838
#> 5 Yuanbsaoshan Power Generation C… aba hydropower generation c… 0.830
#> 6 Yuanbsaoshan Power Generation C… yiyang baoyuan power genera… 0.865
#> 7 Bhushan Energy Ltd. bhagwan energy ltd. 0.818
#> 8 Cementos San Juan cimento nassau 0.852
#> 9 Qujing Cement quang tri cement 0.84
#> 10 Naphtha Israel Petroleum Corp L… japan petroleum exploration… 0.85
#> 11 Kinder Morgan Inc/De diamondback energy inc 0.838
For more details see ?stringdist::stringsim
.
There are a few major fixes in this release of r2dii.analysis, in particular to the functions target_market_share
and target_sda
.
To explain those bugfixes, let’s calculate some more demo data frames:
validated_matched <- matched %>%
prioritize()
# loan weighted portfolio-level results
portfolio_market_share_targets <- target_market_share(
validated_matched,
ald_demo,
scenario_demo_2020,
region_isos_demo
)
# unweighted company-level results
company_market_share_targets <- target_market_share(
validated_matched,
ald_demo,
scenario_demo_2020,
region_isos_demo,
by_company = TRUE,
weight_production = FALSE
)
target_market_share
outputs a data frame with the value technology_share
. This value is expected to always sum to 1
when aggregated across appropriate groupings (e.g. by sector
, region
, metric
). Before, this was true only for results at company level, but not portfolio level. Now, this is fixed and technology_share
sums to 1
at both levels (at least to 10 significant digits; further digits may be off due to rounding errors):
portfolio_sum <- portfolio_market_share_targets %>%
group_by(sector, metric, region, year) %>%
summarize(share_sum = sum(technology_share), .groups = "drop")
# true for all, to 10 significant digits
all(round(portfolio_sum$share_sum, 10) == 1)
#> [1] TRUE
company_sum <- company_market_share_targets %>%
group_by(sector, metric, region, year, name_ald) %>%
summarize(share_sum = sum(technology_share), .groups = "drop")
# true for all, to 10 significant digits
all(round(company_sum$share_sum, 10) == 1)
#> [1] TRUE
The function target_sda
calculates a target pathway for emission_factor
by computing a convergence target to the year 2050, as detailed in this article. A bug in the function was causing this target to be calculated erroneously early, to the last year present in the input ald
file.
The convergence targets are now correctly calculated to the final year present in the input scenario
:
matched <- tibble::tribble(
~id_loan, ~loan_size_outstanding, ~loan_size_outstanding_currency, ~loan_size_credit_limit, ~loan_size_credit_limit_currency, ~id_2dii, ~level, ~score, ~sector, ~name_ald, ~sector_ald,
"L1", 1, "EUR", 2, "EUR", "UP1", "ultimate_parent", 1, "steel", "company a", "steel"
)
# an ALD file with values only between 2020 and 2030
ald <- tibble::tribble(
~name_company, ~sector, ~technology, ~year, ~production, ~emission_factor, ~plant_location, ~is_ultimate_owner,
"company a", "steel", "steel", 2020, 1, 1.5, "DE", TRUE,
"company a", "steel", "steel", 2025, 1, 1.5, "DE", TRUE,
"company a", "steel", "steel", 2030, 1, 1.5, "DE", TRUE,
"company b", "steel", "steel", 2020, 1, 2.5, "DE", TRUE,
"company b", "steel", "steel", 2025, 1, 2.5, "DE", TRUE,
"company b", "steel", "steel", 2030, 1, 2.5, "DE", TRUE
)
# a scenario file with targets at 2050
co2_scenario <- tibble::tribble(
~scenario_source, ~scenario, ~sector, ~region, ~year, ~emission_factor, ~emission_factor_unit,
"etp_2017", "b2ds", "steel", "global", 2020, 2, "tonnes of CO2 per tonne of steel",
"etp_2017", "b2ds", "steel", "global", 2025, 1.9, "tonnes of CO2 per tonne of steel",
"etp_2017", "b2ds", "steel", "global", 2030, 1.8, "tonnes of CO2 per tonne of steel",
"etp_2017", "b2ds", "steel", "global", 2050, 0.25, "tonnes of CO2 per tonne of steel",
)
# SDA portfolio-level results
sda_targets <- target_sda(
matched,
ald,
co2_scenario
)
ggplot(
data = sda_targets,
mapping = aes(
x = year,
y = emission_factor_value,
color = emission_factor_metric
)
) +
geom_line() +
facet_wrap(~sector)
While this release includes commits from only a few of us (jdhoffa, maurolepore), it is thanks to feedback from our colleagues and users.
For attribution, please cite this work as
Hoffart (2020, Dec. 7). Data science at 2DII: r2dii.data 0.1.6, r2dii.match 0.0.7 and r2dii.analysis 0.1.3 are now on CRAN. Retrieved from https://2degreesinvesting.github.io/posts/2020-12-07-r2dii-analysis-0-1-3-is-now-on-cran/
BibTeX citation
@misc{hoffart2020r2dii.data, author = {Hoffart, Jackson}, title = {Data science at 2DII: r2dii.data 0.1.6, r2dii.match 0.0.7 and r2dii.analysis 0.1.3 are now on CRAN}, url = {https://2degreesinvesting.github.io/posts/2020-12-07-r2dii-analysis-0-1-3-is-now-on-cran/}, year = {2020} }