R packages to implement the core functionality of the Paris Agreement Capital Transition Assessment (PACTA)
Adapted from https://jdhoffa.github.io/blog/r2dii-suite-is-now-on-cran/.
The core suite of r2dii packages (r2dii.data, r2dii.match and r2dii.analysis), are now published on CRAN!
The r2dii suite was developed to implement the core functionality of the Paris Agreement Capital Transition Assessment (PACTA) in R. PACTA is a methodology which allows financial institutions to aggregate climate-related data associated to their portfolios, and compare the result against climate scenarios.
You can easily install all three packages directly from CRAN using:
r2dii <- c("r2dii.data", "r2dii.match", "r2dii.analysis")
install.packages(r2dii)
To get the latest development version any one package, you can install it from GitHub. Fir example, you can install r2dii.data from GitHub using:
devtools::install_github("2DegreesInvesting/r2dii.data")
Warning: Development versions are experimental and may be unstable. Proceed with caution.
You can then use each package with library()
.
library(r2dii.data)
library(r2dii.match)
library(r2dii.analysis)
This example also uses two optional but convenient packages to manipulate and plot data:
library(dplyr)
library(ggplot2)
r2dii.data provides a series of fake datasets. These are mainly meant to be used as a tool to learn how the methodology works, and also to provide an easy sandbox for testing. The built-in data_dictionary
provides a view of all datasets in the package, along with variable definitions:
data_dictionary %>%
distinct(dataset)
#> # A tibble: 13 x 1
#> dataset
#> <chr>
#> 1 ald_demo
#> 2 co2_intensity_scenario_demo
#> 3 data_dictionary
#> 4 isic_classification
#> 5 iso_codes
#> 6 loanbook_demo
#> 7 nace_classification
#> 8 naics_classification
#> 9 overwrite_demo
#> 10 region_isos
#> 11 region_isos_demo
#> 12 scenario_demo_2020
#> 13 sector_classifications
Some of the most useful datasets are:
loanbook_demo
: a loanbook dataset which can be used as a template to format real loanbook/ portfolio data.ald_demo
: a so-called “asset-level” dataset, containing crucial climate data at the asset level. (eg. power-plant capacity data, automotive car production, steel company emission factors)scenario_demo_2020
: a fake climate scenario, offering technology pathways that would likely limit warming to less than 2 degrees Celsius.r2dii.match provides the tools necessary to match the counter-parties in a loanbook to the climate data of the assets that they own.
Owing to the fact that many lending portfolios are exposed to mid- and small- size companies, which are often not publicly-listed, matching data can be tricky. To achieve these links, we have written wrappers around some common fuzzy matching algorithms, which play nicely with formatted loanbook
’s and ald
’s.
You can run the matching algorithm on sample data provided by r2dii.data
:
matched <- match_name(loanbook_demo, ald_demo)
matched %>%
select(name, sector, name_ald, sector_ald, score)
#> # A tibble: 502 x 5
#> name sector name_ald sector_ald score
#> <chr> <chr> <chr> <chr> <dbl>
#> 1 Aston Martin automot… aston martin automotive 1
#> 2 Avtozaz automot… avtozaz automotive 1
#> 3 Bogdan automot… bogdan automotive 1
#> 4 Ch Auto automot… ch auto automotive 1
#> 5 Chehejia automot… chehejia automotive 1
#> 6 Chtc Auto automot… chtc auto automotive 1
#> 7 Dongfeng Honda automot… dongfeng honda automotive 1
#> 8 Dongfeng-Luxgen automot… dongfeng-luxgen automotive 1
#> 9 Electric Mobility S… automot… electric mobility s… automotive 1
#> 10 Faraday Future automot… faraday future automotive 1
#> # … with 492 more rows
After the initial matching, you must manually verify which of the matches you would like to keep. To do so, simply save matched
as a .csv, and open in excel or similar. If you would like to keep a match, write the score
to 1
. If you don’t want the match, leave the score
as anything but 1
:
matched %>%
filter(score != 1) %>%
arrange(desc(score)) %>%
select(name, sector, name_ald, sector_ald, score)
#> # A tibble: 30 x 5
#> name sector name_ald sector_ald score
#> <chr> <chr> <chr> <chr> <dbl>
#> 1 Bhushan Energy Ltd. power bhagwan energy ltd. power 0.906
#> 2 Cementos San Juan cement cemento sur cement 0.900
#> 3 Shandong Qiyin Cem… cement shangfeng cement group cement 0.883
#> 4 Handong Shipbuildi… shippi… han dong shipping 0.856
#> 5 Hanil Express Co L… shippi… han sung line co ltd shipping 0.853
#> 6 Nandi Roller Inc. power nandi roller flour mi… power 0.851
#> 7 Yuanbsaoshan Power… power yiyang baoyuan power … power 0.849
#> 8 Yuama Inc. power ykk usa, inc. power 0.843
#> 9 Hanil Express Co L… shippi… hanison shipping 0.838
#> 10 Hanil Express Co L… shippi… hana marine co ltd shipping 0.835
#> # … with 20 more rows
The final step is to prioritize your matches. This ensures that the same loan isn’t accidentally matched to multiple ald
companies.
Read the validated match file back into R and run:
match_result <- matched %>%
prioritize()
match_result %>%
select(name_direct_loantaker, loan_size_outstanding, sector, name_ald)
#> # A tibble: 267 x 4
#> name_direct_loantaker loan_size_outstan… sector name_ald
#> <chr> <dbl> <chr> <chr>
#> 1 Shaanxi Auto 396377 automot… shaanxi auto
#> 2 Shandong Auto 319353 automot… shandong auto
#> 3 Shandong Kama 258105 automot… shandong kama
#> 4 Shandong Tangjun Ouli… 332345 automot… shandong tangju…
#> 5 Shanghai Automotive I… 203353 automot… shanghai automo…
#> 6 Shanxi Dayun 329561 automot… shanxi dayun
#> 7 Shenyang Polarsun 261817 automot… shenyang polars…
#> 8 Shuanghuan Auto 337913 automot… shuanghuan auto
#> 9 Sichuan Auto 227481 automot… sichuan auto
#> 10 Singulato 334201 automot… singulato
#> # … with 257 more rows
The final step is to analyze your matched dataset, and compare the results to a climate scenario.
You can apply scenario targets to your loanbook data by calling:
portfolio_targets <- match_result %>%
target_market_share(
ald_demo,
scenario_demo_2020,
region_isos_demo
)
portfolio_targets
#> # A tibble: 1,170 x 7
#> sector technology year region scenario_source weighted_produc…
#> <chr> <chr> <int> <chr> <chr> <chr>
#> 1 autom… electric 2020 global demo_2020 projected
#> 2 autom… electric 2020 global demo_2020 normalized_corp…
#> 3 autom… electric 2020 global demo_2020 target_cps
#> 4 autom… electric 2020 global demo_2020 target_sds
#> 5 autom… electric 2020 global demo_2020 target_sps
#> 6 autom… hybrid 2020 global demo_2020 projected
#> 7 autom… hybrid 2020 global demo_2020 normalized_corp…
#> 8 autom… hybrid 2020 global demo_2020 target_cps
#> 9 autom… hybrid 2020 global demo_2020 target_sds
#> 10 autom… hybrid 2020 global demo_2020 target_sps
#> # … with 1,160 more rows, and 1 more variable:
#> # weighted_production_value <dbl>
And you can plot the distribution of technologies using ggplot2
:
portfolio_targets %>%
# different targets can be plotted by setting these filters
filter(
sector == "power",
weighted_production_metric != "normalized_corporate_economy",
year == max(year)
) %>%
group_by(technology) %>%
ggplot(aes(
x = weighted_production_metric,
y = weighted_production_value,
fill = technology
)) +
geom_col(position = "fill") +
labs(
x = "Metric",
y = "Weighted Capacity [%]"
)
For attribution, please cite this work as
Hoffart (2020, June 30). Data science at 2DII: r2dii suite is now on CRAN. Retrieved from https://2degreesinvesting.github.io/posts/2020-06-30-r2dii-suite-is-now-on-cran/
BibTeX citation
@misc{hoffart2020r2dii, author = {Hoffart, Jackson}, title = {Data science at 2DII: r2dii suite is now on CRAN}, url = {https://2degreesinvesting.github.io/posts/2020-06-30-r2dii-suite-is-now-on-cran/}, year = {2020} }