Data science at 2DII: r2dii suite is now on CRAN

Jackson Hoffart

Adapted from https://jdhoffa.github.io/blog/r2dii-suite-is-now-on-cran/.

The core suite of r2dii packages (r2dii.data, r2dii.match and r2dii.analysis), are now published on CRAN!

The r2dii suite was developed to implement the core functionality of the Paris Agreement Capital Transition Assessment (PACTA) in R. PACTA is a methodology which allows financial institutions to aggregate climate-related data associated to their portfolios, and compare the result against climate scenarios.

Installation

You can easily install all three packages directly from CRAN using:


r2dii <- c("r2dii.data", "r2dii.match", "r2dii.analysis")
install.packages(r2dii)

To get the latest development version any one package, you can install it from GitHub. Fir example, you can install r2dii.data from GitHub using:


devtools::install_github("2DegreesInvesting/r2dii.data")

Warning: Development versions are experimental and may be unstable. Proceed with caution.

You can then use each package with library().


library(r2dii.data)
library(r2dii.match)
library(r2dii.analysis)

This example also uses two optional but convenient packages to manipulate and plot data:


library(dplyr)
library(ggplot2)

r2dii.data

r2dii.data provides a series of fake datasets. These are mainly meant to be used as a tool to learn how the methodology works, and also to provide an easy sandbox for testing. The built-in data_dictionary provides a view of all datasets in the package, along with variable definitions:


data_dictionary %>%
  distinct(dataset)
#> # A tibble: 13 x 1
#>    dataset                    
#>    <chr>                      
#>  1 ald_demo                   
#>  2 co2_intensity_scenario_demo
#>  3 data_dictionary            
#>  4 isic_classification        
#>  5 iso_codes                  
#>  6 loanbook_demo              
#>  7 nace_classification        
#>  8 naics_classification       
#>  9 overwrite_demo             
#> 10 region_isos                
#> 11 region_isos_demo           
#> 12 scenario_demo_2020         
#> 13 sector_classifications

Some of the most useful datasets are:

loanbook_demo: a loanbook dataset which can be used as a template to format real loanbook/ portfolio data.
ald_demo: a so-called “asset-level” dataset, containing crucial climate data at the asset level. (eg. power-plant capacity data, automotive car production, steel company emission factors)
scenario_demo_2020: a fake climate scenario, offering technology pathways that would likely limit warming to less than 2 degrees Celsius.

r2dii.match

r2dii.match provides the tools necessary to match the counter-parties in a loanbook to the climate data of the assets that they own.

Owing to the fact that many lending portfolios are exposed to mid- and small- size companies, which are often not publicly-listed, matching data can be tricky. To achieve these links, we have written wrappers around some common fuzzy matching algorithms, which play nicely with formatted loanbook’s and ald’s.

You can run the matching algorithm on sample data provided by r2dii.data:


matched <- match_name(loanbook_demo, ald_demo)

matched %>%
  select(name, sector, name_ald, sector_ald, score)
#> # A tibble: 502 x 5
#>    name                 sector   name_ald             sector_ald score
#>    <chr>                <chr>    <chr>                <chr>      <dbl>
#>  1 Aston Martin         automot… aston martin         automotive     1
#>  2 Avtozaz              automot… avtozaz              automotive     1
#>  3 Bogdan               automot… bogdan               automotive     1
#>  4 Ch Auto              automot… ch auto              automotive     1
#>  5 Chehejia             automot… chehejia             automotive     1
#>  6 Chtc Auto            automot… chtc auto            automotive     1
#>  7 Dongfeng Honda       automot… dongfeng honda       automotive     1
#>  8 Dongfeng-Luxgen      automot… dongfeng-luxgen      automotive     1
#>  9 Electric Mobility S… automot… electric mobility s… automotive     1
#> 10 Faraday Future       automot… faraday future       automotive     1
#> # … with 492 more rows

After the initial matching, you must manually verify which of the matches you would like to keep. To do so, simply save matched as a .csv, and open in excel or similar. If you would like to keep a match, write the score to 1. If you don’t want the match, leave the score as anything but 1:


matched %>%
  filter(score != 1) %>%
  arrange(desc(score)) %>%
  select(name, sector, name_ald, sector_ald, score)
#> # A tibble: 30 x 5
#>    name                sector  name_ald               sector_ald score
#>    <chr>               <chr>   <chr>                  <chr>      <dbl>
#>  1 Bhushan Energy Ltd. power   bhagwan energy ltd.    power      0.906
#>  2 Cementos San Juan   cement  cemento sur            cement     0.900
#>  3 Shandong Qiyin Cem… cement  shangfeng cement group cement     0.883
#>  4 Handong Shipbuildi… shippi… han dong               shipping   0.856
#>  5 Hanil Express Co L… shippi… han sung line co ltd   shipping   0.853
#>  6 Nandi Roller Inc.   power   nandi roller flour mi… power      0.851
#>  7 Yuanbsaoshan Power… power   yiyang baoyuan power … power      0.849
#>  8 Yuama Inc.          power   ykk usa, inc.          power      0.843
#>  9 Hanil Express Co L… shippi… hanison                shipping   0.838
#> 10 Hanil Express Co L… shippi… hana marine co ltd     shipping   0.835
#> # … with 20 more rows

The final step is to prioritize your matches. This ensures that the same loan isn’t accidentally matched to multiple ald companies.

Read the validated match file back into R and run:


match_result <- matched %>%
  prioritize()

match_result %>%
  select(name_direct_loantaker, loan_size_outstanding, sector, name_ald)
#> # A tibble: 267 x 4
#>    name_direct_loantaker  loan_size_outstan… sector   name_ald        
#>    <chr>                               <dbl> <chr>    <chr>           
#>  1 Shaanxi Auto                       396377 automot… shaanxi auto    
#>  2 Shandong Auto                      319353 automot… shandong auto   
#>  3 Shandong Kama                      258105 automot… shandong kama   
#>  4 Shandong Tangjun Ouli…             332345 automot… shandong tangju…
#>  5 Shanghai Automotive I…             203353 automot… shanghai automo…
#>  6 Shanxi Dayun                       329561 automot… shanxi dayun    
#>  7 Shenyang Polarsun                  261817 automot… shenyang polars…
#>  8 Shuanghuan Auto                    337913 automot… shuanghuan auto 
#>  9 Sichuan Auto                       227481 automot… sichuan auto    
#> 10 Singulato                          334201 automot… singulato       
#> # … with 257 more rows

r2dii.analysis

The final step is to analyze your matched dataset, and compare the results to a climate scenario.

You can apply scenario targets to your loanbook data by calling:


portfolio_targets <- match_result %>%
  target_market_share(
    ald_demo,
    scenario_demo_2020,
    region_isos_demo
  )

portfolio_targets
#> # A tibble: 1,170 x 7
#>    sector technology  year region scenario_source weighted_produc…
#>    <chr>  <chr>      <int> <chr>  <chr>           <chr>           
#>  1 autom… electric    2020 global demo_2020       projected       
#>  2 autom… electric    2020 global demo_2020       normalized_corp…
#>  3 autom… electric    2020 global demo_2020       target_cps      
#>  4 autom… electric    2020 global demo_2020       target_sds      
#>  5 autom… electric    2020 global demo_2020       target_sps      
#>  6 autom… hybrid      2020 global demo_2020       projected       
#>  7 autom… hybrid      2020 global demo_2020       normalized_corp…
#>  8 autom… hybrid      2020 global demo_2020       target_cps      
#>  9 autom… hybrid      2020 global demo_2020       target_sds      
#> 10 autom… hybrid      2020 global demo_2020       target_sps      
#> # … with 1,160 more rows, and 1 more variable:
#> #   weighted_production_value <dbl>

And you can plot the distribution of technologies using ggplot2:


portfolio_targets %>%
  # different targets can be plotted by setting these filters
  filter(
    sector == "power",
    weighted_production_metric != "normalized_corporate_economy",
    year == max(year)
  ) %>%
  group_by(technology) %>%
  ggplot(aes(
    x = weighted_production_metric, 
    y = weighted_production_value, 
    fill = technology
  )) +
  geom_col(position = "fill") +
  labs(
    x = "Metric",
    y = "Weighted Capacity [%]"
  )

Comment on this article Share:

r2dii suite is now on CRAN

Installation

r2dii.data

r2dii.match

r2dii.analysis

Citation