Data science at 2DII: r2dii.data 0.1.2 and r2dii.match 0.0.4 are now on CRAN

Mauro Lepore

r2dii.data 0.1.2 and r2dii.match 0.0.4 are now on CRAN. These packages provide datasets and tools to align financial markets to climate goals. These releases fix a number of bugs that you can learn about here and here; this post shows enhancements and new features.

You can install r2dii.data and r2dii.match from CRAN with:


install.packages("r2dii.data")
install.packages("r2dii.match")

Then use them with:


library(r2dii.data)
library(r2dii.match)

r2dii.data 0.1.2

r2dii.data 0.1.2 includes two new dataset – green_or_brown, and sic_classification (thanks to Daisy Pacheco and George Harris).


green_or_brown
#> # A tibble: 16 x 3
#>    sector       technology    green_or_brown
#>    <chr>        <chr>         <chr>         
#>  1 automotive   electric      green         
#>  2 automotive   hybrid        green         
#>  3 automotive   ice           brown         
#>  4 automotive   fuelcell      green         
#>  5 power        hydrocap      green         
#>  6 power        renewablescap green         
#>  7 power        coalcap       brown         
#>  8 power        gascap        brown         
#>  9 power        oilcap        brown         
#> 10 power        nuclearcap    green         
#> 11 oil and gas  oil           brown         
#> 12 oil and gas  gas           brown         
#> 13 coal         coal          brown         
#> 14 fossil fuels oil           brown         
#> 15 fossil fuels gas           brown         
#> 16 fossil fuels coal          brown

sic_classification
#> # A tibble: 256 x 4
#>    code  description                              sector    borderline
#>    <chr> <chr>                                    <chr>     <lgl>     
#>  1 0     private households, exterritorial organ… not in s… FALSE     
#>  2 00000 private households, exterritorial organ… not in s… FALSE     
#>  3 11110 growing of cereals and other crops n.e.… not in s… FALSE     
#>  4 11130 growing of fruit, nuts, beverage and sp… not in s… FALSE     
#>  5 11210 farming  of cattle, sheep, goats, horse… not in s… FALSE     
#>  6 11300 growing of crops combined with farming … not in s… FALSE     
#>  7 12100 forestry and related services            not in s… FALSE     
#>  8 12200 logging and related services             not in s… FALSE     
#>  9 13100 ocean and coastal fishing                not in s… FALSE     
#> 10 21000 mining of coal and lignite               coal      FALSE     
#> # … with 246 more rows

Also, region_isos gained data from ETP 2017, and ald_demo dropped the column number_of_assets (thanks to Taylor Posey).


unique(region_isos$source)
#> [1] "weo_2019" "etp_2017"

any(grepl("number_of_assets", names(ald_demo)))
#> [1] FALSE

r2dii.match 0.0.4

match_name() now outputs a new column – borderline. This column helps you measure how much of your loanbook matched some asset; see the new article Calculating matching coverage.


loanbook <- loanbook_demo
ald <- ald_demo

matched <- match_name(loanbook, ald)
tail(names(matched))
#> [1] "sector_ald" "name"       "name_ald"   "score"      "source"    
#> [6] "borderline"

Also, match_name() now runs faster and uses less memory. This responds to users’s feedback, diligently managed by George Harris – thanks! If you still run out of memory, read Using match_name() with large loanbooks: How to resolve memory issues and Improving r2dii.match: How to work with big data, and benchmarks of a more efficient version of match_name(). You may also want to reduce the size of your data: use the new function crucial_lbk() to select the minimum columns you need for match_name().


ncol(loanbook)
#> [1] 19

crucial_lbk()
#> [1] "id_ultimate_parent"                    
#> [2] "name_ultimate_parent"                  
#> [3] "id_direct_loantaker"                   
#> [4] "name_direct_loantaker"                 
#> [5] "sector_classification_system"          
#> [6] "sector_classification_direct_loantaker"

smaller_loanbook <- loanbook[crucial_lbk()]
ncol(smaller_loanbook)
#> [1] 6

match_name(smaller_loanbook, ald)
#> # A tibble: 497 x 15
#>    id_ultimate_par… name_ultimate_p… id_direct_loant… name_direct_loa…
#>    <chr>            <chr>            <chr>            <chr>           
#>  1 UP15             Alpine Knits In… C294             Yuamen Xinneng …
#>  2 UP288            University Of I… C292             Yuama Ethanol L…
#>  3 UP104            Garland Power &… C305             Yukon Energy Co…
#>  4 UP104            Garland Power &… C305             Yukon Energy Co…
#>  5 UP83             Earthpower Tech… C304             Yukon Developme…
#>  6 UP83             Earthpower Tech… C304             Yukon Developme…
#>  7 UP163            Kraftwerk Mehru… C303             Yueyang City Co…
#>  8 UP138            Jai Bharat Gum … C301             Yuedxiu Corp One
#>  9 UP32             Bhagwan Energy … C302             Yuexi County AA…
#> 10 UP81             Dynegy Midwest … C309             Yuxi ounty Liua…
#> # … with 487 more rows, and 11 more variables:
#> #   sector_classification_system <chr>,
#> #   sector_classification_direct_loantaker <dbl>, id_2dii <chr>,
#> #   level <chr>, sector <chr>, sector_ald <chr>, name <chr>,
#> #   name_ald <chr>, score <dbl>, source <chr>, borderline <lgl>

Acknowledgements

While this release includes commits from only a few of us (jdhoffa, maurolepore), it is thanks to feedback from our colleagues and users.

Comment on this article Share:

r2dii.data 0.1.2 and r2dii.match 0.0.4 are now on CRAN

Table of Contents

r2dii.data 0.1.2

r2dii.match 0.0.4

Acknowledgements

Citation