The goal of tilt.company.match is to provide helpers for company name matching in the tilt-project.
Installation
You can install the development version of tilt.company.match from r-universe with:
options(repos = c("https://2degreesinvesting.r-universe.dev", getOption("repos")))
install.packages("tilt.company.match")
Or you can install it from GitHub with:
# install.packages("devtools")
devtools::install_github("2DegreesInvesting/tilt.company.match")
Example
Here is a minimal example of what you can do with the package tilt.company.match. For a complete and gentle walk-through see Get started.
library(vroom, warn.conflicts = FALSE)
library(tilt.company.match)
# TODO: Replace with the path/to/your/real/loanbook.csv
loanbook_csv <- example_file("demo_loanbook.csv")
loanbook_csv
#> [1] "/usr/local/lib/R/site-library/tilt.company.match/extdata/demo_loanbook.csv"
loanbook <- vroom(loanbook_csv, show_col_types = FALSE)
loanbook
#> # A tibble: 12 × 5
#> id company_name postcode country misc_info
#> <dbl> <chr> <chr> <chr> <chr>
#> 1 1 Peasant Peter 01234 germany A
#> 2 2 Peasant Peter 01234 germany Z
#> 3 3 Peasant Peter 11234 germany Z
#> 4 4 Peasant Paul 01234 germany Z
#> 5 5 Bread Bakers Limited 23456 germany C
#> 6 6 Flower Power & Company 34567 germany Z
#> 7 7 Screwdriver Experts 45678 germany D
#> 8 8 Screwdriver Expert 45678 germany Z
#> 9 9 John Meier's Groceries 56789 germany E
#> 10 10 John Meier's Groceries 55555 germany Y
#> 11 11 John Meier's Groceries 55555 norway Y
#> 12 12 Best Bakers 65656 france F
# TODO: Replace with the path/to/your/real/tilt.csv
tilt_csv <- example_file("demo_tilt.csv")
tilt_csv
#> [1] "/usr/local/lib/R/site-library/tilt.company.match/extdata/demo_tilt.csv"
tilt <- vroom(tilt_csv, show_col_types = FALSE)
tilt
#> # A tibble: 11 × 5
#> id company_name postcode country misc_info
#> <dbl> <chr> <chr> <chr> <chr>
#> 1 1 Peasant Peter 01234 germany A
#> 2 2 Peasant Peter 01234 germany Z
#> 3 3 Peasant Peter 11234 germany Z
#> 4 4 Peasant Paul 01234 germany B
#> 5 5 The Bread Bakers Ltd 23456 germany C
#> 6 6 Flower Power Friends and Co. 34567 germany D
#> 7 7 Flower Power and Co. 34567 germany F
#> 8 8 John and Jacques Groceries 56789 germany E
#> 9 9 John and Jacques Groceries 98765 germany E
#> 10 10 John and Jacques Groceries 98765 france E
#> 11 11 Cranes and Friends 65656 france F
check_loanbook(loanbook)
#> Found duplicate(s) on columns company_name, postcode, country of the data set.
#> ✖ Found for the company Peasant Peter, postcode: 01234, country: germany
#> ℹ Please check if these duplicates are intended and have an unique id.
suggest_match(loanbook, tilt)
#> Joining with `by = join_by(id, company_name)`
#> # A tibble: 18 × 15
#> id compa…¹ postc…² country misc_…³ compa…⁴ id_tilt compa…⁵ misc_…⁶ compa…⁷
#> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr>
#> 1 1 Peasan… 01234 germany A peasan… 1 Peasan… A peasan…
#> 2 1 Peasan… 01234 germany A peasan… 2 Peasan… Z peasan…
#> 3 1 Peasan… 01234 germany A peasan… 4 Peasan… B peasan…
#> 4 2 Peasan… 01234 germany Z peasan… 1 Peasan… A peasan…
#> 5 2 Peasan… 01234 germany Z peasan… 2 Peasan… Z peasan…
#> 6 2 Peasan… 01234 germany Z peasan… 4 Peasan… B peasan…
#> 7 3 Peasan… 11234 germany Z peasan… 3 Peasan… Z peasan…
#> 8 4 Peasan… 01234 germany Z peasan… 4 Peasan… B peasan…
#> 9 4 Peasan… 01234 germany Z peasan… 1 Peasan… A peasan…
#> 10 4 Peasan… 01234 germany Z peasan… 2 Peasan… Z peasan…
#> 11 5 Bread … 23456 germany C breadb… 5 The Br… C thebre…
#> 12 6 Flower… 34567 germany Z flower… 7 Flower… F flower…
#> 13 6 Flower… 34567 germany Z flower… 6 Flower… D flower…
#> 14 7 Screwd… 45678 germany D screwd… NA <NA> <NA> <NA>
#> 15 8 Screwd… 45678 germany Z screwd… NA <NA> <NA> <NA>
#> 16 9 John M… 56789 germany E johnme… 8 John a… E johnja…
#> 17 10 John M… 55555 germany Y johnme… NA <NA> <NA> <NA>
#> 18 11 John M… 55555 norway Y johnme… NA <NA> <NA> <NA>
#> # … with 5 more variables: postcode_tilt <chr>, country_tilt <chr>,
#> # similarity <dbl>, suggest_match <lgl>, accept_match <lgl>, and abbreviated
#> # variable names ¹company_name, ²postcode, ³misc_info, ⁴company_alias,
#> # ⁵company_name_tilt, ⁶misc_info_tilt, ⁷company_alias_tilt