score
is 1 and level
per loan is of highest priority
R/prioritize.R
prioritize.Rd
When multiple perfect matches are found per loan (e.g. a match at
direct_loantaker
level and ultimate_parent
level), we must prioritize the
desired match. By default, the highest priority
is the most granular match
(i.e. direct_loantaker
).
prioritize(data, priority = NULL)
data | A data frame like the validated output of |
---|---|
priority | One of:
|
A data frame with a single row per loan, where score
is 1 and
priority level is highest.
How to validate data
Write the output of match_name()
into a .csv file with:
# Writting to current working directory matched %>% readr::write_csv("matched.csv")
Compare, edit, and save the data manually:
Open matched.csv with any spreadsheet editor (Excel, Google Sheets, etc.).
Compare the columns name
and name_ald
manually to determine if
the match is valid. Other information can be used in conjunction
with just the names to ensure the two entities match (sector,
internal information on the company structure, etc.)
Edit the data:
If you are happy with the match, set the score
value to 1
.
Otherwise set or leave the score
value to anything other than
1
.
Save the edited file as, say, valid_matches.csv.
Re-read the edited file (validated) with:
# Reading from current working directory valid_matches <- readr::read_csv("valid_matches.csv")
This function ignores but preserves existing groups.
match_name()
, prioritize_level()
.
Other main functions:
match_name()
#> #>#>#> #>#>#> #># styler: off matched <- tribble( ~sector, ~sector_ald, ~score, ~id_loan, ~level, "coal", "coal", 1, "aa", "ultimate_parent", "coal", "coal", 1, "aa", "direct_loantaker", "coal", "coal", 1, "bb", "intermediate_parent", "coal", "coal", 1, "bb", "ultimate_parent", ) # styler: on prioritize_level(matched)#> [1] "direct_loantaker" "intermediate_parent" "ultimate_parent"# Using default priority prioritize(matched)#> # A tibble: 2 x 5 #> sector sector_ald score id_loan level #> <chr> <chr> <dbl> <chr> <chr> #> 1 coal coal 1 aa direct_loantaker #> 2 coal coal 1 bb intermediate_parent# Using the reverse of the default priority prioritize(matched, priority = rev)#> # A tibble: 2 x 5 #> sector sector_ald score id_loan level #> <chr> <chr> <dbl> <chr> <chr> #> 1 coal coal 1 aa ultimate_parent #> 2 coal coal 1 bb ultimate_parent#> # A tibble: 2 x 5 #> sector sector_ald score id_loan level #> <chr> <chr> <dbl> <chr> <chr> #> 1 coal coal 1 aa ultimate_parent #> 2 coal coal 1 bb ultimate_parent# Using a custom priority bad_idea <- c("intermediate_parent", "ultimate_parent", "direct_loantaker") prioritize(matched, priority = bad_idea)#> # A tibble: 2 x 5 #> sector sector_ald score id_loan level #> <chr> <chr> <dbl> <chr> <chr> #> 1 coal coal 1 bb intermediate_parent #> 2 coal coal 1 aa ultimate_parent