Assign an additional name to an entity
to_alias.Rd
to_alias()
takes any character vector and creates an alias by transforming the input (a) to lower case; (b) to latin-ascii characters; and (c) to standard abbreviations of ownership types. Commonly, the inputs are values from the columnsname_direct_loantaker
orname_ultimate_parent
of a loanbook dataset, or from the columnname_company
of an asset-level dataset.from_name_to_alias()
outputs a table giving default strings used to convert from a name to its alias. You may amend this table and pass it toto_alias()
via thefrom_to
argument.
Source
r2dii.match version 0.1.3.
Arguments
- x
Character string, commonly from the columns
name_direct_loantaker
orname_ultimate_parent
of a loanbook dataset, or from the columnname_company
of an asset-level dataset.- from_to
A data frame with replacement rules to be applied, contains columns
from
(for initial values) andto
(for resulting values).- ownership
vector of company ownership types to be distinguished for cut-off or separation.
- remove_ownership
Flag that defines whether ownership type (like llc) should be cut-off.
Value
to_alias()
returns a character string.from_name_to_alias()
returns a tibble::tibble with columnsfrom
andto
.
Assigning aliases
The transformation process used to compare names between loanbook and tilt datasets applies best practices commonly used in name matching algorithms:
Remove special characters.
Replace language specific characters.
Abbreviate certain names to reduce their importance in the matching.
Spell out numbers to increase their importance.
Author
person(given = "Evgeny", family = "Petrovsky", role = c("aut", "ctr"))
Adapted from: https://github.com/RMI-PACTA/r2dii.match/blob/main/R/to_alias.R
Examples
library(dplyr)
to_alias("A. and B")
#> [1] "ab"
to_alias("Acuity Brands Inc")
#> [1] "acuitybrands inc"
to_alias(c("3M Company", "Abbott Laboratories", "AbbVie Inc."))
#> [1] "threem co" "abbottlaboratories" "abbvie inc"
custom_replacement <- tibble(from = "AAAA", to = "B")
to_alias("Aa Aaaa", from_to = custom_replacement)
#> [1] "aab"
neutral_replacement <- tibble(from = character(0), to = character(0))
to_alias("Company Name Owner", from_to = neutral_replacement)
#> [1] "companynameowner"
to_alias(
"Company Name Owner",
from_to = neutral_replacement,
ownership = "owner",
remove_ownership = TRUE
)
#> [1] "companyname"
from_name_to_alias()
#> # A tibble: 96 × 2
#> from to
#> <chr> <chr>
#> 1 " and " " & "
#> 2 " en " " & "
#> 3 " och " " & "
#> 4 " und " " & "
#> 5 "(pjsc)" ""
#> 6 "(pte)" ""
#> 7 "(pvt)" ""
#> 8 "0" "null"
#> 9 "1" "one"
#> 10 "2" "two"
#> # … with 86 more rows
append_replacements <- from_name_to_alias() %>%
add_row(
.before = 1,
from = c("AA", "BB"), to = c("alpha", "beta")
)
append_replacements
#> # A tibble: 98 × 2
#> from to
#> <chr> <chr>
#> 1 "AA" "alpha"
#> 2 "BB" "beta"
#> 3 " and " " & "
#> 4 " en " " & "
#> 5 " och " " & "
#> 6 " und " " & "
#> 7 "(pjsc)" ""
#> 8 "(pte)" ""
#> 9 "(pvt)" ""
#> 10 "0" "null"
#> # … with 88 more rows
# And in combination with `to_alias()`
to_alias(c("AA", "BB", "1"), from_to = append_replacements)
#> [1] "alpha" "beta" "one"