Interface MultiMatcher<E>
- Type Parameters:
E
- the type of the elements being matched.
- All Known Implementing Classes:
MultiMatcher.Default
public interface MultiMatcher<E>
Exclusviely means each element in both collections can at most be linked with one element from the other collection. Bidirectionally means the link between two elements always has two directions. If element A is linked to element B, element B is inherently linked to element A as well.
Equality and similarity are defined by Equalator
and Similator
functions that can be passed at
creation time. All values controlling the matching algorithm can be optionally configured in the factory class
if the default configuration is not desired. Additionally, a callback function for deciding found matches with
questionable similarity can be injected.
This is a powerful general purpose means of building associations of two sets of similar but not equal elements.
A very simple use case is the formal recognition of a changed table column structure (for which this class
was originally developed).
For example given the following two hypothetical definitions (old and new) of column names:
Old:
- Name
- Firstname
- Age
- Address
- Freetext
- OtherAddress
New:
- firstname
- lastname
- age
- emailAddress
- postalAddress
- noteLink
- newColumn1
- someMiscAddress
Similator
(see Levenshtein.substringSimilarity(java.lang.String, java.lang.String)
) the algorithm produces the following associations:
firstname -1.00- Firstname lastname -0.75- Name age -1.00- Age emailAddress -0.71- Email postalAddress -0.77- Address noteLink [new] newColumn1 [new] someMiscAddress -0.56- OtherAddress X Freetext
-
Nested Class Summary
Nested Classes Modifier and Type Interface Description static class
MultiMatcher.Default<E>
-
Method Summary
-
Method Details
-
similarityThreshold
double similarityThreshold() -
singletonPrecedenceThreshold
double singletonPrecedenceThreshold()This is a measure of how "eager" the algorithm is to find as many matches as possible. The lower this threshold is, the more "single potential match" items will be preferred over actually better matching pairs just to not leave them unmatched. To deactivate this special casing, set the threshold to 1.0, meaning only items that fit perfectly anyway take precedence over others.- Returns:
- the singleton precedence threshold
-
singletonPrecedenceBonus
double singletonPrecedenceBonus() -
noiseFactor
double noiseFactor() -
equalator
-
similator
-
validator
MatchValidator<? super E> validator() -
setSimilarityThreshold
-
setSingletonPrecedenceThreshold
-
setSingletonPrecedenceBonus
-
setNoisefactor
-
setSimilator
-
setEqualator
-
setValidator
-
match
MultiMatch<E> match(XGettingCollection<? extends E> source, XGettingCollection<? extends E> target) -
defaultSimilarityThreshold
static double defaultSimilarityThreshold() -
defaultSingletonPrecedenceThreshold
static double defaultSingletonPrecedenceThreshold() -
defaultSingletonPrecedenceBonus
static double defaultSingletonPrecedenceBonus() -
defaultNoiseFactor
static double defaultNoiseFactor() -
New
-