Public Member Functions | |
__init__ (self) | |
generate_signature (self, temp_rule, cur_tree_node, cur_rule_node) | |
extract_rules (self, tree, node, temp_rule) | |
build_graph (self, rf, feature_names, if_report=False, default_feature_names_dir="") | |
update_range2 (self, idx, upper_bound) | |
special_case_update_range2 (self, idx) | |
special_case_sort_ranges2 (self, idx, sim_str) | |
sort_ranges2 (self) | |
update_range_rule_node (self) | |
visualize_graph (self) | |
report_ranges (self, path) | |
graph_stat (self) | |
Public Attributes | |
int | INF = 100000 |
int | num_tree = 0 |
int | num_rule = 0 |
int | num_feature = 0 |
list | tree_node = [] |
list | rule_node = [] |
list | feature_node = [] |
list | rule_list = [] |
rule_signature = defaultdict(int) | |
rule_signature_trees = defaultdict(set) | |
list | feature_list = [] |
list | feature_range = [] |
feature_div = np.full((100000,), 0, dtype=int) | |
feature_appeared = np.full((100000,), -1, dtype=int) | |
trigraph = nx.Graph() | |
list | bigraList1 = [] |
list | bigraList2 = [] |
list | num_tree = [[], [], []] |
int | feature_node = 0) |
Graph used for feature selection
simjoin_entitymatching.blocker.graph.TripartiteGraph.__init__ | ( | self | ) |
simjoin_entitymatching.blocker.graph.TripartiteGraph.build_graph | ( | self, | |
rf, | |||
feature_names, | |||
if_report = False, | |||
default_feature_names_dir = "" ) |
simjoin_entitymatching.blocker.graph.TripartiteGraph.extract_rules | ( | self, | |
tree, | |||
node, | |||
temp_rule ) |
Args: temp_rule: [[], [], []] feature, threshold, sign (0: <, 1: >)
simjoin_entitymatching.blocker.graph.TripartiteGraph.generate_signature | ( | self, | |
temp_rule, | |||
cur_tree_node, | |||
cur_rule_node ) |
return values: the first bool: whether add tree node; the second bool: whether add rule node
simjoin_entitymatching.blocker.graph.TripartiteGraph.graph_stat | ( | self | ) |
simjoin_entitymatching.blocker.graph.TripartiteGraph.report_ranges | ( | self, | |
path ) |
simjoin_entitymatching.blocker.graph.TripartiteGraph.sort_ranges2 | ( | self | ) |
Comparing wirh sort_ranges: 1. Drop features like 'jaro' & 'needleman' 2. Sort p(c) in a more resonable way 3. Do not consider interval for a feature range
simjoin_entitymatching.blocker.graph.TripartiteGraph.special_case_sort_ranges2 | ( | self, | |
idx, | |||
sim_str ) |
When we use some sim funcs out of supported funcs. update_range2 does not support them. 1. monge_elkan: mel 2. needleman_wunsch: nmw 3. smith_waterman: sw 4. jaro: jar 5. jaro_winkler: jwn 6. rel_diff: rdf 7. lev_dist: lev_dist
simjoin_entitymatching.blocker.graph.TripartiteGraph.special_case_update_range2 | ( | self, | |
idx ) |
For 'lev' and 'rdf', the value should be as small as possible
simjoin_entitymatching.blocker.graph.TripartiteGraph.update_range2 | ( | self, | |
idx, | |||
upper_bound ) |
Sort & update the range dict taking feature into consideration
simjoin_entitymatching.blocker.graph.TripartiteGraph.update_range_rule_node | ( | self | ) |
After sorting, we need to add rule node in tighter range to the looser one. e.g., (1, 0.9): [rule_id1] & (1, 0.8): [rule_id2] it's obviously that if (1, 0.8) cannot be satisfied, rule_id1 cannot be satisfied use set to avoid duplicate
simjoin_entitymatching.blocker.graph.TripartiteGraph.visualize_graph | ( | self | ) |
simjoin_entitymatching.blocker.graph.TripartiteGraph.bigraList1 = [] |
simjoin_entitymatching.blocker.graph.TripartiteGraph.bigraList2 = [] |
simjoin_entitymatching.blocker.graph.TripartiteGraph.feature_appeared = np.full((100000,), -1, dtype=int) |
simjoin_entitymatching.blocker.graph.TripartiteGraph.feature_div = np.full((100000,), 0, dtype=int) |
simjoin_entitymatching.blocker.graph.TripartiteGraph.feature_list = [] |
list simjoin_entitymatching.blocker.graph.TripartiteGraph.feature_node = [] |
int simjoin_entitymatching.blocker.graph.TripartiteGraph.feature_node = 0) |
list simjoin_entitymatching.blocker.graph.TripartiteGraph.feature_range = [] |
int simjoin_entitymatching.blocker.graph.TripartiteGraph.INF = 100000 |
simjoin_entitymatching.blocker.graph.TripartiteGraph.num_feature = 0 |
int simjoin_entitymatching.blocker.graph.TripartiteGraph.num_rule = 0 |
int simjoin_entitymatching.blocker.graph.TripartiteGraph.num_tree = 0 |
list simjoin_entitymatching.blocker.graph.TripartiteGraph.num_tree = [[], [], []] |
list simjoin_entitymatching.blocker.graph.TripartiteGraph.rule_list = [] |
list simjoin_entitymatching.blocker.graph.TripartiteGraph.rule_node = [] |
simjoin_entitymatching.blocker.graph.TripartiteGraph.rule_signature = defaultdict(int) |
simjoin_entitymatching.blocker.graph.TripartiteGraph.rule_signature_trees = defaultdict(set) |
list simjoin_entitymatching.blocker.graph.TripartiteGraph.tree_node = [] |
simjoin_entitymatching.blocker.graph.TripartiteGraph.trigraph = nx.Graph() |