Entity Matching by Similarity Join
 
Loading...
Searching...
No Matches
simjoin_entitymatching.blocker.graph.TripartiteGraph Class Reference

Public Member Functions

 __init__ (self)
 
 generate_signature (self, temp_rule, cur_tree_node, cur_rule_node)
 
 extract_rules (self, tree, node, temp_rule)
 
 build_graph (self, rf, feature_names, if_report=False, default_feature_names_dir="")
 
 update_range2 (self, idx, upper_bound)
 
 special_case_update_range2 (self, idx)
 
 special_case_sort_ranges2 (self, idx, sim_str)
 
 sort_ranges2 (self)
 
 update_range_rule_node (self)
 
 visualize_graph (self)
 
 report_ranges (self, path)
 
 graph_stat (self)
 

Public Attributes

int INF = 100000
 
int num_tree = 0
 
int num_rule = 0
 
int num_feature = 0
 
list tree_node = []
 
list rule_node = []
 
list feature_node = []
 
list rule_list = []
 
 rule_signature = defaultdict(int)
 
 rule_signature_trees = defaultdict(set)
 
list feature_list = []
 
list feature_range = []
 
 feature_div = np.full((100000,), 0, dtype=int)
 
 feature_appeared = np.full((100000,), -1, dtype=int)
 
 trigraph = nx.Graph()
 
list bigraList1 = []
 
list bigraList2 = []
 
list num_tree = [[], [], []]
 
int feature_node = 0)
 

Detailed Description

    Graph used for feature selection

Constructor & Destructor Documentation

◆ __init__()

simjoin_entitymatching.blocker.graph.TripartiteGraph.__init__ ( self)

Member Function Documentation

◆ build_graph()

simjoin_entitymatching.blocker.graph.TripartiteGraph.build_graph ( self,
rf,
feature_names,
if_report = False,
default_feature_names_dir = "" )

◆ extract_rules()

simjoin_entitymatching.blocker.graph.TripartiteGraph.extract_rules ( self,
tree,
node,
temp_rule )
        Args:
            temp_rule: [[], [], []] feature, threshold, sign (0: <, 1: >)

◆ generate_signature()

simjoin_entitymatching.blocker.graph.TripartiteGraph.generate_signature ( self,
temp_rule,
cur_tree_node,
cur_rule_node )
        return values: 
            the first bool: whether add tree node; 
        the second bool: whether add rule node

◆ graph_stat()

simjoin_entitymatching.blocker.graph.TripartiteGraph.graph_stat ( self)

◆ report_ranges()

simjoin_entitymatching.blocker.graph.TripartiteGraph.report_ranges ( self,
path )

◆ sort_ranges2()

simjoin_entitymatching.blocker.graph.TripartiteGraph.sort_ranges2 ( self)
        Comparing wirh sort_ranges:
            1. Drop features like 'jaro' & 'needleman'
            2. Sort p(c) in a more resonable way
            3. Do not consider interval for a feature range

◆ special_case_sort_ranges2()

simjoin_entitymatching.blocker.graph.TripartiteGraph.special_case_sort_ranges2 ( self,
idx,
sim_str )
        When we use some sim funcs out of supported funcs.
        update_range2 does not support them.
            1. monge_elkan: mel
            2. needleman_wunsch: nmw
            3. smith_waterman: sw
            4. jaro: jar
            5. jaro_winkler: jwn
            6. rel_diff: rdf
            7. lev_dist: lev_dist

◆ special_case_update_range2()

simjoin_entitymatching.blocker.graph.TripartiteGraph.special_case_update_range2 ( self,
idx )
        For 'lev' and 'rdf', the value should be as small as possible

◆ update_range2()

simjoin_entitymatching.blocker.graph.TripartiteGraph.update_range2 ( self,
idx,
upper_bound )
        Sort & update the range dict taking feature into consideration

◆ update_range_rule_node()

simjoin_entitymatching.blocker.graph.TripartiteGraph.update_range_rule_node ( self)
        After sorting, we need to add rule node in tighter range to the looser one.
            e.g., (1, 0.9): [rule_id1] & (1, 0.8): [rule_id2]
            it's obviously that if (1, 0.8) cannot be satisfied, rule_id1 cannot be satisfied
            use set to avoid duplicate

◆ visualize_graph()

simjoin_entitymatching.blocker.graph.TripartiteGraph.visualize_graph ( self)

Member Data Documentation

◆ bigraList1

simjoin_entitymatching.blocker.graph.TripartiteGraph.bigraList1 = []

◆ bigraList2

simjoin_entitymatching.blocker.graph.TripartiteGraph.bigraList2 = []

◆ feature_appeared

simjoin_entitymatching.blocker.graph.TripartiteGraph.feature_appeared = np.full((100000,), -1, dtype=int)

◆ feature_div

simjoin_entitymatching.blocker.graph.TripartiteGraph.feature_div = np.full((100000,), 0, dtype=int)

◆ feature_list

simjoin_entitymatching.blocker.graph.TripartiteGraph.feature_list = []

◆ feature_node [1/2]

list simjoin_entitymatching.blocker.graph.TripartiteGraph.feature_node = []

◆ feature_node [2/2]

int simjoin_entitymatching.blocker.graph.TripartiteGraph.feature_node = 0)

◆ feature_range

list simjoin_entitymatching.blocker.graph.TripartiteGraph.feature_range = []

◆ INF

int simjoin_entitymatching.blocker.graph.TripartiteGraph.INF = 100000

◆ num_feature

simjoin_entitymatching.blocker.graph.TripartiteGraph.num_feature = 0

◆ num_rule

int simjoin_entitymatching.blocker.graph.TripartiteGraph.num_rule = 0

◆ num_tree [1/2]

int simjoin_entitymatching.blocker.graph.TripartiteGraph.num_tree = 0

◆ num_tree [2/2]

list simjoin_entitymatching.blocker.graph.TripartiteGraph.num_tree = [[], [], []]

◆ rule_list

list simjoin_entitymatching.blocker.graph.TripartiteGraph.rule_list = []

◆ rule_node

list simjoin_entitymatching.blocker.graph.TripartiteGraph.rule_node = []

◆ rule_signature

simjoin_entitymatching.blocker.graph.TripartiteGraph.rule_signature = defaultdict(int)

◆ rule_signature_trees

simjoin_entitymatching.blocker.graph.TripartiteGraph.rule_signature_trees = defaultdict(set)

◆ tree_node

list simjoin_entitymatching.blocker.graph.TripartiteGraph.tree_node = []

◆ trigraph

simjoin_entitymatching.blocker.graph.TripartiteGraph.trigraph = nx.Graph()

The documentation for this class was generated from the following file: