Entity Matching by Similarity Join
 
Loading...
Searching...
No Matches
simjoin_entitymatching.utils.datasets.DataSettings Class Reference

Public Member Functions

 __init__ (self)
 
 print_data_root (cls)
 
 print_supported_sim_funcs (cls)
 
 print_supported_tokenizers (cls)
 

Static Public Attributes

str dataroot = '/yout/data/root'
 
list str_gt_10w = ['name', 'title', 'description']
 
list str_bt_5w_10w = []
 
list str_bt_1w_5w = []
 
list str_eq_1w = ['brand', 'category']
 
list numeric = ['price']
 
list supported_sim_funcs = ["jaccard", "cosine", "dice", "overlap", "lev_dist", "exm", "anm"]
 
list supported_tokenizers = ["dlm", "qgm", "alphanumeric", "wspace"]
 

Detailed Description

    Settings for data

Constructor & Destructor Documentation

◆ __init__()

simjoin_entitymatching.utils.datasets.DataSettings.__init__ ( self)

Member Function Documentation

◆ print_data_root()

simjoin_entitymatching.utils.datasets.DataSettings.print_data_root ( cls)

◆ print_supported_sim_funcs()

simjoin_entitymatching.utils.datasets.DataSettings.print_supported_sim_funcs ( cls)

◆ print_supported_tokenizers()

simjoin_entitymatching.utils.datasets.DataSettings.print_supported_tokenizers ( cls)

Member Data Documentation

◆ dataroot

str simjoin_entitymatching.utils.datasets.DataSettings.dataroot = '/yout/data/root'
static

◆ numeric

list simjoin_entitymatching.utils.datasets.DataSettings.numeric = ['price']
static

◆ str_bt_1w_5w

list simjoin_entitymatching.utils.datasets.DataSettings.str_bt_1w_5w = []
static

◆ str_bt_5w_10w

list simjoin_entitymatching.utils.datasets.DataSettings.str_bt_5w_10w = []
static

◆ str_eq_1w

list simjoin_entitymatching.utils.datasets.DataSettings.str_eq_1w = ['brand', 'category']
static

◆ str_gt_10w

list simjoin_entitymatching.utils.datasets.DataSettings.str_gt_10w = ['name', 'title', 'description']
static

◆ supported_sim_funcs

list simjoin_entitymatching.utils.datasets.DataSettings.supported_sim_funcs = ["jaccard", "cosine", "dice", "overlap", "lev_dist", "exm", "anm"]
static

◆ supported_tokenizers

list simjoin_entitymatching.utils.datasets.DataSettings.supported_tokenizers = ["dlm", "qgm", "alphanumeric", "wspace"]
static

The documentation for this class was generated from the following file: