Entity Matching by Similarity Join
 
Loading...
Searching...
No Matches
simjoin_entitymatching.utils.datasets.Dataset Class Reference

Public Member Functions

 __init__ (self)
 
 dump_files (self, path_table_A, path_table_B, path_gold)
 
 format_and_check (self)
 
 load_files (self, dir_path)
 
 read_csv (self, path_table_A, path_table_B, path_gold, num_data=2)
 
 load_golds (self, graph)
 
 read_dir (self, dirname, graph)
 

Public Attributes

 tableA = None
 
 tableB = None
 
 golds = None
 
int num_tables = 2
 
str tableA = 'parquet':
 
int tableB = 3:
 
str tableB = 'parquet':
 
int tableA = 2:
 
dict tableB = True)
 

Detailed Description

    A dataset may be csv or parquet fromat.

Constructor & Destructor Documentation

◆ __init__()

simjoin_entitymatching.utils.datasets.Dataset.__init__ ( self)

Member Function Documentation

◆ dump_files()

simjoin_entitymatching.utils.datasets.Dataset.dump_files ( self,
path_table_A,
path_table_B,
path_gold )
        Flush files to buffer directory

◆ format_and_check()

simjoin_entitymatching.utils.datasets.Dataset.format_and_check ( self)

◆ load_files()

simjoin_entitymatching.utils.datasets.Dataset.load_files ( self,
dir_path )
        Load files

◆ load_golds()

simjoin_entitymatching.utils.datasets.Dataset.load_golds ( self,
graph )
        Load the ground truth to the graph in RandomForest class

◆ read_csv()

simjoin_entitymatching.utils.datasets.Dataset.read_csv ( self,
path_table_A,
path_table_B,
path_gold,
num_data = 2 )

◆ read_dir()

simjoin_entitymatching.utils.datasets.Dataset.read_dir ( self,
dirname,
graph )
        read from disk, then flush back to disk but to the buffer

Member Data Documentation

◆ golds

simjoin_entitymatching.utils.datasets.Dataset.golds = None

◆ num_tables

int simjoin_entitymatching.utils.datasets.Dataset.num_tables = 2

◆ tableA [1/3]

simjoin_entitymatching.utils.datasets.Dataset.tableA = None

◆ tableA [2/3]

str simjoin_entitymatching.utils.datasets.Dataset.tableA = 'parquet':

◆ tableA [3/3]

int simjoin_entitymatching.utils.datasets.Dataset.tableA = 2:

◆ tableB [1/4]

int simjoin_entitymatching.utils.datasets.Dataset.tableB = None

◆ tableB [2/4]

int simjoin_entitymatching.utils.datasets.Dataset.tableB = 3:

◆ tableB [3/4]

str simjoin_entitymatching.utils.datasets.Dataset.tableB = 'parquet':

◆ tableB [4/4]

dict simjoin_entitymatching.utils.datasets.Dataset.tableB = True)

The documentation for this class was generated from the following file: