Public Member Functions | |
__init__ (self) | |
dump_files (self, path_table_A, path_table_B, path_gold) | |
format_and_check (self) | |
load_files (self, dir_path) | |
read_csv (self, path_table_A, path_table_B, path_gold, num_data=2) | |
load_golds (self, graph) | |
read_dir (self, dirname, graph) | |
Public Attributes | |
tableA = None | |
tableB = None | |
golds = None | |
int | num_tables = 2 |
str | tableA = 'parquet': |
int | tableB = 3: |
str | tableB = 'parquet': |
int | tableA = 2: |
dict | tableB = True) |
A dataset may be csv or parquet fromat.
simjoin_entitymatching.utils.datasets.Dataset.__init__ | ( | self | ) |
simjoin_entitymatching.utils.datasets.Dataset.dump_files | ( | self, | |
path_table_A, | |||
path_table_B, | |||
path_gold ) |
Flush files to buffer directory
simjoin_entitymatching.utils.datasets.Dataset.format_and_check | ( | self | ) |
simjoin_entitymatching.utils.datasets.Dataset.load_files | ( | self, | |
dir_path ) |
Load files
simjoin_entitymatching.utils.datasets.Dataset.load_golds | ( | self, | |
graph ) |
Load the ground truth to the graph in RandomForest class
simjoin_entitymatching.utils.datasets.Dataset.read_csv | ( | self, | |
path_table_A, | |||
path_table_B, | |||
path_gold, | |||
num_data = 2 ) |
simjoin_entitymatching.utils.datasets.Dataset.read_dir | ( | self, | |
dirname, | |||
graph ) |
read from disk, then flush back to disk but to the buffer
simjoin_entitymatching.utils.datasets.Dataset.golds = None |
int simjoin_entitymatching.utils.datasets.Dataset.num_tables = 2 |
simjoin_entitymatching.utils.datasets.Dataset.tableA = None |
str simjoin_entitymatching.utils.datasets.Dataset.tableA = 'parquet': |
int simjoin_entitymatching.utils.datasets.Dataset.tableA = 2: |
int simjoin_entitymatching.utils.datasets.Dataset.tableB = None |
int simjoin_entitymatching.utils.datasets.Dataset.tableB = 3: |
str simjoin_entitymatching.utils.datasets.Dataset.tableB = 'parquet': |
dict simjoin_entitymatching.utils.datasets.Dataset.tableB = True) |