Utils
Datasets loaders
- torchkge.utils.datasets.load_fb13(data_home=None)[source]
Load FB13 dataset.
- Parameters
data_home (str, optional) – Path to the torchkge_data directory (containing data folders). If files are not present on disk in this directory, they are downloaded and then placed in the right place.
- Returns
kg_train (torchkge.data_structures.KnowledgeGraph)
kg_val (torchkge.data_structures.KnowledgeGraph)
kg_test (torchkge.data_structures.KnowledgeGraph)
- torchkge.utils.datasets.load_fb15k(data_home=None)[source]
Load FB15k dataset. See here for paper by Bordes et al. originally presenting the dataset.
- Parameters
data_home (str, optional) – Path to the torchkge_data directory (containing data folders). If files are not present on disk in this directory, they are downloaded and then placed in the right place.
- Returns
kg_train (torchkge.data_structures.KnowledgeGraph)
kg_val (torchkge.data_structures.KnowledgeGraph)
kg_test (torchkge.data_structures.KnowledgeGraph)
- torchkge.utils.datasets.load_fb15k237(data_home=None)[source]
Load FB15k237 dataset. See here for paper by Toutanova et al. originally presenting the dataset.
- Parameters
data_home (str, optional) – Path to the torchkge_data directory (containing data folders). If files are not present on disk in this directory, they are downloaded and then placed in the right place.
- Returns
kg_train (torchkge.data_structures.KnowledgeGraph)
kg_val (torchkge.data_structures.KnowledgeGraph)
kg_test (torchkge.data_structures.KnowledgeGraph)
- torchkge.utils.datasets.load_wn18(data_home=None)[source]
Load WN18 dataset.
- Parameters
data_home (str, optional) – Path to the torchkge_data directory (containing data folders). If files are not present on disk in this directory, they are downloaded and then placed in the right place.
- Returns
kg_train (torchkge.data_structures.KnowledgeGraph)
kg_val (torchkge.data_structures.KnowledgeGraph)
kg_test (torchkge.data_structures.KnowledgeGraph)
- torchkge.utils.datasets.load_wn18rr(data_home=None)[source]
Load WN18RR dataset. See here for paper by Dettmers et al. originally presenting the dataset.
- Parameters
data_home (str, optional) – Path to the torchkge_data directory (containing data folders). If files are not present on disk in this directory, they are downloaded and then placed in the right place.
- Returns
kg_train (torchkge.data_structures.KnowledgeGraph)
kg_val (torchkge.data_structures.KnowledgeGraph)
kg_test (torchkge.data_structures.KnowledgeGraph)
- torchkge.utils.datasets.load_yago3_10(data_home=None)[source]
Load YAGO3-10 dataset. See here for paper by Dettmers et al. originally presenting the dataset.
- Parameters
data_home (str, optional) – Path to the torchkge_data directory (containing data folders). If files are not present on disk in this directory, they are downloaded and then placed in the right place.
- Returns
kg_train (torchkge.data_structures.KnowledgeGraph)
kg_val (torchkge.data_structures.KnowledgeGraph)
kg_test (torchkge.data_structures.KnowledgeGraph)
- torchkge.utils.datasets.load_wikidatasets(which, limit_=0, data_home=None)[source]
Load WikiDataSets dataset. See here for paper by Boschin et al. originally presenting the dataset.
- Parameters
which (str) – String indicating which subset of Wikidata should be loaded. Available ones are humans, companies, animals, countries and films.
limit (int, optional (default=0)) – This indicates a lower limit on the number of neighbors an entity should have in the graph to be kept.
data_home (str, optional) – Path to the torchkge_data directory (containing data folders). If files are not present on disk in this directory, they are downloaded and then placed in the right place.
- Returns
kg
- Return type
- torchkge.utils.datasets.load_wikidata_vitals(level=5, data_home=None)[source]
Load knowledge graph extracted from Wikidata using the entities corresponding to Wikipedia pages contained in Wikivitals. See here for details on Wikivitals and Wikivitals+ datasets.
- Parameters
level (int (default=5)) – Either 4 or 5.
data_home (str, optional) – Path to the torchkge_data directory (containing data folders). If files are not present on disk in this directory, they are downloaded and then placed in the right place.
- Returns
kg (torchkge.data_structures.KnowledgeGraph)
kg_attr (torchkge.data_structures.KnowledgeGraph)
Pre-trained models
TransE model
Model |
Dataset |
Dimension |
Test MRR |
Filtered Test MRR |
---|---|---|---|---|
TransE |
FB15k |
100 |
0.250 |
0.420 |
TransE |
FB15k237 |
150 |
0.187 |
0.287 |
TransE |
WDV5 |
150 |
0.258 |
0.305 |
TransE |
WN18RR |
100 |
0.201 |
0.236 |
TransE |
Yago3-10 |
200 |
0.143 |
0.261 |
- torchkge.utils.pretrained_models.load_pretrained_transe(dataset, emb_dim, data_home=None)[source]
Load a pretrained version of TransE model.
- Parameters
dataset (str) –
emb_dim (int) – Embedding dimension
data_home (str (opt, default None)) – Path to the torchkge_data directory (containing data folders). Useful for pre-trained model loading.
- Returns
model – Pretrained version of TransE model.
- Return type
TorchKGE.model.translation.TransEModel
ComplEx Model
Model |
Dataset |
Dimension |
Test MRR |
Filtered Test MRR |
---|---|---|---|---|
ComplEx |
FB15k237 |
200 |
0.180 |
0.308 |
ComplEx |
WN18RR |
200 |
0.290 |
0.455 |
ComplEx |
WDV5 |
200 |
0.283 |
0.371 |
- torchkge.utils.pretrained_models.load_pretrained_complex(dataset, emb_dim, data_home=None)[source]
Load a pretrained version of ComplEx model.
- Parameters
dataset (str) –
emb_dim (int) – Embedding dimension
data_home (str (opt, default None)) – Path to the torchkge_data directory (containing data folders). Useful for pre-trained model loading.
- Returns
model – Pretrained version of ComplEx model.
- Return type
TorchKGE.model.translation.ComplExModel
Data redundancy
- torchkge.utils.data_redundancy.duplicates(kg_tr, kg_val, kg_te, theta1=0.8, theta2=0.8, verbose=False, counts=False, reverses=None)[source]
Return the duplicate and reverse duplicate relations as explained in paper by Akrami et al.
References
Farahnaz Akrami, Mohammed Samiul Saeef, Quingheng Zhang. Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study. SIGMOD’20, June 14–19, 2020, Portland, OR, USA
- Parameters
kg_tr (torchkge.data_structures.KnowledgeGraph) – Train set
kg_val (torchkge.data_structures.KnowledgeGraph) – Validation set
kg_te (torchkge.data_structures.KnowledgeGraph) – Test set
theta1 (float) – First threshold (see paper).
theta2 (float) – Second threshold (see paper).
verbose (bool) –
counts (bool) – Should the triplets involving (reverse) duplicate relations be counted in all sets.
reverses (list) – List of known reverse relations.
- Returns
duplicates (list) – List of pairs giving duplicate relations.
rev_duplicates (list) – List of pairs giving reverse duplicate relations.
- torchkge.utils.data_redundancy.count_triplets(kg1, kg2, duplicates, rev_duplicates)[source]
- Parameters
duplicates (list) – List returned by torchkge.utils.data_redundancy.duplicates.
rev_duplicates (list) – List returned by torchkge.utils.data_redundancy.duplicates.
- Returns
n_duplicates (int) – Number of triplets in kg2 that have their duplicate triplet in kg1
n_rev_duplicates (int) – Number of triplets in kg2 that have their reverse duplicate triplet in kg1.
- torchkge.utils.data_redundancy.cartesian_product_relations(kg_tr, kg_val, kg_te, theta=0.8)[source]
Return the cartesian product relations as explained in paper by Akrami et al.
References
Farahnaz Akrami, Mohammed Samiul Saeef, Quingheng Zhang. Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study. SIGMOD’20, June 14–19, 2020, Portland, OR, USA
- Parameters
kg_tr (torchkge.data_structures.KnowledgeGraph) – Train set
kg_val (torchkge.data_structures.KnowledgeGraph) – Validation set
kg_te (torchkge.data_structures.KnowledgeGraph) – Test set
theta (float) – Threshold used to compute the cartesian product relations.
- Returns
selected_relations – List of relations index that are cartesian product relations (see paper for details).
- Return type
list
Dissimilarities
- torchkge.utils.dissimilarities.l1_dissimilarity(a, b)[source]
Compute dissimilarity between rows of a and b as \(||a-b||_1\).
- torchkge.utils.dissimilarities.l2_dissimilarity(a, b)[source]
Compute dissimilarity between rows of a and b as \(||a-b||_2^2\).
- torchkge.utils.dissimilarities.l1_torus_dissimilarity(a, b)[source]
See paper by Ebisu et al. for details about the definition of this dissimilarity function.
- torchkge.utils.dissimilarities.l2_torus_dissimilarity(a, b)[source]
See paper by Ebisu et al. for details about the definition of this dissimilarity function.
- torchkge.utils.dissimilarities.el2_torus_dissimilarity(a, b)[source]
See paper by Ebisu et al. for details about the definition of this dissimilarity function.
Losses
- class torchkge.utils.losses.MarginLoss(margin)[source]
Margin loss as it was defined in TransE paper by Bordes et al. in 2013. This class implements
torch.nn.Module
interface.- forward(positive_triplets, negative_triplets)[source]
- Parameters
positive_triplets (torch.Tensor, dtype: torch.float, shape: (b_size)) – Scores of the true triplets as returned by the forward methods of the models.
negative_triplets (torch.Tensor, dtype: torch.float, shape: (b_size)) – Scores of the negative triplets as returned by the forward methods of the models.
- Returns
loss – Loss of the form \(\max\{0, \gamma - f(h,r,t) + f(h',r',t')\}\) where \(\gamma\) is the margin (defined at initialization), \(f(h,r,t)\) is the score of a true fact and \(f(h',r',t')\) is the score of the associated negative fact.
- Return type
torch.Tensor, shape: (n_facts, dim), dtype: torch.float
- class torchkge.utils.losses.LogisticLoss[source]
Logistic loss as it was defined in TransE paper by Bordes et al. in 2013. This class implements
torch.nn.Module
interface.- forward(positive_triplets, negative_triplets)[source]
- Parameters
positive_triplets (torch.Tensor, dtype: torch.float, shape: (b_size)) – Scores of the true triplets as returned by the forward methods of the models.
negative_triplets (torch.Tensor, dtype: torch.float, shape: (b_size)) – Scores of the negative triplets as returned by the forward methods of the models.
- Returns
loss – Loss of the form \(\log(1+ \exp(\eta \times f(h,r,t))\) where \(f(h,r,t)\) is the score of the fact and \(\eta\) is either 1 or -1 if the fact is true or false.
- Return type
torch.Tensor, shape: (n_facts, dim), dtype: torch.float
- class torchkge.utils.losses.BinaryCrossEntropyLoss[source]
This class implements
torch.nn.Module
interface.- forward(positive_triplets, negative_triplets)[source]
- Parameters
positive_triplets (torch.Tensor, dtype: torch.float, shape: (b_size)) – Scores of the true triplets as returned by the forward methods of the models.
negative_triplets (torch.Tensor, dtype: torch.float, shape: (b_size)) – Scores of the negative triplets as returned by the forward methods of the models.
- Returns
loss – Loss of the form \(-\eta \cdot \log(f(h,r,t)) + (1-\eta) \cdot \log(1 - f(h,r,t))\) where \(f(h,r,t)\) is the score of the fact and \(\eta\) is either 1 or 0 if the fact is true or false.
- Return type
torch.Tensor, shape: (n_facts, dim), dtype: torch.float
Training wrappers
- class torchkge.utils.training.TrainDataLoader(kg, batch_size, sampling_type, use_cuda=None)[source]
Dataloader providing the training process with batches of true and negatively sampled facts.
- Parameters
kg (torchkge.data_structures.KnowledgeGraph) – Dataset to be divided in batches.
batch_size (int) – Size of the batches.
sampling_type (str) – Either ‘unif’ (uniform negative sampling) or ‘bern’ (Bernoulli negative sampling).
use_cuda (str (opt, default = None)) – Can be either None (no use of cuda at all), ‘all’ to move all the dataset to cuda and then split in batches or ‘batch’ to simply move the batches to cuda before they are returned.
- class torchkge.utils.training.Trainer(model, criterion, kg_train, n_epochs, batch_size, optimizer, sampling_type='bern', use_cuda=None)[source]
This class simply wraps a simple training procedure.
- Parameters
model (torchkge.models.interfaces.Model) – Model to be trained.
criterion – Criteria which should differentiate positive and negative scores. Can be an elements of torchkge.utils.losses
kg_train (torchkge.data_structures.KnowledgeGraph) – KG used for training.
n_epochs (int) – Number of epochs in the training procedure.
batch_size (int) – Number of batches to use.
sampling_type (str) – Either ‘unif’ (uniform negative sampling) or ‘bern’ (Bernoulli negative sampling).
use_cuda (str (opt, default = None)) – Can be either None (no use of cuda at all), ‘all’ to move all the dataset to cuda and then split in batches or ‘batch’ to simply move the batches to cuda before they are returned.