Utils

Datasets loaders

torchkge.utils.datasets.load_fb13(data_home=None)[source]

Load FB13 dataset.

Parameters

data_home (str, optional) – Path to the torchkge_data directory (containing data folders). If files are not present on disk in this directory, they are downloaded and then placed in the right place.

Returns

kg_train (torchkge.data_structures.KnowledgeGraph)
kg_val (torchkge.data_structures.KnowledgeGraph)
kg_test (torchkge.data_structures.KnowledgeGraph)

torchkge.utils.datasets.load_fb15k(data_home=None)[source]

Load FB15k dataset. See here for paper by Bordes et al. originally presenting the dataset.

Parameters

data_home (str, optional) – Path to the torchkge_data directory (containing data folders). If files are not present on disk in this directory, they are downloaded and then placed in the right place.

Returns

kg_train (torchkge.data_structures.KnowledgeGraph)
kg_val (torchkge.data_structures.KnowledgeGraph)
kg_test (torchkge.data_structures.KnowledgeGraph)

torchkge.utils.datasets.load_fb15k237(data_home=None)[source]

Load FB15k237 dataset. See here for paper by Toutanova et al. originally presenting the dataset.

Parameters

data_home (str, optional) – Path to the torchkge_data directory (containing data folders). If files are not present on disk in this directory, they are downloaded and then placed in the right place.

Returns

kg_train (torchkge.data_structures.KnowledgeGraph)
kg_val (torchkge.data_structures.KnowledgeGraph)
kg_test (torchkge.data_structures.KnowledgeGraph)

torchkge.utils.datasets.load_wn18(data_home=None)[source]

Load WN18 dataset.

Parameters

data_home (str, optional) – Path to the torchkge_data directory (containing data folders). If files are not present on disk in this directory, they are downloaded and then placed in the right place.

Returns

kg_train (torchkge.data_structures.KnowledgeGraph)
kg_val (torchkge.data_structures.KnowledgeGraph)
kg_test (torchkge.data_structures.KnowledgeGraph)

torchkge.utils.datasets.load_wn18rr(data_home=None)[source]

Load WN18RR dataset. See here for paper by Dettmers et al. originally presenting the dataset.

Parameters

data_home (str, optional) – Path to the torchkge_data directory (containing data folders). If files are not present on disk in this directory, they are downloaded and then placed in the right place.

Returns

kg_train (torchkge.data_structures.KnowledgeGraph)
kg_val (torchkge.data_structures.KnowledgeGraph)
kg_test (torchkge.data_structures.KnowledgeGraph)

torchkge.utils.datasets.load_yago3_10(data_home=None)[source]

Load YAGO3-10 dataset. See here for paper by Dettmers et al. originally presenting the dataset.

Parameters

data_home (str, optional) – Path to the torchkge_data directory (containing data folders). If files are not present on disk in this directory, they are downloaded and then placed in the right place.

Returns

kg_train (torchkge.data_structures.KnowledgeGraph)
kg_val (torchkge.data_structures.KnowledgeGraph)
kg_test (torchkge.data_structures.KnowledgeGraph)

torchkge.utils.datasets.load_wikidatasets(which, limit_=0, data_home=None)[source]

Load WikiDataSets dataset. See here for paper by Boschin et al. originally presenting the dataset.

Parameters

which (str) – String indicating which subset of Wikidata should be loaded. Available ones are humans, companies, animals, countries and films.
limit (int, optional (default=0)) – This indicates a lower limit on the number of neighbors an entity should have in the graph to be kept.
data_home (str, optional) – Path to the torchkge_data directory (containing data folders). If files are not present on disk in this directory, they are downloaded and then placed in the right place.

Returns

kg

Return type

torchkge.data_structures.KnowledgeGraph

torchkge.utils.datasets.load_wikidata_vitals(level=5, data_home=None)[source]

Load knowledge graph extracted from Wikidata using the entities corresponding to Wikipedia pages contained in Wikivitals. See here for details on Wikivitals and Wikivitals+ datasets.

Parameters

level (int (default=5)) – Either 4 or 5.
data_home (str, optional) – Path to the torchkge_data directory (containing data folders). If files are not present on disk in this directory, they are downloaded and then placed in the right place.

Returns

kg (torchkge.data_structures.KnowledgeGraph)
kg_attr (torchkge.data_structures.KnowledgeGraph)

Pre-trained models

TransE model

Model	Dataset	Dimension	Test MRR	Filtered Test MRR
TransE	FB15k	100	0.250	0.420
TransE	FB15k237	150	0.187	0.287
TransE	WDV5	150	0.258	0.305
TransE	WN18RR	100	0.201	0.236
TransE	Yago3-10	200	0.143	0.261

torchkge.utils.pretrained_models.load_pretrained_transe(dataset, emb_dim=None, data_home=None)[source]

Load a pretrained version of TransE model.

Parameters

dataset (str) –
emb_dim (int (opt, default None)) – Embedding dimension
data_home (str (opt, default None)) – Path to the torchkge_data directory (containing data folders). Useful for pre-trained model loading.

Returns

model – Pretrained version of TransE model.

Return type

TorchKGE.model.translation.TransEModel

RESCAL Model

Model	Dataset	Dimension	Test MRR	Filtered Test MRR
RESCAL	FB15k237	200	0.180	0.307
RESCAL	WN18RR	150	0.273	0.424
RESCAL	Yago3-10	200	0.127	0.334

torchkge.utils.pretrained_models.load_pretrained_rescal(dataset, emb_dim=None, data_home=None)[source]

Load a pretrained version of RESCAL model.

Parameters

dataset (str) –
emb_dim (int (opt, default None)) – Embedding dimension
data_home (str (opt, default None)) – Path to the torchkge_data directory (containing data folders). Useful for pre-trained model loading.

Returns

model – Pretrained version of RESCAL model.

Return type

TorchKGE.model.translation.RESCALModel

ComplEx Model

Model	Dataset	Dimension	Test MRR	Filtered Test MRR
ComplEx	FB15k237	200	0.180	0.308
ComplEx	WN18RR	200	0.290	0.455
ComplEx	WDV5	200	0.283	0.371
ComplEx	Yago3-10	200	0.164	0.421

torchkge.utils.pretrained_models.load_pretrained_complex(dataset, emb_dim=None, data_home=None)[source]

Load a pretrained version of ComplEx model.

Parameters

dataset (str) –
emb_dim (int (opt, default None)) – Embedding dimension
data_home (str (opt, default None)) – Path to the torchkge_data directory (containing data folders). Useful for pre-trained model loading.

Returns

model – Pretrained version of ComplEx model.

Return type

TorchKGE.model.translation.ComplExModel

Data redundancy

torchkge.utils.data_redundancy.duplicates(kg_tr, kg_val, kg_te, theta1=0.8, theta2=0.8, verbose=False, counts=False, reverses=None)[source]

Return the duplicate and reverse duplicate relations as explained in paper by Akrami et al.

References

Farahnaz Akrami, Mohammed Samiul Saeef, Quingheng Zhang. Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study. SIGMOD’20, June 14–19, 2020, Portland, OR, USA

Parameters

kg_tr (torchkge.data_structures.KnowledgeGraph) – Train set
kg_val (torchkge.data_structures.KnowledgeGraph) – Validation set
kg_te (torchkge.data_structures.KnowledgeGraph) – Test set
theta1 (float) – First threshold (see paper).
theta2 (float) – Second threshold (see paper).
verbose (bool) –
counts (bool) – Should the triplets involving (reverse) duplicate relations be counted in all sets.
reverses (list) – List of known reverse relations.

Returns

duplicates (list) – List of pairs giving duplicate relations.
rev_duplicates (list) – List of pairs giving reverse duplicate relations.

torchkge.utils.data_redundancy.count_triplets(kg1, kg2, duplicates, rev_duplicates)[source]

Parameters

kg1 (torchkge.data_structures.KnowledgeGraph) –
kg2 (torchkge.data_structures.KnowledgeGraph) –
duplicates (list) – List returned by torchkge.utils.data_redundancy.duplicates.
rev_duplicates (list) – List returned by torchkge.utils.data_redundancy.duplicates.

Returns

n_duplicates (int) – Number of triplets in kg2 that have their duplicate triplet in kg1
n_rev_duplicates (int) – Number of triplets in kg2 that have their reverse duplicate triplet in kg1.

torchkge.utils.data_redundancy.cartesian_product_relations(kg_tr, kg_val, kg_te, theta=0.8)[source]

Return the cartesian product relations as explained in paper by Akrami et al.

References

Farahnaz Akrami, Mohammed Samiul Saeef, Quingheng Zhang. Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study. SIGMOD’20, June 14–19, 2020, Portland, OR, USA

Parameters

kg_tr (torchkge.data_structures.KnowledgeGraph) – Train set
kg_val (torchkge.data_structures.KnowledgeGraph) – Validation set
kg_te (torchkge.data_structures.KnowledgeGraph) – Test set
theta (float) – Threshold used to compute the cartesian product relations.

Returns

selected_relations – List of relations index that are cartesian product relations (see paper for details).

Return type

list

Dissimilarities

torchkge.utils.dissimilarities.l1_dissimilarity(a, b)[source]: Compute dissimilarity between rows of a and b as \(||a-b||_1\).

torchkge.utils.dissimilarities.l2_dissimilarity(a, b)[source]: Compute dissimilarity between rows of a and b as \(||a-b||_2^2\).

torchkge.utils.dissimilarities.l1_torus_dissimilarity(a, b)[source]: See paper by Ebisu et al. for details about the definition of this dissimilarity function.

torchkge.utils.dissimilarities.l2_torus_dissimilarity(a, b)[source]: See paper by Ebisu et al. for details about the definition of this dissimilarity function.

torchkge.utils.dissimilarities.el2_torus_dissimilarity(a, b)[source]: See paper by Ebisu et al. for details about the definition of this dissimilarity function.

Losses

class torchkge.utils.losses.MarginLoss(margin)[source]

Margin loss as it was defined in TransE paper by Bordes et al. in 2013. This class implements torch.nn.Module interface.

forward(positive_triplets, negative_triplets)[source]

Parameters

positive_triplets (torch.Tensor, dtype: torch.float, shape: (b_size)) – Scores of the true triplets as returned by the forward methods of the models.
negative_triplets (torch.Tensor, dtype: torch.float, shape: (b_size)) – Scores of the negative triplets as returned by the forward methods of the models.

Returns

loss – Loss of the form \(\max\{0, \gamma - f(h,r,t) + f(h',r',t')\}\) where \(\gamma\) is the margin (defined at initialization), \(f(h,r,t)\) is the score of a true fact and \(f(h',r',t')\) is the score of the associated negative fact.

Return type

torch.Tensor, shape: (n_facts, dim), dtype: torch.float

class torchkge.utils.losses.LogisticLoss[source]

Logistic loss as it was defined in TransE paper by Bordes et al. in 2013. This class implements torch.nn.Module interface.

forward(positive_triplets, negative_triplets)[source]

Parameters

positive_triplets (torch.Tensor, dtype: torch.float, shape: (b_size)) – Scores of the true triplets as returned by the forward methods of the models.
negative_triplets (torch.Tensor, dtype: torch.float, shape: (b_size)) – Scores of the negative triplets as returned by the forward methods of the models.

Returns

loss – Loss of the form \(\log(1+ \exp(\eta \times f(h,r,t))\) where \(f(h,r,t)\) is the score of the fact and \(\eta\) is either 1 or -1 if the fact is true or false.

Return type

torch.Tensor, shape: (n_facts, dim), dtype: torch.float

class torchkge.utils.losses.BinaryCrossEntropyLoss[source]

This class implements torch.nn.Module interface.

forward(positive_triplets, negative_triplets)[source]

Parameters

positive_triplets (torch.Tensor, dtype: torch.float, shape: (b_size)) – Scores of the true triplets as returned by the forward methods of the models.
negative_triplets (torch.Tensor, dtype: torch.float, shape: (b_size)) – Scores of the negative triplets as returned by the forward methods of the models.

Returns

loss – Loss of the form \(-\eta \cdot \log(f(h,r,t)) + (1-\eta) \cdot \log(1 - f(h,r,t))\) where \(f(h,r,t)\) is the score of the fact and \(\eta\) is either 1 or 0 if the fact is true or false.

Return type

torch.Tensor, shape: (n_facts, dim), dtype: torch.float

Training wrappers

class torchkge.utils.training.TrainDataLoader(kg, batch_size, sampling_type, use_cuda=None)[source]

Dataloader providing the training process with batches of true and negatively sampled facts.

Parameters

kg (torchkge.data_structures.KnowledgeGraph) – Dataset to be divided in batches.
batch_size (int) – Size of the batches.
sampling_type (str) – Either ‘unif’ (uniform negative sampling) or ‘bern’ (Bernoulli negative sampling).
use_cuda (str (opt, default = None)) – Can be either None (no use of cuda at all), ‘all’ to move all the dataset to cuda and then split in batches or ‘batch’ to simply move the batches to cuda before they are returned.

class torchkge.utils.training.Trainer(model, criterion, kg_train, n_epochs, batch_size, optimizer, sampling_type='bern', use_cuda=None)[source]

This class simply wraps a simple training procedure.

Parameters

model (torchkge.models.interfaces.Model) – Model to be trained.
criterion – Criteria which should differentiate positive and negative scores. Can be an elements of torchkge.utils.losses
kg_train (torchkge.data_structures.KnowledgeGraph) – KG used for training.
n_epochs (int) – Number of epochs in the training procedure.
batch_size (int) – Number of batches to use.
sampling_type (str) – Either ‘unif’ (uniform negative sampling) or ‘bern’ (Bernoulli negative sampling).
use_cuda (str (opt, default = None)) – Can be either None (no use of cuda at all), ‘all’ to move all the dataset to cuda and then split in batches or ‘batch’ to simply move the batches to cuda before they are returned.

get_counter_examples() → Optional[SmallKG][source]

Retrieve the counter-examples generated while training the model.

If the model has not been trained yet, return None

Return type: A simple knowledge graph containing the triplets that were used as counter-examples during the training phase.