Utils

Datasets loaders

torchkge.utils.datasets.load_fb13(data_home=None)[source]

Load FB13 dataset.

Parameters

data_home (str, optional) – Path to the torchkge_data directory (containing data folders). If files are not present on disk in this directory, they are downloaded and then placed in the right place.

Returns

  • kg_train (torchkge.data_structures.KnowledgeGraph)

  • kg_val (torchkge.data_structures.KnowledgeGraph)

  • kg_test (torchkge.data_structures.KnowledgeGraph)

torchkge.utils.datasets.load_fb15k(data_home=None)[source]

Load FB15k dataset. See here for paper by Bordes et al. originally presenting the dataset.

Parameters

data_home (str, optional) – Path to the torchkge_data directory (containing data folders). If files are not present on disk in this directory, they are downloaded and then placed in the right place.

Returns

  • kg_train (torchkge.data_structures.KnowledgeGraph)

  • kg_val (torchkge.data_structures.KnowledgeGraph)

  • kg_test (torchkge.data_structures.KnowledgeGraph)

torchkge.utils.datasets.load_fb15k237(data_home=None)[source]

Load FB15k237 dataset. See here for paper by Toutanova et al. originally presenting the dataset.

Parameters

data_home (str, optional) – Path to the torchkge_data directory (containing data folders). If files are not present on disk in this directory, they are downloaded and then placed in the right place.

Returns

  • kg_train (torchkge.data_structures.KnowledgeGraph)

  • kg_val (torchkge.data_structures.KnowledgeGraph)

  • kg_test (torchkge.data_structures.KnowledgeGraph)

torchkge.utils.datasets.load_wn18(data_home=None)[source]

Load WN18 dataset.

Parameters

data_home (str, optional) – Path to the torchkge_data directory (containing data folders). If files are not present on disk in this directory, they are downloaded and then placed in the right place.

Returns

  • kg_train (torchkge.data_structures.KnowledgeGraph)

  • kg_val (torchkge.data_structures.KnowledgeGraph)

  • kg_test (torchkge.data_structures.KnowledgeGraph)

torchkge.utils.datasets.load_wn18rr(data_home=None)[source]

Load WN18RR dataset. See here for paper by Dettmers et al. originally presenting the dataset.

Parameters

data_home (str, optional) – Path to the torchkge_data directory (containing data folders). If files are not present on disk in this directory, they are downloaded and then placed in the right place.

Returns

  • kg_train (torchkge.data_structures.KnowledgeGraph)

  • kg_val (torchkge.data_structures.KnowledgeGraph)

  • kg_test (torchkge.data_structures.KnowledgeGraph)

torchkge.utils.datasets.load_yago3_10(data_home=None)[source]

Load YAGO3-10 dataset. See here for paper by Dettmers et al. originally presenting the dataset.

Parameters

data_home (str, optional) – Path to the torchkge_data directory (containing data folders). If files are not present on disk in this directory, they are downloaded and then placed in the right place.

Returns

  • kg_train (torchkge.data_structures.KnowledgeGraph)

  • kg_val (torchkge.data_structures.KnowledgeGraph)

  • kg_test (torchkge.data_structures.KnowledgeGraph)

torchkge.utils.datasets.load_wikidatasets(which, limit_=0, data_home=None)[source]

Load WikiDataSets dataset. See here for paper by Boschin et al. originally presenting the dataset.

Parameters
  • which (str) – String indicating which subset of Wikidata should be loaded. Available ones are humans, companies, animals, countries and films.

  • limit (int, optional (default=0)) – This indicates a lower limit on the number of neighbors an entity should have in the graph to be kept.

  • data_home (str, optional) – Path to the torchkge_data directory (containing data folders). If files are not present on disk in this directory, they are downloaded and then placed in the right place.

Returns

kg

Return type

torchkge.data_structures.KnowledgeGraph

torchkge.utils.datasets.load_wikidata_vitals(level=5, data_home=None)[source]

Load knowledge graph extracted from Wikidata using the entities corresponding to Wikipedia pages contained in Wikivitals. See here for details on Wikivitals and Wikivitals+ datasets.

Parameters
  • level (int (default=5)) – Either 4 or 5.

  • data_home (str, optional) – Path to the torchkge_data directory (containing data folders). If files are not present on disk in this directory, they are downloaded and then placed in the right place.

Returns

  • kg (torchkge.data_structures.KnowledgeGraph)

  • kg_attr (torchkge.data_structures.KnowledgeGraph)

Pre-trained models

TransE model

Model

Dataset

Dimension

Test MRR

Filtered Test MRR

TransE

FB15k

100

0.250

0.420

TransE

FB15k237

150

0.187

0.287

TransE

WDV5

150

0.258

0.305

TransE

WN18RR

100

0.201

0.236

TransE

Yago3-10

200

0.143

0.261

torchkge.utils.pretrained_models.load_pretrained_transe(dataset, emb_dim=None, data_home=None)[source]

Load a pretrained version of TransE model.

Parameters
  • dataset (str) –

  • emb_dim (int (opt, default None)) – Embedding dimension

  • data_home (str (opt, default None)) – Path to the torchkge_data directory (containing data folders). Useful for pre-trained model loading.

Returns

model – Pretrained version of TransE model.

Return type

TorchKGE.model.translation.TransEModel

RESCAL Model

Model

Dataset

Dimension

Test MRR

Filtered Test MRR

RESCAL

FB15k237

200

0.180

0.307

RESCAL

WN18RR

150

0.273

0.424

RESCAL

Yago3-10

200

0.127

0.334

torchkge.utils.pretrained_models.load_pretrained_rescal(dataset, emb_dim=None, data_home=None)[source]

Load a pretrained version of RESCAL model.

Parameters
  • dataset (str) –

  • emb_dim (int (opt, default None)) – Embedding dimension

  • data_home (str (opt, default None)) – Path to the torchkge_data directory (containing data folders). Useful for pre-trained model loading.

Returns

model – Pretrained version of RESCAL model.

Return type

TorchKGE.model.translation.RESCALModel

ComplEx Model

Model

Dataset

Dimension

Test MRR

Filtered Test MRR

ComplEx

FB15k237

200

0.180

0.308

ComplEx

WN18RR

200

0.290

0.455

ComplEx

WDV5

200

0.283

0.371

ComplEx

Yago3-10

200

0.164

0.421

torchkge.utils.pretrained_models.load_pretrained_complex(dataset, emb_dim=None, data_home=None)[source]

Load a pretrained version of ComplEx model.

Parameters
  • dataset (str) –

  • emb_dim (int (opt, default None)) – Embedding dimension

  • data_home (str (opt, default None)) – Path to the torchkge_data directory (containing data folders). Useful for pre-trained model loading.

Returns

model – Pretrained version of ComplEx model.

Return type

TorchKGE.model.translation.ComplExModel

Data redundancy

torchkge.utils.data_redundancy.duplicates(kg_tr, kg_val, kg_te, theta1=0.8, theta2=0.8, verbose=False, counts=False, reverses=None)[source]

Return the duplicate and reverse duplicate relations as explained in paper by Akrami et al.

References

Parameters
Returns

  • duplicates (list) – List of pairs giving duplicate relations.

  • rev_duplicates (list) – List of pairs giving reverse duplicate relations.

torchkge.utils.data_redundancy.count_triplets(kg1, kg2, duplicates, rev_duplicates)[source]
Parameters
Returns

  • n_duplicates (int) – Number of triplets in kg2 that have their duplicate triplet in kg1

  • n_rev_duplicates (int) – Number of triplets in kg2 that have their reverse duplicate triplet in kg1.

torchkge.utils.data_redundancy.cartesian_product_relations(kg_tr, kg_val, kg_te, theta=0.8)[source]

Return the cartesian product relations as explained in paper by Akrami et al.

References

Parameters
Returns

selected_relations – List of relations index that are cartesian product relations (see paper for details).

Return type

list

Dissimilarities

torchkge.utils.dissimilarities.l1_dissimilarity(a, b)[source]

Compute dissimilarity between rows of a and b as \(||a-b||_1\).

torchkge.utils.dissimilarities.l2_dissimilarity(a, b)[source]

Compute dissimilarity between rows of a and b as \(||a-b||_2^2\).

torchkge.utils.dissimilarities.l1_torus_dissimilarity(a, b)[source]

See paper by Ebisu et al. for details about the definition of this dissimilarity function.

torchkge.utils.dissimilarities.l2_torus_dissimilarity(a, b)[source]

See paper by Ebisu et al. for details about the definition of this dissimilarity function.

torchkge.utils.dissimilarities.el2_torus_dissimilarity(a, b)[source]

See paper by Ebisu et al. for details about the definition of this dissimilarity function.

Losses

class torchkge.utils.losses.MarginLoss(margin)[source]

Margin loss as it was defined in TransE paper by Bordes et al. in 2013. This class implements torch.nn.Module interface.

forward(positive_triplets, negative_triplets)[source]
Parameters
  • positive_triplets (torch.Tensor, dtype: torch.float, shape: (b_size)) – Scores of the true triplets as returned by the forward methods of the models.

  • negative_triplets (torch.Tensor, dtype: torch.float, shape: (b_size)) – Scores of the negative triplets as returned by the forward methods of the models.

Returns

loss – Loss of the form \(\max\{0, \gamma - f(h,r,t) + f(h',r',t')\}\) where \(\gamma\) is the margin (defined at initialization), \(f(h,r,t)\) is the score of a true fact and \(f(h',r',t')\) is the score of the associated negative fact.

Return type

torch.Tensor, shape: (n_facts, dim), dtype: torch.float

class torchkge.utils.losses.LogisticLoss[source]

Logistic loss as it was defined in TransE paper by Bordes et al. in 2013. This class implements torch.nn.Module interface.

forward(positive_triplets, negative_triplets)[source]
Parameters
  • positive_triplets (torch.Tensor, dtype: torch.float, shape: (b_size)) – Scores of the true triplets as returned by the forward methods of the models.

  • negative_triplets (torch.Tensor, dtype: torch.float, shape: (b_size)) – Scores of the negative triplets as returned by the forward methods of the models.

Returns

loss – Loss of the form \(\log(1+ \exp(\eta \times f(h,r,t))\) where \(f(h,r,t)\) is the score of the fact and \(\eta\) is either 1 or -1 if the fact is true or false.

Return type

torch.Tensor, shape: (n_facts, dim), dtype: torch.float

class torchkge.utils.losses.BinaryCrossEntropyLoss[source]

This class implements torch.nn.Module interface.

forward(positive_triplets, negative_triplets)[source]
Parameters
  • positive_triplets (torch.Tensor, dtype: torch.float, shape: (b_size)) – Scores of the true triplets as returned by the forward methods of the models.

  • negative_triplets (torch.Tensor, dtype: torch.float, shape: (b_size)) – Scores of the negative triplets as returned by the forward methods of the models.

Returns

loss – Loss of the form \(-\eta \cdot \log(f(h,r,t)) + (1-\eta) \cdot \log(1 - f(h,r,t))\) where \(f(h,r,t)\) is the score of the fact and \(\eta\) is either 1 or 0 if the fact is true or false.

Return type

torch.Tensor, shape: (n_facts, dim), dtype: torch.float

Training wrappers

class torchkge.utils.training.TrainDataLoader(kg, batch_size, sampling_type, use_cuda=None)[source]

Dataloader providing the training process with batches of true and negatively sampled facts.

Parameters
  • kg (torchkge.data_structures.KnowledgeGraph) – Dataset to be divided in batches.

  • batch_size (int) – Size of the batches.

  • sampling_type (str) – Either ‘unif’ (uniform negative sampling) or ‘bern’ (Bernoulli negative sampling).

  • use_cuda (str (opt, default = None)) – Can be either None (no use of cuda at all), ‘all’ to move all the dataset to cuda and then split in batches or ‘batch’ to simply move the batches to cuda before they are returned.

class torchkge.utils.training.Trainer(model, criterion, kg_train, n_epochs, batch_size, optimizer, sampling_type='bern', use_cuda=None)[source]

This class simply wraps a simple training procedure.

Parameters
  • model (torchkge.models.interfaces.Model) – Model to be trained.

  • criterion – Criteria which should differentiate positive and negative scores. Can be an elements of torchkge.utils.losses

  • kg_train (torchkge.data_structures.KnowledgeGraph) – KG used for training.

  • n_epochs (int) – Number of epochs in the training procedure.

  • batch_size (int) – Number of batches to use.

  • sampling_type (str) – Either ‘unif’ (uniform negative sampling) or ‘bern’ (Bernoulli negative sampling).

  • use_cuda (str (opt, default = None)) – Can be either None (no use of cuda at all), ‘all’ to move all the dataset to cuda and then split in batches or ‘batch’ to simply move the batches to cuda before they are returned.

get_counter_examples() Optional[SmallKG][source]

Retrieve the counter-examples generated while training the model.

If the model has not been trained yet, return None

Return type

A simple knowledge graph containing the triplets that were used as counter-examples during the training phase.