Data Structure

Knowledge Graph

class torchkge.data_structures.KnowledgeGraph(df=None, kg=None, ent2ix=None, rel2ix=None, dict_of_heads=None, dict_of_tails=None, dict_of_rels=None)[source]

Knowledge graph representation. At least one of df and kg parameters should be passed.

Parameters
  • df (pandas.DataFrame, optional) – Data frame containing three columns [from, to, rel].

  • kg (dict, optional) – Dictionary with keys (‘heads’, ‘tails’, ‘relations’) and values the corresponding torch long tensors.

  • ent2ix (dict, optional) – Dictionary mapping entity labels to their integer key. This is computed if not passed as argument.

  • rel2ix (dict, optional) – Dictionary mapping relation labels to their integer key. This is computed if not passed as argument.

  • dict_of_heads (dict, optional) – Dictionary of possible heads \(h\) so that the triple \((h,r,t)\) gives a true fact. The keys are tuples (t, r). This is computed if not passed as argument.

  • dict_of_tails (dict, optional) – Dictionary of possible tails \(t\) so that the triple \((h,r,t)\) gives a true fact. The keys are tuples (h, r). This is computed if not passed as argument.

  • dict_of_rels (dict, optional) – Dictionary of possible relations \(r\) so that the triple \((h,r,t)\) gives a true fact. The keys are tuples (h, t). This is computed if not passed as argument.

ent2ix

Dictionary mapping entity labels to their integer key.

Type

dict

rel2ix

Dictionary mapping relation labels to their integer key.

Type

dict

n_ent

Number of distinct entities in the data set.

Type

int

n_rel

Number of distinct entities in the data set.

Type

int

n_facts

Number of samples in the data set. A sample is a fact: a triplet (h, r, l).

Type

int

head_idx

List of the int key of heads for each fact.

Type

torch.Tensor, dtype = torch.long, shape: (n_facts)

tail_idx

List of the int key of tails for each fact.

Type

torch.Tensor, dtype = torch.long, shape: (n_facts)

relations

List of the int key of relations for each fact.

Type

torch.Tensor, dtype = torch.long, shape: (n_facts)

evaluate_dicts()[source]

Evaluates dicts of possible alternatives to an entity in a fact that still gives a true fact in the entire knowledge graph.

get_df()[source]

Returns a Pandas DataFrame with columns [‘from’, ‘to’, ‘rel’].

get_mask(share, validation=False)[source]

Returns masks to split knowledge graph into train, test and optionally validation sets. The mask is first created by dividing samples between subsets based on relation equilibrium. Then if any entity is not present in the training subset it is manually added by assigning a share of the sample involving the missing entity either as head or tail.

Parameters
  • share (float) –

  • validation (bool) –

Returns

  • mask (torch.Tensor, shape: (n), dtype: torch.bool)

  • mask_val (torch.Tensor, shape: (n), dtype: torch.bool (optional))

  • mask_te (torch.Tensor, shape: (n), dtype: torch.bool)

static get_sizes(count, share, validation=False)[source]

With count samples, returns how many should go to train and test

split_kg(share=0.8, sizes=None, validation=False)[source]

Split the knowledge graph into train and test. If sizes is provided then it is used to split the samples as explained below. If only share is provided, the split is done at random but it assures to keep at least one fact involving each type of entity and relation in the training subset.

Parameters
  • share (float) – Percentage to allocate to train set.

  • sizes (tuple) –

    Tuple of ints of length 2 or 3.

    • If len(sizes) == 2, then the first sizes[0] values of the knowledge graph will be used as training set and the rest as test set.

    • If len(sizes) == 3, then the first sizes[0] values of the knowledge graph will be used as training set, the following sizes[1] as validation set and the last sizes[2] as testing set.

  • validation (bool) – Indicate if a validation set should be produced along with train and test sets.

Returns

  • train_kg (torchkge.data_structures.KnowledgeGraph)

  • val_kg (torchkge.data_structures.KnowledgeGraph, optional)

  • test_kg (torchkge.data_structures.KnowledgeGraph)

Small KG

class torchkge.data_structures.SmallKG(heads, tails, relations)[source]

Minimalist version of a knowledge graph. Built with tensors of heads, tails and relations.