Data Structure
Knowledge Graph
- class torchkge.data_structures.KnowledgeGraph(df=None, kg=None, ent2ix=None, rel2ix=None, dict_of_heads=None, dict_of_tails=None, dict_of_rels=None)[source]
Knowledge graph representation. At least one of df and kg parameters should be passed.
- Parameters
df (pandas.DataFrame, optional) – Data frame containing three columns [from, to, rel].
kg (dict, optional) – Dictionary with keys (‘heads’, ‘tails’, ‘relations’) and values the corresponding torch long tensors.
ent2ix (dict, optional) – Dictionary mapping entity labels to their integer key. This is computed if not passed as argument.
rel2ix (dict, optional) – Dictionary mapping relation labels to their integer key. This is computed if not passed as argument.
dict_of_heads (dict, optional) – Dictionary of possible heads \(h\) so that the triple \((h,r,t)\) gives a true fact. The keys are tuples (t, r). This is computed if not passed as argument.
dict_of_tails (dict, optional) – Dictionary of possible tails \(t\) so that the triple \((h,r,t)\) gives a true fact. The keys are tuples (h, r). This is computed if not passed as argument.
dict_of_rels (dict, optional) – Dictionary of possible relations \(r\) so that the triple \((h,r,t)\) gives a true fact. The keys are tuples (h, t). This is computed if not passed as argument.
- ent2ix
Dictionary mapping entity labels to their integer key.
- Type
dict
- rel2ix
Dictionary mapping relation labels to their integer key.
- Type
dict
- n_ent
Number of distinct entities in the data set.
- Type
int
- n_rel
Number of distinct entities in the data set.
- Type
int
- n_facts
Number of samples in the data set. A sample is a fact: a triplet (h, r, l).
- Type
int
- head_idx
List of the int key of heads for each fact.
- Type
torch.Tensor, dtype = torch.long, shape: (n_facts)
- tail_idx
List of the int key of tails for each fact.
- Type
torch.Tensor, dtype = torch.long, shape: (n_facts)
- relations
List of the int key of relations for each fact.
- Type
torch.Tensor, dtype = torch.long, shape: (n_facts)
- evaluate_dicts()[source]
Evaluates dicts of possible alternatives to an entity in a fact that still gives a true fact in the entire knowledge graph.
- get_mask(share, validation=False)[source]
Returns masks to split knowledge graph into train, test and optionally validation sets. The mask is first created by dividing samples between subsets based on relation equilibrium. Then if any entity is not present in the training subset it is manually added by assigning a share of the sample involving the missing entity either as head or tail.
- Parameters
share (float) –
validation (bool) –
- Returns
mask (torch.Tensor, shape: (n), dtype: torch.bool)
mask_val (torch.Tensor, shape: (n), dtype: torch.bool (optional))
mask_te (torch.Tensor, shape: (n), dtype: torch.bool)
- static get_sizes(count, share, validation=False)[source]
With count samples, returns how many should go to train and test
- split_kg(share=0.8, sizes=None, validation=False)[source]
Split the knowledge graph into train and test. If sizes is provided then it is used to split the samples as explained below. If only share is provided, the split is done at random but it assures to keep at least one fact involving each type of entity and relation in the training subset.
- Parameters
share (float) – Percentage to allocate to train set.
sizes (tuple) –
Tuple of ints of length 2 or 3.
If len(sizes) == 2, then the first sizes[0] values of the knowledge graph will be used as training set and the rest as test set.
If len(sizes) == 3, then the first sizes[0] values of the knowledge graph will be used as training set, the following sizes[1] as validation set and the last sizes[2] as testing set.
validation (bool) – Indicate if a validation set should be produced along with train and test sets.
- Returns
train_kg (torchkge.data_structures.KnowledgeGraph)
val_kg (torchkge.data_structures.KnowledgeGraph, optional)
test_kg (torchkge.data_structures.KnowledgeGraph)