entity2vec

entity2vec computes vector representations of Knowledge Graph entities that preserve semantic similarities and are suitable for classification tasks. It generates a set of property-specific entity embeddings by running node2vec on property specific subgraphs, i.e. K(p) = (s,p,o) where p is a given property. The repository includes:

A reimplementation of node2vec, which introduces the possibility of avoiding the preprocessing of the transition probabilities, which has the effect of reducing memory effort, but slowing down the computation
entity2vec, which generates a set of entity embeddings from Knowledge Graphs corresponding to different properties. Entity2vec can work with a set of pre-downloaded dumps or download them from a SPARQL endpoint.

Requirements

Python 2.7 or above
numpy
gensim
networkx
pandas
SPARQL Wrapper

If you are using pip:

    pip install -r requirements.txt

Property-specific entity embeddings

The set of properties can be defined in the configuration file config/properties.json, otherwise the software will run on each file that is located in datasets/your_dataset/graphs or if a SPARQL endpoint is provided, it will download all the graphs for all properties in datasets/your_dataset/graphs.

    python src/entity2vec.py --dataset dataset --config_file config_file --entities entities --sparql sparql --default_graph default_graph

Alternatively, e2v can be loaded as a module and used like:

    from entity2vec.entity2vec import Entity2Vec

    e2v = Entity2Vec(False, False, False, 1, 1, 10, 5,
                 128, 10, 8, 5, 'path/to/properties.json', False,
                 'dataset_name', 'all', False, False,
                 'path/to/feedback.edgelist')

|option | default |description| |----------------|------------------------|-----------| |dataset | null (Required) | name of the dataset. It will be used to create folders and retrieve properties from config file| |config_file | config/properties.json | path of the configuration file |entities | all | a list of entities for which the embeddings have to be computed. By default, it will use them all.| |sparql | null | endpoint from which property-specific graphs are obtained. If not provided, it assumes that the graphs are already stored in datasets/your_dataset/graphs | |default_graph | null | whether using a default_graph in the SPARQL endpoint | |num_walks | 500 | number of random walks per entity | |feedback_file | null | Path to a DAT file that contains all the couples user-item. If not defined, it assumes that is the file datasets/<my_dataset>/graphs/feedback.edgelist |

Entity classification

Generate unique vector representation for an entity, without considering the role of semantic properties, to use in classification tasks.

Create empty directory called emb

Run node2vec on the whole graph to create a single global embedding of the entity

 python src/node2vec.py --input datasets/aifb/aifb.edgelist --output emb/aifb_p1_q4.emd  --p 1 --q 4

Obtain scores, e.g.:

 cd ml

 python rdf_predict.py --dataset aifb --emb ../emb/aifb_p1_q4.emd --dimension 500

Entity2vec

Install / Use

README

entity2vec

Requirements

Property-specific entity embeddings

Entity classification