SkillAgentSearch skills...

Entity2vec

Generates a set of property-specific entity embeddings from knowledge graphs using node2vec

Install / Use

/learn @D2KLab/Entity2vec
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

entity2vec

entity2vec computes vector representations of Knowledge Graph entities that preserve semantic similarities and are suitable for classification tasks. It generates a set of property-specific entity embeddings by running node2vec on property specific subgraphs, i.e. K(p) = (s,p,o) where p is a given property. The repository includes:

  • A reimplementation of node2vec, which introduces the possibility of avoiding the preprocessing of the transition probabilities, which has the effect of reducing memory effort, but slowing down the computation

  • entity2vec, which generates a set of entity embeddings from Knowledge Graphs corresponding to different properties. Entity2vec can work with a set of pre-downloaded dumps or download them from a SPARQL endpoint.

Requirements

  • Python 2.7 or above
  • numpy
  • gensim
  • networkx
  • pandas
  • SPARQL Wrapper

If you are using pip:

    pip install -r requirements.txt

Property-specific entity embeddings

The set of properties can be defined in the configuration file config/properties.json, otherwise the software will run on each file that is located in datasets/your_dataset/graphs or if a SPARQL endpoint is provided, it will download all the graphs for all properties in datasets/your_dataset/graphs.

    python src/entity2vec.py --dataset dataset --config_file config_file --entities entities --sparql sparql --default_graph default_graph

Alternatively, e2v can be loaded as a module and used like:

    from entity2vec.entity2vec import Entity2Vec

    e2v = Entity2Vec(False, False, False, 1, 1, 10, 5,
                 128, 10, 8, 5, 'path/to/properties.json', False,
                 'dataset_name', 'all', False, False,
                 'path/to/feedback.edgelist')

|option | default |description| |----------------|------------------------|-----------| |dataset | null (Required) | name of the dataset. It will be used to create folders and retrieve properties from config file| |config_file | config/properties.json | path of the configuration file |entities | all | a list of entities for which the embeddings have to be computed. By default, it will use them all.| |sparql | null | endpoint from which property-specific graphs are obtained. If not provided, it assumes that the graphs are already stored in datasets/your_dataset/graphs | |default_graph | null | whether using a default_graph in the SPARQL endpoint | |num_walks | 500 | number of random walks per entity | |feedback_file | null | Path to a DAT file that contains all the couples user-item. If not defined, it assumes that is the file datasets/<my_dataset>/graphs/feedback.edgelist |

Entity classification

Generate unique vector representation for an entity, without considering the role of semantic properties, to use in classification tasks.

  1. Create empty directory called emb

  2. Run node2vec on the whole graph to create a single global embedding of the entity

     python src/node2vec.py --input datasets/aifb/aifb.edgelist --output emb/aifb_p1_q4.emd  --p 1 --q 4
    
  3. Obtain scores, e.g.:

     cd ml
    
     python rdf_predict.py --dataset aifb --emb ../emb/aifb_p1_q4.emd --dimension 500
    
View on GitHub
GitHub Stars78
CategoryProduct
Updated4mo ago
Forks24

Languages

Python

Security Score

97/100

Audited on Nov 30, 2025

No findings