Gsitk

gsitk is a framework to perform a wide variety of sentiment analysis tasks including dataset acquisition, text preprocessing, model design, and performance evaluation.

Generate Convert Improve

Install / Use

/learn @gsi-upm/Gsitk

About this skill

Quality Score

0/100

README

GSITK project

gsitk is a library on top of scikit-learn that eases the development process on NLP machine learning driven projects. It uses numpy, pandas and related libraries to easy the development.

gsitk manages datasets, features, classifiers and evaluation techniques, so that writing an evaluation pipeline results fast and simple.

Full documentation can be found here.

Installation and use

Installation

gsitk can be installed via pip, which is the recommended way:

pip install gsitk

Alternatively, gsitk can be installed by cloning this repository.

Using gsitk

gsitk saves into disk the datasets and some other necessary resources. By default, all these data are stored in /data. The environment variable $DATA_PATH can be set in order to specify an alternative directory.

Feature extraction examples

SIMON feature extractor

gsitk includes the implementation of the SIMON feature extractor, presented in this paper. To use it, two things are needed:

A sentiment lexicon
A word embeddings model that is gensim compatible.

For example, using only the lexicon from Bing Liu and a embeddings model that is in the current directory:

from gsitk.features import simon
from nltk.corpus import opinion_lexicon
from gensim.models.keyedvectors import KeyedVectors

lexicon = [list(opinion_lexicon.positive()), list(opinion_lexicon.negative())]

embedding_model = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)

simon_transformer = simon.Simon(lexicon=lexicon, n_lexicon_words=200, embedding=embedding_model)

# simon_transformer has the fit() and transform() methods, so it can be used in a Pipeline

To enhance performance, it is recommendable to use a more complete scikit-learn pipe that implements normalization and feature selection in conjuction with the SIMON feature extraction.

from gsitk.features import simon

simon_model = simon.Simon(lexicon=lexicon, n_lexicon_words=200, embedding=embedding_model)
model = simon.simon_pipeline(simon_transformer=simon_model, percentile=25)

# model also implements fit() and transform()

Word2VecFeatures

This feature extractor implements the generic word vector model presented in this paper. An example of use is shown below:

from gsitk.features.word2vec import Word2VecFeatures


text = [
    ['my', 'cat', 'is', 'totally', 'happy'],
    ['my', 'dog', 'is', 'very', 'sad'],
]

# path is set to a Word2Vec model
# convolution parameter encondes pooling operation [average, maximum, minimum]

w2v_extractor = Word2VecFeatures(w2v_model_path=path, w2v_format='google_txt', convolution=[1,0,0])
X = model.transform(text)
# X is and array containing extrated features

Cite

In you use this module, please cite the following papers:

Enhancing Deep Learning Sentiment Analysis with Ensemble Techniques in Social Applications

@article{ARAQUE2017236,
title = "Enhancing deep learning sentiment analysis with ensemble techniques in social applications",
journal = "Expert Systems with Applications",
volume = "77",
pages = "236 - 246",
year = "2017",
issn = "0957-4174",
doi = "https://doi.org/10.1016/j.eswa.2017.02.002",
url = "http://www.sciencedirect.com/science/article/pii/S0957417417300751",
author = "Oscar Araque and Ignacio Corcuera-Platas and J. Fernando Sánchez-Rada and Carlos A. Iglesias",
keywords = "Ensemble, Deep learning, Sentiment analysis, Machine learning, Natural language processing"
}

A Semantic Similarity-based Perspective of Affect Lexicons for Sentiment Analysis

@article{ARAQUE2019346,
title = "A semantic similarity-based perspective of affect lexicons for sentiment analysis",
journal = "Knowledge-Based Systems",
volume = "165",
pages = "346 - 359",
year = "2019",
issn = "0950-7051",
doi = "https://doi.org/10.1016/j.knosys.2018.12.005",
url = "http://www.sciencedirect.com/science/article/pii/S0950705118305926",
author = "Oscar Araque and Ganggao Zhu and Carlos A. Iglesias",
keywords = "Sentiment analysis, Sentiment lexicon, Semantic similarity, Word embeddings",
}

Support

If you find bugs or want to make feature requests, please post an issue here. This project is under active development.

Acknowledgements

This research work is supported by the EC through the H2020 project MixedEmotions (Grant Agreement no: 141111), the Spanish Ministry of Economy under the R&D project Semola (TEC2015-68284-R) and the project EmoSpaces (RTC-2016-5053-7); by ITEA3 project SOMEDI (15011); and by MOSI-AGIL-CM (grant P2013/ICE-3019, co-funded by EU Structural Funds FSE and FEDER).

Contact

If you want to contact the developer, please send an email to o.araque@upm.

Related Skills

claude-opus-4-5-migration

110.7k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5

model-usage

351.4k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

mcp-for-beginners

15.8k

This open-source curriculum introduces the fundamentals of Model Context Protocol (MCP) through real-world, cross-language examples in .NET, Java, TypeScript, JavaScript, Rust and Python. Designed for developers, it focuses on practical techniques for building modular, scalable, and secure AI workflows from session setup to service orchestration.

clearshot

Structured screenshot analysis for UI implementation and critique. Analyzes every UI screenshot with a 5×5 spatial grid, full element inventory, and design system extraction — facts and taste together, every time. Escalates to full implementation blueprint when building. Trigger on any digital interface image file (png, jpg, gif, webp — websites, apps, dashboards, mockups, wireframes) or commands like 'analyse this screenshot,' 'rebuild this,' 'match this design,' 'clone this.' Skip for non-UI images (photos, memes, charts) unless the user explicitly wants to build a UI from them. Does NOT trigger on HTML source code, CSS, SVGs, or any code pasted as text.

gsi-upm

View profile

View on GitHub

GitHub Stars5

CategoryDesign

Updated4y ago

Forks5

gsi-upm/gsitk

Languages

Python

Security Score

75/100

Audited on Jan 26, 2022

No findings