TextGraphs.jl
Graph representations of text
Install / Use
/learn @fargolo/TextGraphs.jlREADME
TextGraphs
<!--- [](https://fargolo.github.io/TextGraphs.jl/stable) -->Introduction
TextGraphs.jl offers Graphs representations of Text, along with natural language proccessing (NLP) functionalities. Check the white paper including vignettes with examples.
This package is inspired by SpeechGraphs. TextGraphs.jl new features include pre-processing (e.g.lemmas), properties (e.g. centrality) and latent space embeddings (adding latent semantic information to graphs).
Julia uses multiple dispatching, focusing on modular functions and high-performance computing.

Quick introduction
Check the documentation and the white paper for further information.
See the poster presentation at JuliaCon22:
Install
Install with Pkg.
pkg>add TextGraphs
You should also have R and package udpipe available.
$sudo apt install r-base
$sudo Rscript -e 'install.packages("udpipe")'
Features
Graph types
You can build the following graphs from text (AbstractString):
Raw
- Naive (
naive_graph) uses the original sequence of words. - Phrases Graph(
phrases_graph): Uses the original sequence of phrases.
POS, Stems and Lemmas
- Stem (
stem_graph) uses stemmed words. - Lemma (
lemma_graph): Uses lemmatized words. - Part of Speech Graph (POS,
pos_graph) uses syntactical functions.
Latent space embeddings
- Latent space embedding (LSE,
latent_space_graph) graphs. - Latent space embeddings to target (
latent_space_graph)
Properties
You can obtain several properties of the graphs:
Direct measures
graph_props returns values of density, # of self loops, # of SCCs, size of largest SCC, and mean centrality (betweeness, closeness and eigenvector methods).
Erdős–Rényi ratios
rand_erdos_props returns values as compared to random Erdõs-Rényi graph with identical number of vertices and edges through z-score or ratio to average.
Usage
julia>using TextGraphs
julia>naive_graph("Sample for graph")
{3, 2} directed Int64 metagraph with Float64 weights defined by :weight (default weight 1.0)
julia>stem_graph("Sample for graph";snowball_language="english") # Optional keyword argument
{3, 2} directed Int64 metagraph with Float64 weights defined by :weight (default weight 1.0)
julia> graph_props(naive_graph("Sample for graph"))
Dict{String, Real} with 7 entries:
"mean_close_centr" => 0.388889
"size_largest_scc" => 1
"num_strong_connect_comp" => 3
"density" => 0.333333
"num_self_loops" => 0
"mean_between_centr" => 0.166667
"mean_eig_centr" => 0.333335
Plot
using GraphMakie , GLMakie
g = naive_graph("Colorless green ideas sleep furiously")
stem_g = stem_graph("No meio do caminho tinha uma pedra tinha uma pedra no meio do caminho")
g_labels = map(x -> get_prop(naive_g,x,:token), collect(1:nv(naive_g)))
stem_g_labels = map(x -> get_prop(stem_g,x,:token), collect(1:nv(stem_g)))
graphplot(naive_g,nlabels=g_labels)
graphplot(stem_g,nlabels=stem_g_labels)
spec3_layout = Spectral(dim=3)
graphplot(naive_g,node_size=30,nlabels=g_labels,layout=spec3_layout)
Available options
Besides SpeechGraphs, there's a previous object-oriented Python implementation by github/facuzeta.


