Texar
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Install / Use
/learn @asyml/TexarREADME
Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides a library of easy-to-use ML modules and functionalities for composing whatever models and algorithms. The tool is designed for both researchers and practitioners for fast prototyping and experimentation.
Texar was originally developed and is actively contributed by Petuum and CMU in collaboration with other institutes. A mirror of this repository is maintained by Petuum Open Source.
Key Features
- Two Versions, (Mostly) Same Interfaces. Texar-TensorFlow (this repo) and Texar-PyTorch have mostly the same interfaces. Both further combine the best design of TF and PyTorch:
- Interfaces and variable sharing in PyTorch convention
- Excellent factorization and rich functionalities in TF convention.
- Rich Pre-trained Models, Rich Usage with Uniform Interfaces. BERT, GPT2, XLNet, etc, for encoding, classification, generation, and composing complex models with other Texar components!
- Fully Customizable at multiple abstraction level -- both novice-friendly and expert-friendly.
- Free to plug in whatever external modules, since Texar is fully compatible with the native TF/PyTorch APIs.
- Versatile to support broad tasks, models, algorithms, data processing, evaluation, etc.
- encoder(s) to decoder(s), sequential- and self-attentions, memory, hierarchical models, classifiers...
- maximum likelihood learning, reinforcement learning, adversarial learning, probabilistic modeling, ...
- Modularized for maximal re-use and clean APIs, based on principled decomposition of Learning-Inference-Model Architecture.
- Distributed model training with multiple GPUs.
- Clean, detailed documentation and rich examples.
Library API Example
Builds an encoder-decoder model, with maximum likelihood learning:
import texar.tf as tx
# Data
data = tx.data.PairedTextData(hparams=hparams_data) # a dict of hyperparameters
iterator = tx.data.DataIterator(data)
batch = iterator.get_next() # get a data mini-batch
# Model architecture
embedder = tx.modules.WordEmbedder(data.target_vocab.size, hparams=hparams_emb)
encoder = tx.modules.TransformerEncoder(hparams=hparams_enc)
outputs_enc = encoder(inputs=embedder(batch['source_text_ids']), # call as a function
sequence_length=batch['source_length'])
decoder = tx.modules.TransformerDecoder(
output_layer=tf.transpose(embedder.embedding) # tie input embedding w/ output layer
hparams=hparams_decoder)
outputs, _, _ = decoder(memory=output_enc,
memory_sequence_length=batch['source_length'],
inputs=embedder(batch['target_text_ids']),
sequence_length=batch['target_length']-1,
decoding_strategy='greedy_train') # teacher-forcing decoding
# Loss for maximum likelihood learning
loss = tx.losses.sequence_sparse_softmax_cross_entropy(
labels=batch['target_text_ids'][:, 1:],
logits=outputs.logits,
sequence_length=batch['target_length']-1) # automatic sequence masks
# Beam search decoding
outputs_bs, _, _ = tx.modules.beam_search_decode(
decoder,
embedding=embedder,
start_tokens=[data.target_vocab.bos_token_id]*num_samples,
end_token=data.target_vocab.eos_token_id)
The same model, but with adversarial learning:
helper = tx.modules.GumbelSoftmaxTraingHelper( # Gumbel-softmax decoding
start_tokens=[BOS]*batch_size, end_token=EOS, embedding=embedder)
outputs, _ = decoder(helper=helper) # automatic re-use of the decoder variables
discriminator = tx.modules.BertClassifier(hparams=hparams_bert) # pre-trained model
G_loss, D_loss = tx.losses.binary_adversarial_losses(
real_data=data['target_text_ids'][:, 1:],
fake_data=outputs.sample_id,
discriminator_fn=discriminator)
The same model, but with RL policy gradient learning:
agent = tx.agents.SeqPGAgent(samples=outputs.sample_id,
logits=outputs.logits,
sequence_length=batch['target_length']-1,
hparams=config_model.agent)
Many more examples are available here
Installation
(Note: Texar>0.2.3 requires Python 3.6 or 3.7. To use with older Python versions, please use Texar<=0.2.3)
Texar requires:
tensorflow >= 1.10.0 (but < 2.0.0). Follow the tensorflow official instructions to install the appropriate versiontensorflow_probability >= 0.3.0 (but < 0.8.0). Follow the tensorflow_probability official instractions to install.
After tensorflow and tensorflow_probability are installed, install Texar from PyPI:
pip install texar
To use cutting-edge features or develop locally, install from source:
git clone https://github.com/asyml/texar.git
cd texar
pip install .
Getting Started
Reference
If you use Texar, please cite the tech report with the following BibTex entry:
Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation
Zhiting Hu, Haoran Shi, Bowen Tan, Wentao Wang, Zichao Yang, Tiancheng Zhao, Junxian He, Lianhui Qin, Di Wang, Xuezhe Ma, Zhengzhong Liu, Xiaodan Liang, Wanrong Zhu, Devendra Sachan and Eric Xing
ACL 2019
@inproceedings{hu2019texar,
title={Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation},
author={Hu, Zhiting and Shi, Haoran and Tan, Bowen and Wang, Wentao and Yang, Zichao and Zhao, Tiancheng and He, Junxian and Qin, Lianhui and Wang, Di and others},
booktitle={ACL 2019, System Demonstrations},
year={2019}
}
License
Companies and Universities Supporting Texar
<p float="left"> <img src="https://github.com/asyml/texar/blob/master/docs/_static/img/Petuum.png" width="200" align="top"> <img src="https://asyml.io/assets/institutions/cmu.png", width="200" align="top"> </p>Related Skills
openai-image-gen
329.0kBatch-generate images via OpenAI Images API. Random prompt sampler + `index.html` gallery.
claude-opus-4-5-migration
81.1kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
model-usage
329.0kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
TrendRadar
49.5k⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构,赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ,数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。
