SkillAgentSearch skills...

Pyalex

A Python library for OpenAlex (openalex.org)

Install / Use

/learn @J535D165/Pyalex

README

<p align="center"> <img alt="PyAlex - a Python wrapper for OpenAlex" src="https://github.com/J535D165/pyalex/raw/main/pyalex_repocard.svg"> </p>

PyAlex

PyPI DOI

PyAlex is a Python library for OpenAlex. OpenAlex is an index of hundreds of millions of interconnected scholarly papers, authors, institutions, and more. OpenAlex offers a robust, open, and free REST API to extract, aggregate, or search scholarly data. PyAlex is a lightweight and thin Python interface to this API. PyAlex tries to stay as close as possible to the design of the original service.

The following entities of OpenAlex are currently supported by PyAlex:

  • [x] Work
  • [x] Author
  • [x] Source
  • [x] Institution
  • [x] Concept
  • [x] Topic
  • [x] Publisher
  • [x] Funder

Including the following functionality:

  • [x] Get single entities
  • [x] Filter entities
  • [x] Search entities
  • [x] Semantic search (find similar works)
  • [x] Group entities
  • [x] Search filters
  • [x] Select fields
  • [x] Sample
  • [x] Pagination
  • [x] Autocomplete endpoint
  • [x] N-grams [Deprecated by OpenAlex]
  • [x] Authentication

We aim to cover the entire API, and we are looking for help. We are welcoming Pull Requests.

Key features

  • Pipe operations - PyAlex can handle multiple operations in a sequence. This allows the developer to write understandable queries. For examples, see code snippets.
  • Plaintext abstracts - OpenAlex doesn't include plaintext abstracts due to legal constraints. PyAlex can convert the inverted abstracts into plaintext abstracts on the fly.
  • Find similar works - Use semantic search to find works similar to a given text via Works().similar(). See find similar works.
  • Fetch content in PDF and TEI format - Retrieve full-text content from OpenAlex in PDF or TEI XML formats. See fetching content.
  • Permissive license - OpenAlex data is CC0 licensed :raised_hands:. PyAlex is published under the MIT license.

Installation

PyAlex requires Python 3.8 or later.

pip install pyalex

Getting started

PyAlex offers support for all Entity Objects: Works, Authors, Sources, Institutions, Topics, Publishers, and Funders.

from pyalex import (
  Works,
  Authors,
  Sources,
  Institutions,
  Topics,
  Keywords,
  Publishers,
  Funders,
  Awards,
  Concepts,
)

Rate limits and authentication [Changed!]

⚠️ API Key Required: Starting February 13, 2026, an API key is required to use the OpenAlex API. API keys are free!

The OpenAlex API uses a credit-based rate limiting system. Different endpoint types consume different amounts of credits per request:

  • Without API key: 100 credits per day (testing/demos only)
  • With free API key: 100,000 credits per day
  • Singleton requests (e.g., /works/W123): Free (0 credits)
  • List requests (e.g., /works?filter=...): 1 credit each

All users are limited to a maximum of 100 requests per second.

Get an API Key

  1. Create a free account at openalex.org
  2. Go to openalex.org/settings/api to get your API key
  3. Configure PyAlex with your key:
import pyalex

pyalex.config.api_key = "<YOUR_API_KEY>"

For more information, see the OpenAlex Rate limits and authentication documentation.

Get single entity

Get a single Work, Author, Source, Institution, Concept, Topic, Publisher, Funders or Awards from OpenAlex by the OpenAlex ID, or by DOI or ROR.

Works()["W2741809807"]

# same as
Works()["https://doi.org/10.7717/peerj.4375"]

The result is a Work object, which is very similar to a dictionary. Find the available fields with .keys().

For example, get the open access status:

Works()["W2741809807"]["open_access"]
{'is_oa': True, 'oa_status': 'gold', 'oa_url': 'https://doi.org/10.7717/peerj.4375'}

The previous works also for Authors, Sources, Institutions, Concepts and Topics

Authors()["A5027479191"]
Authors()["https://orcid.org/0000-0002-4297-0502"]  # same

Get random

Get a random Work, Author, Source, Institution, Concept, Topic, Publisher or Funder.

Works().random()
Authors().random()
Sources().random()
Institutions().random()
Topics().random()
Publishers().random()
Funders().random()

Check also sample, which does support filters.

Get abstract

Only for Works. Request a work from the OpenAlex database:

w = Works()["W3128349626"]

All attributes are available like documented under Works, as well as abstract (only if abstract_inverted_index is not None). This abstract made human readable is create on the fly.

w["abstract"]
'Abstract To help researchers conduct a systematic review or meta-analysis as efficiently and transparently as possible, we designed a tool to accelerate the step of screening titles and abstracts. For many tasks—including but not limited to systematic reviews and meta-analyses—the scientific literature needs to be checked systematically. Scholars and practitioners currently screen thousands of studies by hand to determine which studies to include in their review or meta-analysis. This is error prone and inefficient because of extremely imbalanced data: only a fraction of the screened studies is relevant. The future of systematic reviewing will be an interaction with machine learning algorithms to deal with the enormous increase of available text. We therefore developed an open source machine learning-aided pipeline applying active learning: ASReview. We demonstrate by means of simulation studies that active learning can yield far more efficient reviewing than manual reviewing while providing high quality. Furthermore, we describe the options of the free and open source research software and present the results from user experience tests. We invite the community to contribute to open source projects such as our own that provide measurable and reproducible improvements over current practice.'

Please respect the legal constraints when using this feature.

Fetch content in PDF and TEI format

OpenAlex reference: Get content

Only for Works. Retrieve the full-text content of a work in PDF or TEI (Text Encoding Initiative) XML format, if available.

from pyalex import Works

# Get a work
w = Works()["W4412002745"]

# Access the PDF content
pdf_content = w.pdf.get()

# Or access the TEI content
tei_content = w.tei.get()

You can also download the content directly to a file:

from pyalex import Works

w = Works()["W4412002745"]

# Download PDF to a file
w.pdf.download("document.pdf")

# Download TEI to a file
w.tei.download("document.xml")

You can also get the URL of the content without downloading it:

from pyalex import Works

w = Works()["W4412002745"]

# Get the URL of the PDF
pdf_url = w.pdf.url

# Get the URL of the TEI
tei_url = w.tei.url

Note: Content availability depends on the publisher's open access policies and licensing agreements.

Get lists of entities

results = Works().get()

For lists of entities, you can also count the number of records found instead of returning the results. This also works for search queries and filters.

Works().count()
# 10338153

For lists of entities, you can return the result as well as the metadata. By default, only the results are returned.

topics = Topics().get()
print(topics.meta)
{'count': 65073, 'db_response_time_ms': 16, 'page': 1, 'per_page': 25}

Filter records

Works().filter(publication_year=2020, is_oa=True).get()

which is identical to:

Works().filter(publication_year=2020).filter(is_oa=True).get()

Nested attribute filters

Some attribute filters are nested and separated with dots by OpenAlex. For example, filter on authorships.institutions.ror.

In case of nested attribute filters, use a dict to build the query.

Works()
  .filter(authorships={"institutions": {"ror": "04pp8hn57"}})
  .get()

Filter on a set of values

You can filter on a set of values, for example if you want all works from a list of DOI's:

Works()
  .filter_or(doi=["10.1016/s0924-9338(99)80239-9", "10.1002/andp.19213690304"])
  .get()

You can use a maximum of 100 items in the set of values. Also note that OpenAlex allows a maximum URL length of 4096 characters. If you have a big list of identifiers you want to filter on you can run into this limit. It can be helpful to use the short form of the identifiers, so W2001676859 instead of https://openalex.org/W2001676859 and 10.1002/andp.19213690304 instead of https://doi.org/10.1002/andp.19213690304.

Search entities

OpenAlex reference: [The search parameter](https://docs.openalex.org/api-entities/works/search-

Related Skills

View on GitHub
GitHub Stars359
CategoryEducation
Updated7d ago
Forks44

Languages

Python

Security Score

100/100

Audited on Mar 18, 2026

No findings