SkillAgentSearch skills...

GdeltPyR

Python based framework to retreive Global Database of Events, Language, and Tone (GDELT) version 1.0 and version 2.0 data.

Install / Use

/learn @linwoodc3/GdeltPyR
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

|Windows OS | Module Version | Coverage|Downloads| |---------------------|------------------|------------------|-----------| |Build status|PyPI version|Coverage Status|Downloads|

gdeltPyR

gdeltPyR is a Python-based framework to access and analyze Global Database of Events, Language, and Tone (GDELT) 1.0 or 2.0 data in a Python Pandas or R dataframe. A user can enter a single date, date range (list of two strings), or individual dates (more than two in a list) and return a tidy data set ready for scientific or data-driven exploration.

  • Python 2 is retiring. Because gdeltPyR depends on several libraries that will end Python 2 support, it's only prudent that we do the same. gdeltPyR functionality in Python 2 will become buggy over the coming months. Move to Python 3 for the best experience.

gdeltPyR retrieves GDELT data, version 1.0 or version 2.0 via parallel HTTP GET requests and will provide a method to access GDELT data directly via Google BigQuery . Therefore, the more cores you have, the less time it takes to pull more data. Moreover, the more RAM you have, the more data you can pull. And finally, for RAM-limited workflows, create a pipeline that pulls data, writes to disk, and flushes.

The GDELT Project advertises as the largest, most comprehensive, and highest resolution open database of human society ever created. It monitors print, broadcast, and web news media in over 100 languages from across every country in the world to keep continually updated on breaking developments anywhere on the planet. Its historical archives stretch back to January 1, 1979 and accesses the world’s breaking events and reaction in near-realtime as both the GDELT Event and Global Knowledge Graph update every 15 minutes. Visit the GDELT website to learn more about the project.

GDELT Facts

  • GDELT 1.0 is a daily dataset
    • 1.0 only has 'events' and 'gkg' tables
    • 1.0 posts the previous day's data at 6AM EST of next day (i.e. Monday's data will be available 6AM Tuesday EST)
  • GDELT 2.0 is updated every 15 minutes
    • Some time intervals can have missing data; gdeltPyR provides a warning for missing data
    • 2.0 has 'events','gkg', and 'mentions' tables
    • 2.0 has a distinction between native english and translated-to-english news
    • 2.0 has more columns

Project Concept and Evolution Plan

This project will evolve in two phases. Moreover, if you want to contribute to the project, this section can help prioritize where to put efforts.

Phase 1 focuses on providing consistent, stable, and reliable access to GDELT data.

gdeltPyR will help data scientists, researchers, data enthusiasts, and curious Python coders in this phase. Therefore, most issues in this phase will build out the main Search method of the gdelt class to return GDELT data, version 1.0 or version 2.0, or equally important, give a relevant error message when no data is returned. This also means the project will focus on building documentation, a unit testing framework (shooting for 90% coverage), and creating a helper class that provides helpful information on column names/table descriptions.

Phase 2 brings analytics to gdeltPyR to expand the library beyond a simple data retrieval functionality

This phase is what will make gdeltPyR useful to a wider audience. The major addition will be an Analysis method of the gdelt class which will analyze outputs of the Search method. For data-literate users (data scientists, researchers, students, data journalists, etc), enhancements in this phase will save time by providing summary statistics and extraction methods of GDELT data, and as a result reduce the time a user would spend writing code to perform routine data cleanup/analysis. For the non-technical audience (students, journalists, business managers, etc.), enhancesments in this phase will provide outputs that summarize GDELT data, which can in turn be used in reports, articles, etc. Areas of focus include descriptive statistics (mean, split-apply-combine stats, etc), spatial analysis, and time series.

# Basic use and new schema method
import gdelt

gd= gdelt.gdelt()

events = gd.Search(['2017 May 23'],table='events',output='gpd',normcols=True,coverage=False)

# new schema method
print(gd.schema('events'))

Coming Soon (in version 0.2, as of Oct 2023)

<p align="center"> <img src="https://twistedsifter.files.wordpress.com/2015/06/people-tweeting-about-sunrises-over-a-24-hour-period.gif?w=700&h=453"> </p>

Installation

gdeltPyR can be installed via pip

pip install gdelt

It can also be installed using conda

conda install gdelt

Basic Examples

GDELT 1.0 Queries

import gdelt

# Version 1 queries
gd1 = gdelt.gdelt(version=1)

# pull single day, gkg table
results= gd1.Search('2016 Nov 01',table='gkg')
print(len(results))

# pull events table, range, output to json format
results = gd1.Search(['2016 Oct 31','2016 Nov 2'],coverage=True,table='events')
print(len(results))

GDELT 2.0 Queries

# Version 2 queries
gd2 = gdelt.gdelt(version=2)

# Single 15 minute interval pull, output to json format with mentions table
results = gd2.Search('2016 Nov 1',table='mentions',output='json')
print(len(results))

# Full day pull, output to pandas dataframe, events table
results = gd2.Search(['2016 11 01'],table='events',coverage=True)
print(len(results))


Output Options

gdeltPyR can output results directly into several formats which include:

  • pandas dataframe
  • csv
  • json
  • geopandas dataframe (as of version 0.1.10)
  • GeoJSON (coming soon version 0.1.11)
  • Shapefile (coming soon version 0.1.11)

Performance on 4 core, MacOS Sierra 10.12 with 16GB of RAM:

  • 900,000 by 61 (rows x columns) pandas dataframe returned in 36 seconds
    • data is a merged pandas dataframe of GDELT 2.0 events database data

gdeltPyR Parameters

gdeltPyR provides access to 1.0 and 2.0 data. Six parameters guide the query syntax:

| Name | Description | Input Possibilities/Examples | |-------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------| | version | (integer) - Selects the version of GDELT data to query; defaults to version 2. | 1 or 2 | | date | (string or list of strings) - Dates to query | "2016 10 23" or "2016 Oct 23" | | coverage | (bool) - For GDELT 2.0, pulls every 15 minute interval in the dates passed in the 'date' parameter. Default coverage is False or None. gdeltPyR will pull the latest 15 minute interval for the current day or the last 15 minute interval for a historic day. | True or False or None | | translation | (bool) - For GDELT 2.0, if the english or translated-to-english dataset should be downloaded | True or False | | tables | (string) - The specific GDELT table to pull. The default table is the 'events' table. See the GDELT documentation page for more information | 'events' or 'mentions' or 'gkg' | | output | (string) - The output type for the results | 'json' or 'csv' or 'gpd' |

These parameter value

Related Skills

View on GitHub
GitHub Stars247
CategoryData
Updated9d ago
Forks59

Languages

Jupyter Notebook

Security Score

100/100

Audited on Mar 24, 2026

No findings