SkillAgentSearch skills...

Phenodata

An acquisition and processing toolkit for open access phenology data.

Install / Use

/learn @earthobservations/Phenodata

README

######### phenodata #########

Phenology data acquisition for humans.

Documentation <https://phenodata.readthedocs.io>_ | Issues <https://github.com/earthobservations/phenodata/issues>_ | Changelog <https://github.com/earthobservations/phenodata/blob/main/CHANGES.rst>_ | PyPI <https://pypi.org/project/phenodata/>_ | Source code <https://github.com/earthobservations/phenodata>_

.. image:: https://github.com/earthobservations/phenodata/actions/workflows/tests.yml/badge.svg :target: https://github.com/earthobservations/phenodata/actions?workflow=Tests

.. image:: https://readthedocs.org/projects/phenodata/badge/ :target: https://phenodata.readthedocs.io/

.. image:: https://codecov.io/gh/earthobservations/phenodata/branch/main/graph/badge.svg :target: https://codecov.io/gh/earthobservations/phenodata

.. image:: https://static.pepy.tech/badge/phenodata/month :target: https://pepy.tech/project/phenodata

.. image:: https://img.shields.io/pypi/pyversions/phenodata.svg :target: https://pypi.org/project/phenodata/

.. image:: https://img.shields.io/pypi/v/phenodata.svg :target: https://pypi.org/project/phenodata/

.. image:: https://img.shields.io/pypi/l/phenodata.svg :target: https://pypi.org/project/phenodata/


About


Phenodata is an acquisition and processing toolkit for open access phenology data. It is based on pandas_, and can be used both as a standalone program, and as a library.

Currently, it implements data wrappers for acquiring phenology observation data published on the DWD Climate Data Center (CDC) FTP server operated by »Deutscher Wetterdienst« (DWD). Adding adapters for other phenology databases and APIs is possible and welcome.

Acknowledgements

Thanks to the many observers of »Deutscher Wetterdienst« (DWD), the »Global Phenological Monitoring programme« (GPM), and all people working behind the scenes for their commitment on recording observations and making the excellent datasets available to the community. You know who you are.

Notes

Please note that phenodata is beta-quality software, and a work in progress. Contributions of all kinds are welcome, in order to make it more solid.

Breaking changes should be expected until a 1.0 release, so version pinning is recommended, especially when you use phenodata as a library.


Synopsis


The easiest way to explore both phenodata and the dataset interactively, is to use the command-line interface.

Those two examples will acquire observation data from DWD's network, only focus on the "beginning of flowering" phase event, and present the results in tabular format using values suitable for human consumption.

Acquire data from DWD's "immediate" dataset (Sofortmelder).

.. code-block:: bash

phenodata observations \
    --source=dwd --dataset=immediate --partition=recent \
    --year=2023 --station=brandenburg \
    --species-preset=mellifera-de-primary \
    --phase="beginning of flowering" \
    --humanize --sort=Datum --format=rst

Acquire data from DWD's "annual" dataset (Jahresmelder).

.. code-block:: bash

phenodata observations \
    --source=dwd --dataset=annual --partition=recent \
    --year="2022,2023" --station=berlin \
    --species-preset=mellifera-de-primary \
    --phase="beginning of flowering" \
    --humanize --sort=Datum --format=rst

.. tip::

The authors recommend to copy one of those snippets into a file and invoke it
as a script program, in order to make subsequent invocations easier while
editing and exploring different option values. If you discover a bug, or want
to make your program available to others because you think it is useful, feel
free to `share it back with us`_.

Output example

Phenodata can produce output in different formats. This is a table in reStructuredText_ format.

========== ====================== ====================== ===================== Datum Spezies Phase Station ========== ====================== ====================== ===================== 2018-02-17 common snowdrop beginning of flowering Berlin-Dahlem, Berlin 2018-02-19 common hazel beginning of flowering Berlin-Dahlem, Berlin 2018-03-30 goat willow beginning of flowering Berlin-Dahlem, Berlin 2018-04-07 dandelion beginning of flowering Berlin-Dahlem, Berlin 2018-04-15 cherry (late ripeness) beginning of flowering Berlin-Dahlem, Berlin 2018-04-21 winter oilseed rape beginning of flowering Berlin-Dahlem, Berlin 2018-04-23 apple (early ripeness) beginning of flowering Berlin-Dahlem, Berlin 2018-05-03 apple (late ripeness) beginning of flowering Berlin-Dahlem, Berlin 2018-05-24 black locust beginning of flowering Berlin-Dahlem, Berlin 2018-08-20 common heather beginning of flowering Berlin-Dahlem, Berlin ========== ====================== ====================== =====================

.. note::

Using the example snippet provided above, the program rendered a table in
`reStructuredText`_ format using ``--format=rst``. In order to render
tables in `Markdown`_ format, use ``--format=md``. For more tabular output
formats, use ``--format=tabular:foo``, and consult the documentation of the
`tabulate`_ package for choices of ``foo``.

Usage


Introduction

For most acquisition tasks, you will have to select one of two different datasets of DWD, annual-reporters_ or immediate-reporters_. Further, the data partition has to be selected, it is either recent, or historical.

Currently, as of 2023, the historical datasets extend from the past until 2021. All subsequent observations are stored within the recent dataset partition.

The DWD publishes data in files separated by species, this means each plant's data will be in a different file. By default, phenodata will acquire data for all species (plants), in order to be able to respond to all kinds of queries across the whole dataset.

If you are only interested in a limited set of species (plants), you can improve data acquisition performance by using the filename option to only select specific files for retrieval.

For example, when using --filename=Hasel,Schneegloeckchen, only file names containing Hasel or Schneegloeckchen will be retrieved, thus minimizing the effort needed to acquire all files.

Install

To install the software from PyPI, invoke::

pip install 'phenodata[sql]' --upgrade

.. note::

Please refer to the `virtualenv`_ page about best-practice recommendations to
install the software separate from your system environment.

Library use

This snippet demonstrates how to use phenodata as a library within individual programs. For ready-to-run code examples, please have a look into the examples directory_.

.. hidden

.. code-block:: python

    >>> import os
    >>> import pytest
    >>> if "GITHUB_ACTION" in os.environ:
    ...     pytest.skip("pytest-doctest-ellipsis-markers does not work on CI/GHA. Works on macOS though.", allow_module_level=True)

.. code-block:: python

>>> import pandas as pd
>>> from phenodata.ftp import FTPSession
>>> from phenodata.dwd.cdc import DwdCdcClient
>>> from phenodata.dwd.pheno import DwdPhenoDataClient

>>> cdc_client = DwdCdcClient(ftp=FTPSession())
>>> client = DwdPhenoDataClient(cdc=cdc_client, dataset="immediate")
>>> options = {
...     # Select data partition.
...     "partition": "recent",
...
...     # Filter by file names and years.
...     "filename": ["Hasel", "Raps", "Mais"],
...     "year": [2018, 2019, 2020],
...
...     # Filter by station identifier.
...     "station-id": [13346]
... }

>>> observations: pd.DataFrame = client.get_observations(options)
>>> observations.info()
[...]
>>> observations
[...]

Command-line use

This section gives you an idea about how to use the phenodata program on the command-line.

::

$ phenodata --help

Usage:
  phenodata info
  phenodata list-species --source=dwd [--format=csv]
  phenodata list-phases --source=dwd [--format=csv]
  phenodata list-stations --source=dwd --dataset=immediate [--all] [--filter=berlin] [--sort=Stationsname] [--format=csv]
  phenodata nearest-station --source=dwd --dataset=immediate --latitude=52.520007 --longitude=13.404954 [--format=csv]
  phenodata nearest-stations --source=dwd --dataset=immediate --latitude=52.520007 --longitude=13.404954 [--all] [--limit=10] [--format=csv]
  phenodata list-quality-levels --source=dwd [--format=csv]
  phenodata list-quality-bytes --source=dwd [--format=csv]
  phenodata list-filenames --source=dwd --dataset=immediate --partition=recent [--filename=Hasel,Schneegloeckchen] [--year=2017]
  phenodata list-urls --source=dwd --dataset=immediate --partition=recent [--filename=Hasel,Schneegloeckchen] [--year=2017]
  phenodata (observations|forecast) --source=dwd --dataset=immediate --partition=recent [--filename=Hasel,Schneegloeckchen] [--station-id=164,717] [--species-id=113,127] [--phase-id=5] [--quality-level=10] [--quality-byte=1,2,3] [--station=berlin,brandenburg] [--species=hazel,snowdrop] [--species-preset=mellifera-de-primary] [--phase=flowering] [--quality=ROUTKLI] [--year=2017] [--forecast-year=2021] [--humanize] [--show-ids] [--language=german] [--long-station] [--sort=Datum] [--sql=sql] [--format=csv] [--verbose]
  phenodata drop-cache --source=dwd
  phenodata --version
  phenodata (-h | --help)

Data acquisition options:
  --source=<source>         Data source. Currently, only "dwd" is a valid identifier.
  --dataset=<dataset>       Data set. Use "immediate" or "annual" for "--source=dwd".
  --partition=<datase

Related Skills

View on GitHub
GitHub Stars26
CategoryEducation
Updated20d ago
Forks6

Languages

Python

Security Score

95/100

Audited on Mar 11, 2026

No findings