Phenodata

An acquisition and processing toolkit for open access phenology data.

Generate Convert Improve

Install / Use

/learn @earthobservations/Phenodata

About this skill

Quality Score

0/100

README

######### phenodata #########

Phenology data acquisition for humans.

Documentation <https://phenodata.readthedocs.io>_ | Issues <https://github.com/earthobservations/phenodata/issues>_ | Changelog <https://github.com/earthobservations/phenodata/blob/main/CHANGES.rst>_ | PyPI <https://pypi.org/project/phenodata/>_ | Source code <https://github.com/earthobservations/phenodata>_

.. image:: https://github.com/earthobservations/phenodata/actions/workflows/tests.yml/badge.svg :target: https://github.com/earthobservations/phenodata/actions?workflow=Tests

.. image:: https://readthedocs.org/projects/phenodata/badge/ :target: https://phenodata.readthedocs.io/

.. image:: https://codecov.io/gh/earthobservations/phenodata/branch/main/graph/badge.svg :target: https://codecov.io/gh/earthobservations/phenodata

.. image:: https://static.pepy.tech/badge/phenodata/month :target: https://pepy.tech/project/phenodata

.. image:: https://img.shields.io/pypi/pyversions/phenodata.svg :target: https://pypi.org/project/phenodata/

.. image:: https://img.shields.io/pypi/v/phenodata.svg :target: https://pypi.org/project/phenodata/

.. image:: https://img.shields.io/pypi/l/phenodata.svg :target: https://pypi.org/project/phenodata/

About

Phenodata is an acquisition and processing toolkit for open access phenology data. It is based on pandas_, and can be used both as a standalone program, and as a library.

Currently, it implements data wrappers for acquiring phenology observation data published on the DWD Climate Data Center (CDC) FTP server operated by »Deutscher Wetterdienst« (DWD). Adding adapters for other phenology databases and APIs is possible and welcome.

Acknowledgements

Thanks to the many observers of »Deutscher Wetterdienst« (DWD), the »Global Phenological Monitoring programme« (GPM), and all people working behind the scenes for their commitment on recording observations and making the excellent datasets available to the community. You know who you are.

Notes

Please note that phenodata is beta-quality software, and a work in progress. Contributions of all kinds are welcome, in order to make it more solid.

Breaking changes should be expected until a 1.0 release, so version pinning is recommended, especially when you use phenodata as a library.

Synopsis

The easiest way to explore both phenodata and the dataset interactively, is to use the command-line interface.

Those two examples will acquire observation data from DWD's network, only focus on the "beginning of flowering" phase event, and present the results in tabular format using values suitable for human consumption.

Acquire data from DWD's "immediate" dataset (Sofortmelder).

.. code-block:: bash

phenodata observations \
    --source=dwd --dataset=immediate --partition=recent \
    --year=2023 --station=brandenburg \
    --species-preset=mellifera-de-primary \
    --phase="beginning of flowering" \
    --humanize --sort=Datum --format=rst

Acquire data from DWD's "annual" dataset (Jahresmelder).

.. code-block:: bash

phenodata observations \
    --source=dwd --dataset=annual --partition=recent \
    --year="2022,2023" --station=berlin \
    --species-preset=mellifera-de-primary \
    --phase="beginning of flowering" \
    --humanize --sort=Datum --format=rst

.. tip::

The authors recommend to copy one of those snippets into a file and invoke it
as a script program, in order to make subsequent invocations easier while
editing and exploring different option values. If you discover a bug, or want
to make your program available to others because you think it is useful, feel
free to `share it back with us`_.

Output example

Phenodata can produce output in different formats. This is a table in reStructuredText_ format.

========== ====================== ====================== ===================== Datum Spezies Phase Station ========== ====================== ====================== ===================== 2018-02-17 common snowdrop beginning of flowering Berlin-Dahlem, Berlin 2018-02-19 common hazel beginning of flowering Berlin-Dahlem, Berlin 2018-03-30 goat willow beginning of flowering Berlin-Dahlem, Berlin 2018-04-07 dandelion beginning of flowering Berlin-Dahlem, Berlin 2018-04-15 cherry (late ripeness) beginning of flowering Berlin-Dahlem, Berlin 2018-04-21 winter oilseed rape beginning of flowering Berlin-Dahlem, Berlin 2018-04-23 apple (early ripeness) beginning of flowering Berlin-Dahlem, Berlin 2018-05-03 apple (late ripeness) beginning of flowering Berlin-Dahlem, Berlin 2018-05-24 black locust beginning of flowering Berlin-Dahlem, Berlin 2018-08-20 common heather beginning of flowering Berlin-Dahlem, Berlin ========== ====================== ====================== =====================

.. note::

Using the example snippet provided above, the program rendered a table in
`reStructuredText`_ format using ``--format=rst``. In order to render
tables in `Markdown`_ format, use ``--format=md``. For more tabular output
formats, use ``--format=tabular:foo``, and consult the documentation of the
`tabulate`_ package for choices of ``foo``.

Usage

Introduction

For most acquisition tasks, you will have to select one of two different datasets of DWD, annual-reporters_ or immediate-reporters_. Further, the data partition has to be selected, it is either recent, or historical.

Currently, as of 2023, the historical datasets extend from the past until 2021. All subsequent observations are stored within the recent dataset partition.

The DWD publishes data in files separated by species, this means each plant's data will be in a different file. By default, phenodata will acquire data for all species (plants), in order to be able to respond to all kinds of queries across the whole dataset.

If you are only interested in a limited set of species (plants), you can improve data acquisition performance by using the filename option to only select specific files for retrieval.

For example, when using --filename=Hasel,Schneegloeckchen, only file names containing Hasel or Schneegloeckchen will be retrieved, thus minimizing the effort needed to acquire all files.

Install

To install the software from PyPI, invoke::

pip install 'phenodata[sql]' --upgrade

.. note::

Please refer to the `virtualenv`_ page about best-practice recommendations to
install the software separate from your system environment.

Library use

This snippet demonstrates how to use phenodata as a library within individual programs. For ready-to-run code examples, please have a look into the examples directory_.

.. hidden

.. code-block:: python

    >>> import os
    >>> import pytest
    >>> if "GITHUB_ACTION" in os.environ:
    ...     pytest.skip("pytest-doctest-ellipsis-markers does not work on CI/GHA. Works on macOS though.", allow_module_level=True)

.. code-block:: python

>>> import pandas as pd
>>> from phenodata.ftp import FTPSession
>>> from phenodata.dwd.cdc import DwdCdcClient
>>> from phenodata.dwd.pheno import DwdPhenoDataClient

>>> cdc_client = DwdCdcClient(ftp=FTPSession())
>>> client = DwdPhenoDataClient(cdc=cdc_client, dataset="immediate")
>>> options = {
...     # Select data partition.
...     "partition": "recent",
...
...     # Filter by file names and years.
...     "filename": ["Hasel", "Raps", "Mais"],
...     "year": [2018, 2019, 2020],
...
...     # Filter by station identifier.
...     "station-id": [13346]
... }

>>> observations: pd.DataFrame = client.get_observations(options)
>>> observations.info()
[...]
>>> observations
[...]

Command-line use

This section gives you an idea about how to use the phenodata program on the command-line.

$ phenodata --help

Usage:
  phenodata info
  phenodata list-species --source=dwd [--format=csv]
  phenodata list-phases --source=dwd [--format=csv]
  phenodata list-stations --source=dwd --dataset=immediate [--all] [--filter=berlin] [--sort=Stationsname] [--format=csv]
  phenodata nearest-station --source=dwd --dataset=immediate --latitude=52.520007 --longitude=13.404954 [--format=csv]
  phenodata nearest-stations --source=dwd --dataset=immediate --latitude=52.520007 --longitude=13.404954 [--all] [--limit=10] [--format=csv]
  phenodata list-quality-levels --source=dwd [--format=csv]
  phenodata list-quality-bytes --source=dwd [--format=csv]
  phenodata list-filenames --source=dwd --dataset=immediate --partition=recent [--filename=Hasel,Schneegloeckchen] [--year=2017]
  phenodata list-urls --source=dwd --dataset=immediate --partition=recent [--filename=Hasel,Schneegloeckchen] [--year=2017]
  phenodata (observations|forecast) --source=dwd --dataset=immediate --partition=recent [--filename=Hasel,Schneegloeckchen] [--station-id=164,717] [--species-id=113,127] [--phase-id=5] [--quality-level=10] [--quality-byte=1,2,3] [--station=berlin,brandenburg] [--species=hazel,snowdrop] [--species-preset=mellifera-de-primary] [--phase=flowering] [--quality=ROUTKLI] [--year=2017] [--forecast-year=2021] [--humanize] [--show-ids] [--language=german] [--long-station] [--sort=Datum] [--sql=sql] [--format=csv] [--verbose]
  phenodata drop-cache --source=dwd
  phenodata --version
  phenodata (-h | --help)

Data acquisition options:
  --source=<source>         Data source. Currently, only "dwd" is a valid identifier.
  --dataset=<dataset>       Data set. Use "immediate" or "annual" for "--source=dwd".
  --partition=<datase

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

isf-agent

a repo for an agent that helps researchers apply for isf funding