Phenodata
An acquisition and processing toolkit for open access phenology data.
Install / Use
/learn @earthobservations/PhenodataREADME
######### phenodata #########
Phenology data acquisition for humans.
Documentation <https://phenodata.readthedocs.io>_
| Issues <https://github.com/earthobservations/phenodata/issues>_
| Changelog <https://github.com/earthobservations/phenodata/blob/main/CHANGES.rst>_
| PyPI <https://pypi.org/project/phenodata/>_
| Source code <https://github.com/earthobservations/phenodata>_
.. image:: https://github.com/earthobservations/phenodata/actions/workflows/tests.yml/badge.svg :target: https://github.com/earthobservations/phenodata/actions?workflow=Tests
.. image:: https://readthedocs.org/projects/phenodata/badge/ :target: https://phenodata.readthedocs.io/
.. image:: https://codecov.io/gh/earthobservations/phenodata/branch/main/graph/badge.svg :target: https://codecov.io/gh/earthobservations/phenodata
.. image:: https://static.pepy.tech/badge/phenodata/month :target: https://pepy.tech/project/phenodata
.. image:: https://img.shields.io/pypi/pyversions/phenodata.svg :target: https://pypi.org/project/phenodata/
.. image:: https://img.shields.io/pypi/v/phenodata.svg :target: https://pypi.org/project/phenodata/
.. image:: https://img.shields.io/pypi/l/phenodata.svg :target: https://pypi.org/project/phenodata/
About
Phenodata is an acquisition and processing toolkit for open access phenology
data. It is based on pandas_, and can be used both as a standalone program,
and as a library.
Currently, it implements data wrappers for acquiring phenology observation data published on the DWD Climate Data Center (CDC) FTP server operated by »Deutscher Wetterdienst« (DWD). Adding adapters for other phenology databases and APIs is possible and welcome.
Acknowledgements
Thanks to the many observers of »Deutscher Wetterdienst« (DWD), the »Global Phenological Monitoring programme« (GPM), and all people working behind the scenes for their commitment on recording observations and making the excellent datasets available to the community. You know who you are.
Notes
Please note that phenodata is beta-quality software, and a work in progress. Contributions of all kinds are welcome, in order to make it more solid.
Breaking changes should be expected until a 1.0 release, so version pinning is recommended, especially when you use phenodata as a library.
Synopsis
The easiest way to explore both phenodata and the dataset interactively, is to use the command-line interface.
Those two examples will acquire observation data from DWD's network, only focus on the "beginning of flowering" phase event, and present the results in tabular format using values suitable for human consumption.
Acquire data from DWD's "immediate" dataset (Sofortmelder).
.. code-block:: bash
phenodata observations \
--source=dwd --dataset=immediate --partition=recent \
--year=2023 --station=brandenburg \
--species-preset=mellifera-de-primary \
--phase="beginning of flowering" \
--humanize --sort=Datum --format=rst
Acquire data from DWD's "annual" dataset (Jahresmelder).
.. code-block:: bash
phenodata observations \
--source=dwd --dataset=annual --partition=recent \
--year="2022,2023" --station=berlin \
--species-preset=mellifera-de-primary \
--phase="beginning of flowering" \
--humanize --sort=Datum --format=rst
.. tip::
The authors recommend to copy one of those snippets into a file and invoke it
as a script program, in order to make subsequent invocations easier while
editing and exploring different option values. If you discover a bug, or want
to make your program available to others because you think it is useful, feel
free to `share it back with us`_.
Output example
Phenodata can produce output in different formats. This is a table in
reStructuredText_ format.
========== ====================== ====================== ===================== Datum Spezies Phase Station ========== ====================== ====================== ===================== 2018-02-17 common snowdrop beginning of flowering Berlin-Dahlem, Berlin 2018-02-19 common hazel beginning of flowering Berlin-Dahlem, Berlin 2018-03-30 goat willow beginning of flowering Berlin-Dahlem, Berlin 2018-04-07 dandelion beginning of flowering Berlin-Dahlem, Berlin 2018-04-15 cherry (late ripeness) beginning of flowering Berlin-Dahlem, Berlin 2018-04-21 winter oilseed rape beginning of flowering Berlin-Dahlem, Berlin 2018-04-23 apple (early ripeness) beginning of flowering Berlin-Dahlem, Berlin 2018-05-03 apple (late ripeness) beginning of flowering Berlin-Dahlem, Berlin 2018-05-24 black locust beginning of flowering Berlin-Dahlem, Berlin 2018-08-20 common heather beginning of flowering Berlin-Dahlem, Berlin ========== ====================== ====================== =====================
.. note::
Using the example snippet provided above, the program rendered a table in
`reStructuredText`_ format using ``--format=rst``. In order to render
tables in `Markdown`_ format, use ``--format=md``. For more tabular output
formats, use ``--format=tabular:foo``, and consult the documentation of the
`tabulate`_ package for choices of ``foo``.
Usage
Introduction
For most acquisition tasks, you will have to select one of two different
datasets of DWD, annual-reporters_ or immediate-reporters_. Further, the
data partition has to be selected, it is either recent, or historical.
Currently, as of 2023, the historical datasets extend from the past until
2021. All subsequent observations are stored within the recent dataset
partition.
The DWD publishes data in files separated by species, this means each plant's data will be in a different file. By default, phenodata will acquire data for all species (plants), in order to be able to respond to all kinds of queries across the whole dataset.
If you are only interested in a limited set of species (plants), you can
improve data acquisition performance by using the filename option to only
select specific files for retrieval.
For example, when using --filename=Hasel,Schneegloeckchen, only file names
containing Hasel or Schneegloeckchen will be retrieved, thus minimizing
the effort needed to acquire all files.
Install
To install the software from PyPI, invoke::
pip install 'phenodata[sql]' --upgrade
.. note::
Please refer to the `virtualenv`_ page about best-practice recommendations to
install the software separate from your system environment.
Library use
This snippet demonstrates how to use phenodata as a library within individual
programs. For ready-to-run code examples, please have a look into the examples directory_.
.. hidden
.. code-block:: python
>>> import os
>>> import pytest
>>> if "GITHUB_ACTION" in os.environ:
... pytest.skip("pytest-doctest-ellipsis-markers does not work on CI/GHA. Works on macOS though.", allow_module_level=True)
.. code-block:: python
>>> import pandas as pd
>>> from phenodata.ftp import FTPSession
>>> from phenodata.dwd.cdc import DwdCdcClient
>>> from phenodata.dwd.pheno import DwdPhenoDataClient
>>> cdc_client = DwdCdcClient(ftp=FTPSession())
>>> client = DwdPhenoDataClient(cdc=cdc_client, dataset="immediate")
>>> options = {
... # Select data partition.
... "partition": "recent",
...
... # Filter by file names and years.
... "filename": ["Hasel", "Raps", "Mais"],
... "year": [2018, 2019, 2020],
...
... # Filter by station identifier.
... "station-id": [13346]
... }
>>> observations: pd.DataFrame = client.get_observations(options)
>>> observations.info()
[...]
>>> observations
[...]
Command-line use
This section gives you an idea about how to use the phenodata program on
the command-line.
::
$ phenodata --help
Usage:
phenodata info
phenodata list-species --source=dwd [--format=csv]
phenodata list-phases --source=dwd [--format=csv]
phenodata list-stations --source=dwd --dataset=immediate [--all] [--filter=berlin] [--sort=Stationsname] [--format=csv]
phenodata nearest-station --source=dwd --dataset=immediate --latitude=52.520007 --longitude=13.404954 [--format=csv]
phenodata nearest-stations --source=dwd --dataset=immediate --latitude=52.520007 --longitude=13.404954 [--all] [--limit=10] [--format=csv]
phenodata list-quality-levels --source=dwd [--format=csv]
phenodata list-quality-bytes --source=dwd [--format=csv]
phenodata list-filenames --source=dwd --dataset=immediate --partition=recent [--filename=Hasel,Schneegloeckchen] [--year=2017]
phenodata list-urls --source=dwd --dataset=immediate --partition=recent [--filename=Hasel,Schneegloeckchen] [--year=2017]
phenodata (observations|forecast) --source=dwd --dataset=immediate --partition=recent [--filename=Hasel,Schneegloeckchen] [--station-id=164,717] [--species-id=113,127] [--phase-id=5] [--quality-level=10] [--quality-byte=1,2,3] [--station=berlin,brandenburg] [--species=hazel,snowdrop] [--species-preset=mellifera-de-primary] [--phase=flowering] [--quality=ROUTKLI] [--year=2017] [--forecast-year=2021] [--humanize] [--show-ids] [--language=german] [--long-station] [--sort=Datum] [--sql=sql] [--format=csv] [--verbose]
phenodata drop-cache --source=dwd
phenodata --version
phenodata (-h | --help)
Data acquisition options:
--source=<source> Data source. Currently, only "dwd" is a valid identifier.
--dataset=<dataset> Data set. Use "immediate" or "annual" for "--source=dwd".
--partition=<datase
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
isf-agent
a repo for an agent that helps researchers apply for isf funding
