SkillAgentSearch skills...

GMSEUS

Code repository for creating and maintaining the Ground-Mounted Solar Energy in the United States (GM-SEUS) spatiotemporal dataset of solar arrays and panel-rows using existing datasets, machine learning, and object-based image analysis to enhance existing sources.

Install / Use

/learn @stidjaco/GMSEUS
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<p align="center"> <img width="1008" src = "https://github.com/stidjaco/GMSEUS/blob/main/images/GMSEUS_logo_v1.png"> </p>

A comprehensive ground-mounted solar energy dataset with sub-array design metadata in the United States

Code repository for creating and maintaining the Ground-Mounted Solar Energy in the United States (GM-SEUS) spatiotemporal dataset of solar arrays and panel-rows using existing datasets, machine learning, and object-based image analysis to enhance existing sources. A peer-reviewed article describing the purpose and methods behind GM-SEUS has been accepted in Scientific Data (link not yet live)

Current Version Notes

This is the initial release of GM-SEUS (version 1.0). All input datasets and solar panel-row delineation results are up-to-date through December 11th, 2024. The Zenodo repository for the data can be found here, and for the intial code version can be found here.

Product Description

Overview

Solar energy generating systems are a critical component of net-zero infrastructure, yet comprehensive datasets characterizing systems remain incomplete or not publicly available, particularly at the sub-array level. Leveraging the best freely available existing solar datasets in the US with object-based image analysis and machine learning, we present the Ground-Mounted Solar Energy in the United States (GM-SEUS) dataset, a harmonized, open access, and regularly updated geospatial and temporal repository of solar energy arrays and panel-rows. GM-SEUS v1.0 includes nearly 15,000 commercial- and utility-scale ground-mounted solar photovoltaic and concentrating solar energy arrays (186 GWDC) covering 2,950 km<sup>2</sup> and includes 2.92 million unique solar panel-rows (466 km<sup>2</sup>) within those arrays. We use these newly compiled and delineated solar panel-rows to harmonize and independently estimate several value-added attributes to existing datasets, enhancing consistency across spatiotemporal attributes. Value-added attributes include installation year, azimuth, mount technology, panel-row area and dimensions, inter-row spacing, ground cover ratio, tilt, and installed capacity. By estimating and harmonizing these spatial and temporal attributes of the distributed US solar energy landscape, GM-SEUS supports diverse applications in renewable energy modeling, ecosystem service assessment, and infrastructural planning.

alt text

Approach

GM-SEUS is both a harmonization of existing solar energy array data in the US and a new product of solar panel-row spatiotemporal information, providing new insights on perviously under-reported metadata attributes. We used a combination of machine learning and geographic object-based image analysis, often referred to as GEOBIA or OBIA. Importantly, this new dataset is publicly available, with code available here and the associated Zenodo Repository containing all final products of GM-SEUS v1.0 and locations for source datasets.

We defined a solar array spatial footprint as: adjacent, existing, and connected rows of solar panel-rows (PV or CSP) of the same installation year, and the row-spacing between them. Panel-rows are defined by: spatially-unique collection of one or more panel-assemblies connected by proximity and often sharing one mount, but not necessarily electrically connected. Datasets with existing solar array boundaries in the United States are the USPVDB, TZ-SAM, OpenStreetMap, and two regional datasets in California’s Central Valley and the Chesapeake Bay area. Datasets containing value-added attributes and point-locations included the NREL Agrivoltaic Map from the InSPIRE initiative, the LBNL Utility-Scale Solar, 2024 Edition Report, IEA and NREL SolarPACES initiative, Global Energy Monitor’s Global Solar Power Tracker, and The World Resources Institute's Global Power Plant Database.

We removed repeat geometries in order of spatial quality in relation to our deviation of an array, and georectified existing point-location sources within 190 m of existing array shapes. For points without a georectified array boundary, we manually annotated new array boundaries or rectify existing boundaries outside 190 m. Finally, rooftop solar arrays were removed by intersection with Global Google-Microsoft Open Buildings Dataset (2018). The conceptual hierarchy of system boundaries and logic behind mount classification are shown below.

alt text

The above image is the conceptual hierarchical system boundaries when considering solar infrastructure and solar panel-row metadata logic, critical for understanding this dataset and approach. Green boundaries indicate the conceptual boundary for each term. This study reports the geospatial and temporal characteristics of panel-rows and arrays. A panel-row a spatially-unique collection of one or more panel-assemblies connected by proximity and often sharing one mount, but not necessarily electrically connected. An array is composed of one or more adjacent rows of the same installation year, and the row-spacing between them. The cell, panel, assembly and project are not the system boundaries focused on in this study. The ratio of the long-edge to the short-edge is the L/W ratio. Azimuth is initially defined as the primary cardinal direction of the short-edge vector (face of the panel-row) in the minimum bounding rectangle in south facing angles given that all solar arrays were in the northern hemisphere.

Existing solar panel-rows datasets were compiled from OpenStreetMap and Stid et al. (2022). To acquire panel-rows within solar array boundaries without existing panel-row information we used National Agriculture Imagery Program (NAIP) imagery and applied unsupervised object-based image segmentation and supervised machine learning approaches. We classified NAIP imagery using a Random Forest model and four spectral indices with displayed utility in classify solar energy: normalized difference photovoltaic index (NDPVI), normalized blue deviation (NBD), brightness (Br), normalized difference vegetation index (NDVI), normalized difference water index (NDWI). We trained the model using 2,000 panel-row samples from Stid et al. (2022), and 10,000 landcover validation points from Pengra et al. (2020).

Spatial context was incorporated using object-based imagery analysis methods, including using simple non-iterative clustering (SNIC) of each spectral index’s grey-level co-occurrence matrix (GLCM) sum average. We then clustered SNIC values using X-means clustering, and use the Random Forest model to classify pixel-clusters. We also removed low-quality panel-rows using several object-based metrics of geometrical similarity including minimum (15 m<sup>2</sup>) and maximum (2000 m<sup>2</sup>) panel-row area, perimeter-area-ratio, area-bounding-box, long-edge to short-edge ratios, and compactness, all relative to metric values form existing solar panel-row. The logic behind panel-row and new array boundary delineation is shown below.

alt text

Source Datasets

Array Polygon-Level Data

  • United States Solar Photovoltaic Database (USPVDB): Downloaded from USPVDB Portal, Last Download: 10-11-2024 (Up-to-date as of 12-11-2024), Version 2.0
  • California's Central Valley Photovoltaic Dataset (CCVPV) Arrays and Panels: Downloaded from figshare, Last Download: 07-18-2024 (Up-to-date as of 12-11-2024), Version 1.0
  • Chesapeake Watershed Solar Data (CWSD) Arrays: Downloaded from OSFHOME, Last Download: 12-01-2024 (Up-to-date as of 12-11-2024), We downloaded derived polygons as well as manually annotated training polygons, and preferenced training polygons over derived for their completeness and quality, No Version details
  • OpenStreetMap Solar Panels and Arrays (OSM): Array and panel objects were downloaded osmnx package in script0_getOSMdata.ipynb, Last OSM scrape: 12-11-2024,
    • Previously, we used data from Harmonzied Global Wind and Solar Farm Locations (HGLOBS) Downloaded from figshare
  • TransitionZero Global Solar Asset Mapper (SAM): Downloaded from TZ-SAM Portal, Last Download: 12-11-2024, Other information: Website, Viewer, SciData Preprint, Version Q3-2024 (Version 2)

Array Point-Level Data

  • NREL Innovative Solar Practices Integrated with Rural Economies and Ecosystems (InSPIRE) Database: Downloaded from InSPIRE Portal, Last Download: 12-11-2024,
  • LBNL Utility-Scale Solar (USS), 2024 Edition: Downloaded from LBNL Utility-Scale Solar Portal, Last Downloaded: 11-16-2024 (Up-to-date as of 12-11-2024), Large excel report, project level data was copied from original report .xlsx to a new .csv from Individual_Project_Data tab
  • NREL PV Data Acquisition (PV-DAQ) Database: Downloaded from PV-DAQ Portal - Available Systems Information, and PVDAQ Data Map, Last Downloaded: 07-23-2024 (Up-to-date as

Related Skills

View on GitHub
GitHub Stars15
CategoryEducation
Updated1mo ago
Forks1

Languages

Jupyter Notebook

Security Score

90/100

Audited on Jan 31, 2026

No findings