SkillAgentSearch skills...

SptemExp

The approach of ensemble spatiotemporal mixed models is to make reliable estimation of air pollutant concentrations at high resolutions. (1) Extraction of covariates from the satellite images such as GeoTiff and NC4 raster (e.g NDVI, AOD, and meteorological parameters); (2) Generation of temporal basis functions to simulate the seasonal trends in the study regions; (3) Generation of the regional monthly or yearly means of air pollutant concentration; (4) Generation of Thiessen polygons and spatial effect modeling; (5) Ensemble modeling for spatiotemporal mixed models, supporting multi-core parallel computing; (6) Integrated predictions with or without weights of the model's performance, supporting multi-core parallel computing; (7) Constrained optimization to interpolate the missing values; (8) Generation of the grid surfaces of air pollutant concentrations at high resolution; (9) Block kriging for regional mean estimation at multiple scales.

Install / Use

/learn @lspatial/SptemExp
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Ensemble Spatiotemporal Mixed Models for Exposure Estimation

sptemExp: a R package for spatiotemporal exposure estimation

Lianfa Li (Email: lspatial@gmail.com)

It is an ensemble spatiotemporal modeling tool with constrained optimization and also provides the functionality of grid mapping of air pollutants and parallel computing.

Specifically, it includes the following functionality:

  • Extraction of covariates from the satellite images such as geoTiff and nc4 raster (e.g NDVI, AOD, and meteorological parameters);
  • Generation of temporal basis functions to simulate the seasonal trends in the study regions;
  • Generation of Thiessen polygons and spatial effect modeling;
  • Ensemble modeling for spatiotemporal mixed models, supporting multi-core parallel computing;
  • Integrated prediction with or without weights of the model's performance, supporting multi-core parallel computing;
  • Constrained optimization to interpolate the missing values;
  • Generation of the grid surfaces of air pollutant concentrations at high resolution;
  • Block kriging for regional mean estimation at multiple scales.

Extraction of the covariates from satellite images

The following codes show the functions, extractVTIF and extractVNC4 to obtain the values for the subject locations with their timeline from GEoTiff (left) and BC4 (right) image files.

data("gtifRst","samplepnt")
tvals=sptemExp::extractVTIF(samplepnt,gtifRst)
par(mfrow=c(1,2),mar=c(4,4,1,1))
raster::plot(gtifRst)
raster::plot(samplepnt,add=TRUE)

nc4File=file.path(system.file(package = "sptemExp"), "extdata", "ancdata.nc4")
ncin0=ncdf4::nc_open(nc4File)
extRes=sptemExp::extractVNC4(samplepnt,ncin0,"TLML")
raster::plot(extRes$img)

print("geotiff values:")
## [1] "geotiff values:"
print(tvals[c(1:5)])
## [1] 441 484 268 344 324
print("nc4 values:")
## [1] "nc4 values:"
print(extRes$val[c(1:5)])
## [1] 277.6917 277.4444 277.2130 277.4444 277.2130

Generation of the temporal basis function

You can use the function, getTBasisFun to extract the temporal basis functions (usually the first two principle components).

data("shdSeries2014")
#Extract the temporal basis functions 
result=sptemExp::getTBasisFun(shdSeries2014,"siteid","date","obs",df=8,n.basis=2)
#Plot the results 
par(mfrow=c(1,2),mar=c(4,4,1,1))
plot(result$date,result$pv1,type="l",xlab="Date",ylab="1st Temporal Basis Function")
plot(result$date,result$pv2,type="l",xlab="Date",ylab="2nd Temporal Basis Function")

Generation of the Thiessen polygons for spatial effect modeling

We can generate the Thiessen polygons using the spatial points and border map. The function, tpolygonsByBorder is designed for this.

data("samplepnt","prnside")
x=samplepnt
sidepoly=prnside
par(mar=c(1,1,1,1))
tpoly=sptemExp::tpolygonsByBorder(x,sidepoly)$tpolys
raster::plot(samplepnt,add=T)

Ensemble modeling for spatiotemporal mixed models, supporting multi-core parallel computing

The base spatiotemporal mixed models can be trained using the bootstrap datasets with support of multiple cores. The following examples illustrate the use of the function, parSpModel to train 12 models and the trained models are saved in the appointed model path. In practical application, you can train more models like 80 or more to get better performance.

#Set the temporary path to save the models and their performance metrics: 
dPath=tempdir()
mPath=paste(dPath,"/models",sep="")
unlink(mPath,recursive = TRUE,force=TRUE)
dir.create(mPath)

#Load the dataset of Shandong PM2.5 
data("trainsample","bnd")

#The formula where sx(rid, bs = "mrf", map =bnd) is spatial random effect.  You can add unstructured item and other items.  See R2BayesX. 
formulaStrs=c(paste('logpm25 ~ sx(rid, bs = "mrf", map =bnd)+sx(monthAv,bs="rw2")+sx(ndvi,bs="rw2") + sx(aod,bs="rw2") ','+sx(wnd_avg,bs="rw2")',sep=""))

trainsample$tid=as.numeric(strftime(trainsample$date, format = "%j"))
trainsample$logpm25=log(trainsample$pm25)
tids=c(91) # the day of year for 2014 

#Train the model using 2 cores and construct 12 models: 
sptemExp::parSpModel(trainsample,bnd,formulaStrs,tidF="tid",tids,c=1,nM=10,
           mPath,idF="siteid",dateF="date",obsF="pm25")
## [1] "t:  1"
## [1] "       model:1 of 10"
## [1] "       model:2 of 10"
## [1] "       model:3 of 10"
## [1] "       model:4 of 10"
## [1] "       model:5 of 10"
## [1] "       model:6 of 10"
## [1] "       model:7 of 10"
## [1] "       model:8 of 10"
## [1] "       model:9 of 10"
## [1] "       model:10 of 10"

##           used  (Mb) gc trigger  (Mb) max used  (Mb)
## Ncells 2476026 132.3    3886542 207.6  3886542 207.6
## Vcells 4876409  37.3   13681044 104.4 21376203 163.1

Check the performance (rSquared and rmse) of every model and the total performance.

#Check the performance of everty model
mfile=paste(dPath,"/models/t_",tids[1],"_metrics.csv",sep="")
models_metrics=read.csv(mfile,row.names = NULL)
knitr::kable(models_metrics,caption="Performance of every model by bootstrap aggregation")

| imodel| r2| rmse| |-------:|-----------:|----------:| | 1| 0.6819686| 72.36493| | 2| 0.6061918| 91.02627| | 3| -1.3405669| 104.66065| | 4| 0.7514786| 74.78705| | 5| 0.5182634| 78.67102| | 6| 0.7625737| 76.76758| | 7| 0.6832513| 76.58048| | 8| 0.5355187| 76.97414| | 9| 0.6825552| 74.83755| | 10| 0.5372067| 73.95257|

#Check the total performance  
mtfile=paste(dPath,"/models/t_",tids[1],"_total_metric.csv",sep="")
t_metrics=read.csv(mtfile,row.names = NULL)
knitr::kable(t_metrics,caption="Total performance by  bootstrap aggregation")

| tid| r2| rmse| |----:|----------:|---------:| | 1| 0.6673462| 17.05729|

Integrate the predictions with or without weights of the model's performance

Using the models trained in the above step to make the predictions for the new dataset covering Shandong Province. Here we just use the 2000 cells but you can set the value to be all the cells.

#Set the output path of the predictions: 
prePath=paste(dPath,"/preds",sep="")
dir.create(prePath)

#Load the prediction dataset of covariates 
amodelPath=paste(dPath,"/models/t_",tids[1],"_models",sep="")
data("shd140401pcovs","bnd")
shd140401pcovs_part=shd140401pcovs[c(1:2000),]

#cols lists the field names of covariates to be used in the models. Then call parATimePredict to implement parallel predictions (the argument, c is the core number)
cols=c("aod","ndvi","wnd_avg","monthAv")
sptemExp::parATimePredict(amodelPath,newPnts=shd140401pcovs_part,cols,bnd=bnd,c=2,prePath,idF="gid",ridF="rid")
##           used  (Mb) gc trigger  (Mb) max used  (Mb)
## Ncells 2484700 132.7    3886542 207.6  3886542 207.6
## Vcells 5222961  39.9   13681044 104.4 21376203 163.1

Get the weighted averages by the function, weiA2Ens. You can also get the simple averages (not weighted) by the function, noweiAvg.

| id | date | mean| StandardDev| Variance| |:-----|:-----|----------:|------------:|----------:| | 1997 | 1997 | 101.59554| 9.172452| 84.13388| | 1996 | 1996 | 105.31300| 10.413283| 108.43646| | 1991 | 1991 | 98.93103| 9.055528| 82.00260| | 1988 | 1988 | 99.76707| 8.554652| 73.18207| | 1987 | 1987 | 108.80448| 12.720603| 161.81375|

Constrained optimization to interpolate the missing values

Use the function of constrained optimization, inter2conOpt to get the values for the missing items.

data("allPre500","shdSeries2014")

#Get the temporal basis functions to be used in constrained optimization 
season_trends=sptemExp::getTBasisFun(shdSeries2014,idStr="siteid",dateStr="date",
                           valStr="obs",df=10,n.basis=2,tbPath=NA)
season_trends$tid=as.numeric(strftime(season_trends$date, format = "%j")) 

#Constrained optimization 
allPre_part_filled=sptemExp::inter2conOpt(tarPDf=allPre500[c(1:6),],pol_season_trends=season_trends,ncore=2)
## [1] "1 of 3"
## [1] "2 of 3"
## [1] "3 of 3"
knitr::kable(allPre_part_filled[c(1:6),],caption="")

| | d1| d10| d20| d32| d41| d51| d60| d69| d79| d91| d100| d110| d121| d130| d140| d152| d161| d182| d191| d201| d213| d222| d232| d244| d253| d263| d274| d283| d305| d314| d324| d335| d344| d354| |-----|----------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:| | 6 | 122.85424| 89.42742| 62.66390| 45.48924| 46.27284| 59.29426| 76.65560| 92.13597| 98.34205| 86.34609| 72.97928| 61.56609| 54.89888| 52.80087| 51.46472| 48.36195| 46.35885| 43.89639| 41.13629| 36.09612| 30.13036| 28.17422| 26.31104| 22.87484| 21.92161| 23.46074| 28.94409| 35.81034| 50.41806| 56.83202| 61.32039| 64.26901| 72.87738| 94.02896| | 7 | 65.19155| 51.40158| 39.46674| 31.30001| 32.64243| 41.49334| 53.07626| 63.62694| 68.39269| 61.54400| 53.68353| 47.44825| 44.38368| 43.37936| 42.09418| 39.16380| 37.37985| 36.09319| 34.56814| 31.14601| 26.53710| 24.88078| 23.26449| 20.39421| 1

View on GitHub
GitHub Stars8
CategoryCustomer
Updated2y ago
Forks3

Languages

R

Security Score

55/100

Audited on May 15, 2023

No findings