StanSample.jl
WIP: Wrapper package for the sample method in Stan's cmdstan executable.
Install / Use
/learn @StanJulia/StanSample.jlREADME
StanSample v7.10
| Project Status | Build Status |
|:---------------------------:|:-----------------:|
| |
|
Note
After many years I have decided to step away from my work with Stan and Julia. My plan is to be around until the end of 2024 for support if someone decides to step in and take over further development and maintenance work.
At the end of 2024 I'll archive the different packages and projects included in the Github organisations StanJulia, StatisticalRethingJulia and RegressionAndOtherStoriesJulia if no one is interested (and time-wise able!) to take on this work.
I have thoroughly enjoyed working on both Julia and Stan and see both projects mature during the last 15 or so years. And I will always be grateful for the many folks who have helped me on numerous occasions. Both the Julia and the Stan community are awesome to work with! Thanks a lot!
Purpose
StanSample.jl wraps cmdstan's sample method to generate draws from a Stan Language Program. It is the primary workhorse in the StanJulia ecosystem.
StanSample.jl v7.8.0 supports the new save_metric and save_cmdstan_config command keywords.
StanSample.jl v7.6 supports recent enhancements to the Stan Language visible in the output files (.csv files). It supports array, tuples and complex values in output_format=:nesteddataframe.
StanSample.jl v7 supports InferenceObjects.jl as a package extension. Use inferencedata(model) to create an InferenceData object. See also note 1 below. An example Pluto notebook can be found here
Notes
-
Use of both InferenceObjects.jl and the
read_samples()output_format options :dimarray and :dimarrays (based on DimensionalData.jl) creates a conflict. Hence these output_format options are no longer included. See the example Pluto notebooktest_dimarray.jlin StanExampleNotebooks.jl for an example how to still use that option. At some point in time InferenceObjects.jl might provide an alternative way to create a stacked DataFrame and/or DimensionalData object. -
I've removed BridgeStan.jl from StanSample.jl. Two example Pluto notebooks,
test_bridgestan.jlandbridgestan_stansample_example.jlin StanExampleNotebooks.jl demonstrate how BridgeStan can be used.
Prerequisites
You need a working installation of Stan's cmdstan, the path of which you should specify in either CMDSTAN or JULIA_CMDSTAN_HOME, e.g. in your ~/.julia/config/startup.jl include a line like:
# CmdStan setup
ENV["CMDSTAN"] =
expanduser("~/.../cmdstan/") # replace with your path
Or you can define and export CMDSTAN in your .profile, .bashrc, .zshrc, etc.
For more details see this file.
See the example/bernoulli.jl for a basic example. Many more examples and test scripts are available in this package and also in Stan.jl.
Multi-threading and multi-chaining behavior.
From StanSample.jl v6 onwards 2 mechanisms for in paralel drawing samples for chains are supported, i.e. on C++ level (using threads) and on Julia level (by spawning a Julia process for each chain).
The use_cpp_chains keyword argument in the call to stan_sample() determines if chains are executed on C++ level or on Julia level. By default, use_cpp_chains = false.
From cmdstan-2.28.0 onwards it is possible to use C++ threads to run multiple chains by setting use_cpp_chains=true in the call to stan_sample():
rc = stan_sample(_your_model_; use_cpp_chains=true, [ data | init | ...])
To enable multithreading in cmdstan specify this before the build process of cmdstan, i.e. before running make -j9 build. I typically create a path_to_my_cmdstan_directory/make/local file containing STAN_THREADS=true. You can see an example in .github/CI.yml script.
By default in either case num_chains=4. See ??stan_sample for all keyword arguments. Internally, num_chains will be copied to either num_cpp_chains or num_julia_chains.
Currently I do not suggest to use both C++ and Julia level chains. Based on the value of use_cpp_chains (true or false) the stan_sample() method will set either num_cpp_chains=num_chains; num_julia_chains=1 or num_julia_chains=num_chains;num_cpp_chain=1.
This default behavior can be disabled by setting the postional check_num_chains argument in the call to stan_sample() to false.
Threads on C++ level can be used in multiple ways, e.g. to run separate chains and to speed up certain operations. By default StanSample.jl's SampleModel sets the C++ num_threads to 4.
See the (updated for cmdstan-2.29.0) RedCardsStudy example graphs in Stan.jl and here for more details, in particular with respect to just enabling threads and including TBB or not on Intel, and also some indications of the performance on an Apple's M1/ARM processor running native (not using Rosetta and without Intel's TBB).
In some cases I have seen performance advantages using both Julia threads and C++ threads but too many combined threads certainly doesn't help. Note that if you only want 1000 draws (using 1000 warmup samples for tuning), multiple chains (C++ or Julia) do not help.
Installation
This package is registered. It can be installed with:
pkg> add StanSample.jl
Usage
Use this package like this:
using StanSample
See the docstrings (in particular ??StanSample) for more help.
Versions
Version 7.9-7.10
- Fix by zeyus for cmdstan options
Versions 7.5-7.8
- Switching to cmdstan v2.35.0
- Support for new command keywords settings
save_metric1andsave_cmdstan_config - Support for Stan .csv file extensions in output format :nesteddataframe.
Version 7.1-4.0
- Switch to cmdstan.2.32.0 for testing
- Removed BridgeStan extension
Version 7.0.1
- Updated column types for sample_stats (NamedTuples and DataFrames)
Version 7.0.0
- InferenceObjects.jl support.
- Conditional support for BridgeStan.
- Reduced support for :dimarray and :dimarrays option in
read_samples().
Version 6.13.8
- Support for InferenceObjects v0.3.
- Many
tmpdirectories created during testing have been removed from the repo. - Support for BridgeStan v1.0 has been dropped.
Version 6.13.7
- Moved InferenceObjects behind Requires
- Method
inferencedata()is usinginferencedata3()currently
Version 6.13.6
- Added inferencedata3()
- Added option to enable logging in the terminal (thanks to @FelixNoessler)
Version 6.13.0 - 6.13.5
- Many more (minor and a bit more) updates to
inferencedata() - Updates to BridgeStan (more to be expected soon)
- Fix for chain numbering when using CPP threads (thanks to @apinter)
- Switched to use cmdstan-2.32.0 for testing
- Updates to Examples_Notebooks (in particular now using both
inferencedata()andinferencedata2()) - Dropped support for read_samples(m, :dimarray) as this conflicted with InferenceData
Version 6.12.0
- Added experimental version of inferencedata(). See example in ./test/test_inferencedata.jl
- Added InferenceObjects.jl as a dependency
- Dropped MonteCarloMeasurements.jl as a dependency (still supported using Requires)
- Dropped MCMCChains.jl as a dependency (still supported using Requires)
- Dropped AxisKeys.jl as a dependency
Version 6.11.5
- Add sig_figs field to SampleModel (thanks to Andrew Radcliffe).
This change enables the user to control the number of significant digits which are preserved in the output. sig_figs=6 is the default cmdstan option, which is what StanSample has been defaulting to.
Typically, a user should prefer to generate outputs with sig_figs=18 so that the f64's are uniquely identified. It might be wise to make such a recommendation in the documentation, but I suppose that casual users would complain about the correspondingly increased .csv sizes (and subsequent read times).
Version 6.11.4
- Dropped conversion to Symbols in
read_csv_files()if internals are requested (include_internals=true) - Added InferenceObjects as a dependency.
This is part of the work with Set Haxen to enable working with InferenceData objects in a future release (probably v6.12).
Version 6.11.1
- Fix bridge_path in SampleModel.
Version 6.11.0
- Support for BridgeStan as a dependency of StanSample.jl (Thanks to Seth Axen)
Version 6.10.0
- Support for the updated version of BridgeStan.
Version 6.9.3
- A much better test has been added for multidimensional input arrays thanks to Andy Pohl (
test/test_JSON).
Version 6.9.2
- More general handling of Array input data to cmdstan if the Array has more than 2 dimensions.
Version 6.9.2
- Experimental support for BridgeStan.
Version 6.9.0-1
- For chains read in as either a :dataframe or a :nesteddataframe the function matrix(...) has been replaced by array(...). Depending on the the eltype o
Related Skills
node-connect
344.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
99.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
