NcaahoopR
An R package for working with NCAA Basketball Play-by-Play Data
Install / Use
/learn @lbenz730/NcaahoopRREADME
ncaahoopR <img src="figures/logo.png" align="right" />
ncaahoopR is an R package for working with NCAA Basketball Play-by-Play Data. It scrapes play-by-play data and returns it to the user in a tidy format, allowing the user to explore the data with assist networks, shot charts, and in-game win-probability charts.
For pre-scraped schedules, rosters, box scores, and play-by-play data, check out the ncaahoopR_data repository.
To see the lastest changes in version 1.5, view the change log here.
Installation
You can install ncaahoopR from GitHub with:
# install.packages("devtools")
devtools::install_github("lbenz730/ncaahoopR")
If you encounter installation issues, the following tips have helped a few users successfully install the package:
- If given the option to compile any packages from source rather than installing existing binaries, choose
'No'. - Windows users with trouble installing the package should try running the following command before reinstalling the package:
Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS = "true") - Windows users with trouble installing
devtoolsshould try first installing thebackportspackage viainstall.packages("backports").
Functions
Several functions use ESPN game_ids. You can find the game_id in the URL for the game summary,
as shown in the URL for the summary of the UMBC-Virginia game below.

Scraping Data
-
get_pbp(team, season): Get entire current season's worth of play-by-play data for a given team and season.seasondefaults to current season, but can be specified in "2019-20" form. -
get_pbp_game(game_ids, extra_parse): Get play-by-play data for a specific vector of ESPN game_ids.extra_parseis a logical whether to link shot variables and attempt possesion parsing. Default =TRUE. -
get_roster(team, season): Get a particular team's roster.seasondefaults to current season, but can be specified in "2019-20" form. -
get_schedule(team, season): Get a team's schedule.seasondefaults to current season, but can be specified in "2019-20" form. -
get_game_ids(team, season): Get a vector of ESPN game_ids for all games involvingteamspecified.seasondefaults to current season, but can be specified in "2019-20" form. -
get_master_schedule(date): Get schedule of all games for given date. UseYYYY-MM-DDdate formatting. -
get_boxscore(game_id): Returns list of 2 data frames, one with each teams' box score for the game in question. Written by Jared Andrews. -
season_boxscore(team, season = current_season, aggregate = 'average'): Returns (aggregated) player stats over the course of a season for a given team. *team: team to return player stats for. *season: of form YYYY-YY. Defaults to current season. *aggregate: one of 'average' (per-game average statistics), 'total' (sums of season stats) or 'raw' (just return all box scores binded together). 'average' is the default. Contributed in collaboration with Kurt Wirth
The team parameter in the above functions must be a valid team name from the ids dataset built into the package. See the Datasets section below for more details.
Win-Probability and Game-Flow Charts
Win Probability Charts
The latest function for plotting win probability charts is wp_chart_new. Following the 2021-22 season other win probability chart functions will be deprecated and replaced by this function (it will be renamed to wp_chart but I don't want to break any existing pipelines during the season). It no longer requires users to input colors. For best results consider saving via ggsave(filename, height = 9/1.2, width = 16/1.2) (or some other 16/9 aspect ratio.)
wp_chart_new(game_id, home_col = NULL, away_col = NULL, include_spread = T, show_legend = T)
game_idESPN game_id for the desired contest.home_colChart color for home team (if NULL will default toncaa_colorsprimary_color field).away_col: Chart color for away team (if NULL will default toncaa_colorsprimary_color field).include_spread: Logical, whether to include pre-game spread in Win Probability calculations. Default =TRUE.show_legend: Logical, whether or not to show legend/text on chart. Default =TRUE.
A prior version of wp_chart used base R while gg_wp_chart used the ggplot2 plotting library. As of the 2020-21 season, both functions call the same ggplot2 library, and gg_wp_chart now simply aliases wp_chart
wp_chart(game_id, home_col, away_col, include_spread = T, show_legend = T)
game_idESPN game_id for the desired contest.home_colChart color for home team.away_col: Chart color for away team.include_spread: Logical, whether to include pre-game spread in Win Probability calculations. Default =TRUE.show_legend: Logical, whether or not to show legend/text on chart. Default =TRUE.
gg_wp_chart(game_id, home_col, away_col, show_labels = T)
game_idESPN game_id for the desired contest.home_colChart color for home team.away_col: Chart color for away team.include_spread: Logical, whether to include pre-game spread in Win Probability calculations. Default =TRUE.show_labels: Logical whether Game Excitement Index and Minimum Win Probability metrics should be displayed on the plot. Default =TRUE.
Game Flow Charts
game_flow(game_id, home_col, away_col)
game_idESPN game_id for the desired contest.home_colChart color for home team.away_colChart color for away team.
Game Excitement Index
game_exciment_index(game_id, include_spread = T)
include_spread: Logical, whether to include pre-game spread in Win Probability calculations. Default =TRUE.
Returns GEI (Game Excitement Index) for given ESPN game_id. For more information about how these win-probability charts are fit and how Game Excitement Index is calculated, check out the below links
Game Control Measures
average_win_prob(game_id, include_spread = T)
- ESPN game_id for which to compute time-based average win probability (from perspective of home team).
include_spread: Logical, whether to include pre-game spread in Win Probability calculations. Default =TRUE.
average_score_diff(game_id)
- ESPN game_id for which to compute time-based average score differential (from perspective of home team).
Assist Networks
Traditional Assist Networks
assist_net(team, season, node_col, three_weights = T, threshold = T, message = NA, return_stats = T)
teamis the ESPN team name, as listed in theidsdata frame.seasonOptions include "2018-19" (for entire season), or a vector of ESPN game IDs.node_colis the node color for the graph.three_weights(default =TRUE): Logical. If TRUE, assisted three-point shots are given a weight of 1.5. If FALSE, assisted three-point shots are given a weight of 1. In both cases, assisted two-point shots are given a weight of 1.threshold(default = 0) Number between 0-1 indicating minimum percentage of team's assisted baskets a player needs to be involved in to be included in network graph.message(default =NA) Option for custom message to replace graph title when using a subset of the season (e.g. conference play).return_stats(default =TRUE) Return Assist Network-related statistics
Circle Assist Networks and Player Highlighting
circle_assist_net(team, season, highlight_player = NA, highlight_color = NA, three_weights = T, threshold = 0, message = NA, return_stats = T)
teamis the ESPN team name, as listed in theidsdata frame.season: Options include "YYYY-YY" (for entire season), or a vector of ESPN game IDs.highlight_player(default =NA) Name of player to highlight in assist network.NAyields full-team assist network with no player highlighting.highlight_color(default =NA) Color of player links to be highlighted.NAifhighlight_playerisNA.three_weights(default =TRUE): Logical. If TRUE, assisted three-point shots are given a weight of 1.5. If FALSE, assisted three-point shots are given a weight of 1. In both cases, assisted two-point shots are given a weight of 1.threshold(default = 0) Number between 0-1 indicating minimum percentage of team's assisted baskets a player needs to be involved in to be included in network graph.message(default =NA) User-supplied plot title to overwrite default plot title, if desired.return_stats(default =TRUE) Return Assist Network-related statistics
Shot Charts
There are currently three functions for scraping and plotting shot location data. These functions are written by Meyappan Subbaiah.
get_shot_locs(game_id): Returns data frame with shot location data when available. Note that if the extra_parse flag in get_pbp_game is set to TRUE, shot location data will already be included in the play-by-play data (if available).
game_id: ESPN game_id from which shot locations should be scraped.
game_shot_chart(game_id, heatmap = F): Plots shots for a given game.
game_id: ESPN game_id from which shot locations should be scraped.heatmap(default =FALSE): Logical, whether to use density-heat map or plot individual points.- shot-plotting colors derived from team's primary color listed in
ncaa_colorsdata frame.
team_shot_chart(game_ids, team, heatmap = F): Plots shots taken by team during a given set of g
Related Skills
node-connect
341.6kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
341.6kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.6kCommit, push, and open a PR
