SkillAgentSearch skills...

Baseball

Library to download, analyze, and visualize events in Major League Baseball games

Install / Use

/learn @benjamincrom/Baseball
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Table of Contents

Baseball

This package fetches and parses event data for Major League Baseball games. Game objects generated via the _from_url methods pull data from MLB endpoints where events are published within about 30 seconds of occurring. This XML/JSON source data zip file contains event data from MLB games 1974 - 2020.

Installing from pypi

pip3 install baseball

Installing from source

git clone git@github.com:benjamincrom/baseball.git
cd baseball/
python3 setup.py install

Fetch individual MLB game

  • get_game_from_url(date_str, away_code, home_code, game_number)

Fetch an object which contains metadata and events for a single MLB game.

import baseball
game_id, game = baseball.get_game_from_url('2017-11-1', 'HOU', 'LAD', 1)
game_dict = game._asdict()
game_json_str = game.json()

Write scorecard as SVG image:

with open(game_id + '.svg', 'w') as fh:
    fh.write(game.get_svg_str())

2017-11-01-HOU-LAD-1.svg svg

Game Class Structure

Game

  • away_batter_box_score_dict
  • away_pitcher_box_score_dict
  • away_team (Team)
  • away_team_stats
  • start_datetime
  • expected_start_datetime
  • game_date_str
  • home_batter_box_score_dict
  • home_pitcher_box_score_dict
  • home_team (Team)
  • home_team_stats
  • inning_list (Inning list)
  • end_datetime
  • location
  • attendance
  • weather
  • temp
  • timezone_str
  • is_postponed
  • is_suspended
  • is_doubleheader
  • is_today
  • get_svg_str()
  • json()
  • _asdict()

Team

  • abbreviation
  • batting_order_list_list (list of nine PlayerAppearance lists)
  • name
  • pitcher_list (PlayerAppearance list)
  • player_id_dict
  • player_last_name_dict
  • player_name_dict
  • _asdict()

Inning

  • bottom_half_appearance_list (PlateAppearance list)
  • bottom_half_inning_stats
  • top_half_appearance_list (PlateAppearance list)
  • top_half_inning_stats
  • _asdict()

PlateAppearance

  • start_datetime
  • end_datetime
  • batter (Player)
  • batting_team (Team)
  • error_str
  • event_list (list of Pitch, Pickoff, RunnerAdvance, Substitution, Switch objects)
  • got_on_base
  • hit_location
  • inning_outs
  • out_runners_list (Player list)
  • pitcher (Player)
  • plate_appearance_description
  • plate_appearance_summary
  • runners_batted_in_list (Player list)
  • scorecard_summary
  • scoring_runners_list (Player list)
  • _asdict()

Player

  • era
  • first_name
  • last_name
  • mlb_id
  • number
  • obp
  • slg
  • _asdict()

PlayerAppearance

  • start_inning_batter_num
  • start_inning_half
  • start_inning_num
  • end_inning_batter_num
  • end_inning_half
  • end_inning_num
  • pitcher_credit_code
  • player_obj (Player)
  • position
  • _asdict()

Pitch

  • pitch_datetime
  • pitch_description
  • pitch_position
  • pitch_speed
  • pitch_type
  • _asdict()

Pickoff

  • pickoff_description
  • pickoff_base
  • pickoff_was_successful
  • _asdict()

RunnerAdvance

  • runner_advance_datetime
  • run_description
  • runner (Player)
  • start_base
  • end_base
  • runner_scored
  • run_earned
  • is_rbi
  • _asdict()

Substitution

  • substitution_datetime
  • incoming_player (Player)
  • outgoing_player (Player)
  • batting_order
  • position
  • _asdict()

Switch

  • switch_datetime
  • player (Player)
  • old_position_num
  • new_position_num
  • new_batting_order
  • _asdict()

Analyze a game: 2017 World Series - Game 7

import matplotlib
import matplotlib.pyplot as plt
import pandas as pd

import baseball

%matplotlib inline

game_id, game = baseball.get_game_from_url('11-1-2017', 'HOU', 'LAD', 1)

pitch_tuple_list = []
for inning in game.inning_list:
    for appearance in inning.top_half_appearance_list:
        for event in appearance.event_list:
            if isinstance(event, baseball.Pitch):
                pitch_tuple_list.append(
                    (str(appearance.pitcher), 
                     event.pitch_description,
                     event.pitch_position,
                     event.pitch_speed,
                     event.pitch_type)
                )

data = pd.DataFrame(data=pitch_tuple_list, columns=['Pitcher', 'Pitch Description', 'Pitch Coordinate', 'Pitch Speed', 'Pitch Type'])
data.head()
<table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>Pitcher</th> <th>Pitch Description</th> <th>Pitch Coordinate</th> <th>Pitch Speed</th> <th>Pitch Type</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>21 Yu Darvish</td> <td>Ball</td> <td>(155.47, 160.83)</td> <td>96.0</td> <td>FF</td> </tr> <tr> <th>1</th> <td>21 Yu Darvish</td> <td>Called Strike</td> <td>(107.0, 171.09)</td> <td>83.9</td> <td>FC</td> </tr> <tr> <th>2</th> <td>21 Yu Darvish</td> <td>In play, no out</td> <td>(115.36, 183.1)</td> <td>83.9</td> <td>SL</td> </tr> <tr> <th>3</th> <td>21 Yu Darvish</td> <td>In play, run(s)</td> <td>(80.06, 168.03)</td> <td>96.6</td> <td>FF</td> </tr> <tr> <th>4</th> <td>21 Yu Darvish</td> <td>Ball</td> <td>(54.1, 216.52)</td> <td>84.6</td> <td>SL</td> </tr> </tbody> </table> </div>
data['Pitcher'].value_counts().plot.bar()

png

for pitcher in data['Pitcher'].unique():
    plt.ylim(0, 125)
    plt.xlim(0, 250)
    bx = [250 - x[2][0] for x in pitch_tuple_list if x[0] == pitcher if 'Ball' in x[1]]
    by = [250 - x[2][1] for x in pitch_tuple_list if x[0] == pitcher if 'Ball' in x[1]]
    cx = [250 - x[2][0] for x in pitch_tuple_list if x[0] == pitcher if 'Called Strike' in x[1]]
    cy = [250 - x[2][1] for x in pitch_tuple_list if x[0] == pitcher if 'Called Strike' in x[1]]
    ox = [250 - x[2][0] for x in pitch_tuple_list if x[0] == pitcher if ('Ball' not in x[1] and 'Called Strike' not in x[1])]
    oy = [250 - x[2][1] for x in pitch_tuple_list if x[0] == pitcher if ('Ball' not in x[1] and 'Called Strike' not in x[1])]
    b = plt.scatter(bx, by, c='b')
    c = plt.scatter(cx, cy, c='r')
    o = plt.scatter(ox, oy, c='g')

    plt.legend((b, c, o),
               ('Ball', 'Called Strike', 'Other'),
               scatterpoints=1,
               loc='upper right',
               ncol=1,
               fontsize=8)

    plt.title(pitcher)
    plt.show()

png

png

png

png

png

plt.axis('equal')
data['Pitch Description'].value_counts().plot(kind='pie', radius=1.5, autopct='%1.0f%%', pctdistance=1.1, labeldistance=1.2)

png

data.plot.kde()

png

fig, ax = plt.subplots()
ax.set_xlim(50, 120)
for pitcher in data['Pitcher'].unique():
    s = data[data['Pitcher'] == pitcher]['Pitch Speed']
    s.plot.kde(ax=ax, label=pitcher)

ax.legend()

png

fig, ax = plt.subplots()
ax.set_xlim(50, 120)
for desc in data['Pitch Type'].unique():
    s = data[data['Pitch Type'] == desc]['Pitch Speed']
    s.plot.kde(ax=ax, label=desc)

ax.legend()

png

fig, ax = plt.subplots(figsize=(15,7))
data.groupby(['Pitcher', 'Pitch Description']).size().unstack().plot.bar(ax=ax)

png

Analyze a player's season: R.A. Dickey - 2017

game_list_2017 = baseball.get_game_list_from_file_range('1-1-2017', '12-31-2017', '/Users/benjamincrom/repos/livebaseballscorecards-artifacts/baseball_files')

pitch_tuple_list_2 = []
for game_id, game in game_list_2017:
    if game.home_team.name == 'Atlanta Braves' or game.away_team.name == 'Atlanta Braves':
        for inning in game.inning_list:
            for appearance in (inning.top_half_appearance_list +
                               (inning.bottom_half_appearance_list or [])):
                if 'Dickey' in str(appearance.pitcher):
                    for event in appearance.event_list:
                        if isinstance(event, baseball.Pitch):
                            pitch_tuple_list_2.append(
                                (str(appearance.pitcher), 
                                 event.pitch_description,
                                 event.pitch_position,
                                 eve

Related Skills

View on GitHub
GitHub Stars100
CategoryDevelopment
Updated2d ago
Forks18

Languages

HTML

Security Score

100/100

Audited on Mar 29, 2026

No findings