SkillAgentSearch skills...

Pyexcel

Single API for reading, manipulating and writing data in csv, ods, xls, xlsx and xlsm files

Install / Use

/learn @pyexcel/Pyexcel
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

================================================================================ pyexcel - Let you focus on data, instead of file formats

.. image:: https://raw.githubusercontent.com/pyexcel/pyexcel.github.io/master/images/patreon.png :target: https://www.patreon.com/chfw

.. image:: https://codecov.io/gh/pyexcel/pyexcel/branch/master/graph/badge.svg :target: https://codecov.io/gh/pyexcel/pyexcel

.. image:: https://badge.fury.io/py/pyexcel.svg :target: https://pypi.org/project/pyexcel

.. image:: https://anaconda.org/conda-forge/pyexcel/badges/version.svg :target: https://anaconda.org/conda-forge/pyexcel

.. image:: https://pepy.tech/badge/pyexcel/month :target: https://pepy.tech/project/pyexcel

.. image:: https://anaconda.org/conda-forge/pyexcel/badges/downloads.svg :target: https://anaconda.org/conda-forge/pyexcel

.. image:: https://img.shields.io/static/v1?label=continuous%20templating&message=%E6%A8%A1%E7%89%88%E6%9B%B4%E6%96%B0&color=blue&style=flat-square :target: https://moban.readthedocs.io/en/latest/#at-scale-continous-templating-for-open-source-projects

.. image:: https://img.shields.io/static/v1?label=coding%20style&message=black&color=black&style=flat-square :target: https://github.com/psf/black .. image:: https://readthedocs.org/projects/pyexcel/badge/?version=latest :target: http://pyexcel.readthedocs.org/en/latest/

Support the project

If your company uses pyexcel and its components in a revenue-generating product, please consider supporting the project on GitHub or Patreon <https://www.patreon.com/bePatron?u=5537627>_. Your financial support will enable me to dedicate more time to coding, improving documentation, and creating engaging content.

Known constraints

Fonts, colors and charts are not supported.

Nor to read password protected xls, xlsx and ods files.

Introduction

Feature Highlights

.. table:: A list of supported file formats

============ =======================================================
file format  definition
============ =======================================================
csv          comma separated values
tsv          tab separated values
csvz         a zip file that contains one or many csv files
tsvz         a zip file that contains one or many tsv files
xls          a spreadsheet file format created by
             MS-Excel 97-2003 
xlsx         MS-Excel Extensions to the Office Open XML
             SpreadsheetML File Format.
xlsm         an MS-Excel Macro-Enabled Workbook file
ods          open document spreadsheet
fods         flat open document spreadsheet
json         java script object notation
html         html table of the data structure
simple       simple presentation
rst          rStructured Text presentation of the data
mediawiki    media wiki table
============ =======================================================

.. image:: https://github.com/pyexcel/pyexcel/raw/dev/docs/source/_static/images/architecture.svg

  1. One application programming interface(API) to handle multiple data sources:

    • physical file
    • memory file
    • SQLAlchemy table
    • Django Model
    • Python data structures: dictionary, records and array
  2. One API to read and write data in various excel file formats.

  3. For large data sets, data streaming are supported. A genenerator can be returned to you. Checkout iget_records, iget_array, isave_as and isave_book_as.

Installation

You can install pyexcel via pip:

.. code-block:: bash

$ pip install pyexcel

or clone it and install it:

.. code-block:: bash

$ git clone https://github.com/pyexcel/pyexcel.git
$ cd pyexcel
$ python setup.py install

One liners

This section shows you how to get data from your excel files and how to export data to excel files in one line

Read from the excel files

Get a list of dictionaries


Suppose you want to process History of Classical Music <https://www.naxos.com/education/brief_history.asp>_:

History of Classical Music:

=============== ============= ==================================== Name Period Representative Composers Medieval c.1150-c.1400 Machaut, Landini Renaissance c.1400-c.1600 Gibbons, Frescobaldi Baroque c.1600-c.1750 JS Bach, Vivaldi Classical c.1750-c.1830 Joseph Haydn, Wolfgan Amadeus Mozart Early Romantic c.1830-c.1860 Chopin, Mendelssohn, Schumann, Liszt Late Romantic c.1860-c.1920 Wagner,Verdi Modernist 20th century Sergei Rachmaninoff,Calude Debussy =============== ============= ====================================

Let's get a list of dictionary out from the xls file:

.. code-block:: python

records = p.get_records(file_name="your_file.xls")

And let's check what do we have:

.. code-block:: python

for row in records: ... print(f"{row['Representative Composers']} are from {row['Name']} period ({row['Period']})") Machaut, Landini are from Medieval period (c.1150-c.1400) Gibbons, Frescobaldi are from Renaissance period (c.1400-c.1600) JS Bach, Vivaldi are from Baroque period (c.1600-c.1750) Joseph Haydn, Wolfgan Amadeus Mozart are from Classical period (c.1750-c.1830) Chopin, Mendelssohn, Schumann, Liszt are from Early Romantic period (c.1830-c.1860) Wagner,Verdi are from Late Romantic period (c.1860-c.1920) Sergei Rachmaninoff,Calude Debussy are from Modernist period (20th century)

Get two dimensional array


Instead, what if you have to use pyexcel.get_array to do the same:

.. code-block:: python

for row in p.get_array(file_name="your_file.xls", start_row=1): ... print(f"{row[2]} are from {row[0]} period ({row[1]})") Machaut, Landini are from Medieval period (c.1150-c.1400) Gibbons, Frescobaldi are from Renaissance period (c.1400-c.1600) JS Bach, Vivaldi are from Baroque period (c.1600-c.1750) Joseph Haydn, Wolfgan Amadeus Mozart are from Classical period (c.1750-c.1830) Chopin, Mendelssohn, Schumann, Liszt are from Early Romantic period (c.1830-c.1860) Wagner,Verdi are from Late Romantic period (c.1860-c.1920) Sergei Rachmaninoff,Calude Debussy are from Modernist period (20th century)

where start_row skips the header row.

Get a dictionary


You can get a dictionary too:

.. code-block:: python

my_dict = p.get_dict(file_name="your_file.xls", name_columns_by_row=0)

And let's have a look inside:

.. code-block:: python

from pyexcel._compact import OrderedDict isinstance(my_dict, OrderedDict) True for key, values in my_dict.items(): ... print(key + " : " + ','.join([str(item) for item in values])) Name : Medieval,Renaissance,Baroque,Classical,Early Romantic,Late Romantic,Modernist Period : c.1150-c.1400,c.1400-c.1600,c.1600-c.1750,c.1750-c.1830,c.1830-c.1860,c.1860-c.1920,20th century Representative Composers : Machaut, Landini,Gibbons, Frescobaldi,JS Bach, Vivaldi,Joseph Haydn, Wolfgan Amadeus Mozart,Chopin, Mendelssohn, Schumann, Liszt,Wagner,Verdi,Sergei Rachmaninoff,Calude Debussy

Please note that my_dict is an OrderedDict.

Get a dictionary of two dimensional array


Suppose you have a multiple sheet book as the following:

Top Violinist:

================= ========= ================ Name Period Nationality Antonio Vivaldi 1678-1741 Italian Niccolo Paganini 1782-1840 Italian Pablo de Sarasate 1852-1904 Spainish Eugene Ysaye 1858-1931 Belgian Fritz Kreisler 1875-1962 Astria-American Jascha Heifetz 1901-1987 Russian-American David Oistrakh 1908-1974 Russian Yehundi Menuhin 1916-1999 American Itzhak Perlman 1945- Israeli-American Hilary Hahn 1979- American ================= ========= ================

Noteable Violin Makers:

====================== ========= ================ Maker Period Country Antonio Stradivari 1644-1737 Cremona, Italy Giovanni Paolo Maggini 1580-1630 Botticino, Italy Amati Family 1500-1740 Cremona, Italy Guarneri Family 1626-1744 Cremona, Italy Rugeri Family 1628-1719 Cremona, Italy Carlo Bergonzi 1683-1747 Cremona, Italy Jacob Stainer 1617-1683 Austria ====================== ========= ================

Most Expensive Violins:

===================== =============== =================================== Name Estimated Value Location Messiah Stradivarious $ 20,000,000 Ashmolean Museum in Oxford, England Vieuxtemps Guarneri $ 16,000,000 On loan to Anne Akiko Meyers Lady Blunt $ 15,900,000 Anonymous bidder ===================== =============== ===================================

Here is the code to obtain those sheets as a single dictionary:

.. code-block:: python

book_dict = p.get_book_dict(file_name="book.xls")

And check:

.. code-block:: python

isinstance(book_dict, OrderedDict) True import json for key, item in book_dict.items(): ... print(json.dumps({key: item})) {"Most Expensive Violins": [["Nam

Related Skills

View on GitHub
GitHub Stars1.3k
CategoryContent
Updated1mo ago
Forks167

Languages

Python

Security Score

85/100

Audited on Feb 26, 2026

No findings