RubyExcel
Spreadsheet storage class in Ruby with Excel-style API
Install / Use
/learn @VirtuosoJoel/RubyExcelREADME
RubyExcel
Designed for Ruby on Windows with MS Excel
Note: This project is on indefinite hold as of 05/02/2014, as Joel Pearson is no longer actively involved. If anyone wishes to continue development, please feel free to fork the project.
Introduction
A Data-analysis tool for Ruby, with an Excel-style API.
You can find the gem here.
Main documentation is here
Details
This gem was made to simplify the steps between data extraction and the final output. You can drop the data into it, reorganise and edit it, and then output into Excel or your preferred file format. The methods provided for Excel interaction will return the relevant Excel object, allowing you to get as detailed as you like with the output.
Key design features taken from Excel:
- 1-based indexing.
- Referencing objects like Excel's API ( Workbook, Sheet, Row, Column, Cell, Range ).
- Useful data-handling functions ( e.g. Filter, Match, Sumif, Vlookup ).
Typical usage:
- Extract a HTML Table or CSV File into 2D Array ( normally with Nokogiri / Mechanize ).
- Organise and interpret data with RubyExcel.
- Output results into a file.
About
This gem is designed as a way to conveniently edit table data before outputting it to a variety of formats which Excel can interpret, including WIN32OLE Excel Workbooks. It attempts to take as much as possible from Excel's API while providing some of the best bits of Ruby ( e.g. Enumerators, Blocks, Regexp ). An important feature is allowing reference to Columns via their Headers for convenience and enhanced code readability. As this works directly on the data, processing is faster than using Excel itself.
This was written out of the frustration of editing tabular data using Ruby's multidimensional arrays, without affecting headers and while maintaining code readability. Its API is designed to simplify moving code across from VBA into Ruby format when processing spreadsheet data. The combination of Ruby, WIN32OLE Excel, and analysing table data is probably quite rare; but I thought I'd share what I came up with.
Examples
Expected Data Layout (2D Array)
data = [
[ 'Part', 'Ref1', 'Ref2', 'Qty', 'Cost' ],
[ 'Type1', 'QT1', '231', 1, 35.15 ],
[ 'Type2', 'QT3', '123', 1, 40 ],
[ 'Type3', 'XT1', '321', 3, 0.1 ],
[ 'Type1', 'XY2', '132', 1, 30.00 ],
[ 'Type4', 'XT3', '312', 2, 3 ],
[ 'Type2', 'QY2', '213', 1, 99.99 ],
[ 'Type1', 'QT4', '123', 2, 104 ]
]
The number of header rows defaults to 1
Loading the data into a Sheet
require 'rubyexcel'
wb = RubyExcel::Workbook.new
s = wb.add( 'Sheet1' )
s.load( data )
Or:
wb = RubyExcel::Workbook.new
s = wb.add( 'Sheet1' )
s.load( RubyExcel.sample_data )
Or:
wb = RubyExcel::Workbook.new
s = wb.load( RubyExcel.sample_data )
Or:
s = RubyExcel.sample_sheet
wb = s.parent
Using the Mechanize gem to get data
This example is for context, there are many potential data sources
s = RubyExcel::Workbook.new.load( CSV.parse( Mechanize.new.get('http://example.com/myfile.csv').content ) )
Reference a cell's value
s['A7']
s.A7
s.cell(7,1).value
s.range('A7').value
s.row(7)['A']
s.row(7)[1]
s.column('A')[7]
s.column('A')['7']
Reference a group of cells
s['A1:B3'] #=> Array
s.range( 'A1:B3' ) #=> Range
s.range( 'A:A' ) #=> Range (Column)
s.range( '1:2' ) #=> Range (Rows)
s.range( 'A1', 'B3' ) #=> Range
s.range( s.cell( 1, 1 ), s.cell( 3, 2 ) ) #=> Range
s.row( 1 ) #=> Row
s.row(1)[1] #=> Value
s.row(1)[1, 2] #=> Array
s.row(1)['A', 2] #=> Array
s.row(1)[1..2] #=> Array
s.row(1)['A'..'B'] #=> Array
s.column( 'A' ) #=> Column
s.column( 1 ) #=> Column
s.column(1)[1] #=> Value
s.column(1)[1, 2] #=> Array
s.column(1)[1..2] #=> Array
Using headers to reference the data
Here we're looking for the "Part" in row 7
s.row(7).value_by_header( 'Part' ) #=> "Type2"
s.row(7).val( 'Part' ) #=> "Type2"
s.column_by_header( 'Part' )[7] #=> "Type2"
s.ch( 'Part' )[7] #=> "Type2"
s.row(7).cell_by_header( 'Part' ) #=> Cell A7
s.row(7).cell_h( 'Part' ) #=> Cell A7
s.row(7).getref('Part') #=> "A"
Common Operations
#Some data to play with
s = RubyExcel.sample_sheet
#Have a look at the data
puts s
#Append a Column by adding a header
s << 'Number'
#Iterate through the rest of the rows while appending data
x = 1
s.rows(2) { |row| row << x; x+=1 }
#Filter to specific part numbers
s.filter!( 'Part', &/Type[1-3]/ )
#Sort by Part Number
s.sort_by!( 'Part' )
#Add the Number to the Cost in each row.
s.rows(2) { |row| row.cell_h('Cost').value += row.cell_h('Number').value }
#Split the data into multiple sheets by part number
wb = s.split( 'Part' )
#Output a sheet as a TSV file
File.write( 'Output.txt', wb.sheets(1).to_s )
#Output a sheet as an HTML page
File.write( 'Output.htm', wb.sheets(2).to_html )
#Open a sheet in an Excel Workbook
wb.sheets( 'Type3' ).to_excel
#If you feel the need to delete everything between a set of rows:
s.rows(2, 4).reverse_each &:delete
#Gather text files containing string versions of RubyExcel::Sheet into a Workbook
wb = RubyExcel::Workbook.new
Dir.glob( '*.txt' ) do |f|
data = File.read(f).split($/).map { |r| r.split("\t") }
s = wb.add( f.chomp('.txt') )
s.load data
end
Workbook
#Create a workbook
wb = RubyExcel::Workbook.new
wb = RubyExcel::Workbook.new( 'My Workbook' )
#Get and set the name
wb.name = 'My Workbook'
wb.name #=> "My Workbook"
#Add sheets to the workbook
sheet1, sheet2 = wb.add('Sheet1'), wb.add
#Delete all sheets from a workbook
wb.clear_all
#Delete a specific sheet
wb.delete( 1 )
wb.delete( 'Sheet1' )
wb.delete( sheet1 )
wb.delete( /sheet1/i )
wb.delete { |sht| sht.name == 'Sheet1' }
#Import a WIN32OLE Workbook or Sheet, either by passing the Object or a Filename
#Parameters: WIN32OLE Object or Filename, SheetName or nil for all Sheets, true to keep Excel Formulas or omit to import Values.
wb = RubyExcel.import( '/path/to/file.xlsx' )
wb = RubyExcel.import( '/path/to/file.xlsx', 'Sheet2' )
wb = RubyExcel.import( '/path/to/file.xlsx', 'Sheet2', true )
#Or:
require 'win32ole'
excel = WIN32OLE.new( 'excel.application' )
excel.visible = true #Optional
my_workbook = excel.workbooks.open( '/path/to/file.xlsx' )
wb.import( my_workbook )
#Or:
wb.import( my_workbook.sheets(1) )
#Shortcut to create a sheet with a default name and fill it with data
wb.load( data )
#Select a sheet
wb.sheets(1) #=> RubyExcel::Sheet
wb.sheets('Sheet1') #=> RubyExcel::Sheet
#Iterate through all sheets
wb.sheets #=> Enumerator
wb.each #=> Enumerator
#Sort the sheets
wb.sort! { |x,y| x.name <=> y.name }
wb.sort_by! &:name
#Output the workbook as a series of HTML tables
wb.to_html
Sheet
#Create a sheet
s = wb.add #Name defaults to 'Sheet' + total number of sheets
s = wb.add( 'Sheet1' )
#Access the sheet name
s.name #=> 'Sheet1'
s.name = 'Sheet1'
#Access the parent workbook
s.workbook
s.parent
#Access the headers
s.header_rows #=> 1
s.headers #=> 1
s.headers = 1
s.header_rows = 1
#Specify the number of header rows when loading data
s.load( data, 1 )
#Append data (at the bottom of the sheet)
s << data
s << s
s += data
s += s
#Remove identical rows in another data set (skipping any headers)
s -= data
s -= s
#Sheet#advanced_filter! is Deprecated. Sheet#filter! now accepts multiple arguments.
#Filter on multiple criteria
#You can add as many arguments as you like. The order is: Header, Method(Symbol), Argument
#Note: Returns a copy of the sheet when used without "!".
#Note: Sheet#filter is simpler to use, this is for the more in-depth stuff.
_
#Filter to Part 'Type1' and 'Type3' where Qty is greater than 1
s.advanced_filter!( 'Part', :=~, /Type[13]/, 'Qty', :>, 1 )
_
#Filter to Part 'Type1' where Ref1 includes 'X'
s.advanced_filter!( 'Part', :==, 'Type1', 'Ref1', :include?, 'X' )
#Average all elements in a column by criteria in another column (selected by header)
#Parameters: Header to pass to the block, Header to average, Block.
#Note: Accepts Column objects in place of headers.
s.averageif( 'Part', 'Cost' ) { |part| part == 'Type1' } #=> 56.38333333333333
s.averageif( 'Part', 'Cost', &/Type1/ ) #=> 56.38333333333333
#Select a column by its header
s.column_by_header( 'Part' )
s.ch( 'Part' )
#=> Column
#Iterate through rows or columns
s.rows { |r| puts r } #All rows
s.rows( 2 ) { |r| puts r } #From the 2nd to the last row
s.rows( 1, 3 ) { |r| puts r } #Rows 1 to 3
s.columns { |c| puts c } #All columns
s.columns( 'B' ) { |c| puts c } #From the 2nd to the last column
s.columns( 2 ) { |c| puts c } #From the 2nd to the last column
s.columns( 'B', 'D' ) { |c| puts c } #Columns 2 to 4
s.columns( 2, 4 ) { |c| puts c } #Columns 2 to 4
#Count is included via Enumerable and accepts a block which will yield Row or Column.
s.rows.count { |r| r.val('Part') == 'Type1' } #=> 3
s.rows.count { |r| r.val('Part') == 'Type1' && r.val('Qty') < 2 } #=> 2
#Remove all empty rows & columns
s.compact!
#Delete the current sheet from the workbook
s.delete
#Delete rows or columns "if( condition )" (iterates in reverse to preserve references during loop)
s.delete_rows_if { |r| r.empty? }
s.delete_columns_if { |c| c.empty? }
#Filter the data given a column and a block to test values against.
#Note: Returns a copy of the sheet when used without "!".
#Note: This gem carries a Regexp to_proc method for Regex shorthand (shown below).
s.filter!( 'Part' ) { |value| value =~ /Type[13]/ }
s.filter!( 'Part', 'Cost' ) { |part, cost| part =~ /Type[13]/ && cost < 2 }
s.filter!( 'Part', &/Type[13]/ )
#Filter the data to a specific set of columns by their headers.
#Note: Returns a copy of the sheet when used without "!".
s
Related Skills
node-connect
352.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
