ExcelTable
ExcelTable is an Oracle SQL interface to read Microsoft Excel files (.xlsx, .xlsm, .xlsb, .xls, .xml), ODF spreadsheet files (.ods) and flat files as external tables
Install / Use
/learn @mbleron/ExcelTableREADME
ExcelTable - An Oracle SQL Interface for MS Excel, ODF Spreadsheet and Flat Files
<p align="center"><img src="./resources/banner.png"/></p>ExcelTable is a pipelined table interface to read Excel files (.xlsx, .xlsm, .xlsb, .xls, .xml), ODF spreadsheet files (.ods), and flat files (delimited or positional) as if they were external tables.
It is primarily implemented in PL/SQL using an object type (for the ODCI routines) and a package supporting the core functionalities.
Content
- What's New in...
- Bug tracker
- Installation
- Quick Start
- ExcelTable Subprograms and Usage
- Copyright and license
What's New in...
Version 5.0 :
Support for strict OOXML documents
Streaming read method for ODF files
Raw cells listing
Version 4.0 : Support for delimited and positional flat files
Version 3.2 : ExcelTable can read XML spreadsheetML files (.xml)
Version 3.1 : New default value feature in DML API
Version 3.0 : Multi-sheet support
Version 2.3 : New API for DML operations
Version 2.2 : ExcelTable can read ODF (OpenDocument) spreadsheet files (.ods).
Support for TIMESTAMP data type
Version 2.1 : ExcelTable can read .xlsb files.
Version 2.0 : ExcelTable can read old 97-2003 Excel files (.xls).
Bug tracker
Found a bug, have a question, or an enhancement request?
Please create an issue here.
Installation
Getting source code
Clone this repository or download it as a zip archive.
ExcelCommons and MSUtilities dependencies are now provided as submodules.
-
If you choose to clone the repository, use the following command to fetch the submodule automatically :
git clone --recurse-submodules https://github.com/mbleron/ExcelTable.git -
If you go the download way, please also get ExcelCommons and MSUtilities zip archives and extract the content of their root folders into ExcelCommons and MSUtilities folders respectively.
Database requirement
ExcelTable requires Oracle Database 11.2.0.1 and onwards.
Note that the interface may work as well on versions 11.1.0.6 and 11.1.0.7, with limited support for CLOB projections, but that scenario has not been tested.
DBA preliminary tasks
On database versions prior to 11.2.0.4, a temporary XMLType table is used internally to read .xlsx files with the default DOM method. The owner requires the CREATE TABLE privilege in this case :
grant create table to <user>;
The schema into which ExcelTable is installed requires at least the privileges to create sessions, procedures and types:
grant
create session,
create procedure,
create type
to
<user>;
In order to read encrypted files, the interface requires access to the DBMS_CRYPTO API (see PL/SQL section below).
The owner must therefore be granted EXECUTE privilege on it :
grant execute on sys.dbms_crypto to <user>;
ExcelTable requires CURSOR_SHARING parameter set to EXACT, otherwise one may receive the following error when using ODCI-based function getRows() :
PLS-00307: too many declarations of 'ODCITABLEDESCRIBE' match this call
The current value can be checked using this query :
select value from v$parameter where name = 'cursor_sharing';
If the value is not 'EXACT' then it can be changed at system or session level using the corresponding ALTER SYSTEM/SESSION command, e.g.
alter session set cursor_sharing = exact;
If this change is not possible, the workaround is to override the parameter at query level via a hint :
select /*+ cursor_sharing_exact */ t.*
from table(
ExcelTable.getRows( ... )
) t
;
PL/SQL
Using SQL*Plus, connect to the target database schema and run script install.sql.
Java
ExcelTable requires additional Java classes for the following features :
- Streaming read method for .xlsx/.xlsm files
- Reading password-protected ODF spreadsheets encrypted using the Blowfish algorithm (ODF 1.0/1.1)
JAR files to deploy depend on the database version :
:warning: As of ExcelTable 5.0, Java-based features are desupported on database versions < 11.2.0.4
If necessary, you may create an issue and request a backport for your version.
Otherwise, please use the last compatible release ExcelTable 4.0.1.
- Versions < 11.2.0.4
Except for version 11.2.0.4 which supports JDK 6, Oracle 11g only supports JDK 5 (Java 1.5). The following jar files are required in order to use the streaming method :
Run shell script install_jdk5.bat on Windows, or install_jdk5.sh on a UNIX-like machine to load them in the database.
You will be prompted for connect information : database SID, user and password.
- Versions >= 11.2.0.4
The StAX API is included in JDK 6, as well as the Sun Java implementation (SJXSP), so for those versions one only needs the following jar file :
Run shell script install_jdk6.bat (or install_jdk6.sh) to load it in the database.
Both scripts use loadjava utility, available from a standard Oracle client or database installation ($ORACLE_HOME/bin).
It is recommended to use a client version at least equal to the target database version to avoid compatibility issues.
Quick Start
Reading an Excel file using default settings :
SELECT t.*
FROM Table(
ExcelTable.getRows(
ExcelTable.getFile('MY_DIR','my_file.xlsx')
, 'my_sheet'
, ' "COL1" number
, "COL2" varchar2(10)
, "COL3" number
, "COL4" date
, "COL5" number(3)'
)
) t
;
Reading a delimited flat file (e.g. csv) :
SELECT t.*
FROM Table(
ExcelTable.getRows(
p_file => ExcelTable.getTextFile('MY_DIR','my_file.csv')
, p_cols => q'{
"COL1" number
, "COL2" varchar2(50)
, "COL3" varchar2(50)
, "COL4" number
, "COL5" date format 'DD/MM/YYYY HH24:MI:SS'
, "COL6" timestamp(6) format 'DD-MON-YYYY HH.MI.SS.FF9 AM'
}'
, p_skip => 0
, p_line_term => chr(10)
, p_field_sep => ','
)
) t
;
Reading a positional flat file :
SELECT t.*
FROM Table(
ExcelTable.getRows(
p_file => ExcelTable.getTextFile('MY_DIR','my_file.dat')
, p_cols => q'{
"COL1" number(4) position(1:4)
, "COL2" varchar2(10) position(5:14)
, "COL3" varchar2(9) position(15:23)
, "COL4" number(4) position(24:27)
, "COL5" date format 'DD/MM/YYYY' position(28:37)
}'
, p_skip => 0
, p_line_term => chr(10)
)
) t
;
Getting cells list :
SELECT t.sheetIdx
, t.cellRow
, t.cellCol
, t.cellData.getTypeName() as typeName
, case when t.cellData.getTypeName() = 'SYS.VARCHAR2' then t.cellData.accessVarchar2() end as strval
, case when t.cellData.getTypeName() = 'SYS.NUMBER' then t.cellData.accessNumber() end as numval
, case when t.cellData.getTypeName() = 'SYS.TIMESTAMP' then t.cellData.accessTimestamp() end as tsval
, case when t.cellData.getTypeName() = 'SYS.CLOB' then t.cellData.accessClob() end as lobval
, t.cellNote
FROM Table(
ExcelTable.getRawCells(
p_file => ExcelTable.getFile('XL_DATA_DIR','my_file.xlsx')
, p_sheetFilter => anydata.ConvertVarchar2('my_sheet')
, p_cols => 'A-F'
)
) t
;
See the following sections for more examples and detailed description of ExcelTable features.
ExcelTable Subprograms and Usage
- getRows
- getRawCells
- getFile
- getTextFile
- setFetchSize
- useSheetPattern
- getCursor
- getSheets
- isReadMethodAvailable
- createDMLContext
- mapColumn
- mapColumnWithDefault
- loadData
getRows Function
This is the main function of ExcelTable. It returns a set of rows from the input file, based on the sheet(s), range and projection defined in the parameters.
The function is available as three overloads :
- Overloads #1 and #2 are dedicated to spreadsheet files, and accept a single sheet name (as a regex pattern), or a sheet list.
- Overload #3 is used to read delimited or positional flat files.
Overload 1
function getRows (
p_file in blob
, p_sheet in varchar2
, p_cols in varchar2
, p_range in varchar2 default null
, p_method in binary_integer default DOM_READ
, p_password in varchar2 default n
