How to read and write ODF/ODS files (OpenDocument Spreadsheets)
Here you can download the latest version of OdsReadWrite demo application, together
with C# source code:
Download OdsReadWrite.zip
(last updated on 2009-7-24)
After downloading, extract ZIP file to folder on your hard drive and open the solution
file in Visual Studio 2008. If you don't have Visual Studio 2008 you can run the
application directly by running the executable in the bin folder.
Introduction
The OpenDocument Format (ODF) is an XML-based file format for representing electronic
documents such as spreadsheets, charts, presentations and word processing documents.
The standard was developed by the OASIS (Organization for the Advancement of Structured
Information Standards), and it is free and open format.
The OpenDocument format is used in free software and in proprietary software. Originally,
format was implemented by the OpenOffice.org office suite and, with Office
2007 SP2, Microsoft also supports ODF subset.
This article will explain basics of ODF format, and specifically its implementation
in spreadsheet applications (OpenOffice.org Calc and Microsoft Office Excel
2007 SP2). Presented is a demo application which writes/reads tabular data
to/from .ods files. Application is written in C# using Visual Studio 2008. Created
.ods files can be opened using Excel 2007 SP2 and OpenOffice.org Calc.
ODF format
OpenDocument format supports document representation:
- As a single XML document.
- As a collection of several subdocuments within a package.
Office applications use second approach so we will explain it detail.
Every ODF file is a collection of several subdocuments within a package (ZIP file),
each of which stores part of the complete document. Each subdocument stores a particular
aspect of the document. For example, one subdocument contains the style information
and another subdocument contains the content of the document.
This approach has following benefits:
- You don’t need to process entire file in order to extract specific data.
- Images and multimedia are now encoded in native format, not as text streams.
- Files are smaller as a result of compression and native multimedia storage.
There are four subdocuments in the package that contains file’s data:
- content.xml - Document content and automatic styles used in the content.
- styles.xml - Styles used in the document content and automatic styles used
in the styles themselves.
- meta.xml - Document meta information, such as the author or the time of the
last save action.
- settings.xml - Application-specific settings, such as the window size or
printer information.
Besides them, in package can be many other subdocuments like document thumbnail,
images, etc.
In order to read the data from an ODF file you need to:
- Open package as a ZIP archive.
- Find parts that contain data you want to read.
- Read parts you are interested in.
On the other side, if you want to create a new ODF file, you need to:
- Create/get all necessary parts.
- Package everything into a ZIP file with appropriate extension.
Spreadsheet Documents
Spreadsheet document files are the subset of ODF files. Spreadsheet files have .ods
file extensions.
The content (sheets) is stored in content.xml subdocument.

Picture 1: content.xml subdocument.
As we can see on Picture 1, sheets are stored as xml elements. They contain column
and row definitions, rows contain cells and so on… On picture is data from one specific
document, but from this we can see basic structure of content.xml file (you
can also download full ODF specification).
Implementation
Our demo is Windows Forms application (Picture 2), written in C# using Visual Studio
2008.

Picture 2: Demo application.
Application can:
- Create new Spreadsheet document.
- Read existing Spreadsheet document.
- Write created Spreadsheet document.
Creating new document and underlying model of
application
Internal, Spreadsheet document is stored as DataSet. Each sheet is represented
with DataTable, sheet’s row with DataRow, and sheet’s column with
DataColumn. So, to create a new document we have to create new DataSet,
with DataTables. Each DataTable has number of rows and columns that
conforms to our needs.
To show data from our DataSet (and to allow editing that data) application
dynamically creates tabs with DataGridViews (that are connected to our DataTables).
Through interface, user can read, write, edit data and add new rows to Spreadsheet
document.
Zip component and XML parser
Although classes from System.IO.Packaging namespace (.NET 3.0) provides way
to read and write ZIP files, they require different format of ZIP file. Because
of that, our demo uses open source component called DotNetZip.
Using ZIP component we can extract files, get subdocument, replace (or add) subdocuments
that we want and save that file as .ods file (which is ZIP file).
For processing documents we have used XmlDocument because it offers easy
way to reach part that we want. Note that, if performance is crucial for you, you
should use XmlTextReader and XmlTextWriter. That solution need more
work (and code), but provides better performance.
Reading Spreadsheet Document
To read document we follow these steps:
- Extracting .ods file.
- Getting content.xml file (which contains sheets data).
- Creating XmlDocument object from content.xml file.
- Creating DataSet (that represent Spreadsheet file).
- With XmlDocument we select “table:table” elements, and then we create
adequate DataTables.
- We parse child’s of “table:table” element and fill DataTables with
those data.
- At the end, we return DataSet and show it in application’s interface.
Although ODF specification provides way to specify default row, column and cell
style, implementations have nasty practice (that specially applies for Excel) that
they rather write sheet as sheet with maximum number of columns and maximum number
of rows, and then they write all cells with their style. So you could see that your
sheet has more than 1000 columns (1024 in Calc and 16384 in Excel), and even more
rows (and each rows contains the number of cells that is equal to the number of
columns), although you only have write data to first few rows/columns.
ODF specification provides a way that you specify some element (like column/row/cell)
and then you specify the number of time it repeats. So above behavior doesn’t affect
the size of the file, but that complicates our implementation.
Because of that, we can’t just read the number of columns and add equal number of
DataColumns to DataTable (because of performance issues). In this
implementation, we rather read cells and, if they have data, we first create rows/columns
they belong to, and then we add those cells to the DataTable. So, at the
end, we allocate only space that we need to.
Writing Spreadsheet Document
To write document we follow these steps:
- Extracting template.ods file (.ods file that we use as template).
- Getting content.xml file.
- Creating XmlDocument object from content.xml file.
- Erasing all “table:table” elements from the content.xml file.
- Reading data from our DataSet and composing adequate “table:table”
elements.
- Adding “table:table” elements to content.xml file.
- Zipping that file as new .ods file.
In this application, as template, we have use empty document. But application can
be easy modified to use some other template (so, that you have preserved styles,
etc).
Alternative ways
As always in programming, there is more than one method to achieve the same thing.
ODF files are just a collection of xml files, packed in zip files so, any of the
vast number of tools for handling zip files and XML data can be used to handle OpenDocument.
As another option, you could use some third party component which has support for
ODF format. This will probably cost you some money but has advantage that usually
more than one format (for example: GemBox.Spreadsheet reads/writes XLS, XLSX, CSV, HTML and ODS) are supported
within the same API, so your application will be able to target different file formats
using the same code.
About the author
Josip Kremenic (Josip Kremenić) works as a developer at GemBox Software. He primarily works on GemBox.Spreadsheet
- an Excel .NET component for reading and writing XLS, XLSX,
CSV, HTML and ODS files. He got interested in ODF as a part of his work.
|