Performance metrics with large Excel files in C# and VB.NET

GemBox.Spreadsheet is an Excel component that follows .NET design guidelines and best practices. It represents Excel files in-memory through its rich content model that contains sheets, rows, cells, styles, etc. It has optimized memory consumption, allocation, while not jeopardizing the efficiency and speed of the execution.

You can create stress tests to measure the component's processing capabilities using BenchmarkDotNet.

The following example shows how you can track the performance of GemBox.Spreadsheet using the provided input Excel file with 25 sheets of various cell data. The file should cover any typical Excel requirements; it includes different kinds of elements (like images, charts, and pivot tables) and Excel features (like conditional formatting, data validation, and filtering).

Measuring performance of reading, writing, and iterating through Excel files in C# and VB.NET
Screenshot of GemBox.Spreadsheet performance measurements
using System;
using System.Collections.Generic;
using System.IO;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Engines;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;
using GemBox.Spreadsheet;

[SimpleJob(RuntimeMoniker.Net48)]
[SimpleJob(RuntimeMoniker.NetCoreApp31)]
public class Program
{
    private ExcelFile workbook;
    private readonly Consumer consumer = new Consumer();

    public static void Main()
    {
        BenchmarkRunner.Run<Program>();
    }

    [GlobalSetup]
    public void SetLicense()
    {
        // If using Professional version, put your serial key below.
        SpreadsheetInfo.SetLicense("FREE-LIMITED-KEY");

        // If using Free version and example exceeds its limitations, use Trial or Time Limited version:
        // https://www.gemboxsoftware.com/spreadsheet/examples/c-sharp-free-professional-excel-library/1001

        this.workbook = ExcelFile.Load("%#RandomSheets.xlsx%");
    }

    [Benchmark]
    public ExcelFile Reading()
    {
        return ExcelFile.Load("%#RandomSheets.xlsx%");
    }

    [Benchmark]
    public void Writing()
    {
        using (var stream = new MemoryStream())
            this.workbook.Save(stream, new XlsxSaveOptions());
    }

    [Benchmark]
    public void Iterating()
    {
        this.LoopThroughAllCells().Consume(this.consumer);
    }

    public IEnumerable<object> LoopThroughAllCells()
    {
        foreach (var worksheet in this.workbook.Worksheets)
            foreach (var row in worksheet.Rows)
                foreach (var cell in row.AllocatedCells)
                    yield return cell.Value;
    }
}
Imports System
Imports System.Collections.Generic
Imports System.IO
Imports BenchmarkDotNet.Attributes
Imports BenchmarkDotNet.Engines
Imports BenchmarkDotNet.Jobs
Imports BenchmarkDotNet.Running
Imports GemBox.Spreadsheet

<SimpleJob(RuntimeMoniker.Net48)>
<SimpleJob(RuntimeMoniker.NetCoreApp31)>
Public Class Program

    Private workbook As ExcelFile
    Private ReadOnly consumer As Consumer = New Consumer()

    Public Shared Sub Main()
        BenchmarkRunner.Run(Of Program)()
    End Sub

    <GlobalSetup>
    Public Sub SetLicense()
        ' If using Professional version, put your serial key below.
        SpreadsheetInfo.SetLicense("FREE-LIMITED-KEY")

        ' If using Free version and example exceeds its limitations, use Trial or Time Limited version:
        ' https://www.gemboxsoftware.com/spreadsheet/examples/c-sharp-free-professional-excel-library/1001

        Me.workbook = ExcelFile.Load("%#RandomSheets.xlsx%")
    End Sub

    <Benchmark>
    Public Function Reading() As ExcelFile
        Return ExcelFile.Load("%#RandomSheets.xlsx%")
    End Function

    <Benchmark>
    Public Sub Writing()
        Using stream = New MemoryStream()
            Me.workbook.Save(stream, New XlsxSaveOptions())
        End Using
    End Sub

    <Benchmark>
    Public Sub Iterating()
        Me.LoopThroughAllCells().Consume(Me.consumer)
    End Sub

    Public Iterator Function LoopThroughAllCells() As IEnumerable(Of Object)
        For Each worksheet In Me.workbook.Worksheets
            For Each row In worksheet.Rows
                For Each cell In row.AllocatedCells
                    Yield cell.Value
                Next
            Next
        Next
    End Function

End Class

Benchmarks for 1,000,000 Excel rows

The more cells you have, the more memory you'll need. The amount of data you can handle depends on a few factors, like the machine's available memory, the application's architecture (32-bit or 64-bit), the targeted .NET platform (.NET Framework or .NET Core), etc.

The following benchmark charts provide the results of working with Excel files with up to 1 million rows. They show a steady and linear increase in both time and memory with an increased number of rows. For more information, see the resulting performance measurements in the 1_Million_Rows_Performance.xlsx file.

Benchmark chart of time that's required for reading and writing Excel files with up to 1 million rows
Benchmark chart of elapsed time for 1 million rows
Benchmark chart of memory that's required for creating Excel files with up to 1 million rows
Benchmark chart of allocated memory for 1 million rows

Tips for improving performance

The following are some recommendations for improving performance while developing with GemBox.Spreadsheet:

  • When reading large Excel files, the most efficient way to retrieve the data is by accessing it in sequence, from the first ExcelRow in a sheet to the last, and from the first ExcelCell in a row to the last. Avoid accessing cells randomly in workbooks with large data sets.
  • When writing large Excel files with a huge amount of numbers and dates, use one of the ExcelCell.SetValue overload methods rather than the ExcelCell.Value property to avoid boxing and unboxing value types.
  • Typically, sheets have the same number of columns in each row. In that case, to retrieve the columns count, use the Count value of the first ExcelRow.AllocatedCells rather than the ExcelWorksheet.CalculateMaxUsedColumns method.
  • When auto fitting columns, you can speed up the operation by reducing the number of cells measured with the ExcelColumn.AutoFit(Double, ExcelRow, ExcelRow) method. You can also use the first parameter, the scaling factor, to adjust the calculated column width.
  • When inserting or removing rows or columns, you can speed up the operation by disabling the ExcelFile.AutomaticFormulaUpdate option.

Want more?

Next example GitHub

Check the next example or select an example from the menu. You can also download our examples from the GitHub.


Like it?

Download Buy

If you want to try the GemBox.Spreadsheet yourself, you can download the free version. It delivers the same performance and set of features as the professional version, but with some operations limited. To remove the limitation, you need to purchase a license.