Core Java

Apache POI Workbook Evaluation

Apache POI is a popular library for reading, writing, and manipulating Microsoft Office documents, including Excel files. It provides different classes to work with Excel spreadsheets, each suited for specific use cases. Choosing the right class depends on factors like file format, memory usage, and performance. This article compares these classes and evaluations to help you make the best choice for your use case.

1. Overview of Classes

1.1 HSSFWorkbook

The HSSFWorkbook class offers the advantage of being lightweight and well-suited for smaller files while maintaining backward compatibility with older Excel formats. However, it is limited to the .xls format, which supports a maximum of 65,536 rows and 256 columns, and can lead to high memory consumption when handling large files.

  • File Type Supported: .xls (Excel 97-2003 format).
  • Memory Model: The entire workbook is loaded into memory.
  • Use Case: Suitable for small .xls files.

1.2 XSSFWorkbook

The XSSFWorkbook class supports the modern .xlsx format, allowing for up to 1,048,576 rows and 16,384 columns, and provides richer functionality and advanced styling options. However, it consumes significant memory when handling large files and is slower than HSSFWorkbook when processing small files.

  • File Type Supported: .xlsx (Excel 2007+ format).
  • Memory Model: The entire workbook is loaded into memory.
  • Use Case: Suitable for small to medium .xlsx files.

1.3 SXSSFWorkbook

The SXSSFWorkbook class is highly memory-efficient, making it ideal for handling large datasets and processing files with millions of rows without memory constraints. However, it offers limited support for reading existing sheets or advanced features like formula evaluation and rows written to disk cannot be modified.

  • File Type Supported: .xlsx (Excel 2007+ format).
  • Memory Model: Uses a streaming model, and writes rows to disk to minimize memory usage.
  • Use Case: Suitable for very large .xlsx files.

2. Code Examples

To use Apache POI in a project, the required dependencies must be added to the pom.xml file. The dependencies for Apache POI are listed below:

        <dependency> 
            <groupId>org.apache.poi</groupId>
            <artifactId>poi</artifactId> 
            <version>5.3.0</version> 
        </dependency> 
        <dependency> 
            <groupId>org.apache.poi</groupId> 
            <artifactId>poi-ooxml</artifactId> 
            <version>5.3.0</version> 
        </dependency>

This pom.xml configuration includes two key dependencies for working with Apache POI in a Java project. The first dependency POI supports handling Excel .xls files (Excel 97-2003 format) and is essential for basic POI functionality. The second dependency, poi-ooxml, extends this functionality to support .xlsx files (Excel 2007+ format), enabling the use of advanced features specific to modern Excel workbooks.

2.1 Using HSSFWorkbook for .xls Files

The HSSFWorkbook class in Apache POI is specifically designed for working with the older .xls format, which supports up to 65,536 rows and 256 columns. It is ideal for small files or projects that require compatibility with legacy Excel formats. Below is an example of creating and writing data to an .xls file using HSSFWorkbook.

public class HSSFWorkbookExample {

    private static final Logger LOGGER = Logger.getLogger(HSSFWorkbookExample.class.getName());

    public static void main(String[] args) {
        try (HSSFWorkbook workbook = new HSSFWorkbook(); 
            FileOutputStream out = new FileOutputStream("example.xls")) {
            Sheet sheet = workbook.createSheet("Data");
            Row row = sheet.createRow(0);
            Cell cell = row.createCell(0);
            cell.setCellValue("Hello, HSSFWorkbook!");
            workbook.write(out);
            System.out.println("example.xls created successfully!");
        } catch (Exception e) {
            LOGGER.log(Level.SEVERE, e.getMessage());
        }
    }
}

This code demonstrates how to create a simple .xls file using Apache POI’s HSSFWorkbook. It initializes a workbook and creates a sheet named “Data.” Within the sheet, it creates a row and a cell, setting the cell’s value to “Hello, HSSFWorkbook!” The workbook is then written to a file named example.xls using a FileOutputStream. The use of try-with-resources ensures that the workbook and stream are properly closed after execution.

2.2 Using XSSFWorkbook for .xlsx Files

The XSSFWorkbook class is used for creating and managing .xlsx files, the modern Excel format supporting larger data capacities and advanced features. Below is an example of creating and writing data to an .xlsx file using XSSFWorkbook.

public class XSSFWorkbookExample {
    
    private static final Logger LOGGER = Logger.getLogger(HSSFWorkbookExample.class.getName());
    
    public static void main(String[] args) {
        try (XSSFWorkbook workbook = new XSSFWorkbook(); 
            OutputStream out = new BufferedOutputStream(new FileOutputStream("example2.xlsx"))) {
            Sheet sheet = workbook.createSheet("Data");
            Row row = sheet.createRow(0);
            Cell cell = row.createCell(0);
            cell.setCellValue("Hello, XSSFWorkbook!");
            workbook.write(out);
            System.out.println("example2.xlsx created successfully!");
        } catch (Exception e) {
            LOGGER.log(Level.SEVERE, e.getMessage());
        }
    }
}

This code demonstrates how to create and write to an .xlsx file using Apache POI’s XSSFWorkbook class. It initializes a workbook, creates a sheet named “Data,” and adds a row and a cell with the value “Hello, XSSFWorkbook!” The workbook is written to a file named example2.xlsx using a buffered output stream for efficiency.

2.3 Using SXSSFWorkbook for Large .xlsx Files

The SXSSFWorkbook class is optimized for handling large .xlsx files by using a streaming approach that minimizes memory usage, making it ideal for datasets with millions of rows. Below is an example of creating and writing a large Excel file using SXSSFWorkbook.

public class SXSSFWorkbookExample {
    
    private static final Logger LOGGER = Logger.getLogger(HSSFWorkbookExample.class.getName());
    
        public static void main(String[] args) {
        try (SXSSFWorkbook workbook = new SXSSFWorkbook(); 
            OutputStream out = new BufferedOutputStream (new FileOutputStream("large_example.xlsx"))) {
            Sheet sheet = workbook.createSheet("Data");
            for (int i = 0; i < 1_000_000; i++) {
                Row row = sheet.createRow(i);
                Cell cell = row.createCell(0);
                cell.setCellValue("Row " + i);
            }
            workbook.write(out);
            System.out.println("large_example.xlsx created successfully!");
        } catch (Exception e) {
           LOGGER.log(Level.SEVERE, e.getMessage());
        }
    }
}

This code demonstrates how SXSSFWorkbook efficiently handles large .xlsx files by using a streaming approach to write data directly to disk, reducing memory consumption. Unlike XSSFWorkbook, which keeps all data in memory, SXSSFWorkbook flushes rows to disk as they are written, allowing it to handle millions of rows without running out of memory. This makes it useful for creating large Excel files in resource-constrained environments.

However, once rows are saved to disk, they cannot be changed, which is a trade-off for saving memory while working with large files.

3. Comparison Table

The following table summarizes the key differences between HSSFWorkbook, XSSFWorkbook, and SXSSFWorkbook in Apache POI.

FeatureHSSFWorkbookXSSFWorkbookSXSSFWorkbook
File Format.xls.xlsx.xlsx
Memory UsageHighHighLow
PerformanceFast for small filesModerateBest for large files
Max Rows65,5361,048,5761,048,576
Max Columns25616,38416,384
Streaming SupportNoNoYes
ModifiabilityFullFullLimited

3.1 Choosing the Best Option

  • Large Files in .xlsx Format: Use SXSSFWorkbook for memory efficiency. SXSSFWorkbook is the best option as it balances performance and memory usage.
  • Small Files in .xls Format: Use HSSFWorkbook for simplicity and compatibility. HSSFWorkbook is a good choice for compatibility with older Excel versions.
  • Small to Medium Files in .xlsx Format: Use XSSFWorkbook for modern features and better styling options. XSSFWorkbook is the most feature-rich option.

It is recommended to use HSSFWorkbook or XSSFWorkbook when full read/write capabilities and advanced features are required. On the other hand, SXSSFWorkbook is the ideal choice for handling very large datasets where memory efficiency is crucial.

4. Conclusion

In this article, the differences between HSSFWorkbook, XSSFWorkbook, and SXSSFWorkbook were discussed, each suited for different use cases. HSSFWorkbook is ideal for small .xls files, XSSFWorkbook supports modern .xlsx files with advanced features, and SXSSFWorkbook is best for handling large datasets with memory efficiency. The choice depends on the project’s file size, memory constraints, and feature needs.

5. Download the Source Code

This article provides an evaluation of workbook handling in Java using Apache POI.

Download
You can download the full source code of this example here: Java apache poi workbook evaluation

Omozegie Aziegbe

Omos Aziegbe is a technical writer and web/application developer with a BSc in Computer Science and Software Engineering from the University of Bedfordshire. Specializing in Java enterprise applications with the Jakarta EE framework, Omos also works with HTML5, CSS, and JavaScript for web development. As a freelance web developer, Omos combines technical expertise with research and writing on topics such as software engineering, programming, web application development, computer science, and technology.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button