IVSparse

A C++ sparse matrix library optimized for compression of highly redundant sparse data.

IVSparse is a C++ sparse matrix library that introduces novel compression formats for sparse matrices where non-zero values are highly redundant — a common property in single-cell omics and other datasets from machine learning, data science, and scientific computing.

The library provides three storage formats:

  • CSC — standard Compressed Sparse Column format
  • VCSC (Value Compressed Sparse Column) — stores unique values and their column indices, achieving ~2.25× compression over CSC
  • IVCSC (Indexed Value Compressed Sparse Column) — further compresses index storage via positive-delta encoding and bytepacking, achieving ~7.5× compression over CSC

The full paper was published at IEEE BigData ‘24. The work was also accepted as a one-page summary and poster at the 2024 Data Compression Conference (DCC), where we presented alongside researchers from across the compression community.

(Wolfgang et al., 2024)

Links: GitHub     arXiv

Sizes in GB of a large random matrix across different compression formats.
Seth Wolfgang (right) and me (left) presenting IVSparse at the 2024 Data Compression Conference (DCC).

Talk

Presentation on IVSparse and the VCSC/IVCSC formats.

References

2024

  1. BigData
    Value-Compressed Sparse Column (VCSC): Sparse Matrix Storage for Single-cell Omics Data
    Seth Wolfgang, Skyler Ruiter, Marc Tunnell, and 3 more authors
    In 2024 IEEE International Conference on Big Data (BigData), Dec 2024