Improving Table Compression with Combinatorial Optimization

From Simple Sci Wiki
Revision as of 04:17, 24 December 2023 by SatoshiNakamoto (talk | contribs) (Created page with "Title: Improving Table Compression with Combinatorial Optimization Abstract: This research focuses on improving the compression of massive tables using a partition-training paradigm. The authors provide a new theory that unifies previous experimental observations and heuristic column permutations, all of which are used to improve compression rates. They develop the first on-line training algorithms for table compression, which can be applied to individual files and not...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Title: Improving Table Compression with Combinatorial Optimization

Abstract: This research focuses on improving the compression of massive tables using a partition-training paradigm. The authors provide a new theory that unifies previous experimental observations and heuristic column permutations, all of which are used to improve compression rates. They develop the first on-line training algorithms for table compression, which can be applied to individual files and not just continuously operating sources. They also create an off-line training algorithm based on an asymmetric traveling salesman problem link, which improves on prior work by rearranging columns before partitioning. Experimental results support these conclusions. The authors also show that a variation of the table compression problem is MAX-SNP hard.

Main Research Question: How can we improve the compression rates of massive tables using a partition-training paradigm?

Methodology: The authors study the problem of compressing massive tables within the partition-training paradigm introduced by Buchsbaum et al. They provide a new theory that unifies previous experimental observations and heuristic column permutations. They develop the first on-line training algorithms for table compression and an off-line training algorithm.

Results: The on-line algorithms provide 35-55% improvement over gzip with negligible slowdown. The off-line reordering provides up to 20% further improvement over partitioning alone. The authors also show that a variation of the table compression problem is MAX-SNP hard.

Implications: The research has implications for the compression of massive tables, particularly those generated continuously by operating sources. The new algorithms and theory provide a more effective way to compress such tables, reducing storage and network bandwidth requirements.

Link to Article: https://arxiv.org/abs/0203018v1 Authors: arXiv ID: 0203018v1