oups

oups stands for Ordered Updatable Parquet Store.

oups is a Python library that provides powerful tools for managing collections of ordered parquet datasets. It enables efficient storage, indexing, and querying of time-series data with validated ordering and good performance.

Key Features

  • Ordered Storage: Validates data ordering within datasets

  • Schema-based Indexing: Hierarchical organization using dataclass schemas

  • Incremental Updates: Efficiently merge new data with existing datasets

  • Row Group Management: Optimizing storage layout

  • Duplicate Handling: Configurable duplicate detection and removal

  • Lock-based Concurrency: Safe concurrent access to datasets

  • Cross-dataset Queries: Query multiple datasets simultaneously

Documentation

Indices and Tables