API Reference
This section provides detailed API documentation for the main store components.
Indexer Functions
Core Classes
OrderedParquetDataset
Store
Write Operations
Utility Functions
Type Definitions
The following are important type definitions used throughout the store module:
Index Types
Indexer classes are dataclasses decorated with @toplevel
that define the schema for organizing datasets.
Ordered Column Types
The ordered_on
parameter accepts:
str
: Single column nameTuple[str]
: Multi-index column name (for hierarchical columns)
Row Group Target Size Types
The row_group_target_size
parameter accepts:
int
: Target number of rows per row groupstr
: Pandas frequency string (e.g., “1D”, “1H”) for time-based grouping
Key-Value Metadata
Custom metadata stored as Dict[str, str]
alongside parquet files.
Examples
Basic Usage
from oups.store import toplevel, Store, OrderedParquetDataset
import pandas as pd
# Define indexer schema
@toplevel
class MyIndex:
category: str
subcategory: str
# Create store
store = Store("/path/to/data", MyIndex)
# Create sample data
df = pd.DataFrame({
"timestamp": pd.date_range("2023-01-01", periods=1000),
"value": range(1000)
})
# Access dataset and write data
key = MyIndex("stocks", "tech")
dataset = store[key]
dataset.write(df=df, ordered_on="timestamp")
Advanced Write Options
from oups.store import write
# Time-based row groups with duplicate handling
write(
"/path/to/dataset",
ordered_on="timestamp",
df=df,
row_group_target_size="1D", # Daily row groups
duplicates_on=["timestamp", "symbol"], # Drop duplicates
max_n_off_target_rgs=2, # Coalesce small row groups
key_value_metadata={"source": "bloomberg", "version": "1.0"}
)
Cross-Dataset Queries
# Query multiple datasets simultaneously
keys = [MyIndex("stocks", "tech"), MyIndex("stocks", "finance")]
for intersection in store.iter_intersections(
keys,
start=pd.Timestamp("2023-01-01"),
end_excl=pd.Timestamp("2023-02-01")
):
for key, df in intersection.items():
print(f"Processing {key}: {len(df)} rows")
Hierarchical Indexing
from oups.store import toplevel, sublevel
@sublevel
class DateInfo:
year: str
month: str
@toplevel
class HierarchicalIndex:
symbol: str
date_info: DateInfo
# This creates paths like: AAPL/2023-01/
key = HierarchicalIndex("AAPL", DateInfo("2023", "01"))