Icebug format

Icebug format comes in two flavours: icebug-disk and icebug-memory. Both store graph data in CSR format. You can bring your own node and rel tables(with from and to columns) and generate icebug files with the icebug-format tool

icebug-disk

icebug-disk is a Ladybug-native graph-aware Parquet format designed for ingestion-free graph analytics. Unlike general-purpose Parquet files, Icebug preserves graph structure (node and relationship tables) and enables direct querying without preprocessing.

Generating Icebug files

Use the icebug-format tool to generate icebug-disk files from existing databases:

# From a DuckDB database
uvx icebug-format --source-db demo-db.duckdb --schema input_schema.cypher

# From a GraphAr archive
uvx icebug-format --graphar <path to archive>

This generates a directory of Parquet files (for nodes and relationships) plus a Cypher schema file.

output schema.cypher:

CREATE NODE TABLE city(id INT32, name STRING, population INT64, PRIMARY KEY(id)) WITH (storage = '<path-to-dir>', format = 'icebug-disk');
CREATE NODE TABLE user(id INT32, name STRING, age INT64, PRIMARY KEY(id)) WITH (storage = '<path-to-dir>', format = 'icebug-disk');
CREATE REL TABLE follows(FROM user TO user, since INT32) WITH (storage = '<path-to-dir>', format = 'icebug-disk');
CREATE REL TABLE livesin(FROM user TO city) WITH (storage = '<path-to-dir>', format = 'icebug-disk');

Using Icebug files

Start Ladybug with the generated schema file using the -i flag:

lbug -i csr_graph/schema.cypher

or Run the DDL queries yourself in a Ladybug instance.

Then query the graph directly:

MATCH (a:User)-[b:LivesIn]->(c:City)
RETURN a.*, b.*, c.*;

If the ladybug instance is created backed by a file graph.lbdb, you can move it around or export it, along with the data, and query it without needing to re-create the graph. You can also attach the same db to another instance of ladybug using

ATTACH 'graph.lbdb' AS mygraph (dbtype lbug);

For more details about attaching databases, see the attach documentation.

Remote storage

Icebug-disk supports Parquet files on remote storage. The storage path in your schema.cypher can be any URI supported by Ladybug’s file system extensions.

Storage type	Example URI	Extension required
Amazon S3	`s3://my-bucket/graphs/mygraph/`	`httpfs`
Google Cloud Storage	`gcs://my-bucket/graphs/mygraph/`	`httpfs`
Azure Blob Storage	`az://my-container/mygraph/`	`azure`
Huggingface Hub	`xet://huggingface.co/mygraph/`	`httpfs`
HTTPS	`https://host/path/mygraph/`	`httpfs`

Example: S3

First, install and configure the httpfs extension if not already done:

INSTALL httpfs;
LOAD httpfs;
CALL s3_credential(
  key_id='YOUR_KEY_ID',
  secret='YOUR_SECRET',
  region='us-east-1'
);

Then use S3 URIs directly in your schema:

CREATE NODE TABLE city(id INT32, name STRING, population INT64, PRIMARY KEY(id))
  WITH (storage = 's3://my-bucket/mygraph/', format = 'icebug-disk');
CREATE REL TABLE livesin(FROM user TO city)
  WITH (storage = 's3://my-bucket/mygraph/', format = 'icebug-disk');

icebug-memory

icebug-memory is a Ladybug-native graph-aware Arrow format designed for ingestion-free graph analytics. Unlike general-purpose Arrow tables, Icebug preserves graph structure (node and relationship tables) and enables direct querying without preprocessing.

Generating Icebug tables

Use the icebug-format tool to generate icebug-memory tables from existing arrow tables:

from icebug_format import IcebugMemGraph

graph: IcebugMemGraph = IcebugMemGraph.from_arrow_tables(
    from_node_arrow_table=users,   # pa.Table, first column is the primary key
    rel_arrow_table=livesin,       # pa.Table with 'source' / 'from' and 'target' / 'to' columns
    to_node_arrow_table=cities,    # pa.Table, first column is the primary key
)

Using Icebug tables

Ladybug Python, Node.js, Rust, and C++ bindings expose create APIs for node and rel tables. For example, in Python:

import ladybug as lb

# get icebug graph from earlier step

db = lb.Database()
conn = lb.Connection(db)

# Create node table
conn.create_arrow_table(
    table_name="users",         # node table name to be used in ladybug
    dataframe=graph.src         # node table as a pa.Table
)

# create rel table
conn.create_arrow_rel_table(
    table_name="livesin",       # rel table name to be used in ladybug
    src_table_name="users",     # src node table name from table creation earlier
    dst_table_name="cities",    # dst node table name from table creation earlier
    layout="CSR",
    dataframe=graph.indices,    # rel table with 'source' and 'target' columns
    dst_col_name="to",          # dst col name in the indices table
    indptr=graph.indptr,        # row pointers for indices table
)

conn.execute("MATCH (a:users)-[b:livesin]->(c:cities) RETURN a.*, b.*, c.*")