Skip to content
Blog

Icebug format

Icebug format comes in two flavours: icebug-disk and icebug-memory. Both store graph data in CSR format. You can bring your own node and rel tables(with from and to columns) and generate icebug files with the icebug-format tool

icebug-disk

icebug-disk is a Ladybug-native graph-aware Parquet format designed for ingestion-free graph analytics. Unlike general-purpose Parquet files, Icebug preserves graph structure (node and relationship tables) and enables direct querying without preprocessing.

Generating Icebug files

Use the icebug-format tool to generate icebug-disk files from existing databases:

Terminal window
# From a DuckDB database
uvx icebug-format --source-db demo-db.duckdb --schema input_schema.cypher
# From a GraphAr archive
uvx icebug-format --graphar <path to archive>

This generates a directory of Parquet files (for nodes and relationships) plus a Cypher schema file.

output schema.cypher:

CREATE NODE TABLE city(id INT32, name STRING, population INT64, PRIMARY KEY(id)) WITH (storage = '<path-to-dir>', format = 'icebug-disk');
CREATE NODE TABLE user(id INT32, name STRING, age INT64, PRIMARY KEY(id)) WITH (storage = '<path-to-dir>', format = 'icebug-disk');
CREATE REL TABLE follows(FROM user TO user, since INT32) WITH (storage = '<path-to-dir>', format = 'icebug-disk');
CREATE REL TABLE livesin(FROM user TO city) WITH (storage = '<path-to-dir>', format = 'icebug-disk');

Using Icebug files

Start Ladybug with the generated schema file using the -i flag:

Terminal window
lbug -i csr_graph/schema.cypher

or Run the DDL queries yourself in a Ladybug instance.

Then query the graph directly:

MATCH (a:User)-[b:LivesIn]->(c:City)
RETURN a.*, b.*, c.*;

If the ladybug instance is created backed by a file graph.lbdb, you can move it around or export it, along with the data, and query it without needing to re-create the graph. You can also attach the same db to another instance of ladybug using

ATTACH 'graph.lbdb' AS mygraph (dbtype lbug);

For more details about attaching databases, see the attach documentation.

Remote storage

Icebug-disk supports Parquet files on remote storage. The storage path in your schema.cypher can be any URI supported by Ladybug’s file system extensions.

Storage typeExample URIExtension required
Amazon S3s3://my-bucket/graphs/mygraph/httpfs
Google Cloud Storagegcs://my-bucket/graphs/mygraph/httpfs
Azure Blob Storageaz://my-container/mygraph/azure
Huggingface Hubxet://huggingface.co/mygraph/httpfs
HTTPShttps://host/path/mygraph/httpfs

Example: S3

First, install and configure the httpfs extension if not already done:

INSTALL httpfs;
LOAD httpfs;
CALL s3_credential(
key_id='YOUR_KEY_ID',
secret='YOUR_SECRET',
region='us-east-1'
);

Then use S3 URIs directly in your schema:

CREATE NODE TABLE city(id INT32, name STRING, population INT64, PRIMARY KEY(id))
WITH (storage = 's3://my-bucket/mygraph/', format = 'icebug-disk');
CREATE REL TABLE livesin(FROM user TO city)
WITH (storage = 's3://my-bucket/mygraph/', format = 'icebug-disk');

icebug-memory

icebug-memory is a Ladybug-native graph-aware Arrow format designed for ingestion-free graph analytics. Unlike general-purpose Arrow tables, Icebug preserves graph structure (node and relationship tables) and enables direct querying without preprocessing.

Generating Icebug tables

Use the icebug-format tool to generate icebug-memory tables from existing arrow tables:

from icebug_format import IcebugMemGraph
graph: IcebugMemGraph = IcebugMemGraph.from_arrow_tables(
from_node_arrow_table=users, # pa.Table, first column is the primary key
rel_arrow_table=livesin, # pa.Table with 'source' / 'from' and 'target' / 'to' columns
to_node_arrow_table=cities, # pa.Table, first column is the primary key
)

Using Icebug tables

Ladybug Python, Node.js, Rust, and C++ bindings expose create APIs for node and rel tables. For example, in Python:

import ladybug as lb
# get icebug graph from earlier step
db = lb.Database()
conn = lb.Connection(db)
# Create node table
conn.create_arrow_table(
table_name="users", # node table name to be used in ladybug
dataframe=graph.src # node table as a pa.Table
)
# create rel table
conn.create_arrow_rel_table(
table_name="livesin", # rel table name to be used in ladybug
src_table_name="users", # src node table name from table creation earlier
dst_table_name="cities", # dst node table name from table creation earlier
layout="CSR",
dataframe=graph.indices, # rel table with 'source' and 'target' columns
dst_col_name="to", # dst col name in the indices table
indptr=graph.indptr, # row pointers for indices table
)
conn.execute("MATCH (a:users)-[b:livesin]->(c:cities) RETURN a.*, b.*, c.*")