Arrow & Polars Guide

TTOON maintains two independent processing paths: the object path (general-purpose) and the Arrow path (high-performance tabular). This guide covers the Arrow path.

Why a Separate Arrow Path?

The Arrow path keeps tabular data in Arrow-native columnar form instead of language-native objects. Today, the strongest fast path is T-JSON → Arrow direct read; T-TOON tabular still interoperates through the compatibility Node route. For tabular data, this means:

No language-native row materialization on the Arrow side — data stays columnar instead of becoming dict / JS object rows
Lower conversion overhead where direct paths exist — especially for T-JSON → Arrow reads
Native type preservation — Decimal128, Date32, Timestamp, FixedSizeBinary(16) (UUID) stay in their Arrow-native forms

Python: Polars & PyArrow

Serialize

import polars as pl
import pyarrow as pa
import ttoon

# Polars DataFrame
df = pl.DataFrame({"name": ["Alice", "Bob"], "score": [95, 87]})
text = ttoon.dumps(df)
# [2]{name,score}:
# "Alice", 95
# "Bob", 87

# PyArrow Table
table = pa.table({"name": ["Alice", "Bob"], "score": [95, 87]})
text = ttoon.dumps(table)

# Arrow → T-JSON
text = ttoon.stringify_arrow_tjson(df)
# [{"name": "Alice", "score": 95}, {"name": "Bob", "score": 87}]

dumps() auto-detects Polars DataFrame and PyArrow Table/RecordBatch inputs, routing them to the Arrow path. Polars DataFrames are converted to Arrow first (zero-copy in Polars).

Deserialize to Arrow

table = ttoon.read_arrow(text)  # returns pyarrow.Table

From the returned pyarrow.Table, you can convert to any downstream format:

df = pl.from_arrow(table)      # Polars DataFrame
pandas_df = table.to_pandas()  # Pandas DataFrame

Delimiter Options

text = ttoon.dumps(df, delimiter="|")
# [2]{name,score}:
# "Alice"| 95
# "Bob"| 87

text = ttoon.dumps(df, delimiter="\t")

JavaScript: Apache Arrow

Requires the optional peer dependency apache-arrow.

Serialize

import { stringifyArrow, stringifyArrowTjson } from '@ttoon/shared';
import { tableFromArrays } from 'apache-arrow';

const table = tableFromArrays({
  name: ['Alice', 'Bob'],
  score: [95, 87],
});

// Arrow → T-TOON tabular
const ttoonText = await stringifyArrow(table);

// Arrow → T-JSON
const tjsonText = await stringifyArrowTjson(table);

Deserialize to Arrow

import { readArrow } from '@ttoon/shared';

const table = await readArrow(text);

Arrow APIs in JS are async because they dynamically import the apache-arrow module.

Rust

use ttoon_core::{read_arrow, arrow_to_ttoon, arrow_to_tjson};

let table = read_arrow(text)?;
let ttoon = arrow_to_ttoon(&table, None)?;
let tjson = arrow_to_tjson(&table, None)?;

Arrow Input Requirements

read_arrow() across all languages enforces these constraints:

Requirement	Description
Root must be a list	Arrow bridge only handles tabular data
Each element must be an object	Object keys become schema fields
Field types must be consistent	Cannot mix different scalar types in the same column
No structural fields	List/object values are not arrowable

Arrow Schema Mapping

Typed Type	Arrow Type
`int`	`Int64`
`float`	`Float64`
`decimal`	`Decimal128` or `Decimal256` (by precision)
`string`	`Utf8`
`bool`	`Boolean`
`date`	`Date32`
`time`	`Time64(Microsecond)`
`datetime`	`Timestamp(Microsecond[, tz])`
`uuid`	`FixedSizeBinary(16)` + UUID metadata
`hex`/`b64`	`Binary`
`null`	Nullable column; all-null infers as `Null`

Arrow types are preserved at their native resolution — decimal is not downgraded to string, uuid uses FixedSizeBinary(16) with metadata.

Performance Notes

T-JSON Direct Path

The Rust core includes a two-pass direct path for T-JSON → Arrow (read_arrow_tjson_direct) that skips the Token/Node intermediate layer. This significantly reduces memory usage for large datasets and benefits all SDKs through the shared core.

Sparse Schema Support

T-JSON read_arrow() supports sparse rows — missing keys are treated as null. Schema field order is inferred from the first occurrence order within the batch.

T-TOON tabular uses the header field order and width as-is.

Datetime Timezone Consistency

The JS Arrow bridge does not allow mixing timezone-aware and naive datetimes within the same column. Mixing them causes a schema inference error.

Next Steps

Streaming Guide — Row-by-row Arrow streaming with ArrowStreamReader / ArrowStreamWriter
Type Mapping — Complete cross-language type table
Stream API — Streaming APIs and schema definitions

Why a Separate Arrow Path?​

Python: Polars & PyArrow​

Serialize​

Deserialize to Arrow​

Delimiter Options​

JavaScript: Apache Arrow​

Serialize​

Deserialize to Arrow​

Rust​

Arrow Input Requirements​

Arrow Schema Mapping​

Performance Notes​

T-JSON Direct Path​

Sparse Schema Support​

Datetime Timezone Consistency​

Next Steps​

Why a Separate Arrow Path?

Python: Polars & PyArrow

Serialize

Deserialize to Arrow

Delimiter Options

JavaScript: Apache Arrow

Serialize

Deserialize to Arrow

Rust

Arrow Input Requirements

Arrow Schema Mapping

Performance Notes

T-JSON Direct Path

Sparse Schema Support

Datetime Timezone Consistency

Next Steps