Apache Iceberg version
0.9.0 (latest release)
Please describe the bug 🐞
Hi, thanks for writing pyiceberg.
The bug is pretty much described in the title: table.scan(row_filter="x IN (0, 1)") does not include the values for which x=0 when x is a DoubleType and a partition column.
Here is a reproducer:
pip install pyiceberg[sql-sqlite,pyarrow]
from pathlib import Path
from tempfile import TemporaryDirectory
import pyarrow
from pyiceberg.catalog.sql import SqlCatalog
from pyiceberg.schema import Schema
from pyiceberg.transforms import IdentityTransform
from pyiceberg.types import DoubleType, NestedField
from pyiceberg.partitioning import PartitionSpec, PartitionField
schema = Schema(
NestedField(field_id=1, name="x", field_type=DoubleType()),
NestedField(field_id=2, name="y", field_type=DoubleType()),
)
partition_spec = PartitionSpec(PartitionField(source_id=1, field_id=1001, transform=IdentityTransform(), name="x"))
with TemporaryDirectory() as tmpdir:
catalog = SqlCatalog(
"local",
uri=f"sqlite:///{tmpdir}/catalog.db",
warehouse=f"file://{tmpdir}/warehouse",
)
catalog.create_namespace("test")
table = catalog.create_table(
"test.test", schema=schema, partition_spec=partition_spec
)
data = pyarrow.table(
{
"x": [0.0, 1.0, 2.0],
"y": [0.0, 0.0, 0.0],
}
)
table.overwrite(data)
print("=== no filter ===")
print(table.scan().to_arrow())
print("=== x IN (0) ===")
print(table.scan(row_filter="x IN (0)").to_arrow())
print("=== x IN (0, 1, 2) ===")
print(table.scan(row_filter="x IN (0, 1, 2)").to_arrow())
Output:
/tmp/tmp.l2MLQFjC7C-05duO9h5/lib/python3.13/site-packages/pyiceberg/table/__init__.py:686: UserWarning: Delete operation did not match any records
warnings.warn("Delete operation did not match any records")
=== no filter ===
pyarrow.Table
x: double
y: double
----
x: [[0],[1],[2]]
y: [[0],[0],[0]]
=== x IN (0) ===
pyarrow.Table
x: double
y: double
----
x: [[0]]
y: [[0]]
=== x IN (0, 1, 2) ===
pyarrow.Table
x: double
y: double
----
x: [[1],[2]]
y: [[0],[0]]
I expect output for x in (0, 1, 2) to match that of the no filter scan.
Note that I could not reproduce when x is a LongType instead of a DoubleType.
Willingness to contribute