Skip to content

Fix array-like pd.notna crash, normalize NumPy scalars, add update/delete input validation, export DataFrameOperations#145

Merged
saurabhrb merged 2 commits intousers/zhaodongwang/dataFrameExtensionClaudefrom
copilot/fix-review-comments-and-tests
Mar 17, 2026
Merged

Fix array-like pd.notna crash, normalize NumPy scalars, add update/delete input validation, export DataFrameOperations#145
saurabhrb merged 2 commits intousers/zhaodongwang/dataFrameExtensionClaudefrom
copilot/fix-review-comments-and-tests

Conversation

Copy link
Contributor

Copilot AI commented Mar 17, 2026

Addresses four unresolved review comments from PR #98 against the client.dataframe namespace: a crash on array-valued cells, silent NumPy serialization failures, missing ID validation in update() and delete(), and missing exports/tests.

utils/_pandas.py

  • Fix pd.notna() crash on array-like cells: Guard with pd.api.types.is_scalar(v) before calling pd.notna(); non-scalar values (lists, dicts, numpy arrays) pass through directly. Previously raised ValueError: The truth value of an array is ambiguous.
  • Normalize NumPy scalar types: New _normalize_scalar(v) helper converts np.integerint, np.floatingfloat, np.bool_bool, pd.Timestamp → ISO string. DataFrames with integer columns produce np.int64 by default, which json.dumps() cannot serialize.
# Before: would crash or produce non-serializable values
df = pd.DataFrame([{"tags": ["a", "b"]}, {"count": np.int64(5)}])
dataframe_to_records(df)  # ValueError / TypeError at serialization time

# After: safe
[{"tags": ["a", "b"]}, {"count": 5}]

operations/dataframe.py

  • update() — validate id_column values: After extracting IDs, raises ValueError listing offending row indices if any value is not a non-empty string (catches NaN, None, numeric IDs).
  • update() — validate non-empty change columns: Raises ValueError if the DataFrame contains only the id_column and no fields to update.
  • delete() — validate ids Series: Returns None immediately for an empty Series; raises ValueError listing offending indices for any non-string or blank value.

operations/__init__.py

  • Exports DataFrameOperations so consumers can use it for type annotations.

Tests

  • tests/unit/test_pandas_helpers.py — 11 isolated tests for dataframe_to_records() covering NaN handling, NumPy type normalization, Timestamp conversion, list/dict passthrough, and empty input.
  • tests/unit/test_dataframe_operations.py — 35 tests covering the full DataFrameOperations namespace, including all new validation paths.
Original prompt

Context

This PR addresses unresolved review comments from PR #98 ("add dataframe methods") and adds comprehensive test coverage for the DataFrame operations namespace (client.dataframe).

The base branch is users/zhaodongwang/dataFrameExtensionClaude which contains the current state of the DataFrame operations code from PR #98.

Files to modify

1. src/PowerPlatform/Dataverse/utils/_pandas.py

Current code at the HEAD of the PR branch (8838bb69533dd8830bac8724c44696771a6704e7):

# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

"""Internal pandas helpers"""

from __future__ import annotations

from typing import Any, Dict, List

import pandas as pd


def dataframe_to_records(df: pd.DataFrame, na_as_null: bool = False) -> List[Dict[str, Any]]:
    """Convert a DataFrame to a list of dicts, converting Timestamps to ISO strings.

    :param df: Input DataFrame.
    :param na_as_null: When False (default), missing values are omitted from each dict.
        When True, missing values are included as None (sends null to Dataverse, clearing the field).
    """
    records = []
    for row in df.to_dict(orient="records"):
        clean = {}
        for k, v in row.items():
            if pd.notna(v):
                clean[k] = v.isoformat() if isinstance(v, pd.Timestamp) else v
            elif na_as_null:
                clean[k] = None
        records.append(clean)
    return records

Required changes:

Fix A: pd.notna() crash on array-like values (unresolved comment #98 (comment))

pd.notna(v) raises ValueError: The truth value of an array is ambiguous when a cell contains a list, dict, numpy array, etc. Fix by guarding with pd.api.types.is_scalar(v):

for k, v in row.items():
    if pd.api.types.is_scalar(v):
        if pd.notna(v):
            clean[k] = _normalize_scalar(v)
        elif na_as_null:
            clean[k] = None
    else:
        clean[k] = v  # pass through lists, dicts, etc.

Fix B: NumPy scalar types not normalized (acknowledged but deferred by author in #98 (comment))

NumPy scalars (np.int64, np.float64, np.bool_) are NOT JSON-serializable by default json.dumps(). DataFrames with integer columns produce np.int64 values. Add a helper function _normalize_scalar(v) that:

  • Converts pd.Timestamp to .isoformat()
  • Converts numpy.integer types to Python int
  • Converts numpy.floating types to Python float
  • Converts numpy.bool_ to Python bool
  • Passes everything else through

Use import numpy as np and isinstance checks.

2. src/PowerPlatform/Dataverse/operations/dataframe.py

Current code at the HEAD of the PR branch:

# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

"""DataFrame CRUD operations namespace for the Dataverse SDK."""

from __future__ import annotations

from typing import List, Optional, TYPE_CHECKING

import pandas as pd

from ..utils._pandas import dataframe_to_records

if TYPE_CHECKING:
    from ..client import DataverseClient


__all__ = ["DataFrameOperations"]


class DataFrameOperations:
    """Namespace for pandas DataFrame CRUD operations.
    ...
    """

    def __init__(self, client: DataverseClient) -> None:
        self._client = client

    def get(self, table, record_id=None, select=None, filter=None, orderby=None, top=None, expand=None, page_size=None) -> pd.DataFrame:
        # ... (current code)
        pass

    def create(self, table, records) -> pd.Series:
        # ... (current code with empty DataFrame check and ID count validation)
        pass

    def update(self, table, changes, id_column, clear_nulls=False) -> None:
        if not isinstance(changes, pd.DataFrame):
            raise TypeError("changes must be a pandas DataFrame")
        if id_column not in changes.columns:
            raise ValueError(f"id_column '{id_column}' not found in DataFrame columns")

        ids = changes[id_column].tolist()
        change_columns = [column for column in changes.columns if column != id_column]
        change_list = dataframe_to_records(changes[change_columns], na_as_null=clear_nulls)

        if len(ids) == 1:
            self._client.records.update(table, ids[0], change_list[0])
        else:
            self._client.records.update(table, ids, change_list)

    def delete(self, table, ids, use_bulk_delete=True) -> Optional[str]:
        if not isinstance(ids, pd.Series):
            raise TypeError("ids must be a pandas Series")

        id_list = ids.tolist()
        if len(id_list) == 1:
            return self._client.records.delete(table, id_list[0])
        else:
            return self._client.records.delete(table, id_list, use_bulk_delete=use_bulk_delete)

Required changes:

Fix C: Validate id_column values in...

This pull request was created from Copilot chat.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

…e/delete IDs, add comprehensive tests

Co-authored-by: saurabhrb <32964911+saurabhrb@users.noreply.github.com>
Copilot AI changed the title [WIP] [PR #98] Address review comments and add tests for dataframe Fix array-like pd.notna crash, normalize NumPy scalars, add update/delete input validation, export DataFrameOperations Mar 17, 2026
Copilot AI requested a review from saurabhrb March 17, 2026 00:24
@saurabhrb saurabhrb marked this pull request as ready for review March 17, 2026 02:12
@saurabhrb saurabhrb requested a review from a team as a code owner March 17, 2026 02:12
@saurabhrb saurabhrb merged commit 4809dca into users/zhaodongwang/dataFrameExtensionClaude Mar 17, 2026
1 check passed
@saurabhrb saurabhrb deleted the copilot/fix-review-comments-and-tests branch March 17, 2026 02:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants