Skip to content

Add client.dataframe namespace for pandas DataFrame CRUD operations#98

Open
zhaodongwang-msft wants to merge 24 commits intomainfrom
users/zhaodongwang/dataFrameExtensionClaude
Open

Add client.dataframe namespace for pandas DataFrame CRUD operations#98
zhaodongwang-msft wants to merge 24 commits intomainfrom
users/zhaodongwang/dataFrameExtensionClaude

Conversation

@zhaodongwang-msft
Copy link
Collaborator

@zhaodongwang-msft zhaodongwang-msft commented Feb 11, 2026

Summary

Adds a client.dataframe namespace with pandas DataFrame/Series wrappers for all CRUD operations. Users can now query, create, update, and delete Dataverse records using DataFrame-native inputs and outputs -- no manual dict conversion required.

Quick Example

import pandas as pd
from azure.identity import InteractiveBrowserCredential
from PowerPlatform.Dataverse.client import DataverseClient

credential = InteractiveBrowserCredential()

with DataverseClient("https://yourorg.crm.dynamics.com", credential) as client:

    # Query records as a DataFrame
    df = client.dataframe.get("account", select=["name", "telephone1"], top=5)
    print(df)
    #              name   telephone1
    # 0    Contoso Ltd    555-0100
    # 1    Fabrikam Inc   555-0200
    # 2    Northwind Co   555-0300
    # 3    Adventure Wks  555-0400
    # 4    Alpine Ski     555-0500

    # Create records from a DataFrame
    new_records = pd.DataFrame([
        {"name": "Acme Corp", "telephone1": "555-9000"},
        {"name": "Globex Inc", "telephone1": "555-9001"},
    ])
    new_records["accountid"] = client.dataframe.create("account", new_records)
    print(new_records["accountid"])
    # 0    a1b2c3d4-...
    # 1    e5f6g7h8-...

    # Update records
    new_records["telephone1"] = ["555-1111", "555-2222"]
    client.dataframe.update("account", new_records[["accountid", "telephone1"]], id_column="accountid")

    # Delete records
    client.dataframe.delete("account", new_records["accountid"])

What's Included

New Files

File Description
src/.../operations/dataframe.py DataFrameOperations class with get(), create(), update(), delete()
src/.../utils/_pandas.py dataframe_to_records() helper -- normalizes NumPy scalars, handles NaN/None, converts Timestamps to ISO strings
examples/advanced/dataframe_operations.py End-to-end walkthrough script
tests/unit/test_dataframe_operations.py 46 unit tests for DataFrameOperations
tests/unit/test_client_dataframe.py 26 unit tests for client integration
tests/unit/test_pandas_helpers.py 18 unit tests for dataframe_to_records() and _normalize_scalar()

Modified Files

File Change
client.py Added self.dataframe = DataFrameOperations(self) namespace
pyproject.toml Added pandas>=2.0.0 as a required dependency
README.md Added DataFrame CRUD usage examples
operations/__init__.py Cleanup (no public exports from operations package)

API Design

All methods live under client.dataframe and delegate to the existing client.records.* methods:

Method Input Output Underlying API
get(table, ...) OData query params pd.DataFrame (all pages consolidated) records.get()
get(table, record_id=...) Single GUID 1-row pd.DataFrame records.get()
create(table, df) pd.DataFrame of records pd.Series of GUIDs records.create() -> CreateMultiple
update(table, df, id_column) pd.DataFrame with ID column None records.update() -> UpdateMultiple
delete(table, ids) pd.Series of GUIDs Optional[str] (job ID) records.delete() -> BulkDelete

Key Design Decisions

  • clear_nulls parameter on update(): By default (False), NaN/None values are skipped (field unchanged on server). Set to True to explicitly send null and clear fields.
  • NumPy scalar normalization: np.int64 -> int, np.float64 -> float, pd.Timestamp -> ISO string. Prevents JSON serialization failures.
  • pandas is a required dependency: Discussed internally -- the DataFrame feature is a core part of the SDK's value proposition.
  • No client-side batch size limit: The Dataverse server enforces its own limits on CreateMultiple/UpdateMultiple. Docstrings recommend splitting very large DataFrames into smaller batches.

Validation

  • 90 new unit tests across 3 test files (all pass)
  • Full test suite: 375 tests pass, 0 failures
  • mypy: clean (no type errors)
  • black: clean (no formatting issues)

@zhaodongwang-msft zhaodongwang-msft requested a review from a team as a code owner February 11, 2026 18:32
Copilot AI review requested due to automatic review settings February 11, 2026 18:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds pandas DataFrame/Series wrappers to the Dataverse Python SDK so callers can perform CRUD operations using DataFrame-native inputs/outputs, plus accompanying docs, examples, and tests.

Changes:

  • Added DataverseClient DataFrame CRUD wrapper methods: get_dataframe, create_dataframe, update_dataframe, delete_dataframe.
  • Added unit tests and end-to-end example demonstrating DataFrame CRUD workflows.
  • Updated docs/README and added pandas as a dependency.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/PowerPlatform/Dataverse/client.py Implements DataFrame CRUD wrapper methods on DataverseClient.
tests/unit/test_client_dataframe.py Adds unit coverage for DataFrame CRUD wrappers.
examples/advanced/dataframe_operations.py Adds a walkthrough script showing DataFrame CRUD usage.
pyproject.toml Adds pandas to project dependencies.
README.md Documents DataFrame CRUD usage examples.
src/PowerPlatform/Dataverse/claude_skill/dataverse-sdk-use/SKILL.md Documents DataFrame CRUD usage in the packaged skill doc.
.claude/skills/dataverse-sdk-use/SKILL.md Documents DataFrame CRUD usage in the repo-local skill doc.
.gitignore Ignores additional Claude local markdown files.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI commented Feb 12, 2026

@zhaodongwang-msft I've opened a new pull request, #99, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Contributor

Copilot AI commented Feb 12, 2026

@zhaodongwang-msft I've opened a new pull request, #100, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 2 commits February 11, 2026 18:52
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@tpellissier-msft
Copy link
Collaborator

We'll want to track 2 followups from this that are dependent on other refactor PRs so we can't do quite yet:

  1. Moving the new client methods to a client.dataframe namespace
  2. adding QueryBuilder.to_dataframe to connect the dataframe scenario to the fluent syntax / querybuilder changes.

…lete input validation, export DataFrameOperations (#145)

Addresses four unresolved review comments from PR #98 against the
`client.dataframe` namespace: a crash on array-valued cells, silent
NumPy serialization failures, missing ID validation in `update()` and
`delete()`, and missing exports/tests.

## `utils/_pandas.py`

- **Fix `pd.notna()` crash on array-like cells**: Guard with
`pd.api.types.is_scalar(v)` before calling `pd.notna()`; non-scalar
values (lists, dicts, numpy arrays) pass through directly. Previously
raised `ValueError: The truth value of an array is ambiguous`.
- **Normalize NumPy scalar types**: New `_normalize_scalar(v)` helper
converts `np.integer` → `int`, `np.floating` → `float`, `np.bool_` →
`bool`, `pd.Timestamp` → ISO string. DataFrames with integer columns
produce `np.int64` by default, which `json.dumps()` cannot serialize.

```python
# Before: would crash or produce non-serializable values
df = pd.DataFrame([{"tags": ["a", "b"]}, {"count": np.int64(5)}])
dataframe_to_records(df)  # ValueError / TypeError at serialization time

# After: safe
[{"tags": ["a", "b"]}, {"count": 5}]
```

## `operations/dataframe.py`

- **`update()` — validate `id_column` values**: After extracting IDs,
raises `ValueError` listing offending row indices if any value is not a
non-empty string (catches `NaN`, `None`, numeric IDs).
- **`update()` — validate non-empty change columns**: Raises
`ValueError` if the DataFrame contains only the `id_column` and no
fields to update.
- **`delete()` — validate `ids` Series**: Returns `None` immediately for
an empty Series; raises `ValueError` listing offending indices for any
non-string or blank value.

## `operations/__init__.py`

- Exports `DataFrameOperations` so consumers can use it for type
annotations.

## Tests

- `tests/unit/test_pandas_helpers.py` — 11 isolated tests for
`dataframe_to_records()` covering NaN handling, NumPy type
normalization, Timestamp conversion, list/dict passthrough, and empty
input.
- `tests/unit/test_dataframe_operations.py` — 35 tests covering the full
`DataFrameOperations` namespace, including all new validation paths.

<!-- START COPILOT ORIGINAL PROMPT -->



<details>

<summary>Original prompt</summary>


## Context

This PR addresses unresolved review comments from PR #98 ("add dataframe
methods") and adds comprehensive test coverage for the DataFrame
operations namespace (`client.dataframe`).

The base branch is `users/zhaodongwang/dataFrameExtensionClaude` which
contains the current state of the DataFrame operations code from PR #98.

## Files to modify

### 1. `src/PowerPlatform/Dataverse/utils/_pandas.py`

Current code at the HEAD of the PR branch
(`8838bb69533dd8830bac8724c44696771a6704e7`):

```python
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

"""Internal pandas helpers"""

from __future__ import annotations

from typing import Any, Dict, List

import pandas as pd


def dataframe_to_records(df: pd.DataFrame, na_as_null: bool = False) -> List[Dict[str, Any]]:
    """Convert a DataFrame to a list of dicts, converting Timestamps to ISO strings.

    :param df: Input DataFrame.
    :param na_as_null: When False (default), missing values are omitted from each dict.
        When True, missing values are included as None (sends null to Dataverse, clearing the field).
    """
    records = []
    for row in df.to_dict(orient="records"):
        clean = {}
        for k, v in row.items():
            if pd.notna(v):
                clean[k] = v.isoformat() if isinstance(v, pd.Timestamp) else v
            elif na_as_null:
                clean[k] = None
        records.append(clean)
    return records
```

**Required changes:**

#### Fix A: `pd.notna()` crash on array-like values (unresolved comment
#98 (comment))

`pd.notna(v)` raises `ValueError: The truth value of an array is
ambiguous` when a cell contains a list, dict, numpy array, etc. Fix by
guarding with `pd.api.types.is_scalar(v)`:

```python
for k, v in row.items():
    if pd.api.types.is_scalar(v):
        if pd.notna(v):
            clean[k] = _normalize_scalar(v)
        elif na_as_null:
            clean[k] = None
    else:
        clean[k] = v  # pass through lists, dicts, etc.
```

#### Fix B: NumPy scalar types not normalized (acknowledged but deferred
by author in
#98 (comment))

NumPy scalars (`np.int64`, `np.float64`, `np.bool_`) are NOT
JSON-serializable by default `json.dumps()`. DataFrames with integer
columns produce `np.int64` values. Add a helper function
`_normalize_scalar(v)` that:
- Converts `pd.Timestamp` to `.isoformat()`
- Converts `numpy.integer` types to Python `int`
- Converts `numpy.floating` types to Python `float`
- Converts `numpy.bool_` to Python `bool`
- Passes everything else through

Use `import numpy as np` and `isinstance` checks.

### 2. `src/PowerPlatform/Dataverse/operations/dataframe.py`

Current code at the HEAD of the PR branch:

```python
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

"""DataFrame CRUD operations namespace for the Dataverse SDK."""

from __future__ import annotations

from typing import List, Optional, TYPE_CHECKING

import pandas as pd

from ..utils._pandas import dataframe_to_records

if TYPE_CHECKING:
    from ..client import DataverseClient


__all__ = ["DataFrameOperations"]


class DataFrameOperations:
    """Namespace for pandas DataFrame CRUD operations.
    ...
    """

    def __init__(self, client: DataverseClient) -> None:
        self._client = client

    def get(self, table, record_id=None, select=None, filter=None, orderby=None, top=None, expand=None, page_size=None) -> pd.DataFrame:
        # ... (current code)
        pass

    def create(self, table, records) -> pd.Series:
        # ... (current code with empty DataFrame check and ID count validation)
        pass

    def update(self, table, changes, id_column, clear_nulls=False) -> None:
        if not isinstance(changes, pd.DataFrame):
            raise TypeError("changes must be a pandas DataFrame")
        if id_column not in changes.columns:
            raise ValueError(f"id_column '{id_column}' not found in DataFrame columns")

        ids = changes[id_column].tolist()
        change_columns = [column for column in changes.columns if column != id_column]
        change_list = dataframe_to_records(changes[change_columns], na_as_null=clear_nulls)

        if len(ids) == 1:
            self._client.records.update(table, ids[0], change_list[0])
        else:
            self._client.records.update(table, ids, change_list)

    def delete(self, table, ids, use_bulk_delete=True) -> Optional[str]:
        if not isinstance(ids, pd.Series):
            raise TypeError("ids must be a pandas Series")

        id_list = ids.tolist()
        if len(id_list) == 1:
            return self._client.records.delete(table, id_list[0])
        else:
            return self._client.records.delete(table, id_list, use_bulk_delete=use_bulk_delete)
```

**Required changes:**

#### Fix C: Validate `id_column` values in...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created from Copilot chat.*
>

<!-- START COPILOT CODING AGENT TIPS -->
---

✨ Let Copilot coding agent [set things up for
you](https://github.com/microsoft/PowerPlatform-DataverseClient-Python/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot)
— coding agent works faster and does higher quality work when set up for
your repo.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: saurabhrb <32964911+saurabhrb@users.noreply.github.com>
Copilot AI and others added 2 commits March 16, 2026 19:29
…_scalar` (#146)

Adds test coverage gaps identified in the PR #98 review: direct tests
for `_normalize_scalar()` and an end-to-end mocked CRUD flow for
`DataFrameOperations`.

## `tests/unit/test_pandas_helpers.py`
- New `TestNormalizeScalar` class (9 tests) directly exercising
`_normalize_scalar()`:
- NumPy types (`np.integer`, `np.floating`, `np.bool_`) → Python natives
  - `pd.Timestamp` → ISO 8601 string
  - Native Python types and `None` pass through unchanged

## `tests/unit/test_dataframe_operations.py`
- New `TestDataFrameEndToEnd` class (2 tests):
  - Full mocked CRUD cycle: `create → get → update → delete`
- Verifies NumPy types are normalized to Python-native values before
reaching the API layer

## Notes
- `filter` parameter kept as-is (consistent with `records.get()` API;
repo convention prohibits `# noqa` suppression)
- `DataFrameOperations` not re-exported from top-level `__init__.py`
(repo convention: package `__init__.py` files use `__all__ = []`)

<!-- START COPILOT ORIGINAL PROMPT -->



<details>

<summary>Original prompt</summary>

## Context

This PR addresses the remaining unresolved review comments from PR #98
(#98)
and adds comprehensive unit tests for the DataFrame operations.

The PR #98 adds DataFrame CRUD wrappers (`client.dataframe.get()`,
`client.dataframe.create()`, `client.dataframe.update()`,
`client.dataframe.delete()`) to the Dataverse Python SDK. The author has
addressed many review comments but several remain unresolved.

## Current State of the Code

The branch `users/zhaodongwang/dataFrameExtensionClaude` has the latest
code. Key files:

### `src/PowerPlatform/Dataverse/utils/_pandas.py` (current)
```python
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

"""Internal pandas helpers"""

from __future__ import annotations

from typing import Any, Dict, List

import numpy as np
import pandas as pd


def _normalize_scalar(v: Any) -> Any:
    """Convert numpy scalar types to their Python native equivalents."""
    if isinstance(v, pd.Timestamp):
        return v.isoformat()
    if isinstance(v, np.integer):
        return int(v)
    if isinstance(v, np.floating):
        return float(v)
    if isinstance(v, np.bool_):
        return bool(v)
    return v


def dataframe_to_records(df: pd.DataFrame, na_as_null: bool = False) -> List[Dict[str, Any]]:
    """Convert a DataFrame to a list of dicts, normalizing values for JSON serialization."""
    records = []
    for row in df.to_dict(orient="records"):
        clean = {}
        for k, v in row.items():
            if pd.api.types.is_scalar(v):
                if pd.notna(v):
                    clean[k] = _normalize_scalar(v)
                elif na_as_null:
                    clean[k] = None
            else:
                clean[k] = v
        records.append(clean)
    return records
```

### `src/PowerPlatform/Dataverse/operations/dataframe.py` (current - 305
lines)
The `DataFrameOperations` class provides get/create/update/delete
methods. Key points:
- `get()` returns a single consolidated DataFrame (iterates all pages
internally)
- `create()` validates non-empty, validates ID count matches
- `update()` validates id_column exists, validates IDs are non-empty
strings, validates at least one change column exists; has `clear_nulls`
parameter
- `delete()` validates ids is Series, validates IDs are non-empty
strings, special-cases single ID

### `src/PowerPlatform/Dataverse/operations/__init__.py` (current)
```python
from .dataframe import DataFrameOperations
__all__ = ["DataFrameOperations"]
```

### `src/PowerPlatform/Dataverse/__init__.py` (current)
```python
from importlib.metadata import version
__version__ = version("PowerPlatform-Dataverse-Client")
__all__ = ["__version__"]
```

### `src/PowerPlatform/Dataverse/client.py` (current)
Already imports and exposes `DataFrameOperations` as `self.dataframe`.

## Issues to Fix

### 1. `filter` parameter shadows Python built-in (item #8)
In `dataframe.py` `get()` method, the parameter `filter` shadows the
Python built-in `filter()`. Since this mirrors the existing
`records.get()` API which also uses `filter`, renaming is risky for API
consistency. The safe fix is to add a `# noqa: A002` comment on the
parameter and leave it as-is for API consistency (the base
`records.get()` already uses `filter`). Alternatively, rename to
`filter_expr` with an alias for backward compatibility. **Decision: keep
`filter` for API consistency with existing `records.get()`, but suppress
the lint warning.**

### 2. Missing `__init__.py` export for `DataFrameOperations` (item #9)
The `operations/__init__.py` already exports `DataFrameOperations`.
However, the top-level `src/PowerPlatform/Dataverse/__init__.py` does
NOT export it. Add the export there so users can do `from
PowerPlatform.Dataverse import DataFrameOperations` if needed.

### 3. Comprehensive unit tests (item #10)
The existing `tests/unit/test_client_dataframe.py` has 365 lines of
tests. We need to add MORE tests to ensure full coverage. Specifically
add tests for:

**Unit tests for `_pandas.py` helpers:**
- `_normalize_scalar` with np.int64, np.float64, np.bool_, pd.Timestamp,
regular Python types
- `dataframe_to_records` with NaN handling (na_as_null=True vs False)
- `dataframe_to_records` with Timestamp conversion
- `dataframe_to_records` with non-scalar values (lists, dicts in cells)
- `dataframe_to_records` with numpy scalar types in DataFrame
- `dataframe_to_records` with empty DataFrame
- `dataframe_to_records` with mixed types

**Unit tests for `DataFrameOperations`:**
- `get()` single record
- `get()` multi-page results concatenated
- `get()` empty results
- `get()` with all parameters passed through
- `create()` with valid DataFrame
- `create()` with empty DataFrame (should raise ValueError)
- `create()` with non-DataFrame input (should raise TypeError)
- `create()` ID count mismatch (should raise ValueError)
- `update()` with valid DataFrame
- `update()` single record path
- `...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created from Copilot chat.*
>

<!-- START COPILOT CODING AGENT TIPS -->
---

🔒 GitHub Advanced Security automatically protects Copilot coding agent
pull requests. You can protect all pull requests by enabling Advanced
Security for your repositories. [Learn more about Advanced
Security.](https://gh.io/cca-advanced-security)

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: saurabhrb <32964911+saurabhrb@users.noreply.github.com>
@saurabhrb saurabhrb changed the title add dataframe methods Add client.dataframe namespace for pandas DataFrame CRUD operations Mar 17, 2026
@saurabhrb
Copy link
Contributor

@tpellissier-msft update on the two followups:

  1. Moving to client.dataframe namespace -- Done (commit a22832e). All DataFrame methods now live in \DataFrameOperations\ accessed via \client.dataframe.get(), \client.dataframe.create(), \client.dataframe.update(), \client.dataframe.delete().

  2. QueryBuilder.to_dataframe() -- Still a followup. This depends on the QueryBuilder/fluent syntax refactor. Filed as a tracking issue to pick up once that refactor lands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants