Seeking Schema/Sample JSONs for Copilot Usage Metrics Exports: Documentation Gap on Data Hierarchy #183055
-
Select Topic AreaQuestion Copilot Feature AreaGeneral BodyHi everyone, I am currently analyzing the GitHub Copilot Usage Metrics API (version 2022-11-28) to assess data quality and attribution depth. Since this API provides download links to JSON files instead of direct responses, there is a lack of documentation regarding the actual data hierarchy within these exports. While the 'exported keys' are listed in the documentation, the nesting logic and specific data types (especially for breakdowns like IDEs, features, or models) are not transparent. My questions:
If no mocks or samples are available, I’d like to understand the practical path to success for this analysis:
I am trying to understand the granularity and internal structure of the data before moving into technical implementation. Any shared samples, community-maintained mocks, or technical insights would be greatly appreciated. Best regards, |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
|
Hi Onur, This is a common pain point. Because the Copilot Usage Metrics API follows an asynchronous pattern (requesting the endpoint returns a 200 with the raw data, or occasionally a download link depending on the API version and volume), the schema is often opaque until you actually have data flowing. Here is the breakdown of the JSON structure, the schema validation options, and the subscription requirements you asked for.
The data is typically returned as an Array of Objects, where each object represents a single day of aggregated metrics. Here is a high-fidelity dummy sample of what the Organization and Enterprise usage metrics JSON looks like. Note the nesting of the breakdown field, which contains the specific language/editor combinations. code Key Data nuances: Privacy Aggregation: If the count for a specific breakdown (e.g., "Ruby in Vim") is too low, GitHub may exclude it or group it to protect user privacy, though this usually happens at the user-attribution level rather than the org-wide aggregate level. API Versioning: The 2022-11-28 version is the standard REST API version, but the specific fields (like chat_turns) were added incrementally. Ensure your parser handles optional fields/nulls.
Since there is no "Sandbox" or "Mock Server" provided by GitHub for this specific payload: OpenAPI Spec: GitHub publishes their OpenAPI descriptions in the github/rest-api-description repository. However, for the Copilot Usage endpoint, the spec often describes the response structure loosely. It does not strictly define every possible permutation of the breakdown object in a way that generates a perfect mock automatically. Recommendation: Use the JSON sample above to generate a JSON Schema (using a tool like quicktype.io or an online JSON-to-Schema converter). You can then use this schema to validate your ingestion pipeline without needing a live API connection.
To answer your path to success: Q: Is a full GitHub Enterprise account strictly required? There are two levels of the API: Organization Usage: GET /orgs/{org}/copilot/usage Requires: GitHub Copilot Business (GCB) subscription. Access: Organization Owners. This gives you the metrics for that specific organization. Enterprise Usage: GET /enterprises/{enterprise}/copilot/usage Requires: GitHub Copilot Enterprise (or GCB rolled up to an Enterprise account). Access: Enterprise Owners. Q: Are there developer programs/eval licenses? The Practical Path to Success (Lowest Cost): Create a GitHub Organization (Free tier for the Org itself). Enable Copilot Business for that Organization. This is priced at $19/user/month. Add one user (yourself) to the Copilot seat list. Wait 24-48 hours. The usage metrics pipeline is not real-time; it is a daily batch process. You need to generate some activity (code in VS Code, use Chat) to populate the data. Call the API: You can now query /orgs/{your-org}/copilot/usage to get the real JSON export. This approach allows you to build and test your implementation for roughly $19 (one month of one seat), rather than requiring a full Enterprise contract. Summary of Keys for Parsing When building your parser, ensure you account for: total_suggestions_count vs total_acceptances_count (The Acceptance Rate calculation). breakdown is an array that mixes language and editor. Fields like total_chat_turns are newer and may appear as 0 or null in older data ranges if you request historical data. Hope this helps you move forward with your implementation! |
Beta Was this translation helpful? Give feedback.
-
|
Adding to this, the documented https://docs.github.com/en/copilot/reference/copilot-usage-metrics/copilot-usage-metrics is missing large portions of the schema returned in the user level usage data reports In fact, I can find zero dedicated documentation or descriptions of all the fields returned anywhere. This has become a major issue as we attempt to integrate the APIs into our processes |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
https://docs.github.com/en/enterprise-cloud@latest/copilot/reference/copilot-usage-metrics/example-schema