Conversation
matt-aitken
commented
Mar 5, 2026
- Added versions filtering on the Errors list and page
- Added errors stacked bars to the graph on the individual error page
|
WalkthroughAdds version-aware error grouping and alerting across frontend, backend, and DB. New UI: LogsVersionFilter and exported VersionsDropdown, status filters, chart legend styling, and list/status badges. Backend presenters (ErrorGroupPresenter, ErrorsListPresenter) now accept and return versions and state; introduces ErrorGroupState, ErrorGroupActions service, deliver/evaluate alert services (DeliverErrorGroupAlertService, ErrorAlertEvaluator), alerts worker tasks, and email template for error alerts. ClickHouse query builders for per-version aggregations and "since" queries were added. Prisma schema and SQL migrations create ErrorGroupState and add project alert config fields. Estimated code review effort🎯 5 (Critical) | ⏱️ ~150 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can use TruffleHog to scan for secrets in your code with verification capabilities.Add a TruffleHog config file (e.g. trufflehog-config.yml, trufflehog.yml) to your project to customize detectors and scanning behavior. The tool runs only when a config file is present. |
There was a problem hiding this comment.
🚩 Errors list sparklines don't reflect the version filter
When a user filters errors by version on the list page (errors._index/route.tsx), the occurrencesPromise at line 120-134 chains from the filtered listPromise but calls presenter.getOccurrences() without passing the versions filter. This means the sparkline activity graphs shown in each error row will display occurrences across ALL versions, even when the list itself is filtered to specific versions. This may be intentional (showing full activity context) but could confuse users who expect the sparklines to match the filtered view.
(Refers to lines 120-134)
Was this helpful? React with 👍 or 👎 to provide feedback.
| const data = buckets.map((epoch) => { | ||
| const point: Record<string, number | Date> = { date: new Date(epoch * 1000) }; | ||
| for (const version of sortedVersions) { | ||
| point[version] = byBucketVersion.get(`${epoch}:${version}`) ?? 0; | ||
| } | ||
| return point; | ||
| }); |
There was a problem hiding this comment.
🟡 Version string used as Record key can collide with reserved date key, breaking the chart
In getOccurrences, version strings are used as dynamic keys in the same Record<string, number | Date> that already has a date key (apps/webapp/app/presenters/v3/ErrorGroupPresenter.server.ts:214-217). If a task_version value is literally "date" (or empty-string mapped to "unknown" which is safe, but "date" is not guarded), the version count (a number) overwrites the date field (a Date). Downstream in ActivityChart (apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.errors.$fingerprint/route.tsx:428), d.date instanceof Date would be false and new Date(smallNumber).getTime() would produce a nonsensical timestamp (epoch 1970), completely breaking the chart's x-axis. The same collision can occur with "__timestamp" added at route.tsx:427.
Prompt for agents
In apps/webapp/app/presenters/v3/ErrorGroupPresenter.server.ts, the getOccurrences method (lines 213-219) uses version strings as keys in a Record that also has a "date" key. To prevent collisions, either:
1. Prefix version keys with a safe namespace (e.g., "v:" prefix) and update the chart consumer to strip the prefix when displaying labels, OR
2. Restructure the data to use a separate nested object for version counts, e.g.:
{ date: Date, versions: Record<string, number> }
and update the ChartRoot consumer in the fingerprint route (route.tsx lines 475-496) to match.
Also in apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.errors.$fingerprint/route.tsx line 427, the __timestamp key added via spread could similarly collide if a version were named "__timestamp". Guard against both reserved keys ("date" and "__timestamp").
Was this helpful? React with 👍 or 👎 to provide feedback.
cd9b5ea to
e609e8b
Compare
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.alerts/route.tsx (1)
560-574:⚠️ Potential issue | 🔴 CriticalMissing
ERROR_GROUPcase inalertTypeTitlewill cause runtime error.The
ProjectAlertTypeenum now includesERROR_GROUP(added in this PR's schema changes), butalertTypeTitledoesn't handle it. When an alert channel withERROR_GROUPtype is displayed in the table (line 237), this will throw an error.🐛 Proposed fix
export function alertTypeTitle(alertType: ProjectAlertType): string { switch (alertType) { case "TASK_RUN": return "Task run failure"; case "TASK_RUN_ATTEMPT": return "Task attempt failure"; case "DEPLOYMENT_FAILURE": return "Deployment failure"; case "DEPLOYMENT_SUCCESS": return "Deployment success"; + case "ERROR_GROUP": + return "Error group"; default: { throw new Error(`Unknown alertType: ${alertType}`); } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/webapp/app/routes/_app.orgs`.$organizationSlug.projects.$projectParam.env.$envParam.alerts/route.tsx around lines 560 - 574, The switch in alertTypeTitle(ProjectAlertType) is missing a handler for the new "ERROR_GROUP" enum value and will throw for that case; update the alertTypeTitle function to add a case for "ERROR_GROUP" (e.g., case "ERROR_GROUP": return "Error group";) so all ProjectAlertType variants are handled and the default throw is avoided when ERROR_GROUP alerts are rendered (used elsewhere where alerts are displayed).apps/webapp/app/presenters/v3/ErrorGroupPresenter.server.ts (1)
64-85:⚠️ Potential issue | 🔴 Critical
getSummary()declaresErrorGroupSummaryas return type but returns an incomplete object withoutstate.The
getSummary()method at line 249 declares a return type ofPromise<ErrorGroupSummary | undefined>, whereErrorGroupSummary.stateis required. However, the method returns an object (lines 276–285) that omits thestatefield entirely. While thecall()method does assignstateafter retrieval (line 149), this violates the type contract ofgetSummary(). Either makestateoptional in the type definition, includestateingetSummary()'s return object, or updategetSummary()'s return type to reflect the incomplete object it actually returns.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/webapp/app/presenters/v3/ErrorGroupPresenter.server.ts` around lines 64 - 85, getSummary() claims to return ErrorGroupSummary but omits the required state field; update getSummary() to include a state property in its returned object (either populated from the same data source used by call() or a sensible default with fields like status, resolvedAt, resolvedInVersion, resolvedBy, ignoredUntil, ignoredReason, ignoredByUserId, ignoredUntilOccurrenceRate, ignoredUntilTotalOccurrences) so the returned object matches the ErrorGroupSummary type, and remove any subsequent reassignment in call() that assumes state is missing.
🧹 Nitpick comments (4)
internal-packages/clickhouse/src/index.ts (1)
248-263: Add@crumbstracing annotations for the new error-query paths.Please add crumb comments around the newly added
errorsquery builder exposure to align with repository tracing standards.As per coding guidelines,
**/*.{ts,tsx,js}: “Add crumbs as you write code using //@crumbscomments or //#region@crumbsblocks for agentcrumbs debug tracing”.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal-packages/clickhouse/src/index.ts` around lines 248 - 263, The new errors getter (get errors) that exposes multiple query-builder helpers (getGroups, getInstances, getHourlyOccurrences, affectedVersionsQueryBuilder, listQueryBuilder, occurrencesListQueryBuilder, createOccurrencesQueryBuilder, createOccurrencesByVersionQueryBuilder, occurrenceCountSinceQueryBuilder, activeErrorsSinceQueryBuilder, occurrenceCountsSinceQueryBuilder) needs repository-standard tracing crumbs; wrap the errors block with // `@crumbs` comments (or a // `#region` `@crumbs` ... // `#endregion` `@crumbs` block) immediately before and after the getter so agentcrumbs can trace calls to these functions, ensuring the comments surround the get errors { ... } block and reference the same function names for clarity.apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.errors.$fingerprint/route.tsx (1)
630-632: Consider a more reliable approach for closing the dialog after submission.Using
setTimeout(onClose, 100)is fragile—if the submission takes longer than 100ms to start, the dialog may close before the request begins. Consider using the fetcher's state to close the dialog when submission completes:♻️ Suggested alternative
// In CustomIgnoreForm, use useEffect to close on success: import { useEffect } from "react"; // ... useEffect(() => { if (fetcher.state === "idle" && fetcher.data?.ok) { onClose(); } }, [fetcher.state, fetcher.data, onClose]); // Remove setTimeout from onSubmitOr simply close immediately on submit and let the optimistic UI handle it if that's the intended behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/webapp/app/routes/_app.orgs`.$organizationSlug.projects.$projectParam.env.$envParam.errors.$fingerprint/route.tsx around lines 630 - 632, The current onSubmit handler uses setTimeout(onClose, 100) which is fragile; remove the setTimeout call in the onSubmit prop of CustomIgnoreForm and instead close the dialog when the fetcher completes by adding a useEffect in the CustomIgnoreForm component that watches fetcher.state and fetcher.data (e.g., useEffect(() => { if (fetcher.state === "idle" && fetcher.data?.ok) onClose(); }, [fetcher.state, fetcher.data, onClose])); alternatively, if you intend optimistic UI, call onClose() immediately in onSubmit and omit the timeout.apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts (1)
30-46: Prefer a type alias for the payload shape.This new model should use
typeinstead ofinterfaceto stay aligned with the repo's TypeScript rules.As per coding guidelines,
**/*.{ts,tsx}: Use types over interfaces for TypeScript.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts` around lines 30 - 46, Replace the interface declaration ErrorAlertPayload with a type alias using the same shape (e.g., type ErrorAlertPayload = { ... }); keep the property names and nested error object unchanged (including channelId, projectId, classification, and the error fields like fingerprint, environmentId, environmentName, taskIdentifier, errorType, errorMessage, sampleStackTrace, firstSeen, lastSeen, occurrenceCount) so all references to ErrorAlertPayload and ErrorAlertClassification continue to work; update any imports/exports if necessary to reflect the type alias.apps/webapp/app/v3/services/alerts/errorAlertEvaluator.server.ts (1)
16-26: Prefer type aliases for these internal models.
AlertableErrorandResolvedEnvironmentshould usetypealiases here to match the repo's TypeScript convention.As per coding guidelines,
**/*.{ts,tsx}: Use types over interfaces for TypeScript.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/webapp/app/v3/services/alerts/errorAlertEvaluator.server.ts` around lines 16 - 26, Replace the two internal interfaces with type aliases to match repo convention: convert interface AlertableError and interface ResolvedEnvironment into type AlertableError = { ... } and type ResolvedEnvironment = { ... }, keeping the exact same property names and types (classification, error, environmentName) and (id, type, displayName) respectively; update any local references if needed but do not change the shape or visibility.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/webapp/app/presenters/v3/ErrorGroupPresenter.server.ts`:
- Around line 130-145: getState is being called without the task identifier so
it can return the wrong ErrorGroupState when the same fingerprint exists under
multiple tasks; stop fetching summary and state in parallel and instead first
await this.getSummary(...) to read the taskIdentifier, then call
this.getState(environmentId, taskIdentifier, fingerprint) (or otherwise pass the
task identifier into getState) when you run the remaining Promise.all for
getAffectedVersions, getRunList and getState so state lookup is disambiguated by
taskIdentifier.
In `@apps/webapp/app/v3/services/alerts/createAlertChannel.server.ts`:
- Around line 104-112: The helper `#scheduleErrorAlertEvaluation` currently
enqueues a job using the same id pattern "evaluateErrorAlerts:${projectId}"
that's used by the evaluator's delayed self-chain, causing deduplication/no-op
if a future run is already queued; change the id for this immediate kick to be
unique (for example append a timestamp or a "now" suffix like
"evaluateErrorAlerts:${projectId}:now:${Date.now()}") so alertsWorker.enqueue
always creates a distinct job when triggered immediately after channel
creation/edit; keep the job name "v3.evaluateErrorAlerts" and payload identical,
only alter the id generation logic in `#scheduleErrorAlertEvaluation`.
In `@apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts`:
- Around line 239-243: The code logs and throws the raw webhook URL
(webhookProperties.data.url) from the channel object (type
ProjectAlertChannelType), which can leak embedded credentials; update
deliverErrorGroupAlert.server.ts to sanitize/redact webhookProperties.data.url
before any logging or including it in thrown errors (e.g., replace
host/path/query credentials with "[REDACTED]" or strip auth/query params) and
ensure any thrown Error messages or processLogger entries use the redacted
value; apply the same change for the other occurrences around the webhook
handling block (also referenced at the second occurrence near lines 310-316) so
no raw webhook URL is emitted.
In `@apps/webapp/app/v3/services/alerts/errorAlertEvaluator.server.ts`:
- Around line 178-197: The total-occurrence unignore check uses
context.occurrencesSince (window-only) so a threshold like
ignoredUntilTotalOccurrences won’t fire across multiple evaluation windows;
change the logic to compute the cumulative occurrences since the alert was
ignored and compare that to state.ignoredUntilTotalOccurrences. Specifically,
replace the context.occurrencesSince check with a call that gets counts since
state.ignoredAt (or otherwise compute occurrences since state.ignoredAt using
getOccurrenceCountsSince/your occurrence-count helper) and use that cumulative
count in the if that references state.ignoredUntilTotalOccurrences and
state.ignoredAt.
- Around line 102-125: The loop enqueues alerts to every envChannels entry on
every evaluator run, ignoring per-channel intervals; inside the for (const
channel of envChannels) loop (before calling alertsWorker.enqueue) add a guard
that respects the channel's configured interval by comparing it to the
evaluator's current run interval (e.g., currentRunIntervalMinutes or
evaluationIntervalMinutes): only enqueue when the current run aligns with the
channel interval (for example channel.intervalMinutes ===
currentRunIntervalMinutes or channel.intervalMinutes % currentRunIntervalMinutes
=== 0), otherwise continue; update any related loops mentioned (the other
occurrences around the same pattern at the referenced ranges) to use the same
interval-check guard.
- Around line 30-35: The constructor for ErrorAlertEvaluator has its _prisma and
_replica defaults reversed causing writes (e.g., updateErrorGroupStates) to go
to the read-only client; swap the defaults so _prisma defaults to prisma and
_replica defaults to $replica. Update the constructor parameter
order/assignments for ErrorAlertEvaluator accordingly so any write operations
use the primary Prisma client (_prisma = prisma) and reads use the replica
(_replica = $replica).
- Around line 372-387: Remove the stray comma-expression "await this, ..." so
the Prisma update is actually awaited: locate the call to
this._prisma.errorGroupState.update (the block that sets status: "UNRESOLVED"
and clears ignored/resolved fields) and change the statement to await the update
promise directly (i.e., await this._prisma.errorGroupState.update(...)) so the
database write completes before continuing (e.g., before calling selfChain()).
In `@internal-packages/emails/emails/alert-error-group.tsx`:
- Line 58: Change the default export of the component function Email to a named
export (use "export function Email(...)" instead of "export default function
Email(...)") and then update the corresponding import in the emails index module
to use a named import (e.g. import { Email } from '...') or rename on import if
needed; also update any usages that relied on the default import to reference
the named export.
---
Outside diff comments:
In `@apps/webapp/app/presenters/v3/ErrorGroupPresenter.server.ts`:
- Around line 64-85: getSummary() claims to return ErrorGroupSummary but omits
the required state field; update getSummary() to include a state property in its
returned object (either populated from the same data source used by call() or a
sensible default with fields like status, resolvedAt, resolvedInVersion,
resolvedBy, ignoredUntil, ignoredReason, ignoredByUserId,
ignoredUntilOccurrenceRate, ignoredUntilTotalOccurrences) so the returned object
matches the ErrorGroupSummary type, and remove any subsequent reassignment in
call() that assumes state is missing.
In
`@apps/webapp/app/routes/_app.orgs`.$organizationSlug.projects.$projectParam.env.$envParam.alerts/route.tsx:
- Around line 560-574: The switch in alertTypeTitle(ProjectAlertType) is missing
a handler for the new "ERROR_GROUP" enum value and will throw for that case;
update the alertTypeTitle function to add a case for "ERROR_GROUP" (e.g., case
"ERROR_GROUP": return "Error group";) so all ProjectAlertType variants are
handled and the default throw is avoided when ERROR_GROUP alerts are rendered
(used elsewhere where alerts are displayed).
---
Nitpick comments:
In
`@apps/webapp/app/routes/_app.orgs`.$organizationSlug.projects.$projectParam.env.$envParam.errors.$fingerprint/route.tsx:
- Around line 630-632: The current onSubmit handler uses setTimeout(onClose,
100) which is fragile; remove the setTimeout call in the onSubmit prop of
CustomIgnoreForm and instead close the dialog when the fetcher completes by
adding a useEffect in the CustomIgnoreForm component that watches fetcher.state
and fetcher.data (e.g., useEffect(() => { if (fetcher.state === "idle" &&
fetcher.data?.ok) onClose(); }, [fetcher.state, fetcher.data, onClose]));
alternatively, if you intend optimistic UI, call onClose() immediately in
onSubmit and omit the timeout.
In `@apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts`:
- Around line 30-46: Replace the interface declaration ErrorAlertPayload with a
type alias using the same shape (e.g., type ErrorAlertPayload = { ... }); keep
the property names and nested error object unchanged (including channelId,
projectId, classification, and the error fields like fingerprint, environmentId,
environmentName, taskIdentifier, errorType, errorMessage, sampleStackTrace,
firstSeen, lastSeen, occurrenceCount) so all references to ErrorAlertPayload and
ErrorAlertClassification continue to work; update any imports/exports if
necessary to reflect the type alias.
In `@apps/webapp/app/v3/services/alerts/errorAlertEvaluator.server.ts`:
- Around line 16-26: Replace the two internal interfaces with type aliases to
match repo convention: convert interface AlertableError and interface
ResolvedEnvironment into type AlertableError = { ... } and type
ResolvedEnvironment = { ... }, keeping the exact same property names and types
(classification, error, environmentName) and (id, type, displayName)
respectively; update any local references if needed but do not change the shape
or visibility.
In `@internal-packages/clickhouse/src/index.ts`:
- Around line 248-263: The new errors getter (get errors) that exposes multiple
query-builder helpers (getGroups, getInstances, getHourlyOccurrences,
affectedVersionsQueryBuilder, listQueryBuilder, occurrencesListQueryBuilder,
createOccurrencesQueryBuilder, createOccurrencesByVersionQueryBuilder,
occurrenceCountSinceQueryBuilder, activeErrorsSinceQueryBuilder,
occurrenceCountsSinceQueryBuilder) needs repository-standard tracing crumbs;
wrap the errors block with // `@crumbs` comments (or a // `#region` `@crumbs` ... //
`#endregion` `@crumbs` block) immediately before and after the getter so agentcrumbs
can trace calls to these functions, ensuring the comments surround the get
errors { ... } block and reference the same function names for clarity.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: e7e10a3c-d2f2-429c-85bb-c4d210930768
📒 Files selected for processing (21)
apps/webapp/app/components/logs/LogsVersionFilter.tsxapps/webapp/app/components/primitives/charts/ChartRoot.tsxapps/webapp/app/components/runs/v3/RunFilters.tsxapps/webapp/app/models/projectAlert.server.tsapps/webapp/app/presenters/v3/ErrorGroupPresenter.server.tsapps/webapp/app/presenters/v3/ErrorsListPresenter.server.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.alerts/route.tsxapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.errors.$fingerprint/route.tsxapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.errors._index/route.tsxapps/webapp/app/v3/alertsWorker.server.tsapps/webapp/app/v3/services/alerts/createAlertChannel.server.tsapps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.tsapps/webapp/app/v3/services/alerts/errorAlertEvaluator.server.tsapps/webapp/app/v3/services/errorGroupActions.server.tsinternal-packages/clickhouse/src/errors.tsinternal-packages/clickhouse/src/index.tsinternal-packages/database/prisma/migrations/20260306102053_error_group_state/migration.sqlinternal-packages/database/prisma/migrations/20260308181657_add_error_alert_config_to_project_alert_channel/migration.sqlinternal-packages/database/prisma/schema.prismainternal-packages/emails/emails/alert-error-group.tsxinternal-packages/emails/src/index.tsx
🚧 Files skipped from review as they are similar to previous changes (4)
- apps/webapp/app/components/logs/LogsVersionFilter.tsx
- apps/webapp/app/components/runs/v3/RunFilters.tsx
- apps/webapp/app/components/primitives/charts/ChartRoot.tsx
- internal-packages/clickhouse/src/errors.ts
| const [summary, affectedVersions, runList, stateRow] = await Promise.all([ | ||
| this.getSummary(organizationId, projectId, environmentId, fingerprint), | ||
| this.getAffectedVersions(organizationId, projectId, environmentId, fingerprint), | ||
| this.getRunList(organizationId, environmentId, { | ||
| userId, | ||
| projectId, | ||
| fingerprint, | ||
| versions, | ||
| pageSize: runsPageSize, | ||
| from: time.from.getTime(), | ||
| to: time.to.getTime(), | ||
| cursor, | ||
| direction, | ||
| }), | ||
| this.getState(environmentId, fingerprint), | ||
| ]); |
There was a problem hiding this comment.
State lookup needs the task identifier.
ErrorGroupState is keyed by environment + taskIdentifier + fingerprint elsewhere in this PR, but this query only filters by environment and fingerprint. If the same fingerprint exists under multiple tasks, the page can show the wrong resolved/ignored state; fetching summary and stateRow in the same Promise.all also prevents you from disambiguating it.
Also applies to: 318-340
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/webapp/app/presenters/v3/ErrorGroupPresenter.server.ts` around lines 130
- 145, getState is being called without the task identifier so it can return the
wrong ErrorGroupState when the same fingerprint exists under multiple tasks;
stop fetching summary and state in parallel and instead first await
this.getSummary(...) to read the taskIdentifier, then call
this.getState(environmentId, taskIdentifier, fingerprint) (or otherwise pass the
task identifier into getState) when you run the remaining Promise.all for
getAffectedVersions, getRunList and getState so state lookup is disambiguated by
taskIdentifier.
| async #scheduleErrorAlertEvaluation(projectId: string): Promise<void> { | ||
| await alertsWorker.enqueue({ | ||
| id: `evaluateErrorAlerts:${projectId}`, | ||
| job: "v3.evaluateErrorAlerts", | ||
| payload: { | ||
| projectId, | ||
| scheduledAt: Date.now(), | ||
| }, | ||
| }); |
There was a problem hiding this comment.
Don't reuse the periodic job id for the immediate kick.
This helper uses the same evaluateErrorAlerts:${projectId} id that the evaluator's delayed self-chain uses. If a future run is already queued, this enqueue can become a no-op, so newly enabled or edited ERROR_GROUP channels may not be evaluated until the old availableAt.
Suggested fix
async `#scheduleErrorAlertEvaluation`(projectId: string): Promise<void> {
await alertsWorker.enqueue({
- id: `evaluateErrorAlerts:${projectId}`,
+ id: `evaluateErrorAlerts:manual:${projectId}:${Date.now()}`,
job: "v3.evaluateErrorAlerts",
payload: {
projectId,
scheduledAt: Date.now(),
},
});
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async #scheduleErrorAlertEvaluation(projectId: string): Promise<void> { | |
| await alertsWorker.enqueue({ | |
| id: `evaluateErrorAlerts:${projectId}`, | |
| job: "v3.evaluateErrorAlerts", | |
| payload: { | |
| projectId, | |
| scheduledAt: Date.now(), | |
| }, | |
| }); | |
| async `#scheduleErrorAlertEvaluation`(projectId: string): Promise<void> { | |
| await alertsWorker.enqueue({ | |
| id: `evaluateErrorAlerts:manual:${projectId}:${Date.now()}`, | |
| job: "v3.evaluateErrorAlerts", | |
| payload: { | |
| projectId, | |
| scheduledAt: Date.now(), | |
| }, | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/webapp/app/v3/services/alerts/createAlertChannel.server.ts` around lines
104 - 112, The helper `#scheduleErrorAlertEvaluation` currently enqueues a job
using the same id pattern "evaluateErrorAlerts:${projectId}" that's used by the
evaluator's delayed self-chain, causing deduplication/no-op if a future run is
already queued; change the id for this immediate kick to be unique (for example
append a timestamp or a "now" suffix like
"evaluateErrorAlerts:${projectId}:now:${Date.now()}") so alertsWorker.enqueue
always creates a distinct job when triggered immediately after channel
creation/edit; keep the job name "v3.evaluateErrorAlerts" and payload identical,
only alter the id generation logic in `#scheduleErrorAlertEvaluation`.
| channel: { | ||
| type: ProjectAlertChannelType; | ||
| properties: unknown; | ||
| project: { id: string; externalRef: string; slug: string; name: string; organizationId: string; organization: { slug: string; title: string } }; | ||
| }, |
There was a problem hiding this comment.
Redact webhook URLs from logs and thrown errors.
Webhook URLs commonly embed credentials in the path or query string. Logging webhookProperties.data.url and including it in the thrown error will leak those secrets into worker logs and retry metadata.
Suggested fix
async `#sendWebhook`(
channel: {
+ id: string;
type: ProjectAlertChannelType;
properties: unknown;
project: { id: string; externalRef: string; slug: string; name: string; organizationId: string; organization: { slug: string; title: string } };
},
@@
if (!response.ok) {
logger.info("[DeliverErrorGroupAlert] Failed to send webhook", {
status: response.status,
statusText: response.statusText,
- url: webhookProperties.data.url,
+ channelId: channel.id,
});
- throw new Error(`Failed to send error group alert webhook to ${webhookProperties.data.url}`);
+ throw new Error("Failed to send error group alert webhook");
}Also applies to: 310-316
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts` around
lines 239 - 243, The code logs and throws the raw webhook URL
(webhookProperties.data.url) from the channel object (type
ProjectAlertChannelType), which can leak embedded credentials; update
deliverErrorGroupAlert.server.ts to sanitize/redact webhookProperties.data.url
before any logging or including it in thrown errors (e.g., replace
host/path/query credentials with "[REDACTED]" or strip auth/query params) and
ensure any thrown Error messages or processLogger entries use the redacted
value; apply the same change for the other occurrences around the webhook
handling block (also referenced at the second occurrence near lines 310-316) so
no raw webhook URL is emitted.
| export class ErrorAlertEvaluator { | ||
| constructor( | ||
| protected readonly _prisma: PrismaClientOrTransaction = $replica, | ||
| protected readonly _replica: PrismaClientOrTransaction = prisma, | ||
| protected readonly _clickhouse: ClickHouse = clickhouseClient | ||
| ) {} |
There was a problem hiding this comment.
Swap the primary and replica defaults here.
_prisma currently defaults to $replica and _replica defaults to prisma, so updateErrorGroupStates() writes through the read client while the reads hit the primary. On replica-backed deployments that will either fail outright or leave alert state changes unpersisted.
Suggested fix
constructor(
- protected readonly _prisma: PrismaClientOrTransaction = $replica,
- protected readonly _replica: PrismaClientOrTransaction = prisma,
+ protected readonly _prisma: PrismaClientOrTransaction = prisma,
+ protected readonly _replica: PrismaClientOrTransaction = $replica,
protected readonly _clickhouse: ClickHouse = clickhouseClient
) {}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/webapp/app/v3/services/alerts/errorAlertEvaluator.server.ts` around
lines 30 - 35, The constructor for ErrorAlertEvaluator has its _prisma and
_replica defaults reversed causing writes (e.g., updateErrorGroupStates) to go
to the read-only client; swap the defaults so _prisma defaults to prisma and
_replica defaults to $replica. Update the constructor parameter
order/assignments for ErrorAlertEvaluator accordingly so any write operations
use the primary Prisma client (_prisma = prisma) and reads use the replica
(_replica = $replica).
| for (const alertable of alertableErrors) { | ||
| const envChannels = channelsByEnvId.get(alertable.error.environment_id) ?? []; | ||
| for (const channel of envChannels) { | ||
| await alertsWorker.enqueue({ | ||
| job: "v3.deliverErrorGroupAlert", | ||
| payload: { | ||
| channelId: channel.id, | ||
| projectId, | ||
| classification: alertable.classification, | ||
| error: { | ||
| fingerprint: alertable.error.error_fingerprint, | ||
| environmentId: alertable.error.environment_id, | ||
| environmentName: alertable.environmentName, | ||
| taskIdentifier: alertable.error.task_identifier, | ||
| errorType: alertable.error.error_type, | ||
| errorMessage: alertable.error.error_message, | ||
| sampleStackTrace: alertable.error.sample_stack_trace, | ||
| firstSeen: alertable.error.first_seen, | ||
| lastSeen: alertable.error.last_seen, | ||
| occurrenceCount: alertable.error.occurrence_count, | ||
| }, | ||
| }, | ||
| }); | ||
| } |
There was a problem hiding this comment.
Longer channel intervals are effectively ignored.
The evaluator schedules the whole project at the shortest configured interval, but the delivery loop still enqueues alerts for every matching channel on every run. That means a channel configured for 60 minutes will still receive alerts every 5 minutes if any sibling channel uses a 5-minute interval.
Also applies to: 212-221, 391-404
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/webapp/app/v3/services/alerts/errorAlertEvaluator.server.ts` around
lines 102 - 125, The loop enqueues alerts to every envChannels entry on every
evaluator run, ignoring per-channel intervals; inside the for (const channel of
envChannels) loop (before calling alertsWorker.enqueue) add a guard that
respects the channel's configured interval by comparing it to the evaluator's
current run interval (e.g., currentRunIntervalMinutes or
evaluationIntervalMinutes): only enqueue when the current run aligns with the
channel interval (for example channel.intervalMinutes ===
currentRunIntervalMinutes or channel.intervalMinutes % currentRunIntervalMinutes
=== 0), otherwise continue; update any related loops mentioned (the other
occurrences around the same pattern at the referenced ranges) to use the same
interval-check guard.
| if ( | ||
| state.ignoredUntilOccurrenceRate !== null && | ||
| state.ignoredUntilOccurrenceRate !== undefined | ||
| ) { | ||
| const windowMinutes = Math.max(context.windowMs / 60_000, 1); | ||
| const rate = context.occurrencesSince / windowMinutes; | ||
| if (rate > state.ignoredUntilOccurrenceRate) { | ||
| return true; | ||
| } | ||
| } | ||
|
|
||
| if ( | ||
| state.ignoredUntilTotalOccurrences !== null && | ||
| state.ignoredUntilTotalOccurrences !== undefined && | ||
| state.ignoredAt | ||
| ) { | ||
| if (context.occurrencesSince >= state.ignoredUntilTotalOccurrences) { | ||
| return true; | ||
| } | ||
| } |
There was a problem hiding this comment.
ignoredUntilTotalOccurrences is only checking the latest window.
getOccurrenceCountsSince() always counts from the current scheduledAt, so a total-occurrence threshold only trips if the entire threshold is exceeded in one evaluation cycle. A rule like "unignore after 100 occurrences" will never fire if the error accrues 20 occurrences across five runs.
Also applies to: 317-343
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/webapp/app/v3/services/alerts/errorAlertEvaluator.server.ts` around
lines 178 - 197, The total-occurrence unignore check uses
context.occurrencesSince (window-only) so a threshold like
ignoredUntilTotalOccurrences won’t fire across multiple evaluation windows;
change the logic to compute the cumulative occurrences since the alert was
ignored and compare that to state.ignoredUntilTotalOccurrences. Specifically,
replace the context.occurrencesSince check with a call that gets counts since
state.ignoredAt (or otherwise compute occurrences since state.ignoredAt using
getOccurrenceCountsSince/your occurrence-count helper) and use that cumulative
count in the if that references state.ignoredUntilTotalOccurrences and
state.ignoredAt.
| await this, | ||
| this._prisma.errorGroupState.update({ | ||
| where: { id: state.id }, | ||
| data: { | ||
| status: "UNRESOLVED", | ||
| ignoredUntil: null, | ||
| ignoredUntilOccurrenceRate: null, | ||
| ignoredUntilTotalOccurrences: null, | ||
| ignoredAt: null, | ||
| ignoredReason: null, | ||
| ignoredByUserId: null, | ||
| resolvedAt: null, | ||
| resolvedInVersion: null, | ||
| resolvedBy: null, | ||
| }, | ||
| }); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n apps/webapp/app/v3/services/alerts/errorAlertEvaluator.server.ts | sed -n '365,395p'Repository: triggerdotdev/trigger.dev
Length of output: 1221
🏁 Script executed:
cat -n apps/webapp/app/v3/services/alerts/errorAlertEvaluator.server.ts | sed -n '360,400p'Repository: triggerdotdev/trigger.dev
Length of output: 1543
Remove the errant await this, comma expression that prevents the state update from being awaited.
Line 372 has await this, which is a comma operator that discards the awaited this reference and returns the unawaited Prisma promise. The update will execute but won't be awaited, allowing the method to return and subsequent operations (like selfChain()) to run before the database write commits. This creates a race condition where state changes are invisible to the job's continuation.
Fix
- await this,
- this._prisma.errorGroupState.update({
+ await this._prisma.errorGroupState.update({
where: { id: state.id },
data: {
status: "UNRESOLVED",
ignoredUntil: null,
ignoredUntilOccurrenceRate: null,
ignoredUntilTotalOccurrences: null,
ignoredAt: null,
ignoredReason: null,
ignoredByUserId: null,
resolvedAt: null,
resolvedInVersion: null,
resolvedBy: null,
},
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| await this, | |
| this._prisma.errorGroupState.update({ | |
| where: { id: state.id }, | |
| data: { | |
| status: "UNRESOLVED", | |
| ignoredUntil: null, | |
| ignoredUntilOccurrenceRate: null, | |
| ignoredUntilTotalOccurrences: null, | |
| ignoredAt: null, | |
| ignoredReason: null, | |
| ignoredByUserId: null, | |
| resolvedAt: null, | |
| resolvedInVersion: null, | |
| resolvedBy: null, | |
| }, | |
| }); | |
| await this._prisma.errorGroupState.update({ | |
| where: { id: state.id }, | |
| data: { | |
| status: "UNRESOLVED", | |
| ignoredUntil: null, | |
| ignoredUntilOccurrenceRate: null, | |
| ignoredUntilTotalOccurrences: null, | |
| ignoredAt: null, | |
| ignoredReason: null, | |
| ignoredByUserId: null, | |
| resolvedAt: null, | |
| resolvedInVersion: null, | |
| resolvedBy: null, | |
| }, | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/webapp/app/v3/services/alerts/errorAlertEvaluator.server.ts` around
lines 372 - 387, Remove the stray comma-expression "await this, ..." so the
Prisma update is actually awaited: locate the call to
this._prisma.errorGroupState.update (the block that sets status: "UNRESOLVED"
and clears ignored/resolved fields) and change the statement to await the update
promise directly (i.e., await this._prisma.errorGroupState.update(...)) so the
database write completes before continuing (e.g., before calling selfChain()).
| project: "my-project", | ||
| }; | ||
|
|
||
| export default function Email(props: AlertErrorGroupEmailProps) { |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Verify default export removal in the new email file
rg -nP --type=tsx '^\s*export\s+default\b' internal-packages/emails/emails/alert-error-group.tsx
# Verify named import usage in EmailClient index
rg -nP --type=tsx 'alert-error-group' internal-packages/emails/src/index.tsx -A3 -B3Repository: triggerdotdev/trigger.dev
Length of output: 96
🏁 Script executed:
# Search for default export in alert-error-group file
rg -n 'export\s+default' internal-packages/emails/emails/alert-error-group.tsx
# Check the import statement in index.tsx
rg -n 'alert-error-group' internal-packages/emails/src/index.tsx -A5 -B2
# Get file line counts to know safe read ranges
wc -l internal-packages/emails/emails/alert-error-group.tsx internal-packages/emails/src/index.tsxRepository: triggerdotdev/trigger.dev
Length of output: 906
🏁 Script executed:
# Read the alert-error-group file around the export line
sed -n '55,65p' internal-packages/emails/emails/alert-error-group.tsx
# Check if this export is imported/used anywhere else
rg -n 'AlertErrorGroupEmail' internal-packages/emails/ --type=ts --type=tsx
# Check the export in index.tsx around the import
sed -n '1,20p' internal-packages/emails/src/index.tsxRepository: triggerdotdev/trigger.dev
Length of output: 1261
🏁 Script executed:
# Check all usages of AlertErrorGroupEmail across the codebase
rg -n 'AlertErrorGroupEmail' --max-count=20
# Also check if there are any other references to alert-error-group file
rg -n 'alert-error-group' internal-packages/ --max-count=20Repository: triggerdotdev/trigger.dev
Length of output: 1203
Replace the default export with a named function export.
Line 58 introduces a default export, which violates the repository export rule. Switch to a named export and update the import in internal-packages/emails/src/index.tsx accordingly.
Suggested patch
- export default function Email(props: AlertErrorGroupEmailProps) {
+ export function AlertErrorGroupEmail(props: AlertErrorGroupEmailProps) {
const {
classification,
taskIdentifier,
environment,
error,
occurrenceCount,
errorLink,-import AlertErrorGroupEmail, {
- AlertErrorGroupEmailSchema,
-} from "../emails/alert-error-group";
+import {
+ AlertErrorGroupEmail,
+ AlertErrorGroupEmailSchema,
+} from "../emails/alert-error-group";Per coding guidelines: **/*.{ts,tsx,js,jsx}: Use function declarations instead of default exports
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export default function Email(props: AlertErrorGroupEmailProps) { | |
| export function AlertErrorGroupEmail(props: AlertErrorGroupEmailProps) { | |
| const { | |
| classification, | |
| taskIdentifier, | |
| environment, | |
| error, | |
| occurrenceCount, | |
| errorLink, |
| export default function Email(props: AlertErrorGroupEmailProps) { | |
| import { | |
| AlertErrorGroupEmail, | |
| AlertErrorGroupEmailSchema, | |
| } from "../emails/alert-error-group"; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal-packages/emails/emails/alert-error-group.tsx` at line 58, Change the
default export of the component function Email to a named export (use "export
function Email(...)" instead of "export default function Email(...)") and then
update the corresponding import in the emails index module to use a named import
(e.g. import { Email } from '...') or rename on import if needed; also update
any usages that relied on the default import to reference the named export.