Skip to content

feat(datasource/rpm): use primary_db metadata with configurable fallback#41866

Open
kpumuk wants to merge 6 commits intorenovatebot:mainfrom
kpumuk:rpm-sqlite
Open

feat(datasource/rpm): use primary_db metadata with configurable fallback#41866
kpumuk wants to merge 6 commits intorenovatebot:mainfrom
kpumuk:rpm-sqlite

Conversation

@kpumuk
Copy link
Contributor

@kpumuk kpumuk commented Mar 13, 2026

Changes

Some RPM repositories, including Amazon Linux 2023, expose valid repomd.xml metadata without an XML declaration and also publish primary_db metadata alongside primary.xml.gz.

The RPM datasource resolves package versions by reparsing the full primary.xml.gz file for every dependency. That works, but it performs very poorly on large number of dependencies, because the same metadata is scanned repeatedly.

This PR makes the RPM datasource more tolerant and significantly faster on repositories that publish SQLite metadata.

  • Accept repomd.xml responses that start with either <?xml or <repomd>.
  • Parse repomd.xml metadata entries more completely so Renovate can discover both primary and primary_db.
  • Prefer primary_db / primary.sqlite.gz when available, and query package versions from SQLite instead of reparsing XML for every dependency.
  • Keep primary.xml.gz as the fallback path when primary_db is not present or the SQLite path is unusable.
  • Add a new rpmMetadataSource option so users can choose metadata selection behavior explicitly:
    • auto (default): prefer primary_db, fall back to primary
    • primary: use XML metadata only
    • primary_db: use SQLite metadata only
  • Update RPM datasource documentation and add regression coverage for the new metadata selection and fallback behavior.

For repositories with many packages, using primary_db avoids repeated full XML parsing and reduces RPM datasource lookup time from “once per dependency” work to direct indexed queries, while still preserving the existing XML path for compatibility and fallback.

Benchmark results for 50 AL2023 dependencies show 38x pure Node runtime performance improvement (from 34s down to 885ms). Wall clock is ~7x improvement from 38s down to 5.7s.

Context

Please select one of the following:

  • This closes an existing Issue, Closes: #
  • This doesn't close an Issue, but I accept the risk that this PR may be closed if maintainers disagree with its opening or implementation

AI assistance disclosure

Did you use AI tools to create any part of this pull request?

Please select one option and, if yes, briefly describe how AI was used (e.g., code, tests, docs) and which tool(s) you used.

  • No — I did not use AI for this contribution.
  • Yes — minimal assistance (e.g., IDE autocomplete, small code completions, grammar fixes).
  • Yes — substantive assistance (AI-generated non‑trivial portions of code, tests, or documentation).
  • Yes — other (please describe):

Used Codex to iterate on the design of the change.

Documentation (please check one with an [x])

  • I have updated the documentation, or
  • No documentation update is required

How I've tested my work (please select one)

I have verified these changes via:

  • Code inspection only, or
  • Newly added/modified unit tests, or
  • No unit tests, but ran on a real repository, or
  • Both unit tests + ran on a real repository

Used private repository with 60 dependencies from AL2023.

Renovate does not create these rollback PRs by default, so this functionality needs to be opted-into.
We recommend you do this selectively with `packageRules` and not globally.

## rpmMetadataSource
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed at all. why should we make it configurable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I first implemented it as a straight change, but then thought it might be considered a breaking change in the behaviour. Another reason is that sqlite files might be larger than xml, and some might want to switch back to xml.

As a result, I added a config option to allow people to switch back and report, if there are any issues to avoid breaking behaviour.


const rows = db
.prepare('select version, release from packages where name = ?')
.all(packageName) as RpmSqlitePackageRow[];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use async iterator

Copy link
Contributor Author

@kpumuk kpumuk Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, better-sqlite3 does not expose an async iterator. Its type is iterate(...): IterableIterator<Result>. Wrapping this into async wrapper won't change the state of things, and just pretend we have an async interface :-(

I will switch from .all() to .iterate() to avoid materializing all rows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do this refactor in a separate PR to be merged first

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved XML extraction to #41910

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants