feat(datasource/rpm): use primary_db metadata with configurable fallback#41866
feat(datasource/rpm): use primary_db metadata with configurable fallback#41866kpumuk wants to merge 6 commits intorenovatebot:mainfrom
Conversation
| Renovate does not create these rollback PRs by default, so this functionality needs to be opted-into. | ||
| We recommend you do this selectively with `packageRules` and not globally. | ||
|
|
||
| ## rpmMetadataSource |
There was a problem hiding this comment.
I don't think this is needed at all. why should we make it configurable?
There was a problem hiding this comment.
I first implemented it as a straight change, but then thought it might be considered a breaking change in the behaviour. Another reason is that sqlite files might be larger than xml, and some might want to switch back to xml.
As a result, I added a config option to allow people to switch back and report, if there are any issues to avoid breaking behaviour.
|
|
||
| const rows = db | ||
| .prepare('select version, release from packages where name = ?') | ||
| .all(packageName) as RpmSqlitePackageRow[]; |
There was a problem hiding this comment.
Unfortunately, better-sqlite3 does not expose an async iterator. Its type is iterate(...): IterableIterator<Result>. Wrapping this into async wrapper won't change the state of things, and just pretend we have an async interface :-(
I will switch from .all() to .iterate() to avoid materializing all rows.
There was a problem hiding this comment.
do this refactor in a separate PR to be merged first
Changes
Some RPM repositories, including Amazon Linux 2023, expose valid
repomd.xmlmetadata without an XML declaration and also publishprimary_dbmetadata alongsideprimary.xml.gz.The RPM datasource resolves package versions by reparsing the full
primary.xml.gzfile for every dependency. That works, but it performs very poorly on large number of dependencies, because the same metadata is scanned repeatedly.This PR makes the RPM datasource more tolerant and significantly faster on repositories that publish SQLite metadata.
repomd.xmlresponses that start with either<?xmlor<repomd>.repomd.xmlmetadata entries more completely so Renovate can discover bothprimaryandprimary_db.primary_db/primary.sqlite.gzwhen available, and query package versions from SQLite instead of reparsing XML for every dependency.primary.xml.gzas the fallback path whenprimary_dbis not present or the SQLite path is unusable.rpmMetadataSourceoption so users can choose metadata selection behavior explicitly:auto(default): preferprimary_db, fall back toprimaryprimary: use XML metadata onlyprimary_db: use SQLite metadata onlyFor repositories with many packages, using
primary_dbavoids repeated full XML parsing and reduces RPM datasource lookup time from “once per dependency” work to direct indexed queries, while still preserving the existing XML path for compatibility and fallback.Benchmark results for 50 AL2023 dependencies show 38x pure Node runtime performance improvement (from 34s down to 885ms). Wall clock is ~7x improvement from 38s down to 5.7s.
Context
Please select one of the following:
AI assistance disclosure
Did you use AI tools to create any part of this pull request?
Please select one option and, if yes, briefly describe how AI was used (e.g., code, tests, docs) and which tool(s) you used.
Used Codex to iterate on the design of the change.
Documentation (please check one with an [x])
How I've tested my work (please select one)
I have verified these changes via:
Used private repository with 60 dependencies from AL2023.