Replies: 4 comments 6 replies
-
|
That sounds super interesting, thanks for sharing. Let me reel in @krlvi as well. Regarding merge-drivers, these are already used when merging, so |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for confirming that GitButler already picks up merge drivers through gitattributes. For anyone wanting to try weave with GitButler: That's it. GitButler's virtual branch merges will use weave automatically for configured file types (TypeScript, JavaScript, Python, Go, Rust, JSON, YAML, TOML, Markdown). On the "showing diffs differently" angle, that's where it gets interesting. The entity extraction comes from sem-core, which is a Rust library (not just a CLI). It uses tree-sitter to parse code into semantic entities (functions, classes, interfaces, types, etc.) with identity matching across versions. For GitButler specifically, this could help with:
Since GitButler's backend is Rust and uses gix, sem-core could be added as a direct crate dependency rather than shelling out to a CLI. Happy to help explore what that integration would look like if @krlvi is interested. |
Beta Was this translation helpful? Give feedback.
-
|
This is absolutely a killer merge driver! I randomly picked a Rust enum and simulated a conflict scenario where two branches rename the same enum on the same line. Original: #[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub enum Source {
// ...
}Modification on main: #[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub enum Source1 {
// ...
}Modification on branch-1: #[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub enum BSource {
// ...
}Final merge result ( pub enum BSource {
// ...
}
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub enum Source1 {
// ...
}So although Git did not report a conflict and completed the merge successfully, the merge state produced by weave appears to be incorrect - both renames of the same enum from the two branches were preserved. Please forgive me for not having fully read the weave documentation yet. I would have expected this kind of case to fall back to the original line-by-line conflict resolution. Did I misunderstand or misuse something here? Or does weave treat the function / enum as a different entity once the signature changes, even if both function B and function C are just renames of the same original function A? Is tracking changes across such derived functions within weave's intended scope? I feel this kind of "apparently successful but actually incorrect" merge could be more dangerous, as it may allow faulty code to slip into commits unintentionally. I haven't done any research in this area and am only raising this from a practical engineering perspective. If I misunderstood anything, please feel free to correct me! |
Beta Was this translation helpful? Give feedback.
-
|
Hello @rs545837! I want to preface my post by the fact that I think what you're doing is amazing, and I'd love to keep chatting about this to see if there's room for some collaboration on this topic. Line-based merge is so very limited, yet persists simply because more involved merge algorithms are so much effort, and few have been willing to chip in. Also, your tool actually came up in a discussion in the GitButler team just before you posted this. For some context on why I'm at least moderately qualified to comment on this topic, I worked on prototyping structured merge back in 2019-2021. Initially with a focus on move-enabled merge (i.e. allowing an AST subtree be moved in one revision and edited in another without causing a conflict), and then with a focus on not having the printing of the merged AST to source code absolutely butcher the original formatting of the code. If you're curious, the (now long abandoned) merge tool is over at https://github.com/ASSERT-KTH/spork, and the resulting whitepaper can be found here. I honestly do not remember half of my work there, but at some point in time I was fairly in-the-know on this topic. Largely unnecessary flexing out of the way, let's get back to the topic at hand. There are several team members at GitButler that are intrigued by this. We have at least one avid user of Mergiraff, a tool we unfortunately cannot incorporate due to it being under a GPLv3 license. Speaking of license,
This is cool, but really only shows the potential of the tool, and does not tell us about its performance or reliability. It's just too small a sample set. Possibly the largest takeaway I had from my own work is that, while textual merge is very limited, it's also very simple. You really don't hear of bugs in textual merge, while a bug in weave has been uncovered in this very thread. AST-based merge is rather complex.
This surprises me a little bit. I added these exact optimizations myself, but even without them I had a significantly lowered rate of merge conflicts than with a textual merge. For example, if one revision adds a parameter and another revision adds something else to the method signature (return value, modifier, etc), a line-based merge conflicts while a fully structured merge does not. weave appears to perform poorly in such a scenario, see example in spoilers below. weave vs line-based merge on single scenarioProduced with <<<<<<< ours — class `Main` (F, confidence: medium)
diff --git a/src/test/resources/clean/both_modified/add_parameters_and_thrown_types/Left.java b/src/test/resources/clean/both_modified/add_parameters_and_thrown_types/Left.java
index 963c829..5554564 100644
--- a/src/test/resources/clean/both_modified/add_parameters_and_thrown_types/Left.java
+++ b/src/test/resources/clean/both_modified/add_parameters_and_thrown_types/Left.java
@@ -1,5 +1,14 @@
+<<<<<<< ours — class `Main` (F, confidence: medium)
+// hint: Logic changed on both sides. Requires understanding intent of each change.
public class Main {
public int add(int a, int b) throws IllegalArgumentException {
return a + b;
}
}
+=======
+public class Main {
+ public int add(int a, int b, int c) {
+ return a + b + c;
+ }
+}
+>>>>>>> theirs — class `Main` (F, confidence: medium)Produced with `git merge-file Left.java Base.java Right.java public class Main {
+<<<<<<< Left.java
public int add(int a, int b) throws IllegalArgumentException {
return a + b;
+=======
+ public int add(int a, int b, int c) {
+ return a + b + c;
+>>>>>>> Right.java
}
}I can easily engineer this scenario to be even more in favor of the line-based merge by separating the signature and the return statement with a blank line in all revisions (or, really, any line), and increasing the size of the class with other methods, fields etc. Then, the line-based merge conflicts only on the method header, while weave still conflicts on the entire class. This single scenario is of course of very little importance, but I think it illustrates the point that there's a need for a larger scale evaluation of weave as a merge tool. I think it also doesn't really do what I consider a fully structured merge, but more akin to a semi-structured merge. I'd be very interested in learning more about the diff and merge algorithms that are employed. Perhaps we can hop on a call sometime in the near future and you can tell me more about weave? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Virtual branches are one of the most compelling workflows I've seen for parallel feature development, and I've been thinking about how entity-level merge could make them even better. When you're constantly applying and unapplying branches, Git's line-based 3-way merge hits false conflicts surprisingly often, two branches that each add a function to the same file, imports that both branches modify, class members added in different spots. These aren't real conflicts, but Git can't tell the difference.
I've been working on a structured merge driver called weave that uses tree-sitter to extract functions, classes, and methods as discrete entities, then merges at that level. On a benchmark suite of 31 merge scenarios, it resolves all 31 cleanly versus Git's 15/31 (48%). The gains come from three things: concurrent function additions to the same file merge without conflict, imports merge as unordered sets (so ordering differences don't cause conflicts), and class members merge independently. The underlying entity extraction library (sem-core) is a Rust crate that could also enable smarter hunk splitting, grouping changes by semantic entity rather than by diff proximity, which seems relevant to how GitButler assigns hunks to virtual branches.
Has the team considered structured merge approaches for the virtual branch workflow? It seems like a case where the payoff would be especially high, since merges happen so frequently and users aren't expecting to resolve conflicts every time they switch context. Happy to discuss the technical details or trade-offs, the approach draws on ideas from the LastMerge, Mergiraf, ConGra, and Sesame papers if any of those are familiar.
Beta Was this translation helpful? Give feedback.
All reactions