| 2006-08-30 |
| → agile joined | 00:00 |
| → GyrosGeier joined | 00:04 |
| → kanru joined | 00:16 |
| → krh joined | 00:26 |
| → krh joined | 00:35 |
| dwmw2_LHR → dwmw2_SFO | 00:38 |
| → xjjk joined | 00:58 |
| → kanru joined | 01:30 |
| → krh joined | 01:44 |
| → robfitz joined | 01:46 |
|
spearce
| Hmm. so compressed tree deltas apparently waste space in pack files... | 02:01 |
|
mugwump
| that sounds familiar :) | 02:04 |
|
spearce
| has it been discussed before on the list? | 02:06 |
|
mugwump
| 04:07 < spearce> i'm trying to cheat by not reloading the tree object from disk and instead reproduce it from what i have in memory about its entries/modes/sha1s. | 02:07 |
|
spearce
| oh, yea. that was because i would be holding two copies of its filename in memory. with many files that's a lot of data. but this compressed tree delta thing is totally unrelated. | 02:08 |
|
mugwump
| I guess it would make sense to customise the exact compression style per-object | 02:08 |
|
| er, it would make sense that such an endeavour might have wins, I mean | 02:09 |
|
| eg, you might want to partition delta compressing between different object types, and make the delta scanning algorithms apply tree-specific shortcuts to arrive at a better overall result | 02:10 |
|
spearce
| or just not compress tree deltas. :) but yes, and i think that's something i'm going to start working on... | 02:11 |
|
| Jon and I are starting to become convinced that we can likely get good searching in files for real cheap at the same time. | 02:12 |
|
mugwump
| or tune the delta compression for trees such that they rarely get compressed... | 02:12 |
|
| mugwump reads spearce's list post about mozilla .git | 02:14 |
| → somegeek joined | 02:14 |
|
mugwump
| oh, I'm confusing delta compression and zlib compression | 02:14 |
|
| I didn't think zlib ever actually increased the size of stuff... | 02:15 |
|
| I mean, once you've stripped the 20byte header | 02:15 |
|
spearce
| frell. I didn't account for the 20 byte header in my computation. Arrgh. | 02:17 |
|
mugwump
| still, that's only 20MB | 02:18 |
|
| not enough for the 29MB expansion you found | 02:20 |
|
spearce
| The other 9 MB could be caused by the object headers. | 02:20 |
|
| iirc the header can't exceed 5 bytes, but all odds are tree delta headers are around 3 bytes/each. | 02:21 |
|
| I just corrected myself; its more like 8.6 MiB being wasted. | 02:25 |
|
| As the headers should be on average 2 bytes in length + 20 byte SHA1 base * 976712 objects. | 02:26 |
|
mugwump
| hmm, yapc::eu starts in 6 hours, better get some rest :) | 02:28 |
|
spearce
| 'night. | 02:28 |
| → kanru joined | 02:40 |
| → Tv joined | 03:31 |
| → james_d joined | 03:56 |
| ← james_d left | 04:03 |
| → xjjk joined | 04:15 |
| → Gitzilla joined | 04:16 |
| → james_d joined | 04:28 |
| ← james_d left | 04:47 |
| → dst_ joined | 06:05 |
| → somegeek joined | 06:09 |
| → somegeek joined | 07:00 |
| → ruskie joined | 07:05 |
| → ferdy joined | 07:05 |
| → somegeek joined | 07:28 |
| → ruskie joined | 07:28 |
| → somegeek joined | 08:05 |
| → somegeek joined | 08:18 |
| → thi joined | 08:31 |
|
thi
| hello in line 1580 to 1592 of gitweb.perl must be a bug. the if condition is always true | 08:33 |
| → devogon joined | 08:39 |
| → polyonymous joined | 09:18 |
| → mchehab joined | 10:00 |
| → mchehab joined | 10:14 |
| → somegeek joined | 10:37 |
| → coywolf joined | 10:53 |
| → ilogger2 joined | 11:07 |
| → mchehab joined | 12:01 |
| → ruskie joined | 12:28 |
| → timlarson_ joined | 12:37 |
| → ferdy joined | 13:07 |
| → agile joined | 13:58 |
| → spearce joined | 14:08 |
| → segher_ joined | 14:11 |
| → benlau joined | 14:27 |
| → ferdy joined | 14:28 |
| → cworth joined | 14:38 |
| → Oejet joined | 14:52 |
| → xjjk joined | 15:00 |
| → polyonymous joined | 15:07 |
| → krh joined | 15:47 |
| → pdmef joined | 15:47 |
| → yashi joined | 15:56 |
| → GyrosGeier joined | 16:04 |
|
philips
| I am trying to use git-apply on this http://ifup.org/~philips/kernbench.diff but it is complaining: fatal: corrupt patch at line 15 | 16:13 |
|
| I tried using git-applymbox earlier today and it complained also | 16:13 |
|
| git version 1.4.1.1 | 16:13 |
|
| patch -p0 < /home/philips/kernbench.diff works just fine... | 16:14 |
|
Oejet
| I guess, it is because, it is not a "git patch", so you should use normal patch, like that. | 16:15 |
| ← tnks left | 16:15 |
|
spearce
| Right. I don't think git apply likes a non-git patch. | 16:16 |
|
philips
| ah ok | 16:16 |
|
| sucky | 16:16 |
|
spearce
| line 15 as it happens is the blank line between the two hunks. | 16:16 |
|
| git patches never have blank lines between hunks, unless its part of the context of the preceeding hunk. | 16:17 |
|
philips
| On the mbox's I was trying to apply it was failing on the first newline | 16:17 |
|
| no matter if it was between hunks or not | 16:18 |
|
| Does anyone else have troubles with git-applymbox or git-apply? | 16:26 |
|
spearce
| they work fine for me with git generated patches; if its not a git generated patch I use GNU patch. so no. :) | 16:26 |
|
philips
| It doesn't work for me in any case in the guys in #git are trying to convince me that the tools only work with patches generated by git | 16:26 |
|
| err, haha, wrong channel | 16:26 |
|
| :-D | 16:27 |
|
spearce
| git apply is incredibly picky about what it gets and refuses to perform a lot of things that patch would normally do just fine. | 16:28 |
|
philips
| ohhh right, because it is using internal libpatch or something. | 16:28 |
|
spearce
| actually I think Linus hand-wrote his own patch apply routines in git-apply. | 16:29 |
| → alley_cat joined | 16:39 |
| → anholt_ joined | 16:51 |
| → robin joined | 17:15 |
| → Tv joined | 17:52 |
| → timlarson_ joined | 18:27 |
| → timlarson_ joined | 18:28 |
| → DrNick joined | 18:30 |
| beu → beu_ | 18:53 |
| → beu joined | 18:56 |
| → gittus joined | 18:59 |
|
gittus
| philips: http://ifup.org/~philips/kernbench.diff is indeed corrupt, and git-apply is correct in complaining about it. | 19:00 |
|
| That "empty line" on line 15 is very much corruption: according to the diff headers, it is still part of the patch, | 19:00 |
|
| but it is not a valid diff-line (it lacks the initial space) | 19:01 |
|
| So git is correct (as usual), and you have corrupted your patch either by editing it, copy-and-pasting it, or by having a totally broken "diff" binary. | 19:01 |
| → Beber` joined | 19:02 |
|
gittus
| The fact is, git is perfectly happy to apply patches generated by other systems, but git refuses to apply patches that have technical problems. | 19:02 |
|
| GNU patch (and probably some other patch applicator programs) applies any random crap it can. | 19:03 |
|
| Anyway, just thought I'd set the record straight. "git-apply" _is_ very anal, but that's very much on purpose. | 19:04 |
| → dwmw2_gone joined | 19:04 |
|
gittus
| Unlike apparently a lot of other programs, git cares /deeply/ about data integrity at all levels. | 19:04 |
| → beu joined | 19:05 |
|
gittus
| "Guessing" what the correct thing should be is simply not acceptable. | 19:05 |
| dwmw2_gone → dwmw2 | 19:08 |
| → Trigger7 joined | 19:12 |
|
spearce
| Yea but guessing is damn convienent sometimes. Like when the patch is corrupt. :) | 19:13 |
|
gittus
| spearce: sure. Guessing is convenient. But /not/ guessing means that you know that you can trust the end result. | 19:14 |
|
| I'll take "trust it" over "convenience" any day when it comes to git. | 19:14 |
|
| Besides, it has side effects too. In particular, GNU patch has allowed people to perpetuate corruptions, because things still "work" | 19:15 |
|
| So you can have bad email clients or cut-and-paste crap that removes spaces at the end of lines, and nobody necessarily even notices. | 19:15 |
|
| In contrast, when you're strict, and notice, maybe it's painful right then and there, but you can /fix/ the problem. | 19:15 |
|
spearce
| I don't disagree. But it is nice when you can still save the patch without editing it by applying it, cleaning up any rejects if any, and test the hell out of the result. | 19:15 |
|
gittus
| And that means that not only is git-apply then more trustworthy, so is your email client. | 19:16 |
|
spearce
| Its like if Git made it impossible to read a partially corrupt pack... you'd lose the entire pack if you couldn't read it because of a single bit error in one object. | 19:16 |
|
gittus
| but that's exactly what you _want_ | 19:17 |
|
| You may have a git-fsck-objects or something that tries to fix things up, or a "git-extract-as-much-as-possible", or a "--unsafe" flag or similar. | 19:17 |
|
| But you should _refuse_ to touch anything that you notice is corrupt. Which is exactly what git does. | 19:17 |
|
| In fact, the biggest problem with the recent bit corruption was not that git didn't refuse to touch it (it did, eventually), | 19:18 |
|
spearce
| I completely agree. But right now git-apply doesn't have a --unsafe flag. :) | 19:18 |
|
gittus
| but that it could have noticed much eariler. | 19:18 |
|
| Actually, git-apply has /several/ "--unsafe" flags, but for _other_ kinds of corruption. | 19:18 |
|
| See "-pNUM" and "--whitespace=strip" and friends. | 19:19 |
|
| Now, if somebody wants to add a "--accept-crap" flag, please send the patch to Junio, but make sure it's not on by default. | 19:19 |
|
| Personally, I want to _know_ about it when somebody sends me crap, so that I can try to make sure that it gets fixed. | 19:20 |
|
spearce
| Needing -pNUM isn't corruption, its a user who didn't generate the patch correctly for how Git would prefer to apply it. --whitespace=* is looking for weird whitespace corruption but not always, e.g. if you are working with DOS formatted files that whitespace stuff doesn't like the CRs on the end of lines. | 19:20 |
|
gittus
| No, "-pNUM" _is_ about corruption. It means that git-apply will accept a patch even if it has "fuzz". | 19:21 |
|
| That's strictly speaking a corrupt patch that doesn't apply any more. | 19:21 |
|
spearce
| ah, i though it was the leading directory stripping thing like patch's -pNUM flag. | 19:21 |
|
gittus
| Oh, damn, you're right. The fuzz thing is -C<num>. Sorry, my bad. | 19:22 |
|
spearce
| Fuzz is corruption yes, assuming you found a reasonable ancestor to apply the damn patch to. But iirc git-am doesn't search as hard as it could for a possible base to apply that patch to, especially if its not a Git generated patch which lacks blob sha1's. Fuzz may be necessary to get the damn patch to apply to whatever git-am thinks is the right base. In which case its git-am inducing the corruption, not the patch itself... which is still corrupt | 19:23 |
| → Eludias joined | 19:25 |
|
GyrosGeier
| hm | 19:25 |
|
| is there a tool these days that I can tell "apply this patch to the most recent thing you can find where it applies cleanly, and form a new branch from that?" (optionally specifying the branch it must be on) | 19:26 |
|
spearce
| git-checkout -b tmp; git-am patch ? | 19:27 |
|
| git-am is the best guesser we have but it doesn't automatically form the branch and it doesn't guess as well as it could (of course more guessing would take longer, but its already pretty darn fast, so...) | 19:28 |
|
gittus
| spearce: I think GG was talking about finding that most recent point automatically (ie walking backwards until it applies cleanly) | 19:28 |
|
GyrosGeier
| exactly | 19:28 |
|
spearce
| If you read git-am I think it tries that, but it only tries the 5 most recent labels or something... | 19:29 |
|
GyrosGeier
| I have a bunch of old Linux trees that contain changes I once made for some piece of hardware | 19:30 |
|
spearce
| Hmm, that code was removed from git-am. Sorry. | 19:30 |
|
GyrosGeier
| I think I could turn most of them into patches, and there are repos of old trees | 19:31 |
|
spearce
| Probably because it didn't work as good as it could have. | 19:31 |
|
gittus
| GyrosGeier: it would be pretty expensive with old-fashioned diffs to do anything but a small number of tests, but it might not be _too_ bad if you had a real git diff with the SHA1 names listed explicitly, and then did a script that basically did "git-ls-tree" on the commits (with the appropriate files listed) to find a tree that matches in all files | 19:31 |
|
GyrosGeier
| gittus, well, I'd try one file, if it matched, the next one and so on. | 19:31 |
|
| gittus, if a test fails, we can rule out that SHA1 for this patch | 19:32 |
|
| *path | 19:32 |
|
gittus
| Sure, that would also work. | 19:32 |
|
| It would probably be pretty useful even _without_ actually applying the patch itself, ie I can imagine that you'd want to ask just "what version does this patch apply cleanly to" without even applying it. | 19:33 |
|
GyrosGeier
| yep | 19:33 |
|
| do we have a tool to branch and apply in one step, to complement that? | 19:33 |
|
gittus
| But it would be more useful with just regular patches too, and that's much more involved. | 19:33 |
|
| GG: nope. | 19:33 |
|
GyrosGeier
| gittus, I'm talking about regular patches | 19:33 |
|
gittus
| With regular patches, you want to be smarter, but it's still possible to be efficient. What you'd do is: | 19:34 |
|
| - try to apply it in the top-of-tree | 19:34 |
|
GyrosGeier
| gittus, a patch is a list of before/after chunks | 19:34 |
|
gittus
| - if it doesn't work, just do a "git-rev-list HEAD -- <list-of-all-files>" to get only the list of commits that actualyl _changed_ any of those files | 19:35 |
|
| - then try each of those in turn. | 19:35 |
|
| That should be reasonably efficient (ie you'd not have to do a _lot_ of unnecessary patch application, and you'd let git do the SHA1-based optimization for when files don't change) | 19:35 |
|
| If you teach "git-apply" to take the source from a "git revision + pathname", you'd be able to test whether a patch applies or not without actually ever having to check everything out, too. | 19:37 |
|
| A small matter of programming. "cat patch | git-apply --test <revision>" and then check the return value. | 19:37 |
|
| Something like that. | 19:37 |
|
spearce
| Doesn't git-apply actually work off the index right now, so you don't even need to have it checked out to the working directory, just loaded into a temporary index. | 19:37 |
|
GyrosGeier
| gittus, I'd use the "before" chunks to find candidate blobs for the first file, then look at the list of commits that contain one of these at this path, then repeat for the next file in the patch, starting on the latest commit where file 1 matched | 19:38 |
|
gittus
| Not really. "git-apply --index" _verifies_ and _updates_ the index, but it will still get the actual _data_ from the currently checked out file. | 19:38 |
|
| spearce: plus, you'd not actually want to even read things into the index. "git-read-tree" is fairly expensive, because it (by definition) needs to recursively read _everything_. If you'd just read the data directly only for th efiles you want, that would be a lot more efficient. | 19:39 |
|
GyrosGeier
| gittus, that requires patch splitting code, but I'd expect it to be roughly O(n*log n) in the number of files touched by the patch | 19:39 |
|
gittus
| GG: sure. Whatever works. | 19:40 |
|
| I'd suggest using git-apply to help you, though. It already does almost all of the work, the only thing you'd need to extend it with is that "try to apply to a specific tree" thing. | 19:41 |
|
| Once you have that, git-apply would just tell you exactly which files do _not_ match, and you can use any heurstic you want to find the next commit. | 19:41 |
|
| However, I really do believe that you'd be better off using _all_ pathnames (from _all_ current heads), because if you use one path at a time, and it _does_ apply in that path, what will you then use as the set of starting points for the other paths (that _don't_ apply)? | 19:42 |
|
| Using just the last "this worked for file<n-1>" commit would be wrong, because that might be in a branch off the mainline (or off the series that the patch _ever_ applies to). | 19:43 |
|
GyrosGeier
| gittus, of course | 19:43 |
|
gittus
| See? You need to follow the _full_ history here, for _all_ paths. | 19:43 |
| → robin joined | 19:44 |
|
gittus
| But git really does have fairly good support for this. "git-rev-list --all -- <set-of-filenames>" really should give you exactly what you want (except you do want to check the current head first). | 19:44 |
| dwmw2 → dwmw2_lunch | 19:44 |
|
GyrosGeier
| gittus, essentially I have a list of blobs where patch 1 applies (I don't need to realize the list in memory), for each I have a list of commits where it is used at this path | 19:45 |
|
| gittus, then I can do a depth search | 19:45 |
|
| gittus, optionally reordering the patch files to get smaller sets | 19:46 |
|
gittus
| GG: I'm sure you can make it work somehow. But you need to then use the _full_ list of commits, because some commits will change fileA but not fileB, and when you do the final "which commits support _all_ those conditions" decision, you'll otherwise be in trouble. | 19:48 |
|
| So I'm still claiming that you're better off using "git-rev-list -- <filelist>" to get the list of interesting commits _first_. And once you do that, you might as well just use that list to test (one commit at a time) | 19:48 |
|
| But hey, the proof is in the pudding. I'm not actually going to write this, and if you will, you can do it any which way you want | 19:49 |
|
| "He who writes the code gets the final say" | 19:49 |
|
| Anyway, I'm off. I only piped up because I happened to look through the logs again. | 19:49 |
|
| Have fun | 19:49 |
| → ruskie joined | 19:53 |
| → somegeek joined | 20:01 |
| → kanru joined | 20:20 |
| ← Gitzilla left | 20:21 |
| → Gitzilla joined | 20:21 |
| → anholt_ joined | 20:37 |
| dwmw2_lunch → dwmw2 | 21:03 |
| → somegeek joined | 21:55 |
| → mchehab joined | 22:12 |
| ← mchehab left | 22:14 |
| → Oejet joined | 22:21 |
| → krh joined | 22:34 |
| ← Oejet left | 22:57 |
| → agile joined | 22:58 |
| → boto joined | 23:15 |