12

... repost within 10 minutes, aparently the image filter that is suppost to block these isn't working.

Comments
  • 2
    I think you have to downvote
  • 2
    @chadd17 who says i didn't after i screenshoted. But that wasn't the feature i was saying was broken, a few months ago @dfox added a hash check on images that was suppost to stop reposts that close together. So either it got removed somehow or there's a bug that can be exploited with the databases eventual consistancy seeing on how quick they were posted
  • 5
    It’s also possible the images aren’t exactly the same and therefore have a different hash
  • 1
    @jckimble the hashes don't match, just checked.
  • 0
    @JoshBent hmm don't know how, they look the same. Nvm then, wonder if they would match if rebuilt with the php gd library
  • 1
    @jckimble not all changes are visible, could be compression, manipulated by the publisher (changed 1 pixel, ..) etc.
  • 1
    It’s pretty easy to alter a md5 hash. That’s why checking hashes exists in the first place. If the slightest thing is changed the hash won’t match up and you know something is up. A better way of checking for duplicate images would be to actually analyze the image via a nural network, but that’s pretty resource intensive
  • 2
    @dfox I've been seeing the same issue lately, this makes at least 3 cases so far
  • 5
    What @dev0urer @JoshBent said.

    It's not broken and it does exactly what it was advertised as. It compares image hashes. That also means if anything about the images are different, ex. size, compression, or possibly even just how different devices save them (many conditions not visible to the eye), then the hashes won't collide. That's simply how the hashes work and I tried to make clear we weren't doing anything besides a simple hash check.

    And it does work, but that doesn't mean a bunch of images won't trigger it. We have a stat on how often it gets shown and there's also been rants posted by people who have gotten the alert for their own uploads.

    As the announcement post also explains, I've looked into deploying other methods aside from strict hash collisions. The main one was phash (perceptual hash), but it was terrible. I also solicited feedback and ideas from users but honestly didn't get anything too compelling in terms of improving the detection method, and the reality is image detection is hard.

    devRant is growing and there will always be reposts. From everything I observe, our filtering and voting methods seem to do pretty well in limiting them. As I've always said, the recent feed is not meant to be heavily filtered and you should expect to see things you don't want to see there sometimes. That's just the nature of a community that doesn't try to actively censor content and relies entirely on user-generated posts.

    We're always open to feedback and ideas for limiting reposts, and we do have a number of things coming soon which we think will help with that (the main one being rant types and rant type filtering)
  • 2
    @jpichardo while we definitely want to get the feeds as perfect as possible and use our algorithm to eliminate bad content, 3 incidents of reposts, even in a decently short time span, really isn't a lot IMO for a community with hundreds of posts per day and 35,000+ registered users.
  • 2
    @dfox sounds good really was just making sure cause on how close the time was while possible unlikely of being from different places. As for a way to check the images redrawing uploads with php and making a hash from that would fix encoding and format differences but I'm not sure on what kind of extra load it would add to the servers
  • 1
    @jckimble I'm still not sure though how that would be able to detect the same image. What you say sounds interesting and I might be missing something about it - do you have an example somewhere of the process?

    Resource-wise, rant posts with image uploads occur and a low enough frequency where we could pretty much run anything on it.
  • 2
    @dfox I can write a quick script for it and put it on a gist for you to look at since the last time I did it was for a client to save space on image uploads. while its not perfect it cleans out everything besides the actual image where nothing matters but the image content. so generally unless somebody actively tries to get around it it should filter repost images
  • 2
    @dfox https://gist.github.com/jckimble/...

    I tested it by downloading a meme and converting it with imagemagik and it always showed as the same image unless I tried it with another image. as long as you don't think it will put to much load on the servers it should prevent accidental repost
  • 1
    @dfox thanks for the response, my concern wasnt about 3 isolated cases, but that it was happening constantly, since those cases appeared in the last week, but never mind, im pretty sure that you are doing all you can to make this community great, THANKS
  • 1
    @jckimble awesome, thanks! I'll do some testing with that and see how it works.

    @jpichardo I see. Yeah, I mean a lot definitely slip by. I'd say it maybe catches like 25% now. Hopefully we can raise that number if @jckimble's method works well for it.
  • 1
    @dfox Its been over a year since i had to write something like this but if you catch any problems let me know and I'll see if i can fix them
  • 1
    What about scaling the image up and down again? It will have a lower resolution, which leads to a not-that-strict image comparer, because the hash will be the same even for slightly different images.

    ImageMagick shell command: convert $TMPBG -scale 10% -scale 1000% $TMPBG (stole from a cool lock screen, http://tinyurl.com/ocbjpom, percentage is a bit extreme, maybe just 100% to 1000% would be good), the ImageMagick PHP library can do that too.
Add Comment