/hydrus/ - Hydrus Network

Archive for bug reports, feature requests, and other discussion for the hydrus network.

Index Catalog Archive Bottom Refresh
Name
Options
Subject
Message

Max message length: 12000

files

Max file size: 32.00 MB

Total max file size: 50.00 MB

Max files: 5

Supported file types: GIF, JPG, PNG, WebM, OGG, and more

E-mail
Password

(used to delete files and posts)

Misc

Remember to follow the Rules

The backup domains are located at 8chan.se and 8chan.cc. TOR access can be found here, or you can access the TOR portal from the clearnet at Redchannit 3.0.

8chan Karaoke Night!

8chan.moe is a hobby project with no affiliation whatsoever to the administration of any other "8chan" site, past or present.

Big things to work on next hydrus_dev 10/31/2018 (Wed) 21:34:04 Id: a13630 No. 10429
With the download engine and login manager coming to a close, I will need something new to be anxious about and near-overwhelmed by. I will put up a poll in a few weeks for everyone to vote on a big list of possible new features that are too large to fit into my normal weekly work. The poll will allow you to vote on multiple items. I hope to work on the most voted-on item for two to three months before starting the cycle again. This thread is for discussion of the list, which at current looks like this: - Just catch up on small work for a couple of months - Improve tag siblings/parents and tag censorship - Reduce crashes and ui jitter and hanging by improving ui-db async code - Speed up tagging workflow and autocomplete results - Add ways to display files in ways other than thumbnails (like 'details' view in file explorers) - Add text and html support - Add Ugoira support (including optional mp4/webm conversion) - Add CBZ/CBR support (including framework for multi-page format) - Add import any file support (giving it 'unknown' mime) - Improve 'known urls' searching and management - Explore a prototype for neural net auto-tagging - Add support for playing audio for audio and video files - Add OR file search logic - Add an interface for waifu2x and other file converters/processors - Write some ui to allow selecting thumbnails with a dragged bounding box - Add popular/favourite tag cloud controls for better 'browsing' search - Improve the client's local booru - Improve duplicate db storage and filter workflow (need this first before alternate files support) - Improve shortcut customisation, including mouse shortcuts - Import/export ratings, and add 'rating import options' to auto-rate imports - Add more commands to the undo system - Improve display of very large/zoomed files in the media viewer - Set thumbnail border colours on user-editable rating and namespace conditions - Improve hydrus network encryption with client cert management and associated ui - Add tag metadata (private sort order, presentation options, tag description/wiki support) - Write a repository-client refresh/resync routine to clear out junk data and save space - Prototype a client api for external scripts/programs to access - Support streaming file search results (rather than loading them all at once once the whole query is done) - Increase thumbnail size limit (currently 200x200) - Add an optional system to record why files are being deleted - Improve file lookup scripts and add mass auto-lookup - Cleanup code and improve practises - Add multiple local file services - Add an incremental number tagging dialog for thumbnails I am happy to work on any of these items. If you have questions, please ask, and if you have suggestions for new items, go ahead.
>>10541 I can add FLIF in easy short work as soon as PIL or OpenCV add support, which I don't think they have done yet. Or, if someone can point me to a good, non-meme pypi FLIF library that can do some version of GetResolutionAndOtherMetashit( path ) and numpy_array = GetRGBPixels( image ). As long as someone else does the decoding work, it is only about twenty lines of work on my end.
>>10544 Thank you. We'll see how this vote shakes out, but either way, if you are still keen to do work on something like this, I'd love to outsource the expert pain-in-the-ass part so I can focus on building a workflow in the hydrus ui. I'm still a sperg about collaborating, but any sort of library that made parts of this easy-peasy would be very welcome. I guess we are probably talking two(?) components: 1) Given a model, what tags are suggested for this image? 2) Given tagged images and maybe some human interaction, how to make a model? Although I presume we are also talking about some shared interface layer and whatever else is needed. Since we already have i2v model, if you made a library that did the grunt work of 1, I could probably integrate that into a new column in the tag suggestions stuff in regular weekly work. 2 would need to be in 'big work' and more emails/posts back and forth to figure out what workflow and calls the library would need. I don't know much about this, so any thoughts you have on making this stuff real are welcome.
>>10548 >[…] just imported as blind zips right now. Can you please enable ugoira download for the pixiv downloader? That way we could already start hoarding properly. All the current ugoiras have the animation.json file, which starts with the key "ugokuIllustData", so that would be the 100% accurate way for recognition. Though some people might have older ugoiras that only have the 6-digit numbered jpgs or pngs in the zip file. I guess those might be confused with zip files containing comics, so it might be good to have a way to manually change the handling (animation/book) for those. Since old ugoiras don't have any frame duration information included in the zip, being able to set the frame rate manually would be good in that case.
>>10552 This is a tough one. I have just looked at the problem again. The link we are currently using to get pixiv metadata is this: https://www.pixiv.net/touch/ajax/illust/details?illust_id=71528360 (for page https://www.pixiv.net/member_illust.php?mode=medium&illust_id=71528360, which is a recent post) It provides a JSON-less zip: https://i.pximg.net/img-zip-ugoira/img/2018/11/06/05/48/45/71528360_ugoira600x600.zip With the frame timings embedded in the API JSON. My new downloader isn't clever enough to synthesise new files from multiple sources of data, so grabbing the zip and inserting some frame timing json up would require a more significant add-on, which I would expect to write in adding ugoira support. It isn't something I can do quick. As it happens, I was looking at how danbooru do ugoiras, and the couple ugoira zips I downloaded from them didn't have frame timing JSON in the zip either. I wonder if they are just pulling the zip file and using some flat 25ms or something for their webm conversion? Am I talking rubbish here? Do some pixiv zip links have the animation.json in them, and I just missed them? Do pixiv ugoira pages link to different zips anywhere, and the API is just using different stuff?
>>10560 hey goy. is there support for choosing UI font and fontsize? if not then will you add? ty
>>10551 Alright, I'll try to infodump what each of those would require. i2v is a multilabel classifier – you can give it an image and it will give you the confidence for a bunch of tags (1539 of them, examples 'yuu-gi-ou zexal', 'tokyo ghoul', 'kin-iro mosaic', 'safe') The other kind of model is a binary classifier – it only gives you one tag at a time. Either way, you feed it an image and get back a number from 0 to 1 for each tag, and you get to decide what's the cutoff. The model itself is stored in a large-ish file for the weights. For example, the weight file for i2v is 180 MB and doesn't compress much. This isn't tiny, but it's on the small side compared to some more powerful models. Loading the model takes about 0.8s on my machine, classifying one image takes about 0.33s. The steps to build a model from scratch are: >Decide on the architecture This includes describing the various layers, and deciding how many tags you want to look for. >Gather training data The amount of data you need depends on how "simple" the tag you want to find is, and how similar are images with / without the tag. A few hundred images is probably enough to train some easy tags, a few thousand should be able to handle harder ones. >Run training This involves letting your computer run full blast for a bit while it does a bunch of linear algebra on the images. GPUs make this much faster. It depends on the amount of data we use, but I'd expect most models worth training to take an hour of GPU, or maybe 10 hours of CPU (very rough estimate). There are tricks you can do to let everyone help out training a single massive model, but that's a technical and logistical nightmare. There's a trick you can do called "transfer learning" which lets you piggyback off a model you already have. It might be possible to use this to add tags to i2v that aren't in the basic list. This would produce a small model (that still require the larger one to work) and would take less time to train, but it's limited to things that are similar to what i2v was trained on originally.
>>10551 For case 1, I've got a pretty basic file that runs i2v. Loading code, tag list, weight file are at >https://github.com/antonpaquin/Hydrus-Autotagging/blob/master/illust2vec-flask/illust2vec.py >https://github.com/antonpaquin/Hydrus-Autotagging/blob/master/illust2vec-flask/tag_list.json >https://github.com/antonpaquin/Hydrus-Autotagging/releases/download/0.1/illust2vec.h5 This will take a PIL image in and give you a dict of {"tag": score} out. This is probably enough to power the first component, and you can probably reverse engineer enough to not have to use my code at all. One possible way to handle case 2: I could build a thing that takes N images with a tag, and N images without the tag, and builds a classifier for that. There's a lot of potential for change here, but I think that's the simplest form.
>>10429 Maybe an overhaul on tutorals in the help section on the hydrusnetwork site would be my only request, there's a lot to learn about the various features in hydrus that just isn't there at the moment. I don't know, maybe let other people contribute their own tutorals if you're too busy and all.
>>10560 That's a bummer. I used the Px Downloader add-on to download the ones that include the json: https://rndomhack.com/2016/01/15/px-downloader/ I wonder if it actually re-packs them? I made sure to disable ugoira conversion in it's settings, which is why I was sure it wouldn't change the original file. I will dig around some more and see if I can find more info.
>- Add import any file support (giving it 'unknown' mime) Absence of this (and the ability to store original file name) is the main reason why I haven't considered moving to Hydrus just yet.
>>10569 you do know that you can add any arbitrary namespace right? so filename:<name> is possible. people are doing this.
Improve the client's local booru, atleast tag search Prototype a client api for external scripts/programs to access
>>10457 Is there any way to keep tags out of the PTR if you're using it. Or is there any way to make sure you aren't committing to it?
>>10572 you have to set up the ptr if you want it, hydrus does not come with it preinstalled, so just don't install it. Also if you do set it up you have to approve tag uploads, so you can be sure your tags stay your own.
>>10541 I keep all my thumbs on an nvme ssd along with the database, much rather have it this way then the hdd getting hammered looking for a few hundred images to generate thumbs for. that said if flif compresses the thumbs better than jpeg, that would be greatly appreciated. ——————- Ok hdev I remembered something that was brought up a while ago that needs to be improved. the duplicate detector has to have either a mode or a setting that lets you see already known pairs. one problem we discovered a while back was if you import a duplicate, do the dup detector and then it somehow gets re imported, you will never be told the duplicate is back. Then there is also something I asked about a while ago with duplicates, one being a 'contender mode' and one being 'prefered alternate' contender more is simple. you have an image, you determined it is the better of two images, this one and all its other potential dups get taken out of normal duplicate processing and pushed to a second one, because you now have a known better image. this way you could quick filter all the contender images, getting rid of all the images that are lower resolution or file size, and only needing to go though the ones that are potentially better rather then all potential candidates. Im not so much thinking this will be used in the first go around with dup processing, but every subsequent one, its a good bet that you will use it to weed out the junk. Now the reason a filter in and of itself is not good, is simply because resolution or file size of an unknown image is not a good way to determine if the file is good. some faggot on 4chan hated a thread so he bloated every image out and made them fuzzy and un desirable for weeks/months trying to kill a thread, and his work comes up time and time again in dups for images, just looking at file size or resolution would save that shit and remove the good image. Contender mode would get rid of that because there is already a known good version of the image, and you are only looking at higher resolution version, or higher file size versions. and finally 'prefer alternate' It doesnt need to be a mapped choice, it just needs to be a choice. I have several artists who I like who decided to make 20 10mb images that are all the same just small changes, and of them I may want to keep 2 or 3 images, so a perfer alternate option would allow me to mark one for deletion while knowing I don't have it, but I do have a similar one I liked more. this is kind of a useless thing for most people but would be helpful for me along with the "Add an optional system to record why files are being deleted" system.
>>10541 Didn't FLIF died or something? I think they stopped working on it.
>>10592 No, the faggots are at gitter talking about how to get it to the mainstream when BPG/HEIF beat them fair and square for compatibility.
The moment when you read the help about hydrus being able to control and manage files without importing but still doesn't understand a word How you do this , I mean how you tell hydrus to manage those files WITHOUT importing ? also how you can make hydrus make a subscribtion or follow a tag on image boards on general to automatically import new images from internet ?
>>10596 >[…]BPG/HEIF beat them fair and square for compatibility. Maybe compatibility for hardware decoding, but not software compatibility, which is far more important. No big websites will want to support these formats, because they're containers for HEVC intra frames, which is a licensing/patent nightmare. Google developed their own codecs to avoid patent fees, so they will probably not want these formats supported in Chrome. Same for Mozilla, who are pushing for the AV1 codec, so they would probably add support for AVIF long before they cave in to support HEVC based image formats. FLIF meanwhile doesn't require licensing and doesn't appear to cause any patent conflicts so far. So I wouldn't call it dead yet.
>>10561 At the moment, it should pull whatever your OS defaults are, I think for both font and size. I don't set anything specifically atm, afaik. I am not a big fan of themes and making things pretty (as you can probably tell!), so I struggle to revisit ui to neaten it up once I get bare functionality going. I am not against the idea of adding font customisation, but I think I would have to do a bunch of code and ui cleanup first.
>>10601 > FLIF meanwhile doesn't require licensing and doesn't appear to cause any patent conflicts so far. This time not licensing will make it loose competitive edge, soon HEVC will be standard, and Google forcing WebP, Firefox forcing APNG like they normally do… FLIF is toast.
Requesting coverage for most sites in https://theporndude.com/hentai-porn-sites (some sites require JS like Hitomi)
>>10601 >>10603 For reference remember #gitter_FLIF-hub=2FFLIF:matrix.org
>>10562 Thank you, this is great. I've copied it into my ML masterjob. >>10565 Yeah, if other people would like to write their own tutorials for anything in text or html, I am very happy to link to it or host it on the github.
>>10572 Not really. I'd like to add some tag filters to exclude bad tags at the server end ('banned artist' and 'url:' garbage) and allow for 'I only want 'creator:' tags' at the client end. And some repo recycling/cleaning to clear out some of the cluttered master records and reduce dbshit.
Could you make python3 a choice? just for the python features and the poor weird path people?
>>10575 Thank you, this is interesting. There's a lot I would like to do with the duplicate filter and system generally, especially ui to show/browse/review found duplicate relationships. First I will have to clean up the db side of things. I want to move from the current pair-hell to ordered groups that will allow for neater 'this is the best one' actions. This new structure will also work for siblings and parents, btw, which is not a dissimilar problem to deal with.
>>10598 You cannot manage files without importing. Sorry for my bad wording! My new downloader help is here: https://hydrusnetwork.github.io/hydrus/help/getting_started_downloading.html The subscription help is out of date, but I hope to improve it in the coming weeks. If you can't understand what I've written, let me know and I'll see if I can reword it. Feel free to email me or grab me on the discord if you want to work one on one.
>>10608 I hope to convert to python3 over this holiday. I will stop putting out releases starting on the 12th December and hope to have it done in four weeks. I will start working on the result of this thread's poll first thing in the new year.
I am adding "Add multiple local file services" to the list.
>>10566 >>10560 Okay, it seems very likely that PX Downloader re-packs the zip. It seems to be closed source, so I couldn't confirm, but I had a close look at what pixiv fetches when playing ugoira and a file with included animation.json doesn't exist there. The URL you get from that meta data doesn't get you the high quality version. I found that there is a second .json specifically for ugoira meta data. Example ugoira: https://www.pixiv.net/member_illust.php?mode=medium&illust_id=48731415 Ugoira meta: https://www.pixiv.net/ajax/illust/48731415/ugoira_meta This one lists 2 URLs, "src" and "originalSrc", to get the bigger version. Though of course my findings don't make archiving these easier. What would be the best way to preserve the originals? I asked for preserving the original files because I assumed that pixiv fixed their format to include meta data in the zip, but that's not the case after all. If we re-pack the zip file to include meta data, we get the problem of changed hashes that I wanted to prevent by archiving originals. One idea I had was to mux jpgs into an mkv file as mjpg. That way the frame timings can be saved and the images are not re-encoded. ffmpeg -framerate 30 -i %06d.jpg -codec copy mjpg.mkv mkvmerge -o ugoira.mkv -d 0 -A –timestamps "0:timestamps.txt" mjpg.mkv timestamps.txt would contain time stamps for each frame as the absolute time elapsed BEFORE each frame, while ugoira uses a relative pause AFTER each frame. E.g. these ugoira timings: >{"file":"000001.jpg","delay":30}, >{"file":"000002.jpg","delay":30}, >{"file":"000003.jpg","delay":30} would become these mkv timestamps: ># timestamp format v2 >0 >30 >60 I made a proof of concept python 3 script that converts all frames in a folder to an mkv file with correct variable frame rate. All frames need to be unpacked and the "ugoira_meta.json" needs to be saved to the same folder, because the script generates the timestamp file from that. It won't let me attach the script, so I put it here: https://pastebin.com/kdaH6CqE It won't let me attach the sample mkv file either, so I uploaded it here: http://tstorage.info/1rqrc4o43gqu For identifying ugoira files, an idea I found was to use ffmpeg to generate frame hashes for the individual jpgs: ffmpeg -i %06d.jpg -f framemd5 - These hashes actually stay the same even if the container format changes. The jpgs in the original zip file and the muxed mkv mjpg frames will have identical hashes. What do you think about this solution? This way we could get proper video files without re-encoding anything, and get consistent hashes to identify files.
I used to want to be able to add any generic file the most, but out of using hydrus daily what I want the most now is to be able to force-check a picture that failed to import properly with all the tags it should have. Automatic tagging is promising, but ultimately unnecessary for me since I just scrape files most of the time. If anything id use autotagging as a backup system to a failed tag import
The ability to tag images with consecutive numbers outside of the import files dialog. It would make tagging comics/doujinshi downloaded using the downloaders/watchers much, much, easier.
(49.88 KB 767x693 jap2.jpg)

>>10611 Nice. >>10429 >Add an interface for waifu2x and other file converters/processors Would it be possible to work with offline versions as well? I installed waifu2x on my machine so that I wouldn't have to rely on an internet connection. t. linox
I am likely to make the poll today, with the release. I may unsticky and lock this thread to move convo over there, but I am not sure. >>10622 That mkv jpg solution looks great! Thank you for figuring out the variable frame rate stuff and putting it a script together. I have copied this to my ugoira notes for when I get to this. >>10623 Let me know if I misunderstand here, but you can probably do this now by running the problem file through a program like HxD to figure out its sha256 hash and then searching in hydrus in 'all known files'/'public tag repo' search domain using system:hash=abcd… . That said, if a file cannot import to hydrus, it likely doesn't have any tags in hydrus–or do you mean like 'what tags it has on the site I meant to get it from'? In either case, I'd be interested in examples of files that look fine but won't import. Please feel free to submit the files themselves or URLs to them! >>10639 Thanks–I put this on my 'see if you can sneak this in' list a little while ago, and it just didn't happen. I am adding it to the list here as "Add an incremental number tagging dialog for thumbnails". >>10648 I greatly prefer doing transformations like this with our own CPU/GPU cycles, so I would likely start such a system by talking to local executables and then extend it to work with http POST queries depending on demand.
>>10650 The poll is up! Please go >>10654 to vote!


Forms
Delete
Report
Quick Reply