/hydrus/ - Hydrus Network

Archive for bug reports, feature requests, and other discussion for the hydrus network.

Index Catalog Archive Bottom Refresh
Name
Options
Subject
Message

Max message length: 12000

files

Max file size: 32.00 MB

Total max file size: 50.00 MB

Max files: 5

Supported file types: GIF, JPG, PNG, WebM, OGG, and more

E-mail
Password

(used to delete files and posts)

Misc

Remember to follow the Rules

The backup domains are located at 8chan.se and 8chan.cc. TOR access can be found here, or you can access the TOR portal from the clearnet at Redchannit 3.0.

8chan Karaoke Night!

8chan.moe is a hobby project with no affiliation whatsoever to the administration of any other "8chan" site, past or present.

(28.40 KB 480x360 6QynGUt-J8c.jpg)

Version 344 hydrus_dev 03/20/2019 (Wed) 21:56:31 Id: 4bd52a No. 11975
https://www.youtube.com/watch?v=6QynGUt-J8c windows zip: https://github.com/hydrusnetwork/hydrus/releases/download/v344/Hydrus.Network.344.-.Windows.-.Extract.only.zip exe: https://github.com/hydrusnetwork/hydrus/releases/download/v344/Hydrus.Network.344.-.Windows.-.Installer.exe os x app: https://github.com/hydrusnetwork/hydrus/releases/download/v344/Hydrus.Network.344.-.OS.X.-.App.dmg linux tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v344/Hydrus.Network.344.-.Linux.-.Executable.tar.gz source tar.gz: https://github.com/hydrusnetwork/hydrus/archive/v344.tar.gz I had a mixed week, but I got some good hydrus work done. It is basically all misc this week. highlights The Client API v1.0 got its last polish this week. I fixed some bugs and added a new parameter to control page selection to the /add_urls/add_url command. Future Client API work will be entirely in regular weekly work. I'd like to add wildcard and namespace tag searching, more system predicates, autocomplete tag searches, optional https to encrypt communication outside a LAN, cookie.txt import, and I am sure many other things. The basic bones are decent now though–I now need only hang new things off it. Animation scanbars (the draggable area just below a gif or webm in the media viewer) now have millisecond-precise x/y progress timestamps! This was fun to do, and it worked out pretty neat–it even works for variable framerate gifs. I have added barebones .psd (photoshop document) file import support! This is a complicated format that supports image layers and so on, so I cannot provide any native rendering, but it will parse width and height. It presents as 'application/x-photoshop'. PSB files (which support huge 300k width/height) are also recognised, if you happen to have any. Hydrus will treat them as .psd as well. This has been asked for several times, so please let me know if you find useful workflows here. Note that hydrus assumes its file contents will never change, so do not try to load a hydrus-stored .psd in Photoshop and then save any changes back! This is best for 'finished' psds that you are effectively archiving. There is a new 'open_known_url' shortcut under the 'media' shortcut set. It should work on a focused thumbnail or the media viewer: on activation, if the file has one 'recognised' URL (the ones listed in the top-right of the media viewer), it will launch in your browser, and if it has multiple, you will get a list of all known URLs you can quickly navigate with arrow keys and enter to choose which you want to see. full list - final v1.0 client api polish: - added optional 'show_destination_page' arg to '/add_urls/add_url', defaulting to False, to control whether an URL-add will select (i.e. jump to) the destination page in the ui. this changes the default behaviour for this command - simplified the routine that finds or creates a watcher or url import page and fixed a bug in the api that was not creating new pages when destination_page_name was specified - some misc cleanup - fixed fetching file_metadata by hashes - fixed the client api help regarding file_metadata response example tags - client api version is now 5 - . - the rest: - psd support added! because of this format's potential multi-layer complexity, it will not render natively, but width and height are parsed. it is treated as 'application/x-photoshop'. PSB is also recognised and treated as psd - added a 'open_known_url' shortcut to the 'media' shortcut set that lets you quickly open URLs for files. if there is one recognised known url, it will be launched, and if there are multiple, a list with all known urls will appear to select which one you want - animation scanbars now show an x/y current timestamp! it includes millisecond timings and even works for variable framerate gifs. whether to show a second/shadow caret for timestamp position on variable frame rate is a new discussion to have - fixed an issue where animations would sometimes not resume animation for several seconds after a big scanbar drag - when the thumbnail manager cannot produce a thumbnail due to a storage error (like a missing file), it now only puts up a single, more informative error popup on the first problem. subsequent errors are printed silently to the log. (these errors tend to come in en masse, so this cuts down on spam and error-related ui lag that was making loading a bad session difficult) - improved error reporting when an upload pending command would fail due to service non-functionality–it should now give a popup with error info imediately, rather than obscured through the login system - added temp_dir parameter to the client and server that will override which temporary directory the program will use - cleaned up how no_daemons and no_wal mode are handled internally - no_wal mode now has to be called from the command parameter, the no_wal file hack in the db directory no longer works - missing ffmpeg errors now prompt the user to check if it is installed - searching for numerical ratings should now work for files that were rated when the service had a different number of stars (ratings now searches in 'bands' rather than exact values) - reduced the min height of the new import files frame's list - doubled the decompression bomb test to permit files up to ~179 megapixel, we'll see how it goes - misc cleanup
[Expand Post] next week I have some IRL stuff going on atm. I hope it will not affect my hydrus time, but it may steal a day or two from me. If things line up well, the situation should be resolved within a week or two. With Client API done for real now, I will move completely to OR search. Otherwise, next week will be an 'ongoing work' week. I hope to pull the trigger on eliminating hydrus's double-thumbnail system (i.e. moving to just one thumb per file). The db update next week will thus likely be a big deal, maybe several minutes' of deleting for users with larger clients.
>>11975 >I hope to pull the trigger on eliminating hydrus's double-thumbnail system (i.e. moving to just one thumb per file). The db update next week will thus likely be a big deal, maybe several minutes' of deleting for users with larger clients. Will this take effect on my db backup the next time I update it?
>>11979 Yeah. I expect I will be deleting all the txx directories on your client_files directory and renaming all the rxx to txx. I am going to think about the rename though, as that is a big rigmarole in backup sync overhead just to have 'neater' names. Either way, at least half your thumbs will be going, assuming it goes well on my end. One smart thumb instead of two dumb ones. I will write a summary of what I am about to do in the release post and in a popup before the db updates.
(543.81 KB 869x1406 Untitled.png)

Just wondering, what are your plans for improving the PTR? It's useful but it's also kind of a mess. I just noticed some jackass has added a ton of unrelated series tags to a lot of creator:sabudenego's images… pic related. It's messing with my searching.
>>11987 > tfw "we need a social media netork" for 2 years now this still happens because unrelated images having the same imagehash and people are just too lazy to deal with it (possibly) I don't think that people would troll this hard
>>11987 I had this problem last year and had to fix by hand. Search for series you have little to none of, petition the PTR and replace with the tags in local/from a good booru. Almost seems like a shit script is snowballing these tags. What's under creator btw? The cancer spread there too.
>>11987 Probably a script error. Anyway, I just fixed it so it'll be gone in a day or so
>>11989 >>11987 >>11990 Are you sure someone is not trolling the PTR? Because dev has to manually approve everything
>>11987 >>11988 >>11989 >>11990 >>11996 Yeah, most of these sorts of spammy outcomes are due to a script misfiring. Either something generating .txt tag import files, or a website that exposes the wrong tags in the wrong place. Thank you for fixing them. I do not approve what tags go up (there are like 250,000-1,000,000 a day, and I can't see what files they apply to unless I have them on my irl client), but I do approve what gets deleted. For the PTR, I am really pleased with how many users have contributed and written clever systems to populate it, and I am stunned I have been able to expand it to deal with so many tags, but there is also bunch I am not happy about. The main problems from my perspective are: 1) Messy standards. Conflicting booru styles and individual opinions on what is 'good' mean for conflicting search terms, which tag siblings cannot always fix, especially when what a good tag sibling in the particular case is also uncertain. 2) It is too fucking big. About 450 million mappings atm, which makes for a gigantic db. 3) Too many tags people don't care about. 'title' tags are not as great as I once thought they would be, and there are a ton of shit unnamespaced tags parsed from filenames and tumblr-style-linked-tags-where-many-separate-tags-become-one-mess. 4) Bad tags/files hang around. Even when a tag is deleted, its master record for the tag and its file hash remains. This bloats up db size and will have to be addressed at some point. Current plans to address these are: 1) Improve tag filtering and tag siblings over the long term. I'd like to update the current 'tag censorship' system to use my new flexible tag filter object and add a db-level cache to compute and search this stuff quickly. Tag siblings could also do with vast improvement to define different types of sibling and allow personal preferences (it'd be nice if you could choose whether you want the 'clothing:' namespace to display, for instance, while still recognising and agreeing as a community that 'clothing:black socks' and 'black socks' are synonyms. Unfortunately, I am just a dude with few social skills, so I will not ramp up and recruit some janitors to try and police and moderate what is on the PTR. It'd just kill me with stress trying to deal with all the conversations and dispute resolution. If others want to create tag repos that have stricter upload standards, please feel free–I just am not that guy. 2 and 3) I would like, this year, to split the PTR into multiple repos. One for series/creator/character/person 'big' namespaces that are extremely difficult to dispute, one for smaller namespaces like clothing and species, one for unnamespaced tags, and one for often-unique stuff like title and filename namespaces. The low-incidence, long-length tags like title tags bloat up client.master.db and lag out many autocomplete queries, and are not useful for searching. My belief now is that tags are for searching, not describing, so title information is not useful by default, only for enthusiasts. When I do this split, nothing will be lost from any client, but you'll be able to choose what tags you want to sync with. This will lighten the db size for everyone who just wants to search for 'character:samus aran'. I'll also look into integrating the new tag censorship's tag filter into the sync process to further lighten your basic db size and reduce processing time. 4) I don't have a firm plan for this, but I want to integrate some sort of recycling or resyncing process in a future network version whereby a repo can say 'all these hashes and tags are gone, all these update files from time x to y have been regenerated, please cull any orphans'. This would have both client and server interactions, perhaps even some kind of '2012-03 is nullified, please resubmit what you actually care about' and would aim to cull old and stupid shit. These are all big jobs with significant changes to the client and server. I doubt I will be able to do them in normal weekly work. This stuff is on my mind, and if one of the problems becomes critical, I may have to take my executive privilege to override the next 'big thing to work on poll' and knuckle down and fix it. Another idea I have long had is to do some tag metadata swapping between trusted clients. Now we have the Client API running, this is more and more of a possibility. It may make sense to ultimately transition away from reliance on a bigass central server and instead foster more of a distributed network. The final objective of the hydrus network is to birth auto-tagging systems (neural network or whatever) so all tagging happens on your own CPU cycles and birth an imageboard-cultured egregore waifu, and the only tag information shared between users is 'how to tag' metadata. The current tag sharing we are doing is excellent prep work to train our auto-tagging systems in future. True work on this will be many years from now and will be greatly shaped by how this tech shakes out IRL.
>>11997 Also if we move into more client-to-client interactions, I'll be able to distribute update files through users in a P2P fashion, reducing my bandwidth requirements significantly if I technically need to ever say to clients, "that 200MB of definition info is out of date, please update to these new masters".
>>11997 >>11998 The ideas of socialized tagging becomes ever more stronger…
>>11997 >The low-incidence, long-length tags like title tags bloat up client.master.db and lag out many autocomplete queries, and are not useful for searching. When they are unique titles used on single images then I agree, but the title namespace is very useful for searching for multi-page comics for sorting by creator-series-volume-chapter-page. On the subject of PTR bloat, what we need is for something like a dedicated syncing PTR per booru so that when bad tags get removed online they get updated in that booru's PTR. Scrape new content regularly and do a full scrape every few months.
>>11998 Isn't that one of the easiest tasks? At least if you start with a quick hack with an embedded BT client (on by default) that just gets version:magnet hash lists or something? One extra checkbox to disable it for where it doesn't work/where people don't want to share and I'm sure you've saved most of your BW regardless.
>>11997 >2) It is too fucking big. About 450 million mappings atm, which makes for a gigantic db. Is this DB size generally problematic to manage on your machine / server? I don't mind splitting these up at all, seems like a great idea to me. But I'll also guess that there are a good bit more than 450m pieces of CG/drawn art with a known artist out there, never mind photographs and so on. And I'm sure the rest of the world getting internet access will only add more. Wouldn't it be wise to assume that even if you split the DB up, even some of the primary "objective" databases will hit something like 450m again in relatively short order as more tag sources are ingested?
>>12004 and make sure that images deleted from boorus cannot get their tags removed from PTR automatically
>>12008 Tags of deleted files are still available on boorus. The whole page is still there. eg: https://e621.net/post/show/6268 (NSFW, NSFL)
I've noticed you added basic .psd support on this release, is .svg support on the tables as well for a future version? Not asking for anything egregious like a full viewer, but basic import with a preview would be nice. (not at all because I stumbled upon a ftp of svg icons I'd like to preserve or something)
>>12004 Yeah, my hope is when I add cbr/cbz support with native inspection and multi-page navigation in the media viewer, we'll reduce a lot of the hassle here. One 70MB cbz with 'title:my comic title', 'volume:1' is so much less overhead for this than spamming that to a hundred individual jpg pages and dealing with page tags all over the place. But having all kinds of 'title:I am feeling good today, done three commissions' text search shit up any autocomplete entry is less helpful. I wonder perhaps if there is a place for the Tag Filter here as well, saying something like 'if I didn't start my tag entry with 'title:', do not show any 'title' namespace results. I am all for more tag repositories going up for different purposes, and for better exposing them to external scripts to automate what they have. If you or anyone would like to try this, I am happy to help and work on improving the admin side of things (which is ugly as hell, and has awful user management etc… because I have been the 98% primary user of this stuff so far).
>>12005 I have very little experience with p2p, and I am not sure what good python options I have available. I'd like to integrate any client-to-client talking into the client ui and have some sort of 'trust' network so users have more control over who they are talking to (whether that is 'I am happy to help share the bandwidth load, I am on 24/7 and will talk to anyone' vs 'I want to share my private collection of dbz kino with my two friends') rather than linking users together automatically. In any case, I haven't thought about it seriously yet. The bandwidth crush is not a big deal for me yet, but it will be if I trigger any sort of update regeneration. >>12006 The problem I am concerned for is more clientside. A full PTR sync now takes ~14GB total in local hard disk space. Since most users only ever use a small fraction of those tags, I'd like to cut it down for technical reasons like processing speed and also convenience. The server db, which stores a bit more info like account_id and so on with each content row, is now 28GB. I don't mind that bloating up to 100-200GB in future as the collection grows, but I don't want to say 'hey new user, if you sync with the PTR, prepare your expensive SSD angus for 64GB of random shit tags you'll never see'. For me, this is all ultimately a series of stopgaps for the next ten years or whatever until auto-tagging tech takes over the easy stuff. The 'every client gets everything the server has' idea of tag repos has worked overall really great, but it is not sustainable at the current pace. It could be sustainable as >>12004 suggests, making a certain tag repo devoted to a limited set of files and rules that will grow slower than new hard drives are, but my PTR's free-for-all is too unmanaged to last forever. That said, it seems that training databases for neural networks etc… are pretty huge multi-GB things as well. Another thing that we'll have to see how it shakes out. Still, I'd be happier with a 200GB auto-tagging db in 2025 that grows at 2% a year than a 64GB one growing at 100% a year and accelerating. My PTR was always supposed to be a short-term experiment, ha ha. When it hit a million tags, I was like, 'Hey, this is amazing. When growth slows, I guess about 10 million, I can devote more time to pruning bad stuff I guess.'
>>12011 That might be slightly more tricky because I think an SVG is really an xml document, right? There isn't therefore a neat reliable file header, but I'll make a job to look into it. Adding 'document' handling to hydrus is a long-term objective, so extending initial file parsing to say 'ok, it wasn't a video, let's see if it loads as xml and if it has any good file metadata tags' will come one of these days in any case. A quick search suggests there are a couple python svg libraries around, but I don't know how good they are. If there's a great one out there, I can add support for this much quicker (and add native rendering) as I can offload the difficult stuff to that.
>>12014 Yeah, svgs are basically XMLs with a specially defined namespace at the beginning. If you ever get to document/xml handling, namespaces are probably the way to go to differentiate them from say, html files. The mozilla developer network has a surprisingly good primer on the stuff imo: https://developer.mozilla.org/en-US/docs/Web/SVG/Namespaces_Crash_Course I'm not versed enough in python to recommend a library, but considering how often python is used in math and how often svg is used to make charts, I'd wager there's a great one you can probably use.
>>12014 For initial parsing, have you considered libmagic from the "file" unix utility? It doesn't do much to extract metadata, but if it's just a matter of finding the proper MIME type it should do a good job. https://github.com/file/file/tree/master/python
>>12013 >have some sort of 'trust' network so users have more control over who they are talking to Wouldn't it be easier to start with some "all of this is shared" type bittorrent or such integration? There are good libs for that that are easy to use. You can define what is shared, but you'd omit the "with who" part for now. Sharing in a web of trust or with friends requires a good bit of UI work to make any sense, and security flaws will also upset people a lot more.


Forms
Delete
Report
Quick Reply