/hydrus/ - Hydrus Network

Archive for bug reports, feature requests, and other discussion for the hydrus network.

Index Catalog Archive Bottom Refresh
Name
Options
Subject
Message

Max message length: 12000

files

Max file size: 32.00 MB

Total max file size: 50.00 MB

Max files: 5

Supported file types: GIF, JPG, PNG, WebM, OGG, and more

E-mail
Password

(used to delete files and posts)

Misc

Remember to follow the Rules

The backup domains are located at 8chan.se and 8chan.cc. TOR access can be found here, or you can access the TOR portal from the clearnet at Redchannit 3.0.

Uncommon Time Winter Stream

Interboard /christmas/ Event has Begun!
Come celebrate Christmas with us here


8chan.moe is a hobby project with no affiliation whatsoever to the administration of any other "8chan" site, past or present.

(18.16 KB 480x360 PXAlznKcJvA.jpg)

Version 362 hydrus_dev 07/31/2019 (Wed) 22:53:48 Id: bf06a9 No. 13387
https://www.youtube.com/watch?v=PXAlznKcJvA windows zip: https://github.com/hydrusnetwork/hydrus/releases/download/v362/Hydrus.Network.362.-.Windows.-.Extract.only.zip exe: https://github.com/hydrusnetwork/hydrus/releases/download/v362/Hydrus.Network.362.-.Windows.-.Installer.exe os x app: https://github.com/hydrusnetwork/hydrus/releases/download/v362/Hydrus.Network.362.-.OS.X.-.App.dmg linux tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v362/Hydrus.Network.362.-.Linux.-.Executable.tar.gz source tar.gz: https://github.com/hydrusnetwork/hydrus/archive/v362.tar.gz I had a mixed week. The duplicates overhaul work is finished. duplicates work finished The duplicates storage overhaul is done! Everything is now on the new storage system, the duplicate filter has had some quality of life attention, and now there is some updated help: https://hydrusnetwork.github.io/hydrus/help/duplicates.html If you have used hydrus for a bit but haven't checked out the duplicate system yet, this is a good time. I added some final new things this week: the duplicates filter now highlights common resolutions, like 720p and 1080p, and there is a new 'comparison statement' for common ratios like 4:3 or 16:9. Also, the thumbnail right-click file relationships menu now provides a choice to clear out potential relationships and, if in advanced mode, perform some en masse remove/reset actions on multiple thumbnails at once. I am now free to move on to another large job. Audio support was very popular at the last vote, so I will spend a couple of weeks trying to get some simple 'has audio' metadata going, but then I am going to address some growing issues related to tag repositories, easier 'I want these tags' management, namespace siblings, and multiple local tag services. deepdanbooru plugin If you are an experienced user and interested in testing out some neural net tagging, check this out: https://gitgud.io/koto/hydrus-dd/ This project by a hydrus user lets you generate tags using the DeepDanbooru model and get them into hydrus in a variety of ways. If you give it a go, let me know how it goes and what I can do to make it work better on the hydrus end. full list - duplicates work finished: - updated the duplicates help text and screenshots to reflect the new system - duplicate files search tree rebalancing is now done automatically on the normal idle maintenance routine, and its over-technical UI is removed from the duplicates page - the duplicate filter's resolution comparison statement now specifies 480p, 720p, 1080p, and 4k resolutions and highlights resolutions with odd (i.e. non-even) numbers - if the files are of different resolution, a new 'ratio' comparison statement will now show if either have a nice ratio, with current list 1:1, 4:3, 5:4, 16:9, 21:9, 2.35:1 - added a 'stop filtering' button to the duplicate hover frame - made the ill-fitting 'X' button on top hover frame a stop button and cleaned up some misc related ui layout - added a 'remove this file's potential pairs' command to the thumbnail file relationships menu - if in advanced mode, multiple thumbnail selection right-click menus' file relationships submenus will now offer mass remove/reset commands for the whole selection. available commands are: 'reset search', 'remove potentials', 'dissolve dupe groups', 'dissolve alt groups', 'remove false positives' - . - the rest: - added link to https://gitgud.io/koto/hydrus-dd/ , a neat neural net tagging library that uses the DeepDanbooru model and has several ways of talking to hydrus, to the client api help - cleaned up a little of the ipfs file download code, mostly improving error/cancel states
[Expand Post]- rewrote some ancient file repository file download code, which ipfs was also using when commanded to download via a remote thumbnail middle-click. this code and its related popup is now cleaner, cancellable, and session-based rather than saving download records to the db (which caused a couple of edge-case annoyances for certain clients). I think it will need a bit more work, but it is much saner than it was previously - if you do not have the manage tags dialog set to add parents when you add tags, the autocomplete input will no longer expand parents in its results list - fixed an issue displaying the 'select a downloader' list when two GUGs have the same name - hitting apply on the manage parsers or url classes dialogs will now automatically do a 'try to link' action as under manage url class links - fixed (I think!) how the server services start, which was broken for some users in 361. furthermore, errors during initial service creation will now cancel the boot with a nice message, and the 'running … ctrl+c' message will appear strictly after the services have started ok the first time, and services will shut down completely before the db is asked to stop - improved how the program recognises shutdowns right after boot errors, which should speed up clean shutdowns after certain bad server starts - the server will use an existing server.crt and server.key pair if they exist on db creation, and complain nicely if only one is present - the 'ensure file out of the similar files system' file maintenance job result will now automatically remove from/dissolve the file's duplicate group, if any, and clear out outstanding potential pairs - a system language path translation error that was occuring in some unusual filesystems when checking for free disk space before big jobs is now handled better - like repository processing, there is now a 1 hour hard limit on any individual import folder run - fixed an issue where if a gallery url fetch produced faulty urls, it could sometimes invalidate the whole page with an error rather than just the bad file url items - subscriptions will now stop a gallery-page-results-urls-add action early if that one page produces 100 previously seen before urls in a row. this _should_ fix the issue users were seeing with pixiv artist subs resyncing with much older urls that had previously been compacted out of the sub's cache - until we can get better asynch ui feedback for admin-level repository commands (lke fetching/setting account types), they now override bandwidth rules and only try the connection once for quicker responses - misc code cleanup next week I was unable to get to the jobs I wanted to this week, so I think I'll go for a repeat: updating the system:hash and system:similar_to predicates to take multiple files and extending the Client API to do cookie import for easier login. And I'll play around with some audio stuff.
haven't updated to the new version yet, just about done with some due shit and want to get that first but thought this was worth sharing. I said in the last thread that just defaulting to the jpeg when pixels are the same could in rare cases could see the jpeg being larger, well I found kind of a case for this, visually I can't see the difference, im sure there is one but the png is 2mb but the jpeg is 3 Its entirely possible the jpeg had some reason to be 3mb, but I have no idea
(2.56 MB 3113x782 test100.jpg)

>>13389 Thank you, that is a fascinating example. I checked it in hydrus, and they aren't quite pixel-for-pixel duplicates. I also could not see any obvious differences. In the end, I played around with some layers in GIMP and found that every pixel I tested was just slightly a different colour, like the orange behind the text is mostly RGB jpg: (239, 196, 128) vs png: (239, 195, 128). I haven't encountered this before. The center image clearly has jpeg artifacts that the text doesn't, so my best guess is that person who wrote the caption and wrapped it around an existing medium-quality jpeg exported the whole thing to png, and then someone else subsequently exported that to jpeg at 99% quality or something, which just slightly altered the image because it can't be completely lossless, and since it was such a high quality chosen, and since it is modelling such complicated text data and fairly heavy previous jpeg artifacts, it ended up being less efficient than png for compression. The png, of course, would very happily eat up all that blank orange space relatively efficiently. It is all a bit moot because they are functionally the same quality to a human eye. The smaller png is probably 'better', and my assumption is that is the original, anyway. I did a test export of the png as a 100 quality jpeg from gimp and while it too was larger, it didn't have the slight colour differences, so perhaps there is more to this story. Why would an entire file change its hue just slightly? Sounds like an integer rounding issue, so maybe different jpeg encoders work slightly differently? Maybe the image was created in 16-bit colourspace and exported twice down to 8-bit, both jpeg and png, and they round differently? It has the same 196 colour as the other jpeg, I was mistaken. It is stuff like this that makes me shy away from automated decision systems unless things really are pixel-for-pixel or byte-for-byte. There are so many odd examples due to some artist or content host making a funny decision.
>>13391 >>13389 Wait a second! This is thrown off moreso because the jpeg I was delivered when I downloaded your example there was as attached, which is 2.6MB, not 2.9MB, hash 24c64c309c3855d9d5dcce5cb505e7fee4b048dcfdebc25d3ae3e7b259a13f1f. The Eternal Cloudflare strikes again with its auto-optimisations. Who fucking knows what's going on at this point.
(1.75 KB 100x100 test colours.png)

(2.86 KB 100x100 test colours.jpg)

>>13392 Ok, I got the original by pulling the (u) link. The 6884… file is also not a pixel-for-pixel duplicate, and it has (293, 196, 128) orange. I got mixed up with my colours and thought my test100 above had the same orange as the png, but it doesn't: png: (239, 195, 128) all jpgs: (239, 196, 128) I don't think jpegs of high quality are supposed to use subsampling, but maybe something inherent in the jpeg colourspace or something prohibits certain colours? Or at certain quality levels? I made a quick test and saved it as png and jpeg. The png preserves both colours accurately, but the jpeg collapses both sides to 196 on green. I never heard of this before! This suggests your jpeg was indeed derived from the png original.
>>13387 Got some weird errors after updating ConnectionException
Could not connect!
Traceback (most recent call last):
File "site-packages\urllib3\connection.py", line 159, in _new_conn
File "site-packages\urllib3\util\connection.py", line 80, in create_connection
File "site-packages\urllib3\util\connection.py", line 70, in create_connection
OSError: [WinError 10049] The requested address is not valid in its context

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "site-packages\urllib3\connectionpool.py", line 600, in urlopen
File "site-packages\urllib3\connectionpool.py", line 354, in _make_request
File "http\client.py", line 1239, in request
File "http\client.py", line 1285, in _send_request
File "http\client.py", line 1234, in endheaders
File "http\client.py", line 1026, in _send_output
File "http\client.py", line 964, in send
File "site-packages\urllib3\connection.py", line 181, in connect
File "site-packages\urllib3\connection.py", line 168, in _new_conn
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x0000000014EA1160>: Failed to establish a new connection: [WinError 10049] The requested address is not valid in its context

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "site-packages\requests\adapters.py", line 449, in send
File "site-packages\urllib3\connectionpool.py", line 638, in urlopen
File "site-packages\urllib3\util\retry.py", line 398, in increment
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='0.0.0.0', port=4443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000000014EA1160>: Failed to establish a new connection: [WinError 10049] The requested address is not valid in its context',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "include\ClientNetworkingJobs.py", line 947, in Start
response = self._SendRequestAndGetResponse()
File "include\ClientNetworkingJobs.py", line 255, in _SendRequestAndGetResponse
response = session.request( method, url, data = data, files = files, headers = headers, stream = True, timeout = ( connect_timeout, read_timeout ) )
File "site-packages\requests\sessions.py", line 533, in request
File "site-packages\requests\sessions.py", line 646, in send
File "site-packages\requests\adapters.py", line 516, in send
requests.exceptions.ConnectionError: HTTPConnectionPool(host='0.0.0.0', port=4443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000000014EA1160>: Failed to establish a new connection: [WinError 10049] The requested address is not valid in its context',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "include\ClientParsing.py", line 2673, in FetchParsingText
network_job.WaitUntilDone()
File "include\ClientNetworkingJobs.py", line 1151, in WaitUntilDone
raise self._error_exception
File "include\ClientNetworkingJobs.py", line 1050, in Start
raise HydrusExceptions.ConnectionException( 'Could not connect!' )
include.HydrusExceptions.ConnectionException: Could not connect!
Nothing seemed wrong, but I got multiples of this, though I think they were all the same.
>>13391 >>13393 >>13392 My best guess here is that png handles single colors better then it handles the middle, but for the jpeg to get as clear text as it has it has to boost things up quite a bit. otherwise I cant think of a single reason why the png would be smaller than the jpeg regardless. If you add in a pixel for pixel auto remove, I think it should favor file size rather than favor filetype, as I can't imagine the jpeg getting smaller if it was even higher quality. there will be cases where people screenshot something (god knows im guilty of it) and you later find the original jpeg in a smaller file size, but there will also be cases like this where the smaller file will end up being the png.
>>13397 just to be clear, i'm perfectly ok with an auto decision on pixel for pixel duplicates, but the only way I see it as 100% automatable is if it favors smaller size over extension.
>>13387 I just read the updated duplicate help and have two questions that are somewhat related. Suppose "A" becames King of B, C and D because it is better and the other three are deleted. If you import any of B, C or D from a booru, does "A" get the tags? Does "A" gets the tags of any of the three from the PTR based on its status? Thank you for the awesome program!
>>13387 does this work on FREEBSD? I got a 20 tb server with tons of stuff that I can help keep alive and shit.. just let me know if there's a pkg i can install and use behind a vpn?
>>13387 Is there a way to add a new local tag service? I've been trying out DeepDanbooru but it would be nice to be able to separate its tags from mine, at least while I test to see how well it works.
>>13394 Thank you for this report. Can you say anything more about the context that this error appeared? Was it part of a downloader, a subscription, or did it just come in a popup out of nowhere? It looks like that FetchParsingText call that it was trying to do some download stuff, but then it was pointed somewhere strange and could not connect. The error looks like a standard 'could not connect' error that you get if you hit a site that is down, but that 0.0.0.0:4443 address seems pretty weird. Maybe it is some initialisation state, but I am not sure. If it was a downloader, what was the URL it was trying to fetch that failed?
>>13397 >>13398 Thanks. I think I'd like more information here before I make a decision. I always like to make these sorts of auto-rule systems as customisable as can be, because Anons have all sorts of preferences, but my strong suspicion is that other than unusual edge cases like pure white squares or other geometric shapes, a png copy of a jpeg is 99.9% going to be the larger of the two files. It is fascinating to me that this jpeg copy of a png could never be, even at the highest quality settings, an exact pixel-for-pixel dupe. If it is the case that jpg copies of pngs will practically never be pixel for pixel dupes due to smooth colour gradients in the png not being replicable in the jpeg colourspace, then the auto-resolution of jpg copies of pngs is outside the domain of pixel-for-pixel dupes. This is a strange thing, but still a minor revelation to me. I'd never have a system like this default on, in any case. Like everything else, I'll try to massively over-explain it and force the user to turn it on if they want it.
>>13399 Not yet. I had a tentative job to explore post-duplicate metadata merge/permanent synchronisation in this iteration, but adding that system was bigger than I had time to do well. It'll be slightly complicated, and I don't want to mess it up with a quick hack. So just to be clear, for now the metadata merge only works on the pairs you see in the dupe filter. But since both representatives from the dupe filter are the Kings of their respective groups, and therefore 'good' data travels up from groups to kings, you can generally assume that Kings will have fairly 'complete' metadata based on previous decisions for now. Since duplicate records are permanent, once I do add something like this, we'll be able to retroactively apply whatever merge settings you prefer, so non-Kings of groups will all sync too in an ongoing basis. I suspect when I do this I'll have different or better default duplicate merge options here. Some tags aren't appropriate to merge backwards, like 'high resolution' or 'webm', and it'd be nice to have examples of those as default, or easily selectable, rather than making users awkwardly figure it out on their own.
>>13408 I am not certain. I think I remember reading about one user getting it to run on FreeBSD some time ago, but I can't be sure. The program is devved on and works best for Windows, and I build the Linux release on Ubuntu, so some more esoteric flavours of Linux and their Window Managers can have trouble with my build. Your best bet is probably running from source, for which I have some help here: https://hydrusnetwork.github.io/hydrus/help/running_from_source.html There's an Arch package maintained by a user, but I don't know enough about FreeBSD to say what is compatible: https://aur.archlinux.org/packages/hydrus/ I assume that wouldn't work for you, right? If you have a Windows machine, I recommend you try hydrus on that just to see if you like it first, rather than going through the rigmarole of getting it going on your FreeBSD machine and finding it isn't what you like. If you do give it a go, let me know how you get on!
>>13417 Not yet, but as it happens I am on the verge of doing some heavy work on local tags and I hope to add it then. I'll do a couple of weeks on some audio metadata stuff and then start on all that.
>>13425 I had added the lookup script for Deep Danbooru but couldn't get it to work; it would get stuck in the uploading stage and quietly fail out. But the errors didn't pop up till maybe 20 minutes after my various attempts so I wasn't sure if they were related or not.
>>13426 Like I said I think, the reason this one works for the jpeg being larger is due to the text, at least that what I think. to get clean text on a png, it's more or less par for the coarse and doesn't eat much space, but for a jpeg, to keep it without any kind of flaw, at least visually, it would need a stupid amount of resources. edge cases like this, or potentially others like this where a jpeg is made, and my best guess is one of the joi threads on 4chan had someone blanket convert with the highest quality settings on them, hit this image and it tossed out a larger file then put in. i'm guessing that 2/3 or above of the image being something jpeg does not handle well had a part in why this exists the way it does. as for an auto decision, I don't see how the difference between 'the pixels are the same choose jpeg' or 'the pixels are the same choose lowest file size' either way, 99%+ of the time it will go jpeg anyway, this would just introduce a safety net for the sub 1% of the time jpeg isn't the smallest choice.


Forms
Delete
Report
Quick Reply