/hydrus/ - Hydrus Network

Archive for bug reports, feature requests, and other discussion for the hydrus network.

Index Catalog Archive Bottom Refresh
Name
Options
Subject
Message

Max message length: 12000

files

Max file size: 32.00 MB

Total max file size: 50.00 MB

Max files: 5

Supported file types: GIF, JPG, PNG, WebM, OGG, and more

E-mail
Password

(used to delete files and posts)

Misc

Remember to follow the Rules

The backup domains are located at 8chan.se and 8chan.cc. TOR access can be found here, or you can access the TOR portal from the clearnet at Redchannit 3.0.

8chan Karaoke Night!

8chan.moe is a hobby project with no affiliation whatsoever to the administration of any other "8chan" site, past or present.

(4.03 KB 480x360 3q7GYNfY_Dc.jpg)

Version 326 hydrus_dev 10/17/2018 (Wed) 21:46:27 Id: 2b5234 No. 10273
https://www.youtube.com/watch?v=3q7GYNfY_Dc windows zip: https://github.com/hydrusnetwork/hydrus/releases/download/v326/Hydrus.Network.326.-.Windows.-.Extract.only.zip exe: https://github.com/hydrusnetwork/hydrus/releases/download/v326/Hydrus.Network.326.-.Windows.-.Installer.exe os x app: https://github.com/hydrusnetwork/hydrus/releases/download/v326/Hydrus.Network.326.-.OS.X.-.App.dmg tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v326/Hydrus.Network.326.-.OS.X.-.Extract.only.tar.gz linux tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v326/Hydrus.Network.326.-.Linux.-.Executable.tar.gz source tar.gz: https://github.com/hydrusnetwork/hydrus/archive/v326.tar.gz I had a great couple of weeks. I was able to get some good prototype login stuff out for advanced users to play with and a bunch of other stuff besides. Some login behind-the-scenes changed this week. Please report any problems you have with hanging/failing Pixiv, Hentai Foundry, or Hydrus service (i.e. the PTR) jobs. login Advanced users only for now! On Tuesday the 9th, I had a bunch of login work 75% complete and not much else. I decided to make this a two-week cycle, and I am happy to have finished off the 25% and some more. Please check out the new dialog under network->downloader definitions->manage login scripts. As simple as I have attempted to make it, the new login manager is complicated, and the new dialog matches. Just as with the downloader, I expect to write some 'how to write a login script from scratch' help, but for now, please feel free to explore. Hit the add defaults->add them all to see some first-draft replacements I have knocked up for the Hentai Foundry click-through and Pixiv login to see what I am going for here. It basically hits a number of GET/POST requests in order, drawing from user credentials like username/pass and temp variables parsed and passed from one page to another. None of this code is functional yet. The dialog will not save any changes, and there is no test panel, which I sorely missed as I put the example scripts together. I may add or tweak a thing here or there, but hope to turn somo of it on next week, maybe along with some regular-user-friendly ui to manage what scripts are active and store credentials and so on. I hope to swap out the hardcoded HF and Pixiv logins with finalised scripts and write a couple more for the big sites myself. subs and downloaders Pixiv is still in flux, and it seems they broke our manga page parsing for a subset of users this week. Thankfully, user kourraxspam on the discord figured out a neat API fix to our problems and also wrote a pixiv tag search. I tweaked the new objects a bit and folded them into the update. Since we are now mostly pulling from APIs, I hope Pixiv will be a bit more stable. Please let me know if you have any more trouble. The new Pixiv downloader no longer pursues mode=manga URLs. If you ran into big Pixiv problems this week and have a lot of failed 'mode=manga' pages hanging around in your subs, you can try 'try again'ing the neighbouring failed/skipped mode=medium URLs, or try to re-run the queries in a manual download page, or if you are comfortable with the new download system, there is more technical url-class-based fix in the discord, under #parser-creation. Subscriptions get some QoL improvements this week. Firstly, the way they determine if they are 'caught up' to the last sync now permits occasional deeper searches that will find files that were tagged late. In my final IRL testing here, I found it worked well but sometimes hit the 'periodic file limit' and gave me annoying little popups about it. These popups are false positive and you can dismiss them. I will improve the subscription logic next week to recognise this situation and not make the popups. Also, subs can now publish to 'labels'. In the edit subscription panel, you can set a specific name for a sub to publish its button or page files to (overriding the default, which is subscription name). So, if you have several 'character' subscriptions for different sites, you can have them all publish their files to the same 'char subs' page! The watcher and downloader pages also have better en-masse file handling in their lists' right-click menu, which now has a 'show all in new page' entry that lets you combine the results of multiple watchers or gallery imports into one page, letting you handle multiple finished queues all in one go! layout improvements After thinking about it a long time, I have written a custom sizer that lays ui items out in a more conservative way. I have applied it to several locations, particularly the 'management' panels on the left of any page, and hope it will make these panels size a bit more sanely on smaller screens. This first step is limited, and it isn't perfect by any means, but it should be an improvement, with fewer 'why is this thing so tall?' moments. If this sizer doesn't cause huge problems, I expect to apply it across the program and then start tweaking the minimum sizes and related sizer flags to take better advantage of it. If you are on a small screen, please let me know if you notice any big changes, for better or worse, for this week and to come. There is plenty of other related cleanup to do here. I'll also be tackling the manage tag parents/siblings dialogs' layout-hellscape soon. misc If you right-click several tags, you can now open separate new search pages for each of them separately! Autocomplete tag searches should now return faster! My tests showed a 33% reduction! They also handle some unusual wildcard searches better!
[Expand Post] The gallery log button can now be right-clicked to restart a recently failed search! Restarted searches should be more reliable! Under options->downloading, you can now have the little 'x/y' import summaries say 'x/y - zN', where z is the number of x that were 'new' (as opposed to already in db)! full list - login: - finished the new login objects. they can deal with multi-step single second-level domain login problems, can pass variables from step to step, and use cookies as success verification - wrote an ton of ui for the new login objects, now under network->downloader definitions and network->logins. it is not 'active' yet, but advanced users are invited to check it out. there is no good test ui yet, which I think I'll have to figure out in the coming weeks - wrote a first attempt at HF and pixiv replacement login scripts–please try importing from defaults on the manage login scripts dialog and look through them to see what I am going for. once the system is flipped on and we are happy these work, I'll remove the old hardcoded legacy login stuff - when a network job that needs a login cannot login, it now waits (rather than bombing out completely), presenting the related error, and checks again every 60 seconds - if a network job thinks it can login but fails to generate a login process, the network engine now catches the error safely and recovers. the job is put on hold as above - . - subs: - the subscription 'have we caught up to where we were before' test is now more complicated–rather than just stopping after five 'already seen' urls are found, it now only stops if at least the _last_ five contiguous urls of the page are already seen. this will catch more late-tagged files that get inserted out of order - fixed the 'get quality info' button on edit sub panel to only get the current selection, not all queries wew - subscriptions can now optionally publish/present their files to a specific label! this is a great way to merge multiple subs to the same final landing page - . - layout: - after a long time thinking about it, wrote a new custom boxsizer that handles resizing multiple expanding items of different reasonable min size by expanding them _beyond minimum size_ by their proportion, rather than forcing them all to have total proportional width/height. I expect to polish this and apply it in multiple locations around the program where tall things were being too tall because something else was forcing it to be (the management panel on the left of most pages was terrible at this, causing a giganto taglist just because the upper panel was tall as well). - changed my custom boxsizer (the box with a bold header) to the new custom boxsizer, so it is all over now–please report any bad layouts you see - in an effort to improve layout, the manage tag parents and siblings panels' preview boxes have shorter minimum height–it will get a bigger layout overhaul soon - . - bigger misc: - thanks to work of user kourraxspam on the discord, fixed the pixiv downloader to use a more stable api and added pixiv tag search - watchers and gallery imports now have a list right-click menu entry to show all selected importers' files in a new page! use this to clear out a bunch of finished queues all at once! - the tag right-click menu now offers 'open new search pages for each in selection' if multiple tags are selected–this will open three search pages each with one tag, as opposed to the original entry, which would only open one page with all three - the edit nested gug panel now uses a checklistbox rather than the menu to select gugs to add, which is more reliable and allows for multiple selections - sped up autocomplete tag fetches' tag sibling integration–irl this may be a reduction in total a/c search time of approx 33% - page parsers will now generate next gallery urls absent any file/post urls if the only type of url they can generate is gallery urls (so a meta-gallery-search like board->threads that only generates subsidiary gallery pages will now work, whereas before it never could because it was missing post urls) - the gallery log now provides a shorthand way to restart and resume failed searches from its right-click menu (if the most recent log entry failed) - 'try again (and allow search to continue)' reattempt jobs will now generate next page urls even if no new urls are found (which can happen if a search stopped due to the file limit exactly lining up with the number of files found, for instance, so a reattempt finds nothing new) - gallery downloaders will now specify their 'delay work for a bit' error states in the ui. this usually means 'could not connect', which has a 4-hour timer (I'll prob add a scrub delays button here at some point) - the watcher will now show its 'delay work for a bit' error state in more places in the ui - added a 'media' shortcut 'export_files_quick_auto_export', which will open the export files frame and give you a quick yes/no to confirm you want to export as set. if yes, it will export. then it will close the frame - added a 'show a "N" to short import summaries' option to options->downloading, which will extend the typical 'x/y' status string to 'x/y - zN' for z 'new files' (as opposed to already in db) - improved how the video parser estimates frame rate–it _should_ fix some of those low-framerate, low-framecount slideshow-vids where at current they render everything in a rush and then sit on the last frame for ten secs - . - smaller misc: - network report mode now reports url_to_fetch and parser-to-parse-with info - when the server fails to accept a file upload due to a file parsing issue, it now prints the hash of the file in the error - if the client sees a possible file hash in a server error message from a file upload, ~it will try to show that file in a new page~ - fixed an issue where wildcard searches were not finding results if the search text included the normally discarded characters [](){}"' - fixed some domain handling for localhost and other undotted network names - content parsers will now only launch with permissable content types, which for the legacy 'lookup scripts' scripts system means only tags and vetoes, and for the new login system means only temp variables and vetoes - as compaction now happens automatically on sync, removed the 'compact' button from edit subs panel - an unusual network error related to hydrus update files sometimes being cut off mid-stream is now glossed over silently, with the download reattempted after a delay - the initial gui session load now occurs after a 0.25s delay–let's see if it cleans up some initial layout issues some users have had - maybe fixed an odd dictionary-initialisation error related to tag siblings/parents dialog boot - ruggedised against an unusual bandwidth load bug - gave some of the index help a pass - did most of a 'getting started with downloaders' page in the help–I'll finish it next week - updated discord share link to https://discord.gg/3H8UTpb , which should not expire - some listbox add/edit code cleanup - some listctrl delete code cleanup - misc help work - misc cleanup next week More login manager. I'd love to plug my new system in and have it do some work, but we'll have to see how doable that is–I might need to add some more supporting ui first. I have been thinking about my longer-term schedule. I had thought to cram the login manager into the next two weeks and then take four weeks (basically November) off to do a big python 3 update, but I am now leaning towards putting the rewrite off so it bridges the holiday. Christmas is the best time to not be putting out releases for everyone's convenience, and since it is so close, I think it makes sense to just delay it that few more weeks. This will also give me a buffer to ensure the login system isn't exploding and catch up on whatever small work still needs doing. I could even do some prep work on the next big thing, which I still expect to put up a poll for once the login system is done.
any benefit to going to 3? or is it just because 3 is newer? On a side note, one of my sub I have check every 6 hours keeps having the 6 hours changed to 6 minutes. ill probably have a bit to say later on when I get more time with it, but this is an immediate thing I notice. it has happened before, but I assumed it was my own error that time.
(147.50 KB 1634x688 client_2018-10-18_11-50-34.png)

Ok if I remember right the 14 hours ago was the initial search of the sub the program did, It ended due to 5 previously seen urls the next search saw 0 on page 1 and stopped, at this point I realized it was checking to often and went in to set it to 6 hours and reset the urls. the next search at 13 44 was the reset search. The next search aborted at 5 urls the next search stopped on a 0 new url page. Now just thinking of a way to get this to work, 6 hours is enough to sift most images on every page at least 1 and in the rare but with 15 images a page statistically probable that one of those sets is exactly the same, would it be possible to have the program look for a page full of urls that it recognizes and then have a user set amount of pages for it to look back? The way I understand its working currently, which is overall better than before, is this 000000000000000 those are the images 0000000000XXXXX These are the ones that matter to decide if it looks at the next page So if 1001001100 00000 with 1's being a new image happens, it will still recognize it as a previously seen page However if instead its operation was 100100100110000 000100010000000 This one sees it as a potential cath up so lets say a user defined 3 pages to confirm 000001001000000 This one would reset the confirm 000000000000000 000010000000000 this one would also reset the confirm due to a new url 000000000000000 000000000000000 000000000000000 and there is where it would assume that its all caught up. I would assume an opt in check box would be needed for this as I can only think of a few instances where its potentially useful, my derpi search is probably the only current booru that would take advantage of it. I do have concerns with the url check method even when the login system works for this search as for me it would go from 15 images a page to 50, but if the page has 20 new images but the last 5 are previously known, it would still consider that caught up, a few pages extra to confirm would likely solve this issue completely outside of extreme edge cases which user definable would finish the rest.
>>10273 Be a chad and make this as your code of conduct https://sqlite.org/codeofconduct.html
I decided to finally try running from source with this version, however I seem to be missing something and I can't figure it out. I'm on Ubuntu 16.04 and carefully followed the instructions for running from source. Traceback (most recent call last): File "client.pyw", line 13, in <module> from include import HydrusData File "[redacted]/hydrus/include/HydrusData.py", line 223 def ConvertResolutionToPrettyString( ( width, height ) ): ^ SyntaxError: invalid syntax This is not correctly shown due to the proportional font, but it indicates the second bracket. Seems weird that it complains about a tuple, so I'm clueless about what the problem could be. Any help would be appreciated.
>>10278 What century are the sqlite devs from? This one seems more reasonable as a modern code of conduct: http://www.un.org/en/universal-declaration-human-rights/index.html
>>10278 >Do not return ~evil~justice for evil. >Do no wrong to anyone, and bear patiently wrongs done to yourself. >Love your enemies. >Do not curse those who curse you, but rather bless them. Hahaha nope
>>10278 Why the fuck would he need a CoC(K) if he's the only one working on hydrus you nigger?
(55.13 KB 1542x372 client_2018-10-19_20-25-34.png)

Ok, so wanted to download an artist, because I am not spending the points on them to get the gallery, and I got some weird messages, https://www.pixiv.net/member_illust.php?illust_id=71259134&mode=medium https://www.pixiv.net/member_illust.php?illust_id=70480393&mode=medium https://www.pixiv.net/member_illust.php?illust_id=69674685&mode=medium https://www.pixiv.net/member_illust.php?illust_id=69247541&mode=medium https://www.pixiv.net/member_illust.php?illust_id=69154607&mode=medium https://www.pixiv.net/member_illust.php?illust_id=68012052&mode=medium https://www.pixiv.net/member_illust.php?illust_id=67055625&mode=medium https://www.pixiv.net/member_illust.php?illust_id=66227638&mode=medium https://www.pixiv.net/member_illust.php?illust_id=65617998&mode=medium https://www.pixiv.net/member_illust.php?illust_id=64856912&mode=medium https://www.pixiv.net/member_illust.php?illust_id=64566242&mode=medium https://www.pixiv.net/member_illust.php?illust_id=64099765&mode=medium https://www.pixiv.net/member_illust.php?illust_id=63869562&mode=medium https://www.pixiv.net/member_illust.php?illust_id=63755868&mode=medium https://www.pixiv.net/member_illust.php?illust_id=63071421&mode=medium https://www.pixiv.net/member_illust.php?illust_id=62322273&mode=medium https://www.pixiv.net/member_illust.php?illust_id=62008912&mode=medium https://www.pixiv.net/member_illust.php?illust_id=61356485&mode=medium https://www.pixiv.net/member_illust.php?illust_id=61149049&mode=medium those are the links
I just had Hydrus run maintenance (looking for duplicates) while a gallery downloader was busy downloading stuff. Should this be happening? I thought maintenance was supposed to run while the client was idle.
>>10282 >>10278 >>10283 >>10284 Thank you for your interest here. This is obviously in the news right now. I am no fan of codes of conduct in general, as I usually end up on the wrong side of them. Rest assured that I am not suddenly about to take up the Contributor Covenant. I am most comfortable with imageboard culture, which I hope comes through in hydrus's design in various ways. I suspect the core of our culture is fundamentally incompatible with written social rules, and obviously with any attempt to reduce colon crucifixion. As it is, the whole thing is moot as >>10278 says, as I am the only hydrus 'team member', and I don't see that changing as I am a complete sperg who can't work with others.
>>10274 I think it is 'about time', overall. I have been thinking about it for a couple of years now. I believe python themselves have finally stopped working on 2, and the big libraries are starting to drop support. It has a multiple nice features (particularly unicode handling, including at the OS level, and improved C++ interactions, so simple things like thread event waits are more CPU efficient, and I'd like to play with the async stuff), and I understand they have now ironed out the cons, which was mostly slower execution. I expect to jump right up to 3.7, but I'll obviously read up more and play around a bit first.
>>10292 Sorry, as _ >>10284 _ says.
>>10276 >>10274 Thanks for this. I will check the 6 hour-6 min issue–can you take a screen of your checker options panel, just so I can see what death file velocity and all that you have set up? I am hesitant to have the sub checker search multiple 'already seen' pages deep every check because A) It will make the existing 'one page' memory of the checker more complicated and B) It will waste bandwidth and CPU time for all involved for 99% of cases. From my perspective, there is a non-zero 'response is shit' rate due to server problems and Cloudflare changing hashes and resizes and stupid gif conversions and watermarks and bad uploads and all sorts, so I am not super interested in spending 70% more effort to finish the last 0.2% of results, particularly when most of us are downloading far faster than we process anyway. In changing the check system this week, I realised that doing five consecutive 0s would be a better check than five 0s total (and stop there), and then I figured since I had fetched a page's worth anyway, there was no point stopping as soon as I hit the first five–the most effective place to check for the five consecutive 0s would be the final files in the page. I am overall happy with this change as it improves the number of True Positive 'need to fetch more pages' while only increasing False Positives in the rare case that the page border is the exact 'true' sync point, all without increasing bandwidth or CPU significantly. My current plan is to verify this new logic is not hitting false positive too much and then hide the periodic check notifications, and then see how it works overall. I think I would be willing to add an option to change the x=5, maybe for advanced users, in which case you could boost your x up to 15 or whatever and get a deeper, more 'guaranteed' search that way. But I'd probably have to think about safeguards to stop runaway problems. I don't want, through my fault or user fault, to accidentally hit some site for 5,000 CPU-heavy pages every six minutes.
>>10281 I believe tuple parameter unpacking–the cheeky (width, height) param–is stopped in python 3, so my guess is you are running the code as 3? Hydrus is 2.7 for now (but not for long!), so if your machine is defaulting to 3, see if you can force it with python2 client.py or something? I thought my scripts' hashbang was supposed to force it, but I don't really know anything about that stuff. If you just type just whatever python command you are using into a terminal, it'll give you the version it things it is just above its >>>.
>>10285 Thank you for this report. Unfortunately, ugoira is not yet supported, although it is a long-term project (likely conversion to video/apng). You'll see the 'notes' column there says 'ugoira veto', which is the new downloader noticing and skipping those URLs. If you did not know, ugoira is an old Nip attempt to make Web 1.0 webm-like animation by tying blessed jpeg print-outs to a shrine tree dedicated to the Trickster God of Javascript. It may seemingly render correctly in your browser, but right-click->save as is disabled due to a curse placed on the West after the Nagasaki bombing.
>>10287 It depends–what are your idle settings under options->maintenance and processing? If the 'go idle if mouse stays still' setting is only set to like 2mins, or disabled completely, I could see it kicking in a bit too early. That said, non-user actions like download queues don't count as foreground work, so if you weren't using the computer otherwise to browse or move your mouse, it is normal for maintenance to go. If you'd prefer it work a bit less, have a play with the settings on that options page. Only doing work on shutdown works for a lot of users, even if it is only one big job a week.
>>10297 Ugoira is actually the brain child of Hector Martin, better known as Marcan, one of the guys who made the Wii homebrew channel and now works for Pixiv. That trivia aside, I would like to vote against any kind of format conversion happening in Hydrus, or to make it optional. Format conversion to webm/mp4 would be lossy, while format conversion to apng would be like those bloated png screenshots of jpgs. Therefore I think ugoira should by default be preserved in it's original format. The ugoira format is ridiculously easy to parse. The container is a zip file, which according to specification must always have compression disabled, so you wouldn't even need a compression library to read the images inside. The images are a sequence of either jpg or png files, sequentially numbered. Last it contains a json file called animation.json, which contains the timing information. Instead of a fixed frame rate, each frame has a delay in milliseconds, like gifs. There are slides of an old presentation by Marcan here, explaining some of the philosophy: https://marcan.st/talks/2014_pixiv_ugoku_player/ Note that some of the information is outdated. E.g. timing was originally not stored inside the zip file, but embedded in the html code of the site.
(277.59 KB 734x800 buggy frame webm.webm)

>>10301 Thanks, this is interesting. I am not yet sure what I want to do with ugoira. For a while, I think hydrus actually downloaded the zip from danbooru ugoiras, but now I think it gets the webm that danbooru provide. So I assume some clients have ugoiras already in, they just don't know yet. I've considered supporting it natively, actually going into the zip and pulling images out, but while it may be appropriate for fixing browser compatibility problems, that seems an ugly way to render local animations/video. I think I'll probably have to write an inspector if I want to do a conversion, so I'll have the tools to support it natively anyway. But then, it does make sense to me to (optionally) convert fixed-framerate png ugoiras to apng and maybe the jpgs to webm, just to save space and increase compatibility. But only if I could figure out reliable byte-deterministic conversion so we aren't generating random hashes all over the place. Variable frame rate is more tricky. I was playing with videos that looked like ugoira conversions last week that seemed to do that, although I don't know if they were 'cheating' to achieve it.
>>10293 Huh. I thought you had upgraded python a bunch already. Was that 2.XX versions or were the wx updates something else entirely? >>10301 >>10302 What kind of "bloat" are we talking with apngs?
>>10302 To me the main problems with conversion are quality loss and inconsistent hashes. Ugoira uses a single contained file, so it's not uglier than any other container format I think. Of course proper video formats are better to handle. Webm is far superior in quality and file size, compared to the ugoira zip files, but I think it's better to keep the original data than making even more conversions, which would introduce more quality loss and more hashes to keep track of. With common webm files, quality wise there would be chroma loss, since they are only encoded with 4:2:0 chroma. Encoders usually use the Rec. 709 standard, which reduces the 8 bit integer range to 16-235. We would need to encode to 4:4:4 chroma with full integer range, which not all players support yet. Even ignoring the lossy video compression, we would already lose so much image quality. For the jpgs in ugoira files, 4:4:4 chroma with full integer range is a universally supported standard. Then there is the problem of hashes. Ugoira uploaded to boorus were already converted to webm or gif. Now we would make our own conversions from the ugoira, which would have different hashes than the ones in boorus. Even different Hydrus users would have different hashes, because only one component like the encoder or muxer need to be updated to a newer version to result in a different hash. The same encoder might even produce files with different hashes on different PCs. This would make the tag repository useless. My one wish is to keep these files native. I think an optional encoder is a good idea for exporting/sharing files, but making it standard just sounds like it wouldn't be worth the little storage space it would safe. >>10303 WIth "bloat" I meant the bloated file sizes when converting jpgs to png. Even for still pictures it's already annoying to have png conversions of jpg files. It doesn't make sense for them to exist. My guess is that people take screenshots of jpg pictures or something. Some people do that on their phones, because the screenshot shortcut is easier to use than opening a context menu to save the original file.
>>10302 It still considers the zip the downloadable URL according to a parser test. Maybe have an option to have ffmpeg convert the contents of the zip into a hydrus video player friendly format and cache this file in the db somewhere, so that when the hash for an ugoira is called to the image viewer, it instead grabs the cached video linked to it?
(123.33 KB 1165x1871 client_2018-10-21_02-43-21.png)

(58.94 KB 643x465 client_2018-10-21_02-44-21.png)

>>10295 one of them is the normal let the program do it checker, the other is my 'every 6 hours' checker, but that one for some reason gets set to 6 minutes every now and then, not sure what triggers that. yea I can see what you are coming from with "5,000 CPU-heavy pages every six minutes." for my case making X=(user defined)would easily work, and because its an edge case in almost any sense of the definition, it would likely be one of the few if only searches that hydrus is currently compatible with where it would be applicable. A safeguard for subscriptions could be implemented, I know with chrome, every new site I download something from forces a pop up 'are you sure you want to download multiple things from this site' which I have to ok, you could have subscriptions stop a process and force you to ok something if it sees you may be doing something retarded with the settings such as a sub set to every 6 minutes. I would limit it to subscriptions or to a once per checker rule set change, having a prior checker rule set saved for look up. this would force people to confirm their stupidity, along with being able to tell you when something is happening. I remember a few versions ago I had something like 16k look ups for threads that don't exist because the program never stopped trying to look them up after 404. Hell, thinking of it, a failsafe to warn you you are doing stupid shit seems like a good thing to implement. It would make people think twice about it, or in a best case, be something people bring attention to that they are trying to do. —————————– On that note of stupid shit, if you have anything on exhentai/gehentai in the to-do list, may as well scrap it. I have found several scrapers for the site that all result in bans even were the scraper took several minutes in between files. Its unlikely anything you do would get by them
>>10297 Oh, its a file format that is pretty much specifically pixiv… welp, now I know, thought it was something else that was a weird error.
>>10304 >making it standard just sounds like it wouldn't be worth the little storage space it would safe Disagree, I definitely would want it. I don't care about elegant retro solutions or what is simpler from a conceptual perspective, I want something that works well with both clients and servers and will actually run with modern players and embeds if exported from hydrus, and I want to avoid stutter, timing inconsistencies between hardware, memspikes as much as possible. So for someone like me, I would absolutely want a utility to convert any given memeformat into something more practical. I don't care if I can open it as a zip and extract individual frames, because I can just take a screenshot if I want a frame. So for me, I would have no use for the import/handling unless it also included an option to convert to something more standard. In fact, while I'm not holding my breath, I would love to eventually be able to convert all of my files freely and losslessly between common filetypes within Hydrus, and ideally endgame into mostly just FLIF/FLAC/whatever non-Google thing kills WebM/the PDF killer/etc
(4.45 MB 3410x4936 006.jpg)

(3.64 MB 3410x4936 006 16.png)

(2.38 MB 3410x4936 006 8.png)

(2.47 MB 3410x4936 006 8 - 1 process.png)

(1.44 MB 3410x4936 006 4 - 1 proccess.png)

>>10302 On the topic of file conversions. I have no idea what your stance on this is, but here is mine. people who scan manga and many people who edit it are little better then drooling retards when it comes to compression/optimization. I mean i'm not a genius by any means, but fucking come on man, here let me give an example from a hmanga thats 1 fucking gb big because the people who scanned it and edited it both failed horribly. 006 This is the unedited version 006 16 This is just limiting it to 16 colors 006 8 this is just making it 8 colors however the process is stupid, so lets add in some processing, 006 8 - 1 process just adds in a better color process, so the background stays white more or less white, and finally, 006 4 - 1 process, this is as good as you can make this without significant loss, almost entirely due to how it was edited and how it was scanned
(615.53 KB 3410x4936 006 2.png)

>>10310 Now, you will notice some loss on her cheek and the background won't be white, but that's just the cost of this, going 5 colors may have made it unnoticeable, but more effort into this is something i'm not doing. Do note, that manga in general is bionarty in the way its printed, it's either black or its white there is 0 in between. but some people don't scan at a high enough level to get black and white images, hell I set up a dslr a few years ago and 'scanned' some images with it, depending on how stupid you want to go, you could 'scan' fractions of a page and stitch them together, getting you a VERY high resolution image, but at the cost of it being a large file size, but here is where magic comes into play, all of the images are black or white, here is the same page with 2 color, do note this was scanned and edited like crap so 2 color is very lossy in this case, but it still gives a good example. I want you to note file size, 006 - 4.44mb 006 - 2 0.61mb It is effectively 1/10 the file size, now, I have a 4k 55 inch screen, having the images nearly full screen but fitted to the screen, they both look good, 1080p or 1/4th the size they are now, they look even better. If the person who scanned this didn't save it as a fucking jpeg, It would look better, if the person who scanned it at a resolution large enough that there were clear white and black spots, it would look even better, and if the person who edited this… well I can't say to much there due to what they was working with pushed it a bit on photoshop to better delineate white and black, it would look even better. There are a few proof of concept images I made a few years back that were 8000+ pixels big but fantastic scans I was able to edit down to black and white only, when zoomed in they were nothing to look at, but when scaled they shit on every other released version of the manga that existed, while coming out to a reasonable file size, jesus, I went on quite a bit with this. My point is, people are complete morons when it comes to compression almost all of the time and its very rare to find people who know what they are doing (hell as much as I don't like fakku, their 3200x releases are near perfect with what they should be doing to keep file sizes down while quality at a max) and because of this, if there is any way to optimize images that come into hydrus, or possibly lossy compression of black and white images, im 100% ok with this so long as im able to see which is which and choose I thought about this quite a bit, If one of the new image formats that is suppose to do everything like flif comes along, im 100% ok with a hydrus method to convert images to that format so long as there is also a way to convert them out if I choose to share it, the way I see it, i'm ok with minimal loss so long as what I gain is significantly bigger, in the case of this hentai, I could edit a 1.1gb file down to 350mb and not care, if I want to play it safe get it down to 500mb with nearly 0 change in quality. If I could take all my pngs, convert them losslessly to another format and get an esta 40-60% more space, by all means do it, just let me know and potentially set it up so it does it in batches.
(115.82 KB 214x221 chrome_2018-10-21_03-21-20.png)

>>10303 in the case of a jpeg file, it would be like getting a png file out a jpeg. see pic related and the above jpeg version.
>>10309 >memeformat I see that the two of us use Hydrus to achieve very different goals. You're making a good point for people who use it as a meme archive. Which is a legit use too. Conversion would definitely be better for people like you, for ease of sharing, but catastrophic for people like me, who want to use it for long-term archiving. So I guess it makes sense to have a conversion option for some people. However I still think the original unchanged files should always be the default, especially since memes usually already come in easily shareable formats from image boards rather than Pixiv. I would like to use an example to illustrate the problems of format conversions: In 2003, somebody ripped a tv series from a DVD. It used MPEG-2, a horribly inefficient lossy video codec, that was already outdated back then. So to not waste space, he converted the DVD's video to use the latest state of the art codec he could find: DivX You should already be cringing at this point, because DivX is a shitty old codec now. In 2018, you would definitely try to find a better version of that tv series, than that shitty old DivX rip. You would most likely find a batch of mkvs using h.264, that look much better. These better looking mkv/h.264 files exist because somebody kept the DVD's original MPEG-2 video and re-converted it to a modern format. They will still need to keep the MPEG-2 videos archived though, because h.264 is the shitty DivX of tomorrow, and they will be re-released in h.265, VP9, or one of their successors. The ugoira files with their shitty jpg compression are our equivalent to the DVDs from that story. Sure, they are using outdated compression, but it's the best version we have. Any secondary conversion would degrade them, just like that DivX rip did to the MPEG-2 video. Ideally we would want artists to release in a lossless format, or a more modern format like webm, but they didn't, so the ugoira version is the one we need to keep, from an archivist's point of view. Using conversion is great for sharing, but please let people archive the originals. Any conversion happening in Hydrus should be optional, and it should automatically add the necessary meta data to identify the original source, because people suck at indicating sources. >>10310 >>10311 That's a good example that all scanlation groups should read. Ideally you would need to re-scan the original, since, as you mentioned, the resolution should be higher for 1 bit color. The half-tone pattern shows some aliasing, though that might be an artifact of the jpg compression. 1 bit color depth with oversampling during scanning is the way to go for manga. But, your conversion only turned out good because you understand the medium and made an informed decision to chose the format. It would be catastrophic if it was done automatically, just because some algorithm detected that something might be a manga.
>If you right-click several tags, you can now open separate new search pages for each of them separately! Fucking nice, I've been waiting for this to sort my (1) tags. Thanks based dev. >tfw if system:everything doesn't crash my hydrus opening the fuckton of (1) tags at once probably will It's the kind of flammable you want to give a spark just to see what happens.
>>10313 >You're making a good point for people who use it as a meme archive. Don't trip their senpai, I'm saying that Ugoira is only used by a few websites and is not a standard universal format (like DVDs are). A long-term archival format should be something with wide compatibility since it's pointless to archive something which can only ever be played on its host device once it has become broken and ancient. >Any secondary conversion would degrade them Any LOSSY secondary conversion. At very worst case, they could still be converted to FFVI and it would be a more standard option. >Any conversion happening in Hydrus should be optional Yeah, we agree.
>>10304 >>10312 No shit. I meant how much bloat? Do you really have some many ugoiras as to get an extra GB worth in bloat? Still, you have your point about keeping the ugoiras. If there was a way to read them natively inside Hydrus hten that would be best, with an option to convert to webm or whatever for sharing. Still, that itself would increase hashes, and apng might be the better solution for quality-minded individuals. Regardless, this all depends on what magic Hydrus Dev can do. While I'd love to have my ugoiras work in Hydrus, if it's going to be a time consuming endeavor then there are other things I'd rather be worked on.
>>10318 Regex.
>>10320 Wait, I fucked up.
>>10316 >it's pointless to archive something which can only ever be played on its host device once it has become broken and ancient. But jpg and zip have extremely wide compatibility. Much wider compatibility than webm and apng (which despite increasing in popularity still is not recognized as a valid standard by the png group afaik.) As long as we have jpg decoders, we can play those files. Even if no ugoira player existed anymore, you could just unzip an ugoira to a new folder, open the first frame in you OS's standard image viewer and keep the right cursor key pressed to play the animation like a flip book. This example is just to illustrate how ridiculously low-tech this format is. It shouldn't take a programmer more than a few hours to make a simple player, even if he never heard of ugoira before and needs to "reverse engineer" it. And my last claim was an insult to all the people who do real reverse engineering, because elementary school kids could figure this out. >Any LOSSY secondary conversion. Well, any lossless secondary conversion of jpgs would be a waste of storage space. It could be a good idea for png based ugoira, but then we still would generate new hashes with conversions. >>10318 It's hard to tell how much bloat it would be for the ugoira I hoarded so far, since they are spread over multiple folders and mixed with regular pixiv image posts. One of my old ugoira-only folders is 5 GB big and only contains jpg based ugoira. If we check the jpg to png conversion of that Amy pic: >>10284 >>10312 We see a 6.79x size increase, which means that folder might possibly become over 30 GB big, if converted to a lossless format.
>>10318 well this one, relatively small amy image has about 6.79 times the file size, lets say an animation was 10mb, that would bloat to 67.9mb Its not about 1 file, its about potentially hundreds if not thousands. >>10313 Honestly, every single manga can be put through a 16/10 color pass and come out the other end looking nearly the same if not the same, the reason is all of the smaller points that make up a gradient are more than capable of doing so when constricted in this way, and because manga is a bionary white and black, there is no smooth shading done digitally where a 256+ color gradient would come into play so the main reason to have full color range is gone, and because every area is broken up in light and dark, it's nearly impossible to sight read the differences, the long and short being you could robo every manga page to 16 color, 10 color, or even the majority to 8 color and you would be able to save fuckloads of space. hell, If I put a bit of time into editing this and bringing the dark colors up and pushing the light back, I could probably make it work with the 4 color or maybe even 3 color but that would be pushing it due to the images being a jpeg at one point in time. more or less, there are ways to automate image compression like this that are lossless, and if converting a jpeg to another lossy format, while it may lose something, if I believe I gain more then I lose i'm more then willing to take the hit. I would fully support a compression part of hydrus just because it could save so much space it wouldn't even be funny, and with png, so many of them have wasted space that even lossless re encodes can end up shitting out good chunks of space. now, for your dvd to mpeg2 to divx comparison analogy, sure that's bad, but I look at it more like a tiff to png, where both are lossless but one is far better, or a cd to flac, again both are lossless one is just losslessly compressed. I have no real desire to share my images most of the time, so just a huge archive is good enough for me, and if a conversion from jpeg to flif or whatever looks near identical but one is smaller by a good ratio, my image archive can take the hit. I mean jpeg has been going on 26 years now, a new file format would let likely go on for another 20 odd years and by then there is a good chance i'll be dead so it's not like video formats where the compression vs quality ratio changed so drastically in my computer using life time that you could have completely obliterated a video by format changes. Im not even say hydrus should do this as default, just I would probably use said feature if it was implemented especially for some images i know could easily be compressed. >>10316 if ugoira is what I think it is, it's more like having the film reel the movie was shot on opposed to having a dvd of the final cut.
Just had an idea because of someone's fucking tumblr that decided to have 30k images on it, is it possible to import galleries file paused? 4chan or 8chan threads I want going almost immediately, however galleries dont have a shelf life, or at least normally have a shelf life, this way when fucking disasters like that tumblr happen, I don't have 3 or 4 galleries going that hang the program and make importing more a miserable experience. Personally, files is where I want the program to pause, but finding the files I would like unpaused as it doesn't seem to really affect performance much at all.
Hey dev, I'm running into this issue with this version: ```WARNING:root:pafy: youtube-dl not found; falling back to internal backend. This is not as well maintained as the youtube-dl backend. To hide this message, set the environmental variable PAFY_BACKEND to "internal". libdc1394 error: Failed to initialize libdc1394 2018/10/23 22:02:10: hydrus client started 2018/10/23 22:02:10: booting controller… 2018/10/23 22:02:10: booting db… 2018/10/23 22:02:10: preparing disk cache 2018/10/23 22:02:10: preparing db caches 2018/10/23 22:02:10: booting db… 2018/10/23 22:02:10: initialising managers 2018/10/23 22:02:10: If the db crashed, another error may be written just above ^. 2018/10/23 22:02:10: A serious error occurred while trying to start the program. The error will be shown next in a window. More information may have been written to client.log. 2018/10/23 22:02:10: Traceback (most recent call last): File "/opt/hydrus/include/ClientController.py", line 1256, in THREADBootEverything self.InitModel() File "/opt/hydrus/include/ClientController.py", line 639, in InitModel session_manager = self.Read( 'serialisable', HydrusSerialisable.SERIALISABLE_TYPE_NETWORK_SESSION_MANAGER ) File "/opt/hydrus/include/HydrusController.py", line 540, in Read return self._Read( action, *args, **kwargs ) File "/opt/hydrus/include/HydrusController.py", line 180, in _Read result = self.db.Read( action, HC.HIGH_PRIORITY, *args, **kwargs ) File "/opt/hydrus/include/HydrusDB.py", line 873, in Read return job.GetResult() File "/opt/hydrus/include/HydrusData.py", line 1498, in GetResult raise e DBException: ImportError: No module named ordered_dict Database Traceback (most recent call last): File "/opt/hydrus/include/HydrusDB.py", line 532, in _ProcessJob result = self._Read( action, *args, **kwargs ) File "/opt/hydrus/include/ClientDB.py", line 8723, in _Read elif action == 'serialisable': result = self._GetJSONDump( *args, **kwargs ) File "/opt/hydrus/include/ClientDB.py", line 5545, in _GetJSONDump return HydrusSerialisable.CreateFromSerialisableTuple( ( dump_type, version, serialisable_info ) ) File "/opt/hydrus/include/HydrusSerialisable.py", line 136, in CreateFromSerialisableTuple obj.InitialiseFromSerialisableInfo( version, serialisable_info ) File "/opt/hydrus/include/HydrusSerialisable.py", line 213, in InitialiseFromSerialisableInfo self._InitialiseFromSerialisableInfo( serialisable_info ) File "/opt/hydrus/include/ClientNetworkingSessions.py", line 76, in _InitialiseFromSerialisableInfo session = cPickle.loads( str( pickled_session ) ) ImportError: No module named ordered_dict ``` Something to note: my computer crashed shortly before this, so I'm unsure if it's related to that. However, it does not look like that, it looks like whatever pickled session object was created relied on ordered_dict which seems to have become unavailable for some reason in my Python 2 distribution.
(399.45 KB 1851x470 ClipboardImage.png)

(77.15 KB 785x137 ClipboardImage.png)

Just bringing this to the attention of Hdev. Like with derpibooru skipping files without a log in for the filter, the DeviantArt scraper will not pick up the files labeled under 'mature' Thanks for your work
>>10325 >Honestly, every single manga can be put through a 16/10 color pass and come out the other end looking nearly the same if not the same I agree that for scanned manga it would most likely always work. Though I have some manga from a guy painting digitally, who shaded some areas in gray scale instead of using rasters. While the paper print would have had a half-tone pattern, the digital source I have has 8 bit gray scale in some places. So automatic conversion could result in loss of detail there. But yeah, I definitely support built-in conversion for Hydrus, as long as it's optional. Being able to define custom command lines that can then be called from some menu would be a nice idea. Not just for converting, but extending functionality in all kinds of ways. But I'm getting off topic. >I look at it more like a tiff to png, where both are lossless but one is far better That comparison is correct for png based ugoira, but most of them are jpg based, so already are lossy. Recompressing them will always result in more quality loss, or bigger file sizes. Even re-compressing to jpg would probably result in more loss, due to different quantization tables or rounding errors. >I mean jpeg has been going on 26 years now, a new file format would let likely go on for another 20 odd years Hm, it's really hard to say what's going to happen to web image formats. For lossless formats, I guess flif might become a good replacement for png at some point. For lossy formats it seems more complicated. Even though jpg is old, the world somehow decided that it prefers jpg's blocking artifacts over jpeg-2000's ringing artifacts, even though it's a more advanced format for many use cases. And I don't see people converting their image collections to google's shitty webp any time soon. It's entire specification is a collection of bad ideas. >it's more like having the film reel the movie was shot on That wasn't addressed to me, but that's the idea. Using old film as analogy, the archivist's goal is to preserve the version that's as close to the camera negative as possible.
>>10333 I managed to fix this by adding some exception-handling code that simply skips initializing the failed pickled sessions. Not sure if I broke something, but it seems to have fixed the issue at least for now.
>>10335 digital artists, who actually use digital techniques instead of trying to apply traditional to digital are fairly rare, as far as automatic goes, i'm against that just because it's so easy to make an oops, though some form of a png crush or something could likely be implemented as an every png as I believe the one i'm thinking of just gets rid of worthless data. the tiff to png was more referring to a full conversion from png to another format if the format was handled better, lower file size while still having a lossless compressed image, as for jpeg2000, I have to defer to this http://www.stat.columbia.edu/~jakulin/jpeg/artifacts.htm just looking at it compared to jpeg, its not a all round better format, while flif honestly seems to be a good contender for an all around better formant. I have issues with some of the formats that have little to no support, or demand to many resources when rendering, but it seems like flif can just out right halve lossless file sizes and even lossy, because of the way it loads, even a slow connection/slow server speed still gives you a reasonable file size. in my opinion, if I could convert my files to it, then on sharing them possibly convert them back, to a png (in lossless case) or just a fuck it give them a jpg, I could save somewhere around 500gb-1tb off my current image collection. The main reason I don't do a full conversion of at least every lossless image to a better lossless format is simply compatibility, however since I moved 100% to hydrus for loose images, (and if they make a manga version, I will use that exclusively for magna too) Im more than willing to fully convert to formats like that.
>>10333 Same here, im on Arch and ever since i updated python, hydrus will not work anymore.
Oh fuck my life, just did a dup scan, got 53 thousand new hits.
(24.23 KB 836x195 before.png)

(27.23 KB 789x272 after.png)

>>10340 Open /opt/hydrus/include/ClientNetworkingSessions.py and change around line 76 according to the attached images. This should just skip invalid pickled network sessions and Hydrus seems to recreate them correctly after this run. If you start hydrus via the command line (hydrus-client), it should give you some errors on the first start and still continue, all subsequent starts of Hydrus should look normal. After you've done that first run, feel free to remove the changes again, although they'd be overwritten with the next update.
>>10342 Or maybe wait until the dev addresses it, but he seems quite busy.
>>10337 >>10333 >>10340 >>10342 >>10343 Thanks. Someone emailed me with this earlier in the week. I did the same skip-fix as you here. It looks like the new version of requests handles its sessions data with a different object. I never liked using cPickle here–I think I saw it working somewhere else. I will keep an eye on this, and likely will have to swallow resetting some sessions when I end up getting the new requests, or look for a better way of serialising this data.
>>10303 wxPython is the ui library I use. It is a python 'version' of wxWidgets. It draws all the windows and buttons and so on. When I updated wx, I went from v 3.x to 4.x, which was an important shift from one way of talking to the C++ layer to another. The API changed a bit, and bad code was punished more, so it needed a bunch of work. This will be the actual language of python, which is what all of hydrus is written in. Python is doing a slow-motion titanic shift from some old ideas in 2 to some new ones in 3. I am on 2.7 now and will move to 3.6 or 3.7, depending on which works best for what we need. I don't know exactly how much work it will take, but it'll mostly be boring syntax changes and rebuilding libraries and the PyInstaller frozen exe stuff. It is possible to have a program work in 2 and 3 at the same time, but I don't feel I have the maintenance time or technical competence to try for that.
For ugoira, as with a lot of these file format issues, and much like mp3 vs flac, different people prioritise different things. I think both sides of 'just make it webm m8' and 'don't degrade any data' are reasonable, so I don't want to wade into this and force it either way. It'll be on the big things to work on next poll, and I expect to add auto-convert options for those who want it. I can't talk too intelligently as I still need to play with making variable frame rate videos with ffmpeg, whether you can deliver it a list of millisecond durations per frame or you have to figgledy-digglety-doo it a raw animation with dozens of duplicate frames and let it do vfr on its own.
>>10310 >>10311 This is interesting. My general personal feeling is that the value of a work is in how it makes your brain feel. A 0.2% difference in pixel colour vibrancy is unnoticeable in 99% of cases, so while it is desireable, the return on investment is a sinkhole. I don't really care about exact byte-for-byte preservation of content except when it helps us on the technical side of things like lining hashes up. And with Cloudflare et al increasingly fucking us anyway, trying to preserve 'True' originals may be a losing battle. I feel that the coming 20 years will bring us many sorts of waifu2x that will be able to do denoising and other kinds of reverse-entropy better than the best manga cleaner alive today. Or even tools that let you say 'take this image, and put a artist-x-style penis on it', which if it works as a tech, opens up a whole can of worms of 'personal media', especially if it can be generated in anything close to real time. In any case, we'll want to update all our old 24bit 640x480 85% jpegs up to 48bit SuperK™ hyperimages, and all the bad decisions people are making now on jpg/png will be erased to a Good Enough standard by cleverer machines. I think this will become more significant as we push into 4k and start getting areas of media that are higher quality than our dusty screens and smudgy glasses and imperfect eyes can see. That feels like some rambly bullshit, but tl;dr: I think we'll fix all this shit later with better tools. Watch this with the sound off–I thought the tech was great: https://www.youtube.com/watch?v=Mtz3KHWn1W8
>>10326 I may be misunderstanding your problem here–do you mean to start a new gallery query paused? You can't do this right now, but I've thought of it, maybe with a cog button on the downloader panel. If you just want to pause your galleries from downloading files for a bit (but still let them do the gallery search, to build up their queue), select them in the downloader list right after you add them and click pause/play files. You can select and affect multiple queues at once by shift- and ctrl-clicking.
>>10334 Thanks. Yeah, I used to wangle a low-resolution URL for the NSFW DA images by picking some tumblr share link apart, but that got shut down and now you have to be logged in (or age-gate click-through, I think?) to load any NSFW image from DA. I think I maybe skip those urls automatically in the parser? I expect to roll out some login scripts for DA and the other sites like this in the near future to let you access this content (and fix the parser, if need be). Please give them a go and let me know how it works for you.
>>10341 Don't worry about it–that number is inflated because I am currently tracking groups with pairs. You get the number of combinations (as in combinations and permutations), which will reduce in size similarly faster than the number you process. There's a bunch of n(n-1) in there, essentially, when the true number is n. At some point I expect to move the system to true group tracking, at which point many of the inflated numbers and unoptimised actions here will work a hell of a lot better.
>>10296 You were right about it defaulting to python 3. What actually happened was, I checked the python version and it displayed the 2.7 version. So my system defaults to 2.7. However as soon as I start venv, it defaults to 3.5. I now installed the packages again using pip2, and tried running it with python2. Now it shows the splash screen and starts initializing stuff, but then fails with this; "ImportError: No module named ordered_dict" I randomly tried installing packages that sounded like they might be related to ordered dicts using pip2: ordereddict persistent-ordered-dict odict But none of them solved the error. Do you have any idea what I'm missing?
>>10377 The new version of requests can't unpickle a session object from the old version of requests. Try v327, it should be fixed (by clearing your old sessions, argh).


Forms
Delete
Report
Quick Reply