/hydrus/ - Hydrus Network

Archive for bug reports, feature requests, and other discussion for the hydrus network.

Index Catalog Archive Bottom Refresh
Name
Options
Subject
Message

Max message length: 12000

files

Max file size: 32.00 MB

Total max file size: 50.00 MB

Max files: 5

Supported file types: GIF, JPG, PNG, WebM, OGG, and more

E-mail
Password

(used to delete files and posts)

Misc

Remember to follow the Rules

The backup domains are located at 8chan.se and 8chan.cc. TOR access can be found here, or you can access the TOR portal from the clearnet at Redchannit 3.0.

8chan Karaoke Night!

8chan.moe is a hobby project with no affiliation whatsoever to the administration of any other "8chan" site, past or present.

(12.80 KB 480x360 iLnEHzpURXk.jpg)

Version 324 hydrus_dev 09/26/2018 (Wed) 19:46:58 Id: fd51ee No. 10089
https://www.youtube.com/watch?v=iLnEHzpURXk windows zip: https://github.com/hydrusnetwork/hydrus/releases/download/v324/Hydrus.Network.324.-.Windows.-.Extract.only.zip exe: https://github.com/hydrusnetwork/hydrus/releases/download/v324/Hydrus.Network.324.-.Windows.-.Installer.exe os x app: https://github.com/hydrusnetwork/hydrus/releases/download/v324/Hydrus.Network.324.-.OS.X.-.App.dmg tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v324/Hydrus.Network.324.-.OS.X.-.Extract.only.tar.gz linux tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v324/Hydrus.Network.324.-.Linux.-.Executable.tar.gz source tar.gz: https://github.com/hydrusnetwork/hydrus/archive/v324.tar.gz I had a great week. The downloader overhaul is almost done. pixiv Just as Pixiv recently moved their art pages to a new phone-friendly, dynamically drawn format, they are now moving their regular artist gallery results to the same system. If your username isn't switched over yet, it likely will be in the coming week. The change breaks our old html parser, so I have written a new downloader and json api parser. The way their internal api works is unusual and over-complicated, so I had to write a couple of small new tools to get it to work. However, it does seem to work again. All of your subscriptions and downloaders will try to switch over to the new downloader automatically, but some might not handle it quite right, in which case you will have to go into edit subscriptions and update their gallery manually. You'll get a popup on updating to remind you of this, and if any don't line up right automatically, the subs will notify you when they next run. The api gives all content–illustrations, manga, ugoira, everything–so there unfortunately isn't a simple way to refine to just one content type as we previously could. But it does neatly deliver everything in just one request, so artist searching is now incredibly faster. Let me know if pixiv gives any more trouble. Now we can parse their json, we might be able to reintroduce the arbitrary tag search, which broke some time ago due to the same move to javascript galleries. twitter In a similar theme, given our fully developed parser and pipeline, I have now wangled a twitter username search! It should be added to your downloader list on update. It is a bit hacky and may be ultimately fragile if they change something their end, but it otherwise works great. It discounts retweets and fetches 19/20 tweets per gallery 'page' fetch. You should be able to set up subscriptions and everything, although I generally recommend you go at it slowly until we know this new parser works well. BTW: I think twitter only 'browses' 3200 tweets in the past, anyway. Note that tweets with no images will be 'ignored', so any typical twitter search will end up with a lot of 'Ig' results–this is normal. Also, if the account ever retweets more than 20 times in a row, the search will stop there, due to how the clientside pipeline works (it'll think that page is empty). Again, let me know how this works for you. This is some fun new stuff for hydrus, and I am interested to see where it does well and badly. misc In order to be less annoying, the 'do you want to run idle jobs?' on shutdown dialog will now only ask at most once per day! You can edit the time unit under options->maintenance and processing. Under options->connection, you can now change max total network jobs globally and per domain. The defaults are 15 and 3. I don't recommend you increase them unless you know what you are doing, but if you want a slower/more cautious client, please do set them lower. The new advanced downloader ui has a bunch of quality of life improvements, mostly related to the handling of example parseable data. full list - downloaders: - after adding some small new parser tools, wrote a new pixiv downloader that should work with their new dynamic gallery's api. it fetches all an artist's work in one page. some existing pixiv download components will be renamed and detached from your existing subs and downloaders. your existing subs may switch over to the correct pixiv downloader automatically, or you may need to manually set them (you'll get a popup to remind you). - wrote a twitter username lookup downloader. it should skip retweets. it is a bit hacky, so it may collapse if they change something small with their internal javascript api. it fetches 19-20 tweets per 'page', so if the account has 20 rts in a row, it'll likely stop searching there. also, afaik, twitter browsing only works back 3200 tweets or so. I recommend proceeding slowly. - added a simple gelbooru 0.1.11 file page parser to the defaults. it won't link to anything by default, but it is there if you want to put together some booru.org stuff - you can now set your default/favourite download source under options->downloading - . - misc: - the 'do idle work on shutdown' system will now only ask/run once per x time units (including if you say no to the ask dialog). x is one day by default, but can be set in 'maintenance and processing'
[Expand Post]- added 'max jobs' and 'max jobs per domain' to options->connection. defaults remain 15 and 3 - the colour selection buttons across the program now have a right-click menu to import/export #FF0000 hex codes from/to the clipboard - tag namespace colours and namespace rendering options are moved from 'colours' and 'tags' options pages to 'tag summaries', which is renamed to 'tag presentation' - the Lain import dropper now supports pngs with single gugs, url classes, or parsers–not just fully packaged downloaders - fixed an issue where trying to remove a selection of files from the duplicate system (through the advanced duplicates menu) would only apply to the first pair of files - improved some error reporting related to too-long filenames on import - improved error handling for the folder-scanning stage in import folders–now, when it runs into an error, it will preserve its details better, notify the user better, and safely auto-pause the import folder - png export auto-filenames will now be sanitized of \, /, :, *-type OS-path-invalid characters as appropriate as the dialog loads - the 'loading subs' popup message should appear more reliably (after 1s delay) if the first subs are big and loading slow - fixed the 'fullscreen switch' hover window button for the duplicate filter - deleted some old hydrus session management code and db table - some other things that I lost track of. I think it was mostly some little dialog fixes :/ - . - advanced downloader stuff: - the test panel on pageparser edit panels now has a 'post pre-parsing conversion' notebook page that shows the given example data after the pre-parsing conversion has occurred, including error information if it failed. it has a summary size/guessed type description and copy and refresh buttons. - the 'raw data' copy/fetch/paste buttons and description are moved down to the raw data page - the pageparser now passes up this post-conversion example data to sub-objects, so they now start with the correctly converted example data - the subsidiarypageparser edit panel now also has a notebook page, also with brief description and copy/refresh buttons, that summarises the raw separated data - the subsidiary page parser now passes up the first post to its sub-objects, so they now start with a single post's example data - content parsers can now sort the strings their formulae get back. you can sort strict lexicographic or the new human-friendly sort that does numbers properly, and of course you can go ascending or descending–if you can get the ids of what you want but they are in the wrong order, you can now easily fix it! - some json dict parsing code now iterates through dict keys lexicographically ascending by default. unfortunately, due to how the python json parser I use works, there isn't a way to process dict items in the original order - the json parsing formula now uses a string match when searching for dictionary keys, so you can now match multiple keys here (as in the pixiv illusts|manga fix). existing dictionary key look-ups will be converted to 'fixed' string matches - the json parsing formula can now get the content type 'dictionary keys', which will fetch all the text keys in the dictionary/Object, if the api designer happens to have put useful data in there, wew - formulae now remove newlines from their parsed texts before they are sent to the StringMatch! so, if you are grabbing some multi-line html and want to test for 'Posted: ' somewhere in that mess, it is now easy. next week After slaughtering my downloader overhaul megajob of redundant and completed issues (bringing my total todo from 1568 down to 1471!), I only have 15 jobs left to go. It is mostly some quality of life stuff and refreshing some out of date help. I should be able to clear most of them out next week, and the last few can be folded into normal work. So I am now planning the login manager. After talking with several users over the past few weeks, I think it will be fundamentally very simple, supporting any basic user/pass web form, and will relegate complicated situations to some kind of improved browser cookies.txt import workflow. I suspect it will take 3-4 weeks to hash out, and then I will be taking four weeks to update to python 3, and then I am a free agent again. So, absent any big problems, please expect the 'next big thing to work on poll' to go up around the end of October, and for me to get going on that next big thing at the end of November. I don't want to finalise what goes on the poll yet, but I'll open up a full discussion as the login manager finishes.
>>10091 I am afraid it all works on separate downloaders now. You might have some success changing the 'presentation' options under file import options, which you can set defaults for under options->importing. Tell it only to present 'new files', and your additional queries will not display those already imported. I have all my imports work like this and find it very helpful for advanced workflows.
How can I fix this?
>>10092 Thanks. Hydrus should be getting the :orig version. I don't know if this is truly untouched (the files are suspiciously small), but I appreciate they have anything. You can check you are getting the right :orig url by right-clicking the file in hydrus and looking at 'known urls'. It should have the tweet and the raw file url.
>>10116 Please try running client_debug.exe in the install_dir folder. It might be because you are on a 32-bit system, which is presently unfixable, or it could be a different library import error.
>>10118 client_debug.exe runs for a moment and then quits without resolving the issue. Does it print its text to a file somewhere? Also, I'm on 64-bit Windows 7.
>>10118 >>10119 Just found the "help my client will not boot" file
>>10099 Unfortunately not. Subs are designed to work on lists that expand statically, with new files coming first. They work best for artist feeds (i.e. like you are subscribing to the artist). They'll always try to 'catch up' with where they were last sync, so if the file results aren't the same feed or aren't coming in descending post order, it'll likely hit its periodic limit and/or complain. I think you should be able to get this to work though, if you don't mind a popup every day or whatever. Can you expand on how you think it is resuming from the wrong point? Could your derpi domain currently be bandwidth-blocked right now? Subs I think have a 256MB limit per day, typically, so if these are all big pngs, you might be getting choked there–i.e. your sub is fetching the 60 new URLs per day, but it still needs to pursue stuff from days ago, and can never catch up. Check your sub in network->data->review bandwidth. You might like to go into its entry there and give it more daily bandwidth, if that seems to be the problem. Then make sure its checker timings are to check once a day, statically, with a periodic file limit that makes sense for the query you are hitting. Let me know if you figure it out. I didn't design it for this, but if a small tweak on my end can make it work better, I'd love to know.
>>10120 Thanks. Twisted is my network library. It runs the server and can run an optional booru off the client. It doesn't do anything by default for the client, but I think it has to establish a loopback interface as I import it so, afaik, it can talk to itself. It looks like that is what could be failing here–do you have a super paranoid firewall setting that stops programs from opening loopback ports?
>>10108 Yeah, I hope to brush this up sometime. It was the prototype for the modern parsing system, but it has since got left behind. I don't know when I will fit it in–can you say what you would most like me to do to it?
>>10110 Thanks, this is very interesting. It seems bonkers that subsequent regens would re-break it–that suggests the thumbnail generator is non-deterministic. I wonder if instead the thumbs are regenning right, but the thumbnail cache is getting mixed up trying to delete the old ones. Can you post that Ciri(?)+Monster webm here, or point me to the url, so I can try it my end? That appears to be the only one that actually reverted. Maybe the two lara ones were just delayed somehow in updating, although you can post them too if it is convenient.
>>10111 Thank you for this report. I will check this and fix it this week.
>>10114 The whole routine for "ignore retweets, only download images" can be entirely circumvented by going to https://twitter.com/user/media and that gets you only files they uploaded.
(12.71 KB 496x341 ClipboardImage.png)

>>10125 example of what i meant
>>10124 Here it is I did the same thing with 40 webms with bad thumbnails and about 10 didn't regen a good thumbnail. I tried restarting the client to force it to reload the thumbnails to see if there was some error in displaying them, but that changed nothing. I did a regen on all the webms again and this time the ones that failed last time were fine and none went back to being broken. I did a regen on all of them 8 or so more times to see if some went back to being broken, and a few did, then getting fixed again the next time while possibly some other ones broke again. I picked another 45 webms with good thumbnails that as far as I know never had a bad thumbnail, and did a regen on all of them. Eventually a few of them got bad thumbnails. I'm thinking it probably hasn't got anything to do with the webm files themselves, it seems like any one can get a bad thumbnail if you try to regenerate them enough times.
>>10127 Thanks, I hadn't thought of looking at that. It uses a slightly different api call, but as far as I can tell, the json is in the same format. I will see about rolling out a more efficient twitter searcher this week!
>>10139 Thanks. I am afraid I was unable to reproduce this problem, either on that webm or doing repeated regens on larger multiple selections. I have never seen the colour shift and 'quarter displacing' that you are seeing here before, so I am wondering if instead your machine has a combination of factors that could somehow randomly produce this. Do you ever see the 'quartered' frames when you just look at these videos in hydrus? If so, are they ever something special, like the first or last frames of the videos? Can you check help->about and tell me what your ffmpeg version is? Mine, the one that should come with any windows hydrus build, is 4.0. Under options->media, what's your 'video thumbnail %' at? I tried 25, 35, 50, had no problems–are you at 0 or 100, or anything else special? If you regen some still images, do you ever get this same problem? How about gifs? If you turn on help->debug->profile modes->pubsub profile mode (warning, don't leave this on for too long, it is spammy) and then regen a video's thumb, is there a critical pattern of [ modal_message, clear_thumbs, new_thumbs, new_file_info ] buried in the mess of set_status_bar_dirty?
(370.53 KB 1627x1775 client_2018-09-30_21-01-36.png)

>>10121 I have all my traffic refresh every few seconds with a several gb threshold so its not that as for what its doing. I have no idea It looks like it starts on page 1, then goes to page 2 sometimes but I do see something I didn't anticipate which at least explains why im getting them in random orders, its searching based on creation date rather then by a sort order. if I have to take a guess as to what's happening, the thing sees already known images in its search and stops, but sometimes a new image comes into the first page so it goes to the second also, I have this sub set to allow for 1000 images a check, given that this page never goes over 300 images, all of which are the 3 day best of with a 150 like threshold, the odds of it ever hitting a daily check limit of 60 would be a bit difficult for it. I also have the checker a bit aggressive due to everything being timed, some images may just pass the 150 like window on their 2nd day and if I let it check at its own pace it may miss some, so I have it run every 6 hours, this way I shouldnt miss anything good. I don't know how useful it would be for most other boorus, but a 're grab everything' would probably fix this, I don't know how many others allow you to sort by file age and rating, so it would be a very limited in scope option.
>>10115 Thanks, I'll try playing around with that.
>>10146 >Do you ever see the 'quartered' frames when you just look at these videos in hydrus? I tried viewing some of the ones with bad thumbnails and they have no problem playing in hydrus and show no visual errors. >Can you check help->about and tell me what your ffmpeg version is? It's 4.0 >Under options->media, what's your 'video thumbnail %' at? 35 >If you regen some still images, do you ever get this same problem? How about gifs? Nope >If you turn on help->debug->profile modes->pubsub profile mode… Ok, I had a tab with 40 webms with about 5 of them having broken thumbnails. Turned on the profile mode, ran a regen on all of them, turned the profile mode off. One of them (the 24th) generated with a bad thumbnail, the rest were fine. https://pastebin.com/VFVbHHwr If I regen one webm at a time it's very rare that it ends up being a bad thumbnail but it has happened a couple of times.
Lol, thought this was funny So i'm going through all the threads from when the tornado took power out and hydrus would shit itself on to many things to do at once, I had a good 870 threads to go through that weren't 404 but did 404 during the program not working time. So far, I have gone though 30830 images, of those only 1355 were missed so far, and there are still another 42000 images to parse. once this gets to /gif/ images that failure rate is going to go up dramatically.
>>10123 You could start with making sure it works with every booru because right now it doesn't with most of them
You should really allow to check threads faster than once per minute, even if you personally disagree with that. Currently a single check per minute is the fastest you can get (as far as I can tell). Even then official 4chan API documents (github) states that updating a thread every 5 seconds is okay. Please at least allow this, there's no need to make it the default.
>>10160 Nevermind, you don't even allow that, you _force_ it to at least 3 minutes for static checks. Could you please stop doing that? Let people configure the update interval to how they want. This is especially important on fast-moving boards like /b/ where threads quickly 404.
few releases ago, I asked about a way to lock the tabs in place, so they don't jump around, the issue seems to be getting worse for me where I click on one tab, it jumps out of the page of pages, and is generally a pain in the ass to put back inside. just a menu tab to lock it down would be nice, that way I can unlock when I want to re organize, but other then that, they stay put.
>>10089 How do you fags deal with "continuous content" when scraping from booru sites, sometimes something like picrel happens where the images are out of order sorry if i'm asking in the wrong thread
>>10147 Thanks for this feedback. I will keep this in mind. >>10151 Thanks. That pastebin looks all correct, so I am not sure what is going on here. Now the downloader overhaul is done, I expect to revisit the larger-scale en masse reparsing maintenance stuff, which will do this regen stuff in idle time. Please let me know how that works for you and if you discover anything new. >>10155 There's some more here, although I didn't make them, so I don't know how well they work atm: https://github.com/CuddleBear92/Hydrus-Presets-and-Scripts/tree/master/Scripts%20-%20Tags >>10160 >>10161 I am sorry you have had trouble here. I don't want to set the minimum time period too low because the gains are limited and it wastes a lot of CPU and bandwidth both clientside and serverside. It can take a decent fraction of a second to parse and process a thread's json, so if you had perhaps 100 good-size threads all checking every, say, ten seconds, you could easily accidentally fall into a situation where your GUI hangs due to all the threads doing redundant json parsing over and over. At the moment, for dynamic checking, the minimum check time ('never check faster than') should be 30 seconds. If it is stuck on 1 minute for you, try just lowering it to 0, and the 30s should fill in automatically–let me know if it doesn't. I feel this is a good throttle to stop hydrus being overwhelmed. If you set the min time to 30s and 'intended new files per check' to 1, then if your /b/ thread is getting at least one new file per 30s, the client will automatically speed up its checks to check every 30s. And then, when the thread slows down, it will throttle back (up to your max check time, whether than is 10 mins or 24 hours) and save you a lot of waste. Are these /b/ threads on 4chan? I haven't been there in a long time–what's a typical thread age? Is the difference between 5 and 30 seconds very important, or is it ok? Or are the images posted in spikes, so the dynamic checking missing the ends of threads because it thought it had slowed down? I am willing to reduce the min time further for advanced users, or alter how the file velocity is calculated, but only if it actually makes technical sense.
>>10173 Thank you for this report. Do you have a busy client? I have noticed this happening during little gui hangs. I have a plan to fix it, and if that doesn't work, I'll add a way to turn off tab DnDs, or only enable with a shift+click or something.
>>10177 Just hang on to them for now and set them as 'alternate' duplicates when convenient. Delete anything you don't like. My duplicates system typically recognises these kinds of related images, but there is no workflow for arranging them into something more intelligent like parent-child or sibling or multi-page relationships. When I next reiterate over the duplicates system, I will add this and we'll be able to process all the 'alternates' into something better.
>>10177 >>10181 More info and examples on duplicate handling here: https://hydrusnetwork.github.io/hydrus/help/duplicates.html
Here are some small issues I've been wondering about. I'm running Hydrus on Ubuntu 16.04. It's running off a RAID-5 volume on a NAS, which is accessed over NFS. Performance is fine, however it sometimes crashes with the error message "client: Fatal IO error 11 (Resource temporarily unavailable) on X server :0." showing in the terminal that I used to run it. Is this happening due to the latency of the network connection? Can Hydrus be made more tolerant to latency? I would like to keep everything on the NAS, since this makes backups and integrity checking easier for me. The second issue might be related to the downloaders. I've been getting empty thumbnails for Pixiv posts that I deleted from Hydrus. (Pic related.) I wonder if the downloader added those entries back?
>>10185 Thanks. This 'resource unavailable x server' stuff is related to wx, my ui library, which still has some crash bugs for linux. It is related to some conflict between my package and your OS (although I build on Ubuntu 16.04, so I don't know what's precisely going on there). Running from source, if you are prepared to put a bit of time into figuring it out, seems to clear them up completely: https://hydrusnetwork.github.io/hydrus/help/running_from_source.html If you decide to try it, let me know if you run into any trouble. Some users on the discord do this and can help as well. Those missing thumbs are unusual. That's how the client displays non-local content (including deleted stuff), but it doesn't usually appear in downloaders, which typically filter out deleted content. Is this a regular download page, and the hydrus thumbs are appearing naturally as the other files stream in, or is it a more unusual operation like going into the file import status and right-clicking->open in new page?
>>10187 Thanks, I might try running from source. How does this impact the performance? Oh, it might not be such a big problem after all. It's the import page that automatically opened to show new downloaded images. I must have kept the tab open. When I open a regular search page for the artist it doesn't show the deleted entries anymore. I guess the import page has a fixed list of entries it displays?
>>10187 I've had this issue ever since the big UI update/rewrite. Running from source does not help, but it doesn't seem to hurt anything either. I'm running Linux Mint 18.2, which uses Ubuntu 16.04 for its package base. On an unrelated note, ever since I updated from v304 to v324, my console logs an error on startup. It says, "Could not import lz4." AFAICT, everything works, but I don't know what lz4 does so I don't know if it's just because lz4 code is never called during my normal use of Hydrus.
>>10189 I get that error too. But lz4 is just some obscure type of compression that's not used in any file formats I know, and it doesn't seem to cause problems, so I didn't bother reporting it.
>>10188 Should be about the same, as the frozen exe is just spinning up a python instance and running the same code, afaik. Boot should be a bit faster since it doesn't have to mess around with that bootstrap environment shit. That's interesting that that page kept the files through the trashed->full delete transition. They should be removed automatically.. That suggests the underlying file context of those pages is 'everything' rather than 'local'. Or maybe they are loading from db subsequently in a non-filtered way. I'll check this, thank you for reporting it! >>10189 >>10190 Thanks. If you figure out any new/unusual AssertionErrors getting put to your log (not the GTK-critical stuff, which doesn't give me enough info, but a proper python/wx error trace), please let me know. There are still some instabilities in the linux code, but I don't always get a good traceback in my environment. lz4 isn't a big deal. I've been unable to package it properly for a few months now due to a PyInstaller update. I burned some hours on it and couldn't figure it out, so I will revisit after the python3 update.
>>10205 Okay, I will have a look at running the source then.


Forms
Delete
Report
Quick Reply