/hydrus/ - Version 325

/hydrus/ - Hydrus Network

Archive for bug reports, feature requests, and other discussion for the hydrus network.

Mode: Reply

Name
Options
Subject
Message	Max message length: 12000
files	Drag files here to upload or click here to select them 0.00 / 50.00 MB Max file size: 32.00 MB Total max file size: 50.00 MB Max files: 5 Supported file types: GIF, JPG, PNG, WebM, OGG, and more

E-mail
Password	(used to delete files and posts)
Misc

Remember to follow the Rules

The backup domains are located at 8chan.st and 8chan.cc. TOR access can be found here, or you can access the TOR portal from the clearnet at Redchannit 3.0 (Temporarily Dead).

Christmas Collaboration Event
Volunteers and Ideas Needed!

.se is now at .st!
Update your bookmarks

8chan.moe is a hobby project with no affiliation whatsoever to the administration of any other "8chan" site, past or present.

Version 325 hydrus_dev 10/03/2018 (Wed) 21:47:37 Id: 923859 No. 10186

https://www.youtube.com/watch?v=OqEBF2F-4z8 windows zip: https://github.com/hydrusnetwork/hydrus/releases/download/v325/Hydrus.Network.325.-.Windows.-.Extract.only.zip exe: https://github.com/hydrusnetwork/hydrus/releases/download/v325/Hydrus.Network.325.-.Windows.-.Installer.exe os x app: https://github.com/hydrusnetwork/hydrus/releases/download/v325/Hydrus.Network.325.-.OS.X.-.App.dmg tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v325/Hydrus.Network.325.-.OS.X.-.Extract.only.tar.gz linux tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v325/Hydrus.Network.325.-.Linux.-.Executable.tar.gz source tar.gz: https://github.com/hydrusnetwork/hydrus/archive/v325.tar.gz I had a difficult week, but I got some great work done. Save for some final help revisions, the downloader overhaul is complete. final downloader work So, I managed to finish 13 of my 15 final jobs in the downloader overhaul. All that remains is a help pass for subscriptions and a better intro to gallery and watcher downloading, which I will fold into normal work over the coming weeks. This has been a longer journey than I expected, but I feel great to be done. This final work is mostly unusual stuff that got put off. For instance, subscriptions can now run without a working popup (i.e. completely in the background)! It works just like import folders, as a per-subscription checkbox option, and still permits the final files being published to a popup button or a named page. I recommend only trying this after the initial sync has completed, just so you know the sub works ok (and isn't accidentally downloading 2,000 garbage files in the background!). Also, subscription queries can now take an optional 'display name'. This display name will appear in lieu of the actual query text in most display contexts, like the edit sub panel or a popup message or a publishing destination. A query for pixiv_artist:93360 can be more neatly renamed to and managed as 'houtengeki', and 'xxxxxxxx' can be renamed 'family documents, DO NOT ENTER' and so on. And subscription queries now have individual tag import options that only support 'additional tags'. So, if you want to give a particular query a blog-related creator tag or a personal processing tag, this is now simple. If you 'try again' on a 'deleted' file import, the client will now ask if you want to erase that deletion record first (i.e. overriding it and importing anyway)! This is obviously much quicker and simpler than having to temporarily edit the file import options to not exclude previously deleted. Gallery and Watcher pages now have quick 'retry failed' buttons and list right-click menu entries. advanced stuff If you are in advanced mode, subscription edit panels now have a 'get quality info' button. If you select some queries and hit this (oh fug, I just tested it IRL and discovered it does it for all queries, not just selected, wew, I will fix this for next week), the client will do some hacky db work and present you with a summary of how many files currently in those queries are inbox/archive/deleted, and a percentage of archived/(archived+deleted)–basically "after processing, you kept 5% of this query". This should help you figure out which queries are actually 'good' for you and which are just giving you 98% trash. I can do more here, but this is just a quick prototype. Feedback would be appreciated. The downloader easy-import pngs now support custom http headers and bandwidth rules! This is a bit experimental, so test it a bit please before you roll it out for real. If you have custom headers or specific bandwidth rules for the domains in your export downloaders gugs, they will be added automatically, and there's a button to add them separately as well. Exporters and importers will get detailed previews of what these new 'domain metadata' objects include. If you are in advanced mode, file import options now have options to turn off url- and hash-based 'skip because already in db/previously deleted' checks. They are basically a "I don't care what you think the url is, just download it anyway and see if it is a new file m8". If you have a particular url conflict that was causing an incorrectly skipped download that I have previously discussed with you, please try these options and reattempt the problem file. Don't use them for regular downloads and subs, or you'll just be wasting bandwidth. Advanced file import options now also allow you to turn off source url association completely. full list - added a 'show a popup while working' checkbox to edit subscription panel–be careful with it, I think maybe only turn it off after you are happy everything is set up right and the sub has run once - advanced mode users will see a new 'get quality info' button on the edit subscription panel. this will some ugly+hacky inbox/archived/deleted info on the selected queries to help you figure out if you are only archiving, say, 2% of one query. this is a quickly made but cpu-expensive way of calculating this info. I can obviously expand it in future, so I would appreciate your thoughts - subscription queries now have an optional display name, which has no bearing on their function but if set will appear instead of query text in various presentation contexts (this is useful, for instance, if the downloader query text deals in something unhelpful like integer artist_id) - subscription queries now each have a simple tag import options! this only allows 'additional tags', in case you want to add some simple per-query tags - selecting 'try again' on file imports that previously failed due to 'deleted' will now pop up a little yes/no asking if you would like to first erase these files' previously deleted file record! - the watcher and gallery import panels now have 'retry failed' buttons and right-click menu entries when appropriate - the watcher and gallery import panels will now do some ui update less frequently when they contain a lot of data - fixed the new human-friendly tag sorting code for ungrouped lexicographic sort orders, where it was accidentally grouping by namespace - downloader easy-import pngs can now hold custom header and bandwidth rules metadata! this info, if explicitly present for the appropriate domain, will be added automatically on the export side as you add gugs. it can also be bundled separately after manually typing a domain to add. on the import side, it is now listed as a new type. longer human-friendly descriptions of all bandwidth and header information being bundled will be displayed during the export and import processes, just as an additional check - for advanced users, added 'do not skip downloading because of known urls/hashes' options to downloader file import options. these checkboxes work like the tag import options ones–ignoring known urls and hashes to force downloads. they are advanced and should not be used unless you have a particular problem to fix

[Expand Post]

- improved how the pre-import url/hash checking code is compared for the tag and file import options, particularly on the hash side - for advanced users, added 'associate additional source urls' to downloader file import options, which governs whether a site's given 'source urls' should be added and trusted for downloaded files. turn this off if the site is giving bad source urls - fixed an unusual problem where gallery searches with search terms that included the search separator (like '6+girls skirt', with a separator of '+') were being overzealously de/encoded (to '6+girls+skirt' rather than '6%2bgirls+skirt') - improved how unicode quoted characters in URLs' query parameters, like %E5%B0%BB%E7%A5%9E%E6%A7%98 are auto-converted to something prettier when the user sees them - the client now tests if 'already in db' results are actually backed by the file structure–now, if a the actual file is missing despite the db record, the import will be force-attempted and the file structure hopefully healed - gallery url jobs will no longer spawn new 'next page' urls if the job yielded 0 _new_ (rather than _total_) file urls (so we should have fixed loops fetching the same x 'already in file import cache' results due to the gallery just passing the same results for n+1 page fetches) - in the edit parsing panels, if the example data currently looks like json, new content parsers will spawn with json formulae, otherwise they will get html formulae - fixed an issue with the default twitter tweet parser pulling the wrong month for source time - added a simple 'media load report mode' to the help debug menu to help figure out some PIL/OpenCV load order stuff - the 'missing locations recovery' dialog that spawns on boot if file locations are missing now uses the new listctrl, so is thankfully sortable! it also works better behind the scenes - this dialog now also has an 'add a possibly correct location' button, which will scan the given directory for the correct prefixes and automatically fill in the list for you - fixed some of the new import folder error reporting - misc code cleanup next week Now I will finish a simple login manager. Fingers crossed, I hope to spend a total of three to four weeks on it. I don't expect I'll have anything interesting ready for it for v326, but maybe I'll have some dummy ui for advanced users to play with. Thanks everyone!

Anonymous 10/04/2018 (Thu) 11:46:07 Id: 64b25b No. 10192

>>10183 Regarding this, I must be blind, I can't find any mention of it. Still thank you for trying to address it and not taking it the wrong way, I just felt a bit iffed after I changed all my watcher tabs and then noticed that a limit was set that I didn't specify.

Anonymous 10/04/2018 (Thu) 14:26:00 Id: 5a6573 No. 10193

>>10180 yea client is busy, even without noticeable hangs it happens, though far less likely. >>10177 Personally have quite a few of these, but because they were mostly acquired through artist grabs, they are mostly findable without hassle. I'm waiting on hdev to do something so a deleted image can be notated in a way that I can easily see why something was removed. If I have to hotkey things like Low quality Have better Prefer alternate and label images, so fucking be it. but being able to see why I got rid of something needs to happen before I can move forward with culling duplicates, and moving on to general file processing seeing as I have half my db in the dup finder (a fuck up won't let the whole thing be in it at the moment) but still have nearly 180k images, and at least 170k before once combined those are well over 300k images, and even then that's assuming no new dups are found. but that's just me, if a set is worth keeping everything in i'm not really going to put it in hydrus, i would keep it seperate. ______ On a side note, hdev, just tried the forget hash and urls, didn't work for for grabbing all of first_seen_at.gt:3 days ago, upvotes.gte:150 From derpibooru, however cache rests seem to. ill have to let it sit for a while so I can get a bigger pool of potential images,

Anonymous 10/05/2018 (Fri) 06:49:27 Id: 5a6573 No. 10195

>>10193 yea, back on their derpibooru again, the search im doing is also very weird. first_seen_at.gt:3 days ago that one is explanatory, only images uploaded with in the last 3 days and upvotes.gte:150 only images that get 150 upvotes, so I think I figured out what's happening, going to open a search and then do a gallery grab to confirm, Ok, tell me if this sounds reasonable… ok, never mind, im fuckin retarded, the problem is right there on >>10147 "Found 5 previously seen urls, so assuming we caught up" Now on a normal search this would be the case and it works good for the most part as anything that has a few already known images would most likely be caught up, however this search is capable of dumping images from days ago in as well as brand new images, so unless a fuckload of brand new ones come in, it will stop the search. currently the oldest image in this search JUST passed the 150 threshold by 1 image, granted due to its content its an image I would rather not have, but that's a different issue altogether that will be solvable once a universal login system is in place. Now the ignore image has and urls while it works, the sub still stops once it sees 5 already in db images, and it will do this unless I reset the cache. knowing what is the issue now, the way to fix it would be an 'ignore assumptions' option, as the program is quite literally telling me it's assuming it caught up because of other images it sees, and the nature of this search makes it so every page can have both new and old images. I honestly cant think of much use for this outside of derpi, and even then this specific search, at the moment, there were a few places that would just dump porn images I use to go to, I think ftp servers or something like that, that would randomly get new images, if you could subscribe to a place like that through a simple downloader 'all images embedded' filter or something, it could be useful there, but yea.

Anonymous 10/06/2018 (Sat) 02:11:34 Id: 2883a6 No. 10198

What's the deal with the new gallery downloader not showing results? I'm assuming there's an option I've overlooked somewhere

Anonymous 10/06/2018 (Sat) 07:56:56 Id: 96011d No. 10202

>>10198 You select one of your downloads and click "highlight" or double click on any of your downloads in the top left under "gallery downloader". Its that way to reduce too much strain on hydrus where instead of having multiple tabs with thousands of images slowing it down, it will just not load up everything until you highlight it. If that makes sense.

Anonymous 10/06/2018 (Sat) 13:49:40 Id: 5a6573 No. 10204

>>10202 It also resets the sort order if you accidently change it or do something to the files, so clearing and rehighlighting will fix an accident. ————————– That said, hdev, I had an idea. So im downloading a gallery of artists stuff, while I would like a button for 'just show new' I had an idea, would it be possible to have new images be a displayable under items? possibly a menu option like deleted was, being able to quickly see how many items were found and then also be able to see how many are new would be a greatly appreciated feature for quickly parsing how much new content I aquire.

hydrus_dev Board Owner 10/06/2018 (Sat) 19:19:44 Id: 923859 No. 10206

>>10191 I found the api call did not reliably give a new min_position value in the json, which I need for the next 'gallery page' url. Maybe I was missing something, but my old parser didn't transfer 100% and I didn't have time to put into figuring it out, so I dropped it for now. If you can get it to work, please let me know and I'll fold the new parser or whatever into an update.

hydrus_dev Board Owner 10/06/2018 (Sat) 19:22:08 Id: 923859 No. 10207

>>10192 End of >>10179 np, I'll repost: I am sorry you have had trouble here. I don't want to set the minimum time period too low because the gains are limited and it wastes a lot of CPU and bandwidth both clientside and serverside. It can take a decent fraction of a second to parse and process a thread's json, so if you had perhaps 100 good-size threads all checking every, say, ten seconds, you could easily accidentally fall into a situation where your GUI hangs due to all the threads doing redundant json parsing over and over. At the moment, for dynamic checking, the minimum check time ('never check faster than') should be 30 seconds. If it is stuck on 1 minute for you, try just lowering it to 0, and the 30s should fill in automatically–let me know if it doesn't. I feel this is a good throttle to stop hydrus being overwhelmed. If you set the min time to 30s and 'intended new files per check' to 1, then if your /b/ thread is getting at least one new file per 30s, the client will automatically speed up its checks to check every 30s. And then, when the thread slows down, it will throttle back (up to your max check time, whether than is 10 mins or 24 hours) and save you a lot of waste. Are these /b/ threads on 4chan? I haven't been there in a long time–what's a typical thread age? Is the difference between 5 and 30 seconds very important, or is it ok? Or are the images posted in spikes, so the dynamic checking missing the ends of threads because it thought it had slowed down? I am willing to reduce the min time further for advanced users, or alter how the file velocity is calculated, but only if it actually makes technical sense.

hydrus_dev Board Owner 10/06/2018 (Sat) 19:31:22 Id: 923859 No. 10208

>>10204 That's an interesting idea. That UI is a bit too crushed atm, so I don't really want to add any new columns or data to them until I update the underlying ListCtrl to have some kind of right-click select columns or something. I could then figure out some kind of cached new_files count to optionally show. That said, maybe I could optionally split the current '193/202' into a successful/already in db thing, like 56S137A/202, although this may be too autistic to actually parse. Forgive me if you know this and your question is based around simply showing this in the number, but you can set it to only show new thumbnails under file import options. If you set it to only 'present' new files, then when you double-click, it will only show new, not anything that was 'already in db'. Your subscriptions should default to present/publish like this, and downloader kinosseurs set this for all their import contexts to improve workflow. Check options->importing to alter the default presentation options.

Anonymous 10/07/2018 (Sun) 04:13:41 Id: 2883a6 No. 10209

>>10206 Not sure what the twitter parser is doing differently, but the resulting images don't match ones I've downloaded directly. The urls match as :orig files but the image sizes don't.

Anonymous 10/07/2018 (Sun) 04:53:01 Id: 2883a6 No. 10211

>>10209 Correction: Newer stuff matches, older stuff that I've had sitting around for a while does not. I've also noticed this with Pixiv's from time to time.

Anonymous 10/07/2018 (Sun) 19:48:53 Id: 64b25b No. 10214

>>10207 Hey, thanks for your answer! Yes, they are 4chan /b/ threads, and some of them quickly disappear. It's hard to determine the average thread age, I think it really depends on the content: Loli threads for example tend to disappear very quickly, especially if an overzealous mod is active, while more socially acceptable pornography can easily stay around for multiple hours. To be honest, the threads where I think the 10 seconds check interval makes sense mostly, are the ones being pruned by overzealous mods (against site-specific rules, not to bring too much drama into this) and if the board is very active (for example during the recent birthday), where threads disappeared very quickly. However, it's no issue for me to patch this into Hydrus myself (in ClientGUITime.py), which I have done so far and it works. 30 seconds check interval should theoretically be decent for most things, but I personally prefer to make really sure I got the entire thread, hence checking to the maximum allowed limit (every 10 seconds) seemed appealing to me.

Anonymous 10/08/2018 (Mon) 19:21:02 Id: 5a6573 No. 10215

>>10208 Yea, I know I can set it to only show new images, however that autistic part of my brain always runs back to the 'well what else was in the thread' and refuses to let me do that, mostly this part of my thinking refuses to let me change it to new only. However thinking of it, a new images indicator, that said the worst one I have right now is 189 - 1D1F now you could have it 189/60 - 1D1F 189 - 60N - 1D1F 189 - 60 - 1D1F 189 - 60N1D1F personally I like the first second and third option, as it separates the got it done from the didnt get it done I also like 2 and 3 more because it tells the total images and the new images in a less cluttered way I also like 3 the most of them because if new was its own box you wouldn't need the N and to top it off, you already kind of sort out new from old the only problem is you don't say it till its highlighted How cluttered it is is an issue, but thats why I also suggested it could be an opt in option in the settings, for most things just seeing how many new images there are is enough, and I can just sort by date added and everything is good

Anonymous 10/09/2018 (Tue) 20:22:22 Id: 3fa767 No. 10220

I've been wondering about some features for a while. Has it been considered before to add support for albums? I know that files can be opened in separate file pages by right-clicking on them, or can even be dragged into other file pages. The problem is that their order within the page can't be changed, and the page needs to be kept open. My idea would be a file page that allows files to be manually ordered. It should then be possible to save the content of the window as an album file, which gets it's own hash and could have it's own tags, possibly with an option to automatically add the tags of files it contains. The big reason I'm asking for this feature is, even though it's awesome how Hydrus can automatically list all the files I want to see, sometimes I want to group and sort them in a way that can't be logically expressed in tags and ratings. E.g. I want to have some innocent pics in the list first, continue with some swimsuit pics, some softcore pics, gradually proceed to the hardcore stuff and add the creampie pics last. Another reason is that it's a bit of a hassle to use Hydrus for reading manga. First I search for it by the artist's name, then scroll down the tag list to find the "title:" tag, then right-click on the tag and open it in a new page, then change the sorting to x-x-x-volume-chapter-page. But even then it can happen that I have an edited page, e.g. an inofficial fan colored page of some manga. I would probably filter to include or exclude some tag to exclude that page, but it's all a few steps too many to just read some manga. I know that collections can simplify this, but I don't like using them because I can't get a quick overview of all pages if they're automatically grouped. But yeah, the first example is the true reason.

hydrus_dev Board Owner 10/09/2018 (Tue) 22:16:26 Id: 923859 No. 10222

>>10211 >>10209 Can you give me some example URLs for this, so I can try my end? >>10214 Ok, thank you. Let me know how this works for you. I am still a little afraid of this going wrong if I enable it for lots of people, but I could see enabling it maybe with a warning for advanced mode users only if it works well. >>10215 Thanks, this is a much better way of showing it. I'll try to fit in a checkbox to display this.

hydrus_dev Board Owner 10/09/2018 (Tue) 22:28:07 Id: 923859 No. 10223

>>10220 Thank you for this feedback. Grouping files has been a challenge from the start. I am not very happy with how collections and page sorting work for several reasons, but I definitely would like good solutions here. There are many reasons to combine and order files in certain ways. I have two main ideas: 1) reworking the thumbnail grid so it can handle metadata-based grouping. Say by creator tag or star rating–it could split the 600 thumbs in front of you into 20 groups of 30 with an actual band of blank pixels in between each group. The groups could have 'title' info in the blank area, and be quickly selectable as groups and maybe be collapsible (perhaps to a single thumbnail, a la current collections) and so on. I could see this replacing collections entirely. 2) Supporting cbr/cbz formats thoroughly. I don't like how page tags have worked out–they are just too finicky and fairly pointless in content and for searching. Knowing the page is '7' is not as important as knowing it comes after the one with '6' and before the one with '8', and trying to edit mistakes is hellish. cbr and cbz don't care about this–they just have filename order. Treating chapters/volumes/3-page-shorts as single files also makes more sense from processing and viewing workflows. You rarely want to read pages 220, 17, and 56 of a volume–you want to read them in order, and if you want to bookmark, that is better handled in a hydrus-level metadata that would work great on single file objects. Anyway, that's my current thinking. Better handling of 'alternates' in the duplicate system is also strongly related to this. I'd love to have danbooru-like 'file hints' when a file has parents/siblings, and generally extend the media viewer to be ok with a file having multiple pages or views, maybe having a two-dimensional navigation system (like left/right going from thumb to thumb, and up/down otherwise going between known file alternates or cbr/cbz pages).

Anonymous 10/09/2018 (Tue) 23:21:50 Id: 2883a6 No. 10224

>>10222 >Can you give me some example URLs for this, so I can try my end? I'm not sure it will matter. Like I said, the ones that aren't matching are images that have been around for a while. Here's a pixiv example: https://www.pixiv.net/member_illust.php?mode=medium&illust_id=40238805 Pics related match that for a source URL; the 247KB one I picked manually and imported over 2 years ago, the 240KB one replaced it at some point and got picked up when I added the artist to subscriptions. Not sure of the exact date because I deleted it and kept the 247KB one. Visiaully, I'm not seeing any differences. For the record, there are another 4 images by the same artist that have this "issue." All are years old, none appear any different, but the filesize is just a few KBs less. As for twitter, I kept the ones the downloader picked up over the ones I had laying around, so I'm don't really have any examples to hand you. The ones that mismatched were primarily, if not all, jpgs. I've got pngs that matched between the two, so maybe there was some sort of exif/metadata type thing going on with the jpegs? Same thing as pixiv though, no apparent differences but the manually grabbed images were just a few KBs more.

Anonymous 10/09/2018 (Tue) 23:27:07 Id: 2883a6 No. 10225

>>10222 >>10224 Also, if you're still looking for webms/mp4s that don't parse frames properly to test out, I've got a couple that don't clean up with the "reparse" option. Both play their animation super fast and then just sit on the final frame for the rest of their duration. The mp4 shows as 181 frames, the webm as 547 but they really shouldn't be longer than like 50 tops.

Anonymous 10/09/2018 (Tue) 23:28:26 Id: 2883a6 No. 10226

>>10225 …And naturally they display properly here. Go figure.

Anonymous 10/10/2018 (Wed) 12:12:48 Id: 64b25b No. 10230

>>10222 Thanks for your reply, I'll let you know. So far it hasn't been an issue, but I'll keep an eye on it. Can I suggest something else, while I'm at it? I've got quite a few thread watcher tabs, a quick filter would be super useful. I imagine this just as me starting to type in some tab, and it filters the displayed tabs at the top. So for example I could type "b/rule34" and it would show me all tabs that I renamed with that being part of the name. Currently searching for the correct tab when I want to add a new thread URL to my watcher is kind of annoying, I have to scroll through a rather long list.

Anonymous 10/10/2018 (Wed) 12:37:16 Id: 3fa767 No. 10231

>>10223 Smarter grouping and support for the comic book formats would be amazing and solve the manga problem. But what's your thought about the user made albums that I suggested? Even with better sorting it's nice to be able to make custom image sets. Ideally it should be possible to add an image to as many albums as you want, so I don't think it's a good idea to use the image's tags, or other meta data to achieve this, since this seems like it could get messy. I think the image's tags shouldn't "know" about the albums at all. Thinking of them as playlists is probably a good comparison. Thinking further, albums could even be automatically generated, as a way to archive the context of any multi-image post that's downloaded. E.g. many artists mainly post on twitter, but every so often make a pixiv post with all the twitter images. The images are part of that post, but also part of the previous twitter posts. Currently that context is mostly lost. Then you might find a thread on an image board that you want to archive. All images posted are part of that thread, but also exist independently from it. You already have half of the images in Hydrus, downloaded e.g. from pixiv, so they don't get archived again. The remaining images get downloaded from the image board and have a time stamp and tags from a repository added to them. If you then remember that one awesome pic from that thread, you will need to guess it's tags to find it again. Having an album of that thread it would be much easier to find it again.

Anonymous 10/10/2018 (Wed) 20:29:50 Id: 5a6573 No. 10233

>>10222 nice to know, sadly no release today but cant wait till the 17th for it. >>10214 with 4chan the two most heavily trafficked boards are /b/ and /v/ with one being 12-17 minute before bump off and 20-30 minutes for bump off. typically /b/ at least doesn't have thread pruning without reason, granted other boards have fairly shit mods who will hit everything no matter what. I find hydruses incremental checker good enough, with a 1 day max and 3 minute minimum, /b/ unless a happening happens, 3 minute static will catch everything. in the event you think you missed something, you can always go to the b archive, honestly even during birthday I don't think /b/ was moving that fast. on this note hdev, if you give us a fuck it filter, could it be a checkbox/dialouge box? something I can press, import and it unchecks? just a painless way to activate it would be good, as there are times I know a thread will get pruned but don't want to babysit it.

hydrus_dev Board Owner 10/10/2018 (Wed) 22:28:57 Id: 923859 No. 10235

>>10224 Ah, I understand the problem better now. Yeah, over long time-frames, some sites' storage is unreliable on a byte-for-byte basis. I have seen several different CDNs give different versions of the 'same file' over time and even at the same time. Usually the larger the site (and hence the more sophisticated CDN), the more likely they are to do this. Cloudflare is a significant source of this, and is used by more and more sites–as far as I can tell, they dynamically and non-deterministically serve 'optimised' versions of static images depending on which cache server you happen to hit. My guess here is that pixiv ran an optimisation pass on their storage in the past couple of years, saving that 7KB. I downloaded the same file from pixiv in my browser just now and got the 8a05 version. This problem is a big pain in the balls for us. I don't have a huge number of options here but to slowly improve the duplicate system in future. Ideally, we will eventually automate how dupes are merged, so you won't even notice this stuff, but we aren't there yet. >>10225 Thanks, these are great. I get the same issue. I don't have time right now to get into it, but I will run these through ffmpeg manually and see what kind of crazy header data is coming back.

Anonymous 10/11/2018 (Thu) 00:01:41 Id: 2883a6 No. 10236

>>10235 >This problem is a big pain in the balls for us. I don't have a huge number of options here but to slowly improve the duplicate system in future. Ideally, we will eventually automate how dupes are merged, so you won't even notice this stuff, but we aren't there yet. Personally, I'd rather it wasn't automated. Considering that even when booru images have their source listed, they might not be the the best versions, I prefer to use whatever I myself can pick up & verify. I've seen many cases of twitter images not be the :orig files, tumblrs not using raw (when that still worked) or even _1280. Add in that images get revisioned all the time (or even multiple revisions, which might not get listed) and unsourced images coming from who knows where, and it becomes questionable as to which hash/file is the "true" one. If everybody agrees to use the same image posted on the booru then tagging via PTR would be simple, but even the boorus don't agree on which hash is accurate. It's a bit of a pickle since any exif changes the hash, and particularly for un-sauced chan images where I would have no idea if the hash matches the "original" or not. If there was a way to objectively figure out which hash is the "best" version, and the PTR could sync appropriate tags between versions then that would narrow things a lot. Unfortunately, I kind of doubt it's anywhere near that cut and dry.

Anonymous 10/11/2018 (Thu) 03:54:15 Id: 54c791 No. 10238

>>10186 Hydrus doesn't work on x32/x86?

Anonymous 10/12/2018 (Fri) 09:36:14 Id: 5a6573 No. 10245

Got a question Of the big image dump places where artist/artists go you got boorus, and most of the big ones, you got hentai foundry, inkbunny, you got log in needed ones like pivix, but furaffinity is missing, there any reason or is it just down to you want to get a better login system working? I ask because, and I am making assumptions here, an artist I found and like posts there exclusively apparently, and apparently because they are barely furry, no one who regularly goes there posts their work anywhere else. they got 700-800 images, but the most any booru can pop up with is 100 and it's a bit annoying considering how absolutely garbage that site is for trying to get anything off of

hydrus_dev Board Owner 10/13/2018 (Sat) 18:34:30 Id: 923859 No. 10251

>>10236 Yeah, I wouldn't rush into any system like this, and I'd almost certainly start with it turned off by default. I intend to introduce really easy merge-decisions first, and only with very high confidence values. Things like relatively-pixel-perfect dupes with like 1024xXXX resolution vs 4300xYYYY, or thumbnail 150xXXX vs 2300xYYYY. Figuring out extremely similar jpgs vs pngs also seems possibly doable–I'd love to automatically clear out the bloated redundant Clipboardimage.png crap I get from imageboards for their reasonable jpg originals. Figuring out good decisions with watermark dupes and other more difficult scenarios would have to start as 'hints' to guide your human confirmation and potentially go further as we explore neural net tech. I do not have confidence that I can hardcode 'this image is better than this image' beyond very simple examples. My overall long-term feeling is that we are going to move away from quantitative hash-maps and towards qualitative knowledge about files. We'll stop sharing so much 'file x has tag y' and more 'this shape is a dog', but I cannot speak with any great amount of expertise yet.

hydrus_dev Board Owner 10/13/2018 (Sat) 18:37:00 Id: 923859 No. 10252

>>10238 Not the releases I build and put out. I am now x64 on all platforms (and a hydrus client that is doing a lot of modern 1080p webm shit will spike through 2GB memory limit no prob). You can try to run from source, but I suspect it would be more trouble than it is worth. https://hydrusnetwork.github.io/hydrus/help/running_from_source.html If you try it, let me know how you get on!

hydrus_dev Board Owner 10/13/2018 (Sat) 18:47:02 Id: 923859 No. 10253

>>10245 Yeah, the Hentai Foundry 'login' (which is really just a simulated click-through and 'allow all' filter set) and the Pixiv login were prototype hardcoded systems. They were enough of a hassle to maintain that I didn't go further. I am finally knuckling down and making the new system completely user-editable, like the new downloader stuff. I never wrote an FA parser because, afaik, you can't get anything without logging in, right? I understand some of the advanced users have been playing with making one and using the cookies.txt import system to log-in hydrus as their browsers, but I don't know much about it. I think Inkbunny and DeviantArt hide nsfw behind login, which are other situations we will fix in time. Aside: This new system will be roughly aware of what a login script gives access to, and will present that in the ui. I try to be completely content neutral in my hydrus work–furry isn't my taste, but I don't care if you like it. FA is a big site and has been requested plenty, so I am sure I will fold in a default parser and login script for it as the new login system becomes real. Let me know how it works for you.

Anonymous 10/14/2018 (Sun) 02:35:43 Id: 5a6573 No. 10259

>>10253 personally for me with how much monster girl shit gets made and put out, and how monsters get lumped in with furry more often than not, so long as the porn is good and not autistic, i'm sure we all know the difference there, I have no problem with it. artist im thinking of specifically right now has no one of fur affinity posting their content outside of it because it's not furry for the most part, so every other place that would normally have their shit just has a small old subsection of it and everywhere else they post themselves is a case of they got sick of posting there. ill be sure to try it out when it gets put in. but i'm curious, how will hydrus handle captchas? I know this is an issue with some places where they force a captcha regardless of how you log in.

hydrus_dev Board Owner 10/14/2018 (Sun) 16:49:44 Id: 923859 No. 10261

>>10259 This iteration won't support captchas, but it isn't actually that difficult to do. I supported it several aeons ago when hydrus had a 4chan dumper. Basically, you just fetch a jpg and a challenge key (some text that identifies the problem, and times out after x seconds) from the captcha server, present it to the user, and send back the answer and the same key in the same login form you put in the user/pass. Although some of the tech has moved on since then, so if it is anything like 'select all the road signs', I guess it is a more complicated proposition. For now, any complicated login will have to rely on the advanced cookies.txt import and just copy cookies from your active logged-in browser (I'll be doing some more work on this in the coming weeks, to make it more compatible).

Anonymous 10/15/2018 (Mon) 00:17:51 Id: 407bc8 No. 10262

Thank you based dev.

Anonymous 10/15/2018 (Mon) 23:27:08 Id: 64b25b No. 10265

Thank you based dev.

hydrus_dev Board Owner 10/16/2018 (Tue) 19:45:17 Id: 923859 No. 10269

>>10225 Right, I just looked into this, and ffmpeg was giving some funny framerates: mp4: 0.77 fps 20tbr webm: 2.5fps 50 tbr tbr is some kind of fps estimate ffmpeg is pulling from the codec or something. iirc, it stands for: T: Standards B: Engineers R: Are fucked in the head m8, big time I am not totally sure, but it looks like there is some variable frame rate going on when I look at them in MPC, so I guess it is even more complicated. In any case, I have rejiggered my parsing code to be less trustworthy of these values when they differ like this, and now the videos' frames are counted manually–I now get 7 and 13 respectively, with correct durations. So these should work better for you in tomorrow's release! As hopefully should the other various low-framerate low-frame-count vids that seem to render all in a rush and then hang on the last frame for ten secs. In v326, turn on help->advanced mode and then right-click these videos and hit reparse files and regen thumbs. They should fix up, at least a bit. Let me know if you run into any more trouble like this.

Anonymous 10/18/2018 (Thu) 17:07:11 Id: 3fa767 No. 10275

This week I started getting this error with pixiv subscriptions in version 325: Exception The subscription Pixiv encountered several errors when downloading files, so it abandoned its sync. Traceback (most recent call last): File "include/ClientImportSubscriptions.py", line 1125, in Sync File "include/ClientImportSubscriptions.py", line 479, in _WorkOnFiles Exception: The subscription Pixiv encountered several errors when downloading files, so it abandoned its sync. It downloads around 10 to 30 pics, then throws that exception. Is there any way to display the post IDs of failed downloads? I retried a few times, with one artist, and it didn't download any new pics on retry, then threw the exception again. So I assume it fails on the same posts again.

hydrus_dev Board Owner 10/20/2018 (Sat) 16:30:03 Id: 923859 No. 10291

>>10275 Pixiv are in the midst of changing some of their layout again. Please try again in v326, which will automatically update you to a hopefully more stable solution.

Index Catalog Archive Top Reply

Manage Board Moderate Board Moderate Thread

Forms

Delete

Password Unlink (Removes file reference from posts) Delete (Removes file from the server)

Report

Reason Category Global

No Cookies?

Quick Reply


Sage Bypass Check