/hydrus/ - Hydrus software optimization thread

/hydrus/ - Hydrus Network

Archive for bug reports, feature requests, and other discussion for the hydrus network.

Mode: Reply

Name
Options
Subject
Message	Max message length: 12000
files	Drag files here to upload or click here to select them 0.00 / 50.00 MB Max file size: 32.00 MB Total max file size: 50.00 MB Max files: 5 Supported file types: GIF, JPG, PNG, WebM, OGG, and more

E-mail
Password	(used to delete files and posts)
Misc

Remember to follow the Rules

The backup domains are located at 8chan.st and 8chan.cc. TOR access can be found here, or you can access the TOR portal from the clearnet at Redchannit 3.0 (Temporarily Dead).

Christmas Collaboration Event
Volunteers and Ideas Needed!

.se is now at .st!
Update your bookmarks

8chan.moe is a hobby project with no affiliation whatsoever to the administration of any other "8chan" site, past or present.

Hydrus software optimization thread Anonymous 06/07/2018 (Thu) 02:07:57 Id: 1e8781 No. 9068

ITT: create proposals for making Hydrus more optimized. Proposal: Why can't Hydrus switch to MariaDB? If it is faster, then it should be better. The only trouble is having the need to rewrite the queries, which from an SQL standpoint should be a non-issue, right? List of Databases with Open Source License and Open Source APIs: SQLite - Currently used in Hydrus, has minimal features MySQL - A more well-rounded SQL Database with user management PostgreSQL - An SQL with complex features with less performance MariaDB - SQL/NoSQL database with heavy optimizations ElasticSearch - A literal search engine instead of a normal Database Teradata - IDK https://www.digitalocean.com/community/tutorials/sqlite-vs-mysql-vs-postgresql-a-comparison-of-relational-database-management-systems https://www.infoworld.com/article/2611812/mysql/mysql-face-off--mysql-or-mariadb-.html

Anonymous 06/07/2018 (Thu) 02:08:57 Id: 1e8781 No. 9069

>>9068 More articles https://www.linuxjournal.com/content/mariadbmysql-postgresql-and-sqlite3-comparing-command-line-interfaces

Anonymous 06/07/2018 (Thu) 21:18:11 Id: f8f5ac No. 9077

You're aware MariaDB and the like are server software and would rely starting a second process alongside the hydrus client, right? SQLite is the only one who supports being loaded from a file within an application, which makes it the best -might even say only- fit for desktop software like this.

Anonymous 06/09/2018 (Sat) 07:44:06 Id: 89d121 No. 9087

>>9077 Are there other SQL-like software that loads like a file while still out performing SQLite?

Anonymous 06/09/2018 (Sat) 20:52:53 Id: f8f5ac No. 9092

>>9087 take your pick: https://en.wikipedia.org/wiki/Embedded_database mongo or levelDB are good, but they're noSQL and would require extensive query rewrites - I'm also pretty sure hydrus benefits more from the relational database model, which those don't provide.

Anonymous 06/10/2018 (Sun) 04:33:17 Id: 21844b No. 9094

Honestly, there are lower fruit to pick in order to optimize Hydrus before even touching its database. After the initial processing of mappings, the bulk of I/O access is spent on the files themselves which AFAIK is single-threaded.

Anonymous 06/10/2018 (Sun) 09:06:25 Id: 89d121 No. 9096

>>9094 Multi-threaded Python won't end too well… Some say Go or Rust, but I know it is a meme to rewrite everything.

Anonymous 06/12/2018 (Tue) 18:45:40 Id: 5105b1 No. 9112

>>9077 kde/plasma also starts up a mysql/mariadb instance for everything pim related and users hate it because they never managed to write their software in a way that wouldn't crash the database. All in all, i think the startup time required for a mysqlish database is negligible on a modern system but the amount of code required to make it act like a embedded database is astronomical and the exact opposite of what this project needs.

Anonymous 06/13/2018 (Wed) 03:38:55 Id: 9b4af4 No. 9117

What about using FreeNAS in conjunction with Hydrus for ZFS-like performance? Or is there a distro that is best suited for image and file hoarding with RAID-like redundency?

Anonymous 06/13/2018 (Wed) 03:40:34 Id: 9b4af4 No. 9118

>>9112 Well can we layout a pros vs cons of Embedded Database vs Optimized database like MariaDB?

Anonymous 06/13/2018 (Wed) 10:12:58 Id: 833c67 No. 9120

Would it be possible to use some ORM library for SQL and let user choose SQL backend?

Anonymous 06/13/2018 (Wed) 10:56:45 Id: 49430e No. 9121

I would not mind runing mariadb daemon for hydrus. In fact, i am running one right now, and it would be great if i could set hydrus up to just connect to an existing database.

Anonymous 06/14/2018 (Thu) 01:11:56 Id: 9b4af4 No. 9123

>>9068 What about file system parities? Would installing Hydrus on FreeNAS with ZFS be a good idea? What about Linux with BTRFS?

Anonymous 06/19/2018 (Tue) 07:37:42 Id: b241f3 No. 9182

>>9123 >>9117 You can't "partition" ZFS https://forums.freenas.org/index.php?threads/partitioning-harddrive-partitions-into-small-portions.10470/ More info: https://plone.lucidsolutions.co.nz/storage/network/freenas/how-freenas-paritions-zfs-disks More info: https://tentacles666.wordpress.com/2014/06/02/freenas-creating-zfs-zpools-using-partitions-on-mixed-sized-disks/ And RAM corruption is a small issue https://blog.briancmoses.com/2014/03/why-i-chose-non-ecc-ram-for-my-freenas.html More info: http://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-your-data/ More info: https://www.reddit.com/r/homelab/comments/3vrs7f/will_corruption_occur_using_non_ecc_memory_on/ More info: https://linustechtips.com/main/topic/577401-do-you-really-need-ecc-ram-for-a-home-nas/

Anonymous 06/19/2018 (Tue) 08:27:14 Id: b241f3 No. 9183

http://www.freenas.org/blog/a-complete-guide-to-freenas-hardware-design-part-i-purpose-and-best-practices/ http://www.freenas.org/blog/a-complete-guide-to-freenas-hardware-design-part-ii-hardware-specifics/ http://www.freenas.org/blog/a-complete-guide-to-freenas-hardware-design-part-iii-pools-performance-and-cache/ http://www.freenas.org/blog/a-complete-guide-to-freenas-hardware-design-part-iv-network-notes-conclusion/ http://www.freenas.org/blog/freenas-worst-practices/ Some of the points: 1. 8GB of RAM minimum, 12GB minimum if using plugins or jails, 1GB RAM per 1TB (conservative) or 3TB (liberal) 2. Don't use RAID controllers, just use Hot Bus Adapters to connect the drives to the motherboard (software "RAID") 3. FreeNAS needs bare metal, NOT VMs (but putting plugins or jails into FreeNAS is a good idea) 4. Intel CPU has more support than AMD, and LSI has the best Hot Bus Adapters (Marvell and J-Micron is okay) 5. 7200 RPM SAS or Enterprise SATA will work as HDD, do not use desktop drives for this to prevent IO errors 6. RAIDZ1 is like RAID 5, RAIDZ2 is like Z6, RAIDZ3 has triple parity, each vdev/group only has one-drive speeds 7. "ZFS intent log" should be on RAM (and on power-protected SSD if you wish), without it the whole vdev would fail

SQL code optimization Anonymous 06/21/2018 (Thu) 04:39:04 Id: 1e8781 No. 9217

https://ponyorm.com/ can actually simplify SQL queries into something more python-friendly.

De-duplication optimization Anonymous 06/27/2018 (Wed) 06:42:41 Id: 0869af No. 9281

Anonymous 07/04/2018 (Wed) 03:09:06 Id: fb9533 No. 9323

>>9068 >PostgreSQL - An SQL with complex features with less performance t. Uber

Anonymous 07/07/2018 (Sat) 10:09:11 Id: c4095c No. 9348

>>9068 How about rewriting this in q++/qt? If you limit yourself to qt syntax then it is surprisingly similar to python with some c++ quirks. It's easier doing multi threading and starting separate processes in this than in python. Something like this could be easiest done this way: >Make code modular and switch the GUI to PyQt/pyside while still using python. >Experiment with the GUI code, perhaps try using QML to facilitate the GUI proposals from that one anon that made all the cool mockups. See >>8185 >Debate if it is even required to switch to c++ anymore since many qt goodies can be used via above mentioned libraries(threading/process starting/native notifications, etc). I haven't taken a look at the code but if it is already written modular then this shuldn't be too hard if the dev can stay motivated and people can life with a few months of only critical bug fixing.

Anonymous 07/14/2018 (Sat) 04:12:21 Id: 02d3aa No. 9391

>>9348 That is the issue, the dev is trying to migrate from wxPython to PyQt after the downloader overhaul, along with other key functions like parallel downloads, workflow management and mobile integration.

Anonymous 07/23/2018 (Mon) 08:41:35 Id: be1efa No. 9464

Bumping

Anonymous 08/01/2018 (Wed) 12:27:06 Id: d8220f No. 9530

>>9464 Yes but why?

Anonymous 08/12/2018 (Sun) 05:43:03 Id: cfc291 No. 9658

As mentioned by >>9094 the bottleneck is mostly how the I/O and CPU is handled by hydrus. Imports are done sequentially when they can be sped up a lot by using multiprocessing. I'm sure other actions are still done sequentially too. A transition to a graph database like ArangoDB could be better in the long run, but that's never going to happen. Looking at the client.master.db database, I'm not sure why he added an index to the md5, sha1 and sha512 columns but not to the subtag or namespace columns. Doesn't make sense to me (and is the sha512 index really necessary?). Also it boggles my mind that foreign keys aren't being used at all.

Anonymous 08/12/2018 (Sun) 06:28:15 Id: bd599a No. 9659

>>9658 I am also expecting multi-threading could be a place where we can optimise the code (since most computers now run on 4/8 cores). Perhaps SQLite, MD5/SHA hashing and de-duplication are not made for multi-core and/or GPU computers.

Anonymous 08/12/2018 (Sun) 06:57:30 Id: cfc291 No. 9660

>>9659 >multi-threading Python threads are all executed on the same core. That's why I said multiprocessing. It spreads out each subprocess across each core. Based on your post you don't know much about software, so think of a subprocess in python like a normal thread. >are not made for multi-core and/or GPU computers Everything you've mentioned can be easily sped up with multiple cores. Using a GPU would be even faster but there's no point in using that here. I'm actually pretty surprised he hasn't implemented multiprocessing functions in bottleneck situations like importing. It's very easy to split up the work once you've scanned all the files. You just divide them up by the number of cores and have each subprocess do that portion of the work. If you have 4 cores you have each core do 1/4 of the files you want to import.

Anonymous 08/12/2018 (Sun) 08:31:23 Id: bd599a No. 9661

>>9660 >Python threads are all executed on the same core. That's why I said multiprocessing Well due to people call 4 core Intel CPUs having "hyperthreads" making it 8 virtual cores, I would say that is easy to have those things mixed up. If I have to use a proper term Parallel Programming (as in Concurrency) would be more fitting. >Everything you've mentioned can be easily sped up with multiple cores I meant that it has not been implemented yet by the dev since (s/are not/has not been/) >I'm actually pretty surprised he hasn't implemented multiprocessing functions

Anonymous 08/13/2018 (Mon) 06:56:41 Id: 813085 No. 9670

>>9660 Also https://medium.com/@bfortuner/python-multithreading-vs-multiprocessing-73072ce5600b https://luckypants.weebly.com/subprocesses-and-multithreading.html

Hydrus decentralization and dapp Anonymous 09/04/2018 (Tue) 03:27:45 Id: 6bf834 No. 9881

Considering the recent happenings of Tumblr and booru.org purges, it is important to put focus on alternative decentralization libraries. 1. free P2P software a. BitTorrent - Most commonly used, but can't handle individual files b. WebTorrent - WebRTC version of BitTorrent, but still have the same issue c. eDonkey and GNUtella - both very obscure, not really useful or adaptive d. IPFS - currently used in Hydrus, can handle singular files in a folder structure 2. Proxies and psuedo-VPNs a. TOR - very common, maybe pozzed by CIA, has BitTorrent and IPFS compatibility (OpenBazaar) b. I2P - less common, not pozzeed, has BitTorrent compatibility, IPFS is in the works (go-i2p) c. Freenet and Retroshare - both very uncommon, has file transferring and chats as a primitive d. Zeronet - pretty dead, works with Javascript, too many unknowns 3. Blockchain data solutions (https://en.wikipedia.org/wiki/Cooperative_storage_cloud) a. Filecoin - based in IPFS, slowly developing, could be used in conjunction with Hydrus b. Sia - top data blockchain contender, has smart contracts with regular renewal for storage (https://sia.tech/) c. MaidSafe - possible competition, includes secure communication and storage (https://maidsafe.net/) d. Storj - noted, already have average pricing, made to be used along side self-host cloud (https://storj.io/) e. Ethereum Swarm - note really a good idea as the blockchain is congested by CryptoCats f. Others include https://decent.ch/ https://www.creativechain.org/ https://contentbox.one/ https://noia.network/ Others: https://cryptoslate.com/category/cryptos/storage/

Anonymous 09/04/2018 (Tue) 03:57:31 Id: 6bf834 No. 9882

4. Social media blockchain a. Steem - used in alt-media like bitchute, dtube and steemit (https://steem.io/) b. Rocketchat - used by the furrires to commuitcate (https://rocket.chat/) c. SocialX - at a whitepaper stage, to replace facebook and twitter (https://socialx.network/) d. Akasha - based in IPFS, meant to replace Tumblr (https://akasha.world/) e. BAT Token - used by Brave Browser (https://basicattentiontoken.org/) Others https://foresting.io/ and https://sola.foundation/ and https://www.synereo.com/ https://www.stateofthedapps.com/dapps/tagged/social/tab/most-relevant

Anonymous 09/04/2018 (Tue) 13:19:59 Id: 9f26dd No. 9884

>>9881 >booru.org purges What do you mean?

Anonymous 09/05/2018 (Wed) 03:59:52 Id: 6bf834 No. 9886

>>9884 Gelbooru and *.booru.org are hosted in the Netherlands, and they are using "anti-loli laws as an excuse" to force a purge on the admins.

Anonymous 09/25/2018 (Tue) 13:32:11 Id: 833c67 No. 10077

Do you know how can I convert hydrus db to postgresql? Hydrus db consists of multiple sqlite files, how can I connect all of them?

Search improvements with fuzzy search Anonymous 10/10/2018 (Wed) 18:53:39 Id: 7d7b19 No. 10232

Icon selection Anonymous 10/13/2018 (Sat) 12:05:56 Id: 9c2ceb No. 10247

1. Having icon representation of major functions in the menu bar and buttons 2. Possibly expanding on famfamfam https://github.com/ionic-team/ionicons https://github.com/yusukekamiyamane/fugue-icons https://github.com/FortAwesome/Font-Awesome https://github.com/Templarian/MaterialDesign https://github.com/linea-io/Linea-Iconset https://twitter.github.io/twemoji/ https://xtoolkit.github.io/Micon/ https://github.com/google/material-design-icons https://github.com/legomushroom/iconmelon

Anonymous 10/17/2018 (Wed) 15:41:06 Id: 7d7b19 No. 10272

>>10232 https://searchcode.com/codesearch/raw/42426693/ and http://archive.fo/ilT6P has a JS version of the Metaphone3 algorithm

Audio fingerprints/hashes Anonymous 10/20/2018 (Sat) 10:44:51 Id: 6f19b2 No. 10290

Anonymous 10/25/2018 (Thu) 06:51:21 Id: ec1fb1 No. 10361

>>9281 https://vision.fe.uni-lj.si/cvww2016/proceedings/papers/04.pdf (Quantitative Comparison of Feature Matchers Implemented in OpenCV3) https://sci-hub.tw/10.1109/m2vip.2016.7827292 (Comparison of OpenCV’s Feature Detectors and Feature Matchers)

Anonymous 10/25/2018 (Thu) 06:54:13 Id: ec1fb1 No. 10362

>>10361 SIFT https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_feature2d/py_sift_intro/py_sift_intro.html SURF https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_feature2d/py_surf_intro/py_surf_intro.html FAST https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_feature2d/py_fast/py_fast.html BRIEF https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_feature2d/py_brief/py_brief.html ORB https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_feature2d/py_orb/py_orb.html

Anonymous 11/10/2018 (Sat) 19:57:09 Id: 978c9b No. 10599

>>10361 Got some more comparative papers 4U https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8346440 (A Comparative Analysis of SIFT, SURF, KAZE, AKAZE, ORB, and BRISK)

Tag correlation Anonymous 11/21/2018 (Wed) 04:58:03 Id: 1e8781 No. 10742

https://en.wikipedia.org/wiki/Pointwise_mutual_information Pointwise mutual information between tag X and tag Y is the logarithm of (num. of images with both tags) * (total image count) / ((num of images with tag X) * (num of images with Tag Y)) PMI can be used to find possible tag siblings https://en.wikipedia.org/wiki/Conditional_entropy Conditional entropy of X given Y is ( (num. of images with both tags) / (total image count) ) * logarithm of ( (num of images with tag X) / (num. of images with both tags) ) CE can be used to find possible tag parents and children

Language and package-specific optimizations Anonymous 11/28/2018 (Wed) 17:39:24 Id: b990dc No. 10805

Nim is low-level Python, Crystal is low-level Ruby, both would be easy for the rest of us (and hopefully the dev) to pick up. Doing so would mean that Hydrus would be at least twice as fast in certain departments when compared to non-NumPy Python. (Also D is a C replacement, Go and Kotlin are Java replacements, but those are very different from the syntax of Python) Are there applications where low-level languages DON'T apply? Math calculations, in that case use SciPy/NumPy for less work. Some benchmarks: https://github.com/kostya/benchmarks https://github.com/drujensen/fib https://github.com/frol/completely-unscientific-benchmarks https://github.com/logicchains/LPATHBench

Anonymous 11/29/2018 (Thu) 14:25:59 Id: aa7425 No. 10819

>>10805 https://github.com/yglukhov/nimpy is for connecting Nim to Python https://nim-lang.org/docs/httpclient.html is the new URLLib https://nim-lang.org/docs/htmlparser.html is the new BeautifulSoup For more: https://nim-lang.org/docs/lib.html

Anonymous 12/15/2018 (Sat) 16:20:14 Id: b990dc No. 11022

>>10272 https://searchcode.com/codesearch/raw/2366000/ and http://archive.fo/4Phr9 has the Java version

Anonymous 12/15/2018 (Sat) 16:53:55 Id: b990dc No. 11023

>>10232 For Japanese fuzzy search you can use these to get the kana https://github.com/atilika/kuromoji (Java) https://github.com/takuyaa/kuromoji.js (JS) https://github.com/taku910/mecab (C++) https://github.com/ikawaha/kagome (Go) https://github.com/mocobeta/janome (Python) https://github.com/ku-nlp/jumanpp (C++)

Anonymous 12/19/2018 (Wed) 04:52:09 Id: a72330 No. 11053

>>10290 >https://github.com/acoustid/acoustid-index (C++) You're looking for https://github.com/acoustid/chromaprint (C++) To be honest though when Hydrus starts doing audio fingerprinting it should probably just use acoustid so it can grab tags from MusicBrainz ( https://musicbrainz.org/ )

Anonymous 12/19/2018 (Wed) 08:41:17 Id: 2f2eb0 No. 11058

>>11053 Or maybe others as well? What if we are getting music from torrents instead and don't want MusicBrainz to know that I got them? Bumping to spark conversation >>10232 http://www.scitepress.org/Papers/2016/59263/59263.pdf (Performance Evaluation of Phonetic Matching Algorithms on English Words and Street Names) More benchmarks for major phonetic algorithms

Anonymous 12/28/2018 (Fri) 19:09:14 Id: 5cfb09 No. 11133

>>9068 >PostgreSQL - An SQL with complex features with less performance 1998 wants it retard memes back.

Anonymous 01/07/2019 (Mon) 11:17:27 Id: e73dfb No. 11204

>>11023 >implying

Anonymous 01/08/2019 (Tue) 02:07:54 Id: 1e8781 No. 11206

>>11204 How so? Too many onyomi and kunyomi? Even then if we are not using phonetic fuzzy search, string fuzzy search can still be used (see https://en.wikipedia.org/wiki/String_metric)

Anonymous 01/19/2019 (Sat) 12:39:35 Id: ac7c72 No. 11380

>>11053 https://musicbrainz.org/doc/Other_Databases I find that https://www.discogs.com/ and http://www.freedb.org/ are still alive so what about those? https://github.com/discogs/discogs_client could be good for example.

Anonymous 02/11/2019 (Mon) 07:58:55 Id: d46cda No. 11586

>>9281 https://github.com/rachmadaniHaryono/transformationInvariantImageSearch Looks like our men are getting into this

are these articles any good? Anonymous 03/18/2019 (Mon) 11:33:02 Id: f06e36 No. 11927

https://hackernoon.com/why-is-python-so-slow-e5074b6fe55b https://medium.freecodecamp.org/if-you-have-slow-loops-in-python-you-can-fix-it-until-you-cant-3a39e03b6f35 https://metarabbit.wordpress.com/2018/02/05/pythons-weak-performance-matters/ https://blog.codinghorror.com/the-infinite-space-between-words/ https://www.prowesscorp.com/computer-latency-at-a-human-scale/

Anonymous 04/18/2019 (Thu) 17:31:27 Id: 0a99e5 No. 12295

Anonymous 04/19/2019 (Fri) 18:10:30 Id: 0b5902 No. 12302

>>12295 Why don't you actually develop something on your own instead of endlessly shitting out github links

Anonymous 04/20/2019 (Sat) 04:28:38 Id: 0a99e5 No. 12307

>>12302 Nah that is for >>12277

Index Catalog Archive Top Reply

Manage Board Moderate Board Moderate Thread

Forms

Delete

Password Unlink (Removes file reference from posts) Delete (Removes file from the server)

Report

Reason Category Global

No Cookies?

Quick Reply


Sage Bypass Check