/hydrus/ - Hydrus Network

Archive for bug reports, feature requests, and other discussion for the hydrus network.

Index Catalog Archive Bottom Refresh
Name
Options
Subject
Message

Max message length: 12000

files

Max file size: 32.00 MB

Total max file size: 50.00 MB

Max files: 5

Supported file types: GIF, JPG, PNG, WebM, OGG, and more

E-mail
Password

(used to delete files and posts)

Misc

Remember to follow the Rules

The backup domains are located at 8chan.se and 8chan.cc. TOR access can be found here, or you can access the TOR portal from the clearnet at Redchannit 3.0.

Uncommon Time Winter Stream

Interboard /christmas/ Event has Begun!
Come celebrate Christmas with us here


8chan.moe is a hobby project with no affiliation whatsoever to the administration of any other "8chan" site, past or present.

(12.55 KB 340x175 mariadb-usa-inc.png)

Hydrus software optimization thread Anonymous 06/07/2018 (Thu) 02:07:57 Id: 1e8781 No. 9068
ITT: create proposals for making Hydrus more optimized. Proposal: Why can't Hydrus switch to MariaDB? If it is faster, then it should be better. The only trouble is having the need to rewrite the queries, which from an SQL standpoint should be a non-issue, right? List of Databases with Open Source License and Open Source APIs: SQLite - Currently used in Hydrus, has minimal features MySQL - A more well-rounded SQL Database with user management PostgreSQL - An SQL with complex features with less performance MariaDB - SQL/NoSQL database with heavy optimizations ElasticSearch - A literal search engine instead of a normal Database Teradata - IDK https://www.digitalocean.com/community/tutorials/sqlite-vs-mysql-vs-postgresql-a-comparison-of-relational-database-management-systems https://www.infoworld.com/article/2611812/mysql/mysql-face-off--mysql-or-mariadb-.html
>>9348 That is the issue, the dev is trying to migrate from wxPython to PyQt after the downloader overhaul, along with other key functions like parallel downloads, workflow management and mobile integration.
Bumping
>>9464 Yes but why?
As mentioned by >>9094 the bottleneck is mostly how the I/O and CPU is handled by hydrus. Imports are done sequentially when they can be sped up a lot by using multiprocessing. I'm sure other actions are still done sequentially too. A transition to a graph database like ArangoDB could be better in the long run, but that's never going to happen. Looking at the client.master.db database, I'm not sure why he added an index to the md5, sha1 and sha512 columns but not to the subtag or namespace columns. Doesn't make sense to me (and is the sha512 index really necessary?). Also it boggles my mind that foreign keys aren't being used at all.
>>9658 I am also expecting multi-threading could be a place where we can optimise the code (since most computers now run on 4/8 cores). Perhaps SQLite, MD5/SHA hashing and de-duplication are not made for multi-core and/or GPU computers.
>>9659 >multi-threading Python threads are all executed on the same core. That's why I said multiprocessing. It spreads out each subprocess across each core. Based on your post you don't know much about software, so think of a subprocess in python like a normal thread. >are not made for multi-core and/or GPU computers Everything you've mentioned can be easily sped up with multiple cores. Using a GPU would be even faster but there's no point in using that here. I'm actually pretty surprised he hasn't implemented multiprocessing functions in bottleneck situations like importing. It's very easy to split up the work once you've scanned all the files. You just divide them up by the number of cores and have each subprocess do that portion of the work. If you have 4 cores you have each core do 1/4 of the files you want to import.
>>9660 >Python threads are all executed on the same core. That's why I said multiprocessing Well due to people call 4 core Intel CPUs having "hyperthreads" making it 8 virtual cores, I would say that is easy to have those things mixed up. If I have to use a proper term Parallel Programming (as in Concurrency) would be more fitting. >Everything you've mentioned can be easily sped up with multiple cores I meant that it has not been implemented yet by the dev since (s/are not/has not been/) >I'm actually pretty surprised he hasn't implemented multiprocessing functions
Considering the recent happenings of Tumblr and booru.org purges, it is important to put focus on alternative decentralization libraries. 1. free P2P software a. BitTorrent - Most commonly used, but can't handle individual files b. WebTorrent - WebRTC version of BitTorrent, but still have the same issue c. eDonkey and GNUtella - both very obscure, not really useful or adaptive d. IPFS - currently used in Hydrus, can handle singular files in a folder structure 2. Proxies and psuedo-VPNs a. TOR - very common, maybe pozzed by CIA, has BitTorrent and IPFS compatibility (OpenBazaar) b. I2P - less common, not pozzeed, has BitTorrent compatibility, IPFS is in the works (go-i2p) c. Freenet and Retroshare - both very uncommon, has file transferring and chats as a primitive d. Zeronet - pretty dead, works with Javascript, too many unknowns 3. Blockchain data solutions (https://en.wikipedia.org/wiki/Cooperative_storage_cloud) a. Filecoin - based in IPFS, slowly developing, could be used in conjunction with Hydrus b. Sia - top data blockchain contender, has smart contracts with regular renewal for storage (https://sia.tech/) c. MaidSafe - possible competition, includes secure communication and storage (https://maidsafe.net/) d. Storj - noted, already have average pricing, made to be used along side self-host cloud (https://storj.io/) e. Ethereum Swarm - note really a good idea as the blockchain is congested by CryptoCats f. Others include https://decent.ch/ https://www.creativechain.org/ https://contentbox.one/ https://noia.network/ Others: https://cryptoslate.com/category/cryptos/storage/
4. Social media blockchain a. Steem - used in alt-media like bitchute, dtube and steemit (https://steem.io/) b. Rocketchat - used by the furrires to commuitcate (https://rocket.chat/) c. SocialX - at a whitepaper stage, to replace facebook and twitter (https://socialx.network/) d. Akasha - based in IPFS, meant to replace Tumblr (https://akasha.world/) e. BAT Token - used by Brave Browser (https://basicattentiontoken.org/) Others https://foresting.io/ and https://sola.foundation/ and https://www.synereo.com/ https://www.stateofthedapps.com/dapps/tagged/social/tab/most-relevant
>>9881 >booru.org purges What do you mean?
>>9884 Gelbooru and *.booru.org are hosted in the Netherlands, and they are using "anti-loli laws as an excuse" to force a purge on the admins.
Do you know how can I convert hydrus db to postgresql? Hydrus db consists of multiple sqlite files, how can I connect all of them?
>>9281 https://vision.fe.uni-lj.si/cvww2016/proceedings/papers/04.pdf (Quantitative Comparison of Feature Matchers Implemented in OpenCV3) https://sci-hub.tw/10.1109/m2vip.2016.7827292 (Comparison of OpenCV’s Feature Detectors and Feature Matchers)
>>10361 Got some more comparative papers 4U https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8346440 (A Comparative Analysis of SIFT, SURF, KAZE, AKAZE, ORB, and BRISK)
https://en.wikipedia.org/wiki/Pointwise_mutual_information Pointwise mutual information between tag X and tag Y is the logarithm of (num. of images with both tags) * (total image count) / ((num of images with tag X) * (num of images with Tag Y)) PMI can be used to find possible tag siblings https://en.wikipedia.org/wiki/Conditional_entropy Conditional entropy of X given Y is ( (num. of images with both tags) / (total image count) ) * logarithm of ( (num of images with tag X) / (num. of images with both tags) ) CE can be used to find possible tag parents and children
Nim is low-level Python, Crystal is low-level Ruby, both would be easy for the rest of us (and hopefully the dev) to pick up. Doing so would mean that Hydrus would be at least twice as fast in certain departments when compared to non-NumPy Python. (Also D is a C replacement, Go and Kotlin are Java replacements, but those are very different from the syntax of Python) Are there applications where low-level languages DON'T apply? Math calculations, in that case use SciPy/NumPy for less work. Some benchmarks: https://github.com/kostya/benchmarks https://github.com/drujensen/fib https://github.com/frol/completely-unscientific-benchmarks https://github.com/logicchains/LPATHBench
>>10290 >https://github.com/acoustid/acoustid-index (C++) You're looking for https://github.com/acoustid/chromaprint (C++) To be honest though when Hydrus starts doing audio fingerprinting it should probably just use acoustid so it can grab tags from MusicBrainz ( https://musicbrainz.org/ )
>>11053 Or maybe others as well? What if we are getting music from torrents instead and don't want MusicBrainz to know that I got them? Bumping to spark conversation >>10232 http://www.scitepress.org/Papers/2016/59263/59263.pdf (Performance Evaluation of Phonetic Matching Algorithms on English Words and Street Names) More benchmarks for major phonetic algorithms
>>9068 >PostgreSQL - An SQL with complex features with less performance 1998 wants it retard memes back.
(1.57 KB 300x300 下.png)

>>11023 >implying
>>11204 How so? Too many onyomi and kunyomi? Even then if we are not using phonetic fuzzy search, string fuzzy search can still be used (see https://en.wikipedia.org/wiki/String_metric)
Here are a list of "expert system" Video Quality Quantifier https://github.com/Netflix/vmaf (C/C++/Python) https://github.com/aizvorski/video-quality (Python) https://github.com/bavc/qctools (C++) https://github.com/Rolinh/VQMT (C++) https://github.com/google/rtc-video-quality (Python) https://github.com/kahkeng/vqats (C/C++) https://github.com/slhck/ffmpeg-quality-metrics (Python) https://github.com/honzabilek4/VideoCodecs (C++) https://github.com/jsyzgaochao/iqat (C++) Here are a list of "expert system" Image Quality Quantifier https://github.com/andrewekhalel/sewar (Python) https://github.com/jeffh/CV-Image-Quality-Analysis (Python) https://github.com/VIQET/VIQET-Desktop (C++/C#) https://github.com/arcaduf/image_quality_assessment (Python) https://github.com/bukalapak/pybrisque (Python) https://github.com/pby5/BRISQUE (C++) https://github.com/mchall/ImageQuality (C#) https://github.com/mtobeiyf/CEIQ (C/MATLAB) https://github.com/realwecan/BlindImageQualityAssessment (C++) https://github.com/grevutiu-gabriel/iqa (C/MATLAB) https://github.com/ruofeidu/ImageQualityCompare (C++) https://github.com/henrikjohansson/Colorite (Java/C++) And for NN-based Image Video Quantity Quantifier… sigh https://github.com/idealo/image-quality-assessment (348 stars) https://github.com/jongyookim/IQA_BIECON_release (41 stars) https://github.com/jongyookim/IQA_DeepQA_FR_release (32 stars) https://github.com/lidq92/CNNIQA (29 stars) https://github.com/lidq92/CNNIQAplusplus (28 stars) https://github.com/HC-2016/weighted_DCNN_IQA (17 stars) https://github.com/lidq92/WaDIQaM (17 stars) https://github.com/zwx8981/DBCNN-PyTorch (10 stars) https://github.com/VideoForage/VQA-Deep-Learning (10 stars) https://github.com/synckey/deep_biq (9 stars) https://github.com/zhl2007/pytorch-image-quality-param-ctrl (9 stars) https://github.com/michaelneuder/image_quality_analysis (9 stars) https://github.com/hervindphil/image_quality (8 stars) https://github.com/SenJia/Saliency-CNN-Image-Quality-Assessment (8 stars) https://github.com/pcpmartins/video-quality-assessment (8 stars) https://github.com/kamballu/HDR-NRIQA-PCNN (5 stars) https://github.com/JayMarx/VSBIQA (5 stars) https://github.com/etosworld/etos-image-assessment (3 stars) https://github.com/geosrs/transIQA (3 stars) https://github.com/Bobholamovic/CNN-FRIQA (3 stars) https://github.com/LeonLIU08/DeepQA-with-Pytorch (3 stars)
>>12295 Why don't you actually develop something on your own instead of endlessly shitting out github links
>>12302 Nah that is for >>12277


Forms
Delete
Report
Quick Reply