/hdg/ - Anime Screenshot Pipeline for Building Datasets

Name
Options
Subject
Message	Max message length: 0/12000
files	Drag files here to upload or click here to select them 0.00 / 50.00 MB Max file size: 15.00 MB Total max file size: 50.00 MB Max files: 4 Supported file types: GIF, JPG, PNG, WebM, OGG, and more

E-mail
Password	(used to delete files and posts)
Misc

Anime Screenshot Pipeline for Building Datasets Anonymous 01/29/2023 (Sun) 00:44:25 No. 1209 >>15601

This was mentioned a couple times on the main general, and while its pretty messy, some of the pieces on their own do work, just nothing to make it a straight automated pipeline or even something hassle free to run that is turn key at the moment. I figured that since everyone here is more less more determined and committed to the craft, that maybe we could get some of the best minds, and a little push from ChatGPT, to get this working to help streamline the process of turning anime episodes into datasets. https://github.com/cyber-meow/anime_screenshot_pipeline Let me provide some of my notes and observations from what I have done so far: With frame extraction, as stated in the github, you are turning a 24 minute animation of about 34k frames and condensing it to an average of 4k/6k/9k non-frozen/dead frames, depending on the show, episode, studio, or era of said source. The work is being done by ffmpeg's `mpdecimate` which's purpose is to "drop frames that do not differ greatly from the previous frame in order to reduce frame rate." The frame extraction command with ffmpeg provided in the github works fine, the issue is that git maker's bulk file script, `extract_frames.py `, doesn't play nice and only produces the folders while the ffmpeg script fails to execute. I did consider that video file syntax could possibly be a culprit for the script to function based on some previous errors I ran into, but it's not an issue running ffmpeg so I side stepped the bulk script. Since I already compiled the datasets I'm currently working on from manually running the command, I haven't had the need to go back and retry the script with any modifications. ChatGPT did offer some suggestions, but required me to provide a copy of the output to review which I no longer had and didn't have time to go and reproduce. Similar Image Removal, the base application running the filter is called `FiftyOne`, a "computer vision model" used for collecting databases, with its recent use being to build clean visual databases for vehicle autopilot AI to use. Using `remove_similar.ipynb` in Jupyter Notebook, a second round of filtering that will remove duplicate, very similar frames of a certain threshold, across the entire dataset, instead of just the sequential frames of mpdecimate. This would be cases when the animation is stretched out during talking scenes where only the mouth moves, standing shots where the camera isn't being panned, etc. The script has a default threshold of `0.985` value of what is considered a duplicate, but I've noticed that even at this value some frames were considered duplicates and purged that shouldn't have been but that's what manual review is if you need that higher accuracy in a dataset. The main issue I ran with this was that with my dataset (could be a personal issue), the process would be painfully slow at 1 sample/s read on the duplicate image detection Notebook script. That's one and half hours sorting through a 24 minute episode worth of already filtered frames. Through some trial and error and ChatGPT QA, I found that switching the model used in the script provided much faster results. If you want to test your luck, switch out the following in Cell 2: `model = foz.load_zoo_model("mobilenet-v2-imagenet-torch")` with `model = foz.load_zoo_model("alexnet-imagenet-torch") ` I was getting 4.9~5.1 samples/s, or roughly 15 minutes per episode after the adjustment. a 5x improvement of speed. Other models can be found on: https://docs.voxel51.com/user_guide/model_zoo/models.html The recommendation was to stick to "imagenet" models but feel free to explore. The github recommends 2 other alternatives for this task, but I have not checked them out myself. https://github.com/ryanfwy/image-similarity https://github.com/ChsHub/SSIM-PIL I haven't proceeded further than this because I had a bit of an issue installing the face detector until just recently. The github links additional documentation on setting up the face detection as well as other commands by kohya_ss, the same as the SD-Script maker, that would just need to be DeepL'd for us English onlys. https://github.com/hysts/anime-face-detector https://note.com/kohya_ss/n/nad3bce9a3622 The Face detection also includes regularization instructions, includeing rotating the face images in proper orientation for training. Tagging is being done with wd-1-4-vit Face Detection can be trained on the subjects which I assume is for an automated filtering process and for later dreambooth weight calculations. From there the rest is a bit of blurr. I admittedly m not as sharp to go through this on my own so I am kind of asking for help but I felt that not providing some sort of primer with fixes before doing so would be rude. And hopefully this would help everyone that's trying to build up Lora or even model datasets.

>>15601

Anonymous 02/04/2023 (Sat) 01:16:07 No. 2864 >>3478

Just want to let you know this thing was a piece of shit to deal with and had to contact the git owner to get some answers but I got this shit to work, but this thing has limitations. Also for any of these packages you need to install, it's probably best to do it in a venv, but I already did everything on main before it occurred to me to do that. When the git references other projects, know that they are already in the cloned and modified by the owner. Oh an obviously: `git clone https://github.com/cyber-meow/anime_screenshot_pipeline` For Face Detection and Cropping section, make sure to specify `mmcv-full==1.6.2` as newer versions will not work and the face detection will vomit errors. Rest of the steps as described in python will work without issue until the next section. When you reach the character classification step, edit the included `classifier_training\requirements.txt` with the following: >changing the version of `einops` to `einops==0.6.0` >remove `opencv-python==4.4.0.46` (this version will not build the wheel and interrupt the rest of the installations and thus have issues continuing forward, and by default you should already have the latest version anyway). Another aspect where you can run into problems is the wandb package in the requirements file. It will ask you for an api_key to run the machine learning. Per the Git owner, this is not needed, it is only being carried over from the other git installation where this training came from. Just run `wandb disabled` and continue on. Lastly, right before you run `classifier_training\train.py`, it is not explained in the readme but you need to make an extra `\data\` folder where your `/classification_data_dir/`, Follow the steps described, and when you crate the labels.csv and the other training files, create the data subfolder and drag and drop all your images for the vision model to train, leave the csv files alone. I have not used the suggested EveryDream/2 for training because I was planning to compare this with the other options we have available, but the pipeline should be usable now with the fixes. Now for the flaws. The current tools provided only help with generating cropped face images, lacking full/upper body + face detection. If such scripts and models exist, and have it aspect ratio bucket instead of square cuts, we can overcome this issue, The face detection also does not detected faces with abnormal face structure or if eyes are obscured in an odd manner beyond Hentai/NTR protagonist levels such as knight helms or hoods that cover more than half the face. If anyone is interested in having a go and needs help I will try my best to help if asked.

>>3478

Anonymous 02/05/2023 (Sun) 10:06:40 No. 3478 >>3670

>>2864 I didn't even bother to try since EveryDream's VRAM requirements are far beyond me but thanks for pioneering as far as /hdg/ goes.

>>3670

Anonymous 02/06/2023 (Mon) 00:28:11 No. 3670

>>3478 From the main thread, Kohya's finetunner only requires 12GB VRAM and was used by the vtai chubba model maker https://github.com/bmaltais/kohya_ss/blob/master/fine_tune_README.md so probably use that instead of EveryDream

Anonymous 03/14/2023 (Tue) 01:15:57 No. 10863 >>10869

I'm trying to run this shit with wsl installed on windows 10 but all I'm getting with the command line to use the python file for frame extraction is >[ ]

>>10869

Anonymous 03/14/2023 (Tue) 01:45:41 No. 10869 >>10874

>>10863 which exact script are you trying to run? If you are trying to do the frame extraction script to have it extract all your episodes, don't bother, its borked. Just run the ffmpeg command manually per episode in command line. ffmpeg -hwaccel cuda -i "anime_name.mp4" -filter:v \ "mpdecimate=hi=64*200:lo=64*50:frac=0.33,setpts=N/FRAME_RATE/TB" \ -qscale:v 1 -qmin 1 -c:a copy "$prefix"_%d.png If for some reason command line gives you shit, use Git Bash instead to run the command. Also, make sure you actually have ffmpeg installed.

>>10874

Anonymous 03/14/2023 (Tue) 02:08:38 No. 10874 >>10877

>>10869 alright got it, thanks I thought that python script was working

>>10877

Anonymous 03/14/2023 (Tue) 02:19:31 No. 10877

>>10874 That's the only one to my knowledge that doesn't work as intended that didn't have a fix addressed in the OP or the second post.

Anonymous 04/23/2023 (Sun) 19:26:21 No. 15601

>>1209 (OP) By the way, catbox plz?

Quick Reply


Sage Bypass Check