>>10551
Alright, I'll try to infodump what each of those would require.
i2v is a multilabel classifier – you can give it an image and it will give you the confidence for a bunch of tags (1539 of them, examples 'yuu-gi-ou zexal', 'tokyo ghoul', 'kin-iro mosaic', 'safe')
The other kind of model is a binary classifier – it only gives you one tag at a time.
Either way, you feed it an image and get back a number from 0 to 1 for each tag, and you get to decide what's the cutoff.
The model itself is stored in a large-ish file for the weights. For example, the weight file for i2v is 180 MB and doesn't compress much. This isn't tiny, but it's on the small side compared to some more powerful models. Loading the model takes about 0.8s on my machine, classifying one image takes about 0.33s.
The steps to build a model from scratch are:
>Decide on the architecture
This includes describing the various layers, and deciding how many tags you want to look for.
>Gather training data
The amount of data you need depends on how "simple" the tag you want to find is, and how similar are images with / without the tag. A few hundred images is probably enough to train some easy tags, a few thousand should be able to handle harder ones.
>Run training
This involves letting your computer run full blast for a bit while it does a bunch of linear algebra on the images. GPUs make this much faster. It depends on the amount of data we use, but I'd expect most models worth training to take an hour of GPU, or maybe 10 hours of CPU (very rough estimate).
There are tricks you can do to let everyone help out training a single massive model, but that's a technical and logistical nightmare.
There's a trick you can do called "transfer learning" which lets you piggyback off a model you already have. It might be possible to use this to add tags to i2v that aren't in the basic list. This would produce a small model (that still require the larger one to work) and would take less time to train, but it's limited to things that are similar to what i2v was trained on originally.