There's been a recent flurry of AI covers on Youtube and Instagram. For example: https://youtube.com/@fakemusicbr?si=G-RzKcm2iuXX56kI
Some of these songs are _really_ good. And pretty far ahead of where I thought audio AI was. Does anyone know what models and tools are used? I imagine once you have stems you can maybe style / genre transfer and use a reference audio file perhaps? Like image and video models.
Where you can input your lyrics, styles, etc. You can create all original songs by having AI generate the lyrics even.
>Does anyone know what models and tools are used?
One of the best in my opinion: https://huggingface.co/tencent/SongGeneration
One of the most popular: https://huggingface.co/facebook/musicgen-medium
The "transformers" is just a python library and you just add all the parameters in a constructor:
processor( text=["80s pop track with bassy drums and synth", "90s rock song with loud guitars and heavy drums"], padding=True, return_tensors="pt", )
This one can run on like 16GB of vram and modern cards produce them trivially.
>I imagine once you have stems you can maybe style / genre transfer and use a reference audio file perhaps?
You can do audio to audio; but it's not as easy as just generating the whole thing from text.