One quick comment: in general it is a bad idea to compute the inverse of a matrix (to solve a linear system). It's much better to compute the QR factorization or SVD instead (or simply call least square solver).
GMRES is definitely my go-to these days. Though it is worth noting that direct methods do have a benefit of letting you quickly solve many successive linear problems involving the same matrix, but different right-hand sides. But iterative methods scale very well for large sparse problems that they are very often the only tool to consider.
Block Krylov methods are a thing , but I haven't experimented with them yet.
Has anyone implemented them in production? I read their paper a while ago, but thought that the dual methods they showed might have practical limitations (like insane cache misses, compared to more standard methods)
These problems you sketch can already be overcome by using the approximate eigenspectrum info of the matrix obtained as a by-product of Krylov methods. This has been invented many times, and is used in 'thick restart', 'augmented Krylov subspace methods', 'deflation'. In particular this was developed for scattering problems with many electromagnetic incident waves hitting an object, changing only the rhs.
Also, you can also use an approximate LU-decomp as a preconditioner for Krylov methods.
Similarly, Tykhonov regularization (solving for a range of slightly perturbed matrices "A + labda I" where labda is a parameter) are easily tackled using Krylov subspace methods by noting that the Krylov subspace is invariant under shifts like these. So only once an orthonormal basis must be found for the Krylov subspace, which can then be used for every lamda of interest.
Depends on the number of unknowns, sparsity of the matrix, whether you want 3 correct digits or 16, etc.
Direct methods as Gaussian elimination with pivoting are proven to be stable. Iterative methods are not but can be a lot cheaper in computational costs. Also they can stop when a certain relative residual is reached, unlike direct methods.
If iterative methods like Krylov subspace methods where stable, then they could actually be seen as direct methods themselves, as the Krylov subspace has at most a dimension of N, where N is the number of unknowns; so after at most N iterations the solution could be extracted exactly from the search space. In practice this is not the case due to rounding errors.
Not very important and for a learning project I find using cvxpy a better idea as it's more readable ( like you did ) but:
Solving the full quadratic optimization problem for SVMs in basically impossible to do. You are forming an n^2 matrix, so I'm going to let you imagine what happens when n = 100 000.
Using people use either approximation methods ( Incomplete Cholesky, Nystrom ) or do it exactly but iteratively ( SMO, Pegasos... )
I'm implementing them for class right now so it's still fresh in my head haha
See for example: https://www.johndcook.com/blog/2010/01/19/dont-invert-that-m...