One quick comment: in general it is a bad idea to compute the inverse of a matrix (to solve a linear system). It's much better to compute the QR factorization or SVD instead (or simply call least square solver).
GMRES is definitely my go-to these days. Though it is worth noting that direct methods do have a benefit of letting you quickly solve many successive linear problems involving the same matrix, but different right-hand sides. But iterative methods scale very well for large sparse problems that they are very often the only tool to consider.
Block Krylov methods are a thing , but I haven't experimented with them yet.
Has anyone implemented them in production? I read their paper a while ago, but thought that the dual methods they showed might have practical limitations (like insane cache misses, compared to more standard methods)
These problems you sketch can already be overcome by using the approximate eigenspectrum info of the matrix obtained as a by-product of Krylov methods. This has been invented many times, and is used in 'thick restart', 'augmented Krylov subspace methods', 'deflation'. In particular this was developed for scattering problems with many electromagnetic incident waves hitting an object, changing only the rhs.
Also, you can also use an approximate LU-decomp as a preconditioner for Krylov methods.
Similarly, Tykhonov regularization (solving for a range of slightly perturbed matrices "A + labda I" where labda is a parameter) are easily tackled using Krylov subspace methods by noting that the Krylov subspace is invariant under shifts like these. So only once an orthonormal basis must be found for the Krylov subspace, which can then be used for every lamda of interest.
Depends on the number of unknowns, sparsity of the matrix, whether you want 3 correct digits or 16, etc.
Direct methods as Gaussian elimination with pivoting are proven to be stable. Iterative methods are not but can be a lot cheaper in computational costs. Also they can stop when a certain relative residual is reached, unlike direct methods.
If iterative methods like Krylov subspace methods where stable, then they could actually be seen as direct methods themselves, as the Krylov subspace has at most a dimension of N, where N is the number of unknowns; so after at most N iterations the solution could be extracted exactly from the search space. In practice this is not the case due to rounding errors.
Not very important and for a learning project I find using cvxpy a better idea as it's more readable ( like you did ) but:
Solving the full quadratic optimization problem for SVMs in basically impossible to do. You are forming an n^2 matrix, so I'm going to let you imagine what happens when n = 100 000.
Using people use either approximation methods ( Incomplete Cholesky, Nystrom ) or do it exactly but iteratively ( SMO, Pegasos... )
I'm implementing them for class right now so it's still fresh in my head haha
Great resource, but it could be a phenomenal resource if you documented each method and explained how and why it does what it does.
Don't get me wrong, having working code to play with is key, but when you don't fully grasp the concepts behind it, an explanation can become so valuable.
That being said, you've included names, so research can be done. Great work and I hope you're enjoying it!
Given the amount of publicity this repository has gained I will make sure do put more work into documenting the code. And also add references. Thank you for your feedback!
This is a nice project. I think it would be great to add references used for the implementations and some tests that demonstrate they return what is expected (or perhaps the same result of sklearn maybe).
This is impressive, and kindof exactly what I am in the process of doing. It's certainly the best way to get familiar with the internal workings of these methods than just tune parameters like an oblivious albeit theoretically informed monkey. How long did it take you to do them?
Pattern recognition and machine learning by Bishop is one of the canonical text books. It helps to have a linear algebra background, it includes a refresher though
This one is interesting to see the "statistical" other side of the industry vs machine-learning people. For example I don't think gradient descent is used once in that book.
This could become a fantastic resource for anybody who is teaching machine learning.
One vital improvement suggestion to make that path attractive would be if the Jupyter notebook format were used. It would be easier to add more documentation and references.
Better yet, write your own simple versions. That's what we did in undergrad and grad classes in AI, ML, Neural Nets, etc. There is nothing like building one yourself, even a simple model like kNN!
In your RandomForest implementation, on the line in fit() where you're building the training subsets to give to each tree, it appears that your bagging approach doesn't use 'sampling with replacement' strategy.
It would appear that the replace=False prevents the 'sampling with replacement' behavior usually implemented by bagging algorithms. Should the replace=False be changed to replace=True?
Thank you for your feedback! I have read up on the feature bagging part of the algorithm and I believe that you are correct. This is fixed in the latest commit.
sci-kit learn is excellent, but their implementations are a bit to complicated to learn from.
this is for people who don't just want to tune parameters but build the whole thing from scratch
I can buy buy a pie all the fix-ins from a bakery, or I can buy the ingredients myself, and make it to exactly my liking. it may not be a professional.
A friend sent me a link to this - nice work, and I happen to be intermittently working on a very similar (and unfortunately similarly named) project - https://github.com/jarfa/ML_from_scratch/. Check my commit history if you suspect me of copying you ;)
I don't think I'll be implementing as many algorithms as you though, I should force myself to work on more projects outside my comfort zone.
I have taken some ML courses during my university studies and have also done some model implementations in other programming languages. So I didn't have to start from scratch. But I have been working on this project for about three weeks now.
Nice! I have started brushing up my maths and reading about machine learning in general. Next step is to get my feet wet in the implementation. I think looking at your project can give me a good idea as to how to implement some of the most basic algorithms.
Good luck!
Awesome! :) Doing the nitty-gritty has been a great way of learning the limitations and benefits of using certain models for a given task. Good luck to you as well!
Nice project! I'm doing something similar in julia, with the added advantage that as I build it the numerical types are variadic so I can play around with numbers that aren't IEEE FPs.
See for example: https://www.johndcook.com/blog/2010/01/19/dont-invert-that-m...