When data scientists are training locally on small datasets, they have to deal with dependency installation, parameterization, and provisioning infrastructure. Once they want to train the model for production on a full load dataset, the complexity increases, and a new configuration must be considered.
Today we're launching a GUI feature in ploomber to solve this issue, by allowing our users to drop notebooks and execute them on the cloud without spinning up clusters or worrying about any infrastructure.
The service is based on our open-source software and have a free-tier that allows to scale multiple models before depleting the quota.
Would love to hear thoughts and impressions of it!
You can also reach out to me directly at ido@ploomber.io
+1 I also think it's faster that way on both environment setup and ad hoc rapid experiments, from my experience using the library in a team doesn't scale well, it becomes pretty slow.
its forced upon many of them that are in finance, banking, insurance, ...
Mainly because those tend to run on Microsoft Azure, which has no decent analytics offering, and are pushing Databricks extremely hard. The CTO or whatever just pushes databricks. On paper it checks all the boxes. Mlops, notebooks, experiment management. It just does all of those things very badly, but the exec doesn't care. They only care about the microsoft credits.
Just to avoid using Jupyter so the compliance teams stay happy as well because Microsoft sales people scared them away from from open source.
We pushed back on it very, very, very hard, and finally convinced "IT" to not turn off our big Linux server running JupyterHub. We actually ended up using Databricks (PySpark, Delta Lake, hosted MLFlow) quite a bit for various purposes, and were happy to have it available.
But the thought of forcing us into it as our only computing platform was a spine-chilling nightmare. Something that only a person who has no idea what data analysts and data scientists actually do all day would decide to do.
What would you go with instead for collaborative notebooks?
I ask because normally I tend pretty strongly towards the "NO just let the DSes/analysts work how they want to", which in this case would be running Jupyter locally. However DBr's notebooks seem genuinely useful.
Is your issue "but I don't need Spark" or "i wanna code in a python project, not a notebook?", or something else?
Imo if DBr cut their wedding to Spark and provided a Python-only nb environment they'd have a killer offering on their hands.
> What would you go with instead for collaborative notebooks?
Production workloads should be code. In source control. Like everybody else.
Notebooks inevitably degrade into confusing, messy blocks of “maybe applicable, maybe not” text, old results and plots embedded in the file because nobody stripped them before committing and comments like “don’t run cells below here”.
They’re acceptable only as a prototyping and exploration tool. Unfortunately, whole “generation” of data scientists and engineers have been trained to basically only use notebooks.
It's ubiquitous. I've consulted for a 100 person company that built a data product on top of some IoT data. Everything was in databricks, literally everything. (Not endorsing that, just an observation)
Talking to a 2000+ person org now that is standardizing data science across the org using... you guessed it
Pretty interesting. I think this is part of this notion to release half baked products, like some of the stuff in there are really cool, just enough to get you in but it doesn't scale and usually is complex to deploy/use.
Hi, we’re Ido & Eduardo, the founders of Ploomber. We’re launching Ploomber Convert today, a web-based application that allows data scientists to convert notebooks to PDF, no setup required.
As data scientists, we have to share our work with non-technical colleagues to communicate results. To allow them to read our findings, we use nbconvert, which enables us to export notebooks to PDF or HTML. Unfortunately, nbconvert requires Pandoc, TeX/XeLaTeX, Pyppeteer, Chromium, and other packages, which is complicated. Ploomber Convert provides the nbconvert functionality without installing a single package.
Ploomber Convert is built on top of AWS and runs all the necessary packages in a docker container. Since notebooks often contain sensitive information, we do not store any notebooks or PDF files.
Ploomber Convert is free to use. Go to https://convert.ploomber.io, drop your Jupyter Notebook to convert, hit ‘Convert to PDF’, and save it.
We want to make Ploomber Convert the go-to tool for data scientists to turn their notebooks into shareable reports. We’re working on adding support for Quarto, custom CSS templates, export to HTML, and other features. Let us know what else you need!
We’re thrilled to share Ploomber Convert with you! So, if you had difficulties exporting your Jupyter Notebooks or if you have any feedback, please share your thoughts! We love discussing these problems since exchanging ideas sparks exciting discussions and brings our attention to use cases we haven’t considered before!
First you can always research on your own, there's tons of resources online and in git.
If that doesn't work I find a teammate or a friend with the right domain expertise. There are also some useful slack communities for instance on MLOps etc.