Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
No, you cannot trust third party code without reading it first (unixsheikh.com)
38 points by lycopodiopsida on Aug 11, 2022 | hide | past | favorite | 38 comments


This is probably not very good advice.

You do need a model for evaluating whether or not to trust a dependency, and for understanding if and when to extend that trust to a subsequent version/update.

I’m going to suggest that if your model is to fully understand the dependency, e.g., by reading every line of code, then you should probably stop having any dependencies and develop everything yourself —- just to save time. (But let’s face it, you should probably just find a new line of work).

It sounds glib, but I’m serious.

You can literally only run software only you’ve written, only on hardware you’ve designed and manufactured (and personally delivered), or you are trusting someone else, somewhere.

What is your basis for this? Actually?

If you’re not sure, but have an intuition, that’s not too bad… you’re in good company and like practically everyone else.

IDK, spend a few minutes thinking about what your model of trust is, and why even have one. Really, if the best you can come up with is to not trust anything you can’t personally verify, software development is not the best for you.


I do look over libraries that I’m adopting in a number of projects (and often review the differences between version upgrades). At the same time, I don’t look over the source code to PostgreSQL or libpq or even postgrex (the Elixir library for PostgreSQL).

But in terms of Getting Things Done, I also did `pnpm add date-fns` this afternoon and have never reviewed the code for `date-fns`, because it seems to do what it says on the tin and is generally well-regarded. There’s a balance to be obtained, and you have to trust someone, because you’re not going to read the source code to clang or gcc.

So in general, I agree with you: the article here is horrible advice.


I was one of the authors of a widely used library, one that I expect thousands of people looked over, maybe tens of thousands. Noone noticed the easter egg I added, as far as I could tell, and I didn't even try very hard to obfuscate it.

I'll add my voice to the chorus: The article is horrible advice.


It really comes down to tracking published vulnerabilities in your dependencies, and choosing trustworthy dependencies.

Nobody reviews third party source code for security issues (well, almost nobody). It would not be a productive use of time. There are almost certainly many other less expensive things you could do with that resource to improve security.


It's (almost) the same thing for traditional engineering. Nobody reviews that fuel pump for reliability and functionality, they trust the manufacturer's specs and certifications, and if the pump turns out not to meet those specs they sue for a recall & fix/replacement. The difference is software licenses all disclaim liability, so we can't even get critical issues fixed.


Even in traditional engineering how can you guarantee ever eventuality that another creative mind comes up with after you designed and built the thing. I design and build an air duct to facilitate air circulation into my buildings. If someone decides fly a drone into my air duct vents and breach the building when everyone leaves for the day then who's liable? Should I have know that years later someone would use the air vents to breach the building?


I'm not super happy about log4j as it affected us through a third party install (via PyPI even!).

But the issue I see is that larger software installations have larger attack surfaces. And the typical python, js, golang (eg.) projects are just exploding in dependencies.

I think the python project I run at work is now over 100 deps on PyPI. My simplistic golang side project is around 50 or so.


What PyPI dependency exposed you to log4j? That's too intriguing to pass on.


Appdynamics. https://pypi.org/project/appdynamics/

It's a front end for gathering metrics about webservers and the like.

And sure enough sitting in site-packages were these java jar files.


The article gives up on its clickbait headline almost immediately:

> Of course you cannot do that with everything, you cannot read all the source code for the kernel of the operating system you're running, you cannot read all the code that makes up the compiler or interpreter you're using, but that is not the point at all, of course some level of trust is always required. The point is that you need to do it when you're dealing with code you are writing and importing!

OK, so we're drawing an arbitrary line of what we read and what we don't.

A basic web application at this point is going to import at least 10k lines of other people's code. A React app probably uses millions.

There's just no way around this. We all trust code we haven't read and have to continue doing it.

Reading code doesn't even guarantee finding security flaws. Most of us aren't security researchers, and none of us can understand an entire project's source code the first time we read it.


> The article gives up on its clickbait headline almost immediately

I find often it’s not the content of the article that’s relevant. It’s generally not as there’s a lot of trash out there. But they do raise good discussion topic that lead to active conversation. I find that is the real value for me personally.


> OK, so we're drawing an arbitrary line of what we read and what we don't.

Imo the line is not arbritary. You are including an library/framework for a purpose. The code path for this purpose should be explored. Recently I allowed a friend of mine to DDOS my server. He used unvisited software. Now my server suffers from thousand wild guesses daily. Previously, I had ten visitors a day. Therefore I conclude that the software leaked my server. And his learning: Don't trust any software; Exspecially if the software is for malicious purposes in the first place..

> A basic web application at this point is going to import at least 10k lines of other people's code. A React app probably uses millions.

I think it is a valid assumption to increase the trust in frameworks with a giant user base. React is from Facebook, isn't it? So I would have read the path _and_ random internals to picture the trustworthiness.

> There's just no way around this. We all trust code we haven't read and have to continue doing it.

It shouldn't be black-and-white thinking. When introducing a _new_ dependency I see myself responsible for estimating the trustworthiness. But yes, with an increasing amount of dependencies the process of updating such needs more effort as well. But then, we can postpone such updates, if the application under development is safety critical...

> Reading code doesn't even guarantee finding security flaws. Most of us aren't security researchers, and none of us can understand an entire project's source code the first time we read it.

Well, one should be able to judge about the workings imo. Otherwise maintainability can get painful in the long run. So better grasp it before deciding to build upon it.

Another comment stated that we are trusting cars we drive, etc. etc. This is about the users running our code. A proper craftsman will inspect its material, before using it in quality products. When building a throwable tool for his work, he doesn't spent to much time worrying, of course.


> React is from Facebook, isn't it?

Yes, but it uses a massive number of libraries that were not developed at Facebook.

> But yes, with an increasing amount of dependencies the process of updating such needs more effort as well. But then, we can postpone such updates, if the application under development is safety critical...

Postponing dependency updates is very, very bad for security. That is not a solution to supply-chain attacks.

> Well, one should be able to judge about the workings imo. Otherwise maintainability can get painful in the long run.

The whole point of APIs is that we do not need to understand the inner workers of code that we're calling. How many of us use bcrypt and couldn't tell you anything about the underlying algorithm?


> Postponing dependency updates is very, very bad for security. That is not a solution to supply-chain attacks.

You are right; This underlines the thought and care some distributions put into their package management system...

> The whole point of APIs is that we do not need to understand the inner workers of code that we're calling. How many of us use bcrypt and couldn't tell you anything about the underlying algorithm?

That is right, but code written by unknown developers can be a huge risk. Of course you are not assumed to read up upon any dependency, but from external sources. A quick glimps on the imports and dependencies goes a long way, I think. In the end your team is responsible for security issues, even if they appear in an external dependency. Companies with direct customer sales are spending tons of money for mitigation strategies. Maybe a chunk of this money should be spend on validating this beforehand.


> There's just no way around this. We all trust code we haven't read and have to continue doing it.

If compared to other industries: We all drive a car we never fully inspected ourselves, fly a plane we have't looket at ourselves and eat food we didn't see where it grew. How can we adapt the supply chain in software engineering that we can have the same level of trust (or trusted parties)?


All of those industries are incredibly heavily regulated, especially flying on planes.

I agree: we should have this much regulation for software security. The stakes are extremely high.


>There's just no way around this.

Wrong. The way around it is to write small amounts of code. We should be searching for local minima instead of convenience. Tightly integrated, batteries included programs like redbean show one way. Clojure, with its emphasis on terse, composable objects shows another. Those who write web clients using only HTML, CSS and JavaScript show yet another. We require a reusable set of small primitives that can be examined, mastered, and recombined to suit our purpose. We do not require...a great deal of what we now have.


Web clients require authentication on the backend. Authentication requires cryptography, and it's idiotic to write your own crypto libraries.

So at a minimum, you'd still need to read and understand many thousands of lines of some of the most complex code in common use.

It's not realistic and it's never going to be common practice.


You don't need to read the code yourself, but ideally it should be vetted or reviewed by sources you trust. Maybe that's Debian / Ubuntu / Red Hat, or maybe it's through a review system like Rust's cargo-crev: https://github.com/crev-dev/cargo-crev

Minimizing the number of dependencies helps a lot too.

But don't blindly npm or pip install something unless you trust the developers. npx/pipx are even worse. All it takes is a one typo-squatter to steal your ssh keys and maybe even saved browser passwords or cookies.


I just don't believe that I would have caught log4shell no matter how many times I read the code.

If some big org is using the same code I am then if there's a bug a patch will be written and released.


On top of all the other points raised here, I'm mystified by the author's reference to the Solarwinds hack. The whole issue with the Solarwinds attack was that Solarwinds themselves failed to notice changes to their own internal code, allowing attackers to push malicious updates using their infrastructure. No amount of code dependency auditing by any of Solarwinds' customers nor by Solarwinds themselves would have prevented that.


Right, was going to say the same thing.


For commercial line-of-business application code, vendors will offer vetted versions of popular open source libraries/tools with SLAs for patches.

This will be a good thing for large companies, create a two-tiered world for open source, and create another set of gatekeepers. And if I were the author of a popular open library or tool, I would be thinking about how to get a piece of that action to support future development.


Is your own code free of security vulnerabilities?

Will you be able to identify and avoid issues with 3rd party libraries by reading their code, and the code of all the other libraries that they depend on?

Do you know what all vulnerabilities exist in cyberspace?

I mean that a static code analysis tool can take you much further than reading 3rd party code manually... and that is still going to fall short, but that is as good as it gets


Do you read the source code of the static code analysis tool to make sure that it’s not hiding some obvious backdoors?


I look at the outcome, and can certainly say that the tools I use find issues than I didn't know existed


You also can't trust it after reading it.

Really, reading it is only loosely coupled to the trust model.


Plug: I've been building tooling to easily audit third-party open-source dependencies for supply chain attacks. Packj [1] analyzes Python/NPM/Rubygems packages for several risky code and attributes such as Network/File permissions, expired email domains, etc. Auditing hundreds of direct/transitive dependencies manually is impractical, but Packj can quickly point out if a package accesses sensitive files (e.g., SSH keys), spawns shell, exfiltrates data, is abandoned, lacks 2FA, etc. It can be customized to your threat model by commenting out alerts that don't apply. We found a bunch of malicious packages on PyPI using the tool, which have now been taken down; a few are listed here https://packj.dev/malware

1. https://github.com/ossillate-inc/packj


This looks interesting, but something like this would be more useful if it could read `requirements.txt`, `Gemfile` (or `*.gemspec`), or `package.json` files to identify my specific dependencies.


While I find myself agreeing with them occasionally, unixsheikh has an unfortunate habit of taking the extremum position in discussions like this, and not SUFFICIENTLY justifying it.

Reading code you use is a good habit. Reading it CAN improve its trustability, if you're familiar with both the language and the specific application space that the code operates in. Otherwise it's simply false that reading the code will help. In cases like that, it is better to use different heuristics for trust.

This 'overshooting the mark' behaviour is sadly very typical of a certain pocket of the Unix community, that have far too strong a conviction to their beliefs. I believe it severely damages the credibility of otherwise knowledgeable sources.


You can't trust any code, you can only evaluate risks and set your tolerance to those risks. There's plenty of third party code you can't read, like the code running your vending machine. I can tolerate a huge amount of risk in the code base for $1 though


Dependencies are generally terrible. One needs to consider whether they absolutely need something or just want to support complexity and vulnerabilities. NPM is the poster child of a giant tumbleweed of garbage no one reads and is half broken most of the time.


Author didn't mention node-ipc which I think makes his point much better than supply chain attacks. In supply chain attack, some security failure leads to sneaky malware injected into the build. One could argue that cryptographic signing could solve this problem. However, in the case of node-ipc, #the author# of the package inserted malware deliberately. No package signing can help you with that. Nor can community reputation, as github did nothing about the deliberate distribution of malware from their platform.

When that is the working environment, he's right. Trust no one.


You don’t want to be the one who read the code and certified it as good. You might become criminally liable for an attack.

What you want is to reduce your risk so if you can present evidence that Google or some other high profile has done the due diligence then that will be good enough to indemnify you.

Must regular developers don’t have time to read the code anyway. You need to use the slipstream of other companies to protect you.


I agree with the spirit of this post, but the idea of reading and understanding the code doesn't scale well. Perhaps a better way of expressing this is "trust, but verify". How trust is established varies depending on the size of the library and the reputation of the author(s). Verification obviously means rigorous testing.


It would be nice if we could import a library and have a way to only import the dependencies that are actually used in the code path, hide the rest and be notified whenever things change after an update.


Even if you do, you're still sort of screwed for reasons illustrated below:

http://www.underhanded-c.org/


Also the reason why I won't touch GitHub Co-pilot




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: