No, you cannot trust third party code without reading it first

jmull · on Aug 11, 2022

This is probably not very good advice.

You do need a model for evaluating whether or not to trust a dependency, and for understanding if and when to extend that trust to a subsequent version/update.

I’m going to suggest that if your model is to fully understand the dependency, e.g., by reading every line of code, then you should probably stop having any dependencies and develop everything yourself —- just to save time. (But let’s face it, you should probably just find a new line of work).

It sounds glib, but I’m serious.

You can literally only run software only you’ve written, only on hardware you’ve designed and manufactured (and personally delivered), or you are trusting someone else, somewhere.

What is your basis for this? Actually?

If you’re not sure, but have an intuition, that’s not too bad… you’re in good company and like practically everyone else.

IDK, spend a few minutes thinking about what your model of trust is, and why even have one. Really, if the best you can come up with is to not trust anything you can’t personally verify, software development is not the best for you.

halostatue · on Aug 12, 2022

I do look over libraries that I’m adopting in a number of projects (and often review the differences between version upgrades). At the same time, I don’t look over the source code to PostgreSQL or libpq or even postgrex (the Elixir library for PostgreSQL).

But in terms of Getting Things Done, I also did `pnpm add date-fns` this afternoon and have never reviewed the code for `date-fns`, because it seems to do what it says on the tin and is generally well-regarded. There’s a balance to be obtained, and you have to trust someone, because you’re not going to read the source code to clang or gcc.

So in general, I agree with you: the article here is horrible advice.

Arnt · on Aug 12, 2022

I was one of the authors of a widely used library, one that I expect thousands of people looked over, maybe tens of thousands. Noone noticed the easter egg I added, as far as I could tell, and I didn't even try very hard to obfuscate it.

I'll add my voice to the chorus: The article is horrible advice.

MattPalmer1086 · on Aug 12, 2022

It really comes down to tracking published vulnerabilities in your dependencies, and choosing trustworthy dependencies.

Nobody reviews third party source code for security issues (well, almost nobody). It would not be a productive use of time. There are almost certainly many other less expensive things you could do with that resource to improve security.

SAI_Peregrinus · on Aug 12, 2022

It's (almost) the same thing for traditional engineering. Nobody reviews that fuel pump for reliability and functionality, they trust the manufacturer's specs and certifications, and if the pump turns out not to meet those specs they sue for a recall & fix/replacement. The difference is software licenses all disclaim liability, so we can't even get critical issues fixed.

titoCA321 · on Aug 14, 2022

Even in traditional engineering how can you guarantee ever eventuality that another creative mind comes up with after you designed and built the thing. I design and build an air duct to facilitate air circulation into my buildings. If someone decides fly a drone into my air duct vents and breach the building when everyone leaves for the day then who's liable? Should I have know that years later someone would use the air vents to breach the building?

bb88 · on Aug 12, 2022

I'm not super happy about log4j as it affected us through a third party install (via PyPI even!).

But the issue I see is that larger software installations have larger attack surfaces. And the typical python, js, golang (eg.) projects are just exploding in dependencies.

I think the python project I run at work is now over 100 deps on PyPI. My simplistic golang side project is around 50 or so.

kjeetgill · on Aug 12, 2022

What PyPI dependency exposed you to log4j? That's too intriguing to pass on.

bb88 · on Aug 12, 2022

Appdynamics. https://pypi.org/project/appdynamics/

It's a front end for gathering metrics about webservers and the like.

And sure enough sitting in site-packages were these java jar files.

smt88 · on Aug 11, 2022

The article gives up on its clickbait headline almost immediately:

> Of course you cannot do that with everything, you cannot read all the source code for the kernel of the operating system you're running, you cannot read all the code that makes up the compiler or interpreter you're using, but that is not the point at all, of course some level of trust is always required. The point is that you need to do it when you're dealing with code you are writing and importing!

OK, so we're drawing an arbitrary line of what we read and what we don't.

A basic web application at this point is going to import at least 10k lines of other people's code. A React app probably uses millions.

There's just no way around this. We all trust code we haven't read and have to continue doing it.

Reading code doesn't even guarantee finding security flaws. Most of us aren't security researchers, and none of us can understand an entire project's source code the first time we read it.

sys_64738 · on Aug 12, 2022

> The article gives up on its clickbait headline almost immediately

I find often it’s not the content of the article that’s relevant. It’s generally not as there’s a lot of trash out there. But they do raise good discussion topic that lead to active conversation. I find that is the real value for me personally.

tommyage · on Aug 12, 2022

> OK, so we're drawing an arbitrary line of what we read and what we don't.

Imo the line is not arbritary. You are including an library/framework for a purpose. The code path for this purpose should be explored. Recently I allowed a friend of mine to DDOS my server. He used unvisited software. Now my server suffers from thousand wild guesses daily. Previously, I had ten visitors a day. Therefore I conclude that the software leaked my server. And his learning: Don't trust any software; Exspecially if the software is for malicious purposes in the first place..

> A basic web application at this point is going to import at least 10k lines of other people's code. A React app probably uses millions.

I think it is a valid assumption to increase the trust in frameworks with a giant user base. React is from Facebook, isn't it? So I would have read the path _and_ random internals to picture the trustworthiness.

> There's just no way around this. We all trust code we haven't read and have to continue doing it.

It shouldn't be black-and-white thinking. When introducing a _new_ dependency I see myself responsible for estimating the trustworthiness. But yes, with an increasing amount of dependencies the process of updating such needs more effort as well. But then, we can postpone such updates, if the application under development is safety critical...

> Reading code doesn't even guarantee finding security flaws. Most of us aren't security researchers, and none of us can understand an entire project's source code the first time we read it.

Well, one should be able to judge about the workings imo. Otherwise maintainability can get painful in the long run. So better grasp it before deciding to build upon it.

Another comment stated that we are trusting cars we drive, etc. etc. This is about the users running our code. A proper craftsman will inspect its material, before using it in quality products. When building a throwable tool for his work, he doesn't spent to much time worrying, of course.

smt88 · on Aug 12, 2022

> React is from Facebook, isn't it?

Yes, but it uses a massive number of libraries that were not developed at Facebook.

> But yes, with an increasing amount of dependencies the process of updating such needs more effort as well. But then, we can postpone such updates, if the application under development is safety critical...

Postponing dependency updates is very, very bad for security. That is not a solution to supply-chain attacks.

> Well, one should be able to judge about the workings imo. Otherwise maintainability can get painful in the long run.

The whole point of APIs is that we do not need to understand the inner workers of code that we're calling. How many of us use bcrypt and couldn't tell you anything about the underlying algorithm?

tommyage · on Aug 12, 2022

> Postponing dependency updates is very, very bad for security. That is not a solution to supply-chain attacks.

You are right; This underlines the thought and care some distributions put into their package management system...

> The whole point of APIs is that we do not need to understand the inner workers of code that we're calling. How many of us use bcrypt and couldn't tell you anything about the underlying algorithm?

That is right, but code written by unknown developers can be a huge risk. Of course you are not assumed to read up upon any dependency, but from external sources. A quick glimps on the imports and dependencies goes a long way, I think. In the end your team is responsible for security issues, even if they appear in an external dependency. Companies with direct customer sales are spending tons of money for mitigation strategies. Maybe a chunk of this money should be spend on validating this beforehand.

chrisandchris · on Aug 12, 2022

> There's just no way around this. We all trust code we haven't read and have to continue doing it.

If compared to other industries: We all drive a car we never fully inspected ourselves, fly a plane we have't looket at ourselves and eat food we didn't see where it grew. How can we adapt the supply chain in software engineering that we can have the same level of trust (or trusted parties)?

smt88 · on Aug 12, 2022

All of those industries are incredibly heavily regulated, especially flying on planes.

I agree: we should have this much regulation for software security. The stakes are extremely high.

javajosh · on Aug 11, 2022

>There's just no way around this.

Wrong. The way around it is to write small amounts of code. We should be searching for local minima instead of convenience. Tightly integrated, batteries included programs like redbean show one way. Clojure, with its emphasis on terse, composable objects shows another. Those who write web clients using only HTML, CSS and JavaScript show yet another. We require a reusable set of small primitives that can be examined, mastered, and recombined to suit our purpose. We do not require...a great deal of what we now have.

smt88 · on Aug 12, 2022

Web clients require authentication on the backend. Authentication requires cryptography, and it's idiotic to write your own crypto libraries.

So at a minimum, you'd still need to read and understand many thousands of lines of some of the most complex code in common use.

It's not realistic and it's never going to be common practice.

collinmanderson · on Aug 12, 2022

You don't need to read the code yourself, but ideally it should be vetted or reviewed by sources you trust. Maybe that's Debian / Ubuntu / Red Hat, or maybe it's through a review system like Rust's cargo-crev: https://github.com/crev-dev/cargo-crev

Minimizing the number of dependencies helps a lot too.

But don't blindly npm or pip install something unless you trust the developers. npx/pipx are even worse. All it takes is a one typo-squatter to steal your ssh keys and maybe even saved browser passwords or cookies.

etothepii · on Aug 11, 2022

I just don't believe that I would have caught log4shell no matter how many times I read the code.

If some big org is using the same code I am then if there's a bug a patch will be written and released.

czx4f4bd · on Aug 12, 2022

On top of all the other points raised here, I'm mystified by the author's reference to the Solarwinds hack. The whole issue with the Solarwinds attack was that Solarwinds themselves failed to notice changes to their own internal code, allowing attackers to push malicious updates using their infrastructure. No amount of code dependency auditing by any of Solarwinds' customers nor by Solarwinds themselves would have prevented that.

MattPalmer1086 · on Aug 12, 2022

Right, was going to say the same thing.

_lqaf · on Aug 11, 2022

For commercial line-of-business application code, vendors will offer vetted versions of popular open source libraries/tools with SLAs for patches.

This will be a good thing for large companies, create a two-tiered world for open source, and create another set of gatekeepers. And if I were the author of a popular open library or tool, I would be thinking about how to get a piece of that action to support future development.

rufflez · on Aug 12, 2022

Is your own code free of security vulnerabilities?

Will you be able to identify and avoid issues with 3rd party libraries by reading their code, and the code of all the other libraries that they depend on?

Do you know what all vulnerabilities exist in cyberspace?

I mean that a static code analysis tool can take you much further than reading 3rd party code manually... and that is still going to fall short, but that is as good as it gets

halostatue · on Aug 12, 2022

Do you read the source code of the static code analysis tool to make sure that it’s not hiding some obvious backdoors?

rufflez · on Aug 12, 2022

I look at the outcome, and can certainly say that the tools I use find issues than I didn't know existed

shadowgovt · on Aug 12, 2022

You also can't trust it after reading it.

Really, reading it is only loosely coupled to the trust model.

ashishbijlani · on Aug 12, 2022

Plug: I've been building tooling to easily audit third-party open-source dependencies for supply chain attacks. Packj [1] analyzes Python/NPM/Rubygems packages for several risky code and attributes such as Network/File permissions, expired email domains, etc. Auditing hundreds of direct/transitive dependencies manually is impractical, but Packj can quickly point out if a package accesses sensitive files (e.g., SSH keys), spawns shell, exfiltrates data, is abandoned, lacks 2FA, etc. It can be customized to your threat model by commenting out alerts that don't apply. We found a bunch of malicious packages on PyPI using the tool, which have now been taken down; a few are listed here https://packj.dev/malware

1. https://github.com/ossillate-inc/packj

halostatue · on Aug 12, 2022

This looks interesting, but something like this would be more useful if it could read `requirements.txt`, `Gemfile` (or `*.gemspec`), or `package.json` files to identify my specific dependencies.

plaguepilled · on Aug 12, 2022

While I find myself agreeing with them occasionally, unixsheikh has an unfortunate habit of taking the extremum position in discussions like this, and not SUFFICIENTLY justifying it.

Reading code you use is a good habit. Reading it CAN improve its trustability, if you're familiar with both the language and the specific application space that the code operates in. Otherwise it's simply false that reading the code will help. In cases like that, it is better to use different heuristics for trust.

This 'overshooting the mark' behaviour is sadly very typical of a certain pocket of the Unix community, that have far too strong a conviction to their beliefs. I believe it severely damages the credibility of otherwise knowledgeable sources.

8note · on Aug 12, 2022

You can't trust any code, you can only evaluate risks and set your tolerance to those risks. There's plenty of third party code you can't read, like the code running your vending machine. I can tolerate a huge amount of risk in the code base for $1 though

cvccvroomvroom · on Aug 12, 2022

Dependencies are generally terrible. One needs to consider whether they absolutely need something or just want to support complexity and vulnerabilities. NPM is the poster child of a giant tumbleweed of garbage no one reads and is half broken most of the time.

panny · on Aug 12, 2022

Author didn't mention node-ipc which I think makes his point much better than supply chain attacks. In supply chain attack, some security failure leads to sneaky malware injected into the build. One could argue that cryptographic signing could solve this problem. However, in the case of node-ipc, #the author# of the package inserted malware deliberately. No package signing can help you with that. Nor can community reputation, as github did nothing about the deliberate distribution of malware from their platform.

When that is the working environment, he's right. Trust no one.

sys_64738 · on Aug 12, 2022

You don’t want to be the one who read the code and certified it as good. You might become criminally liable for an attack.

What you want is to reduce your risk so if you can present evidence that Google or some other high profile has done the due diligence then that will be good enough to indemnify you.

Must regular developers don’t have time to read the code anyway. You need to use the slipstream of other companies to protect you.

chmaynard · on Aug 11, 2022

I agree with the spirit of this post, but the idea of reading and understanding the code doesn't scale well. Perhaps a better way of expressing this is "trust, but verify". How trust is established varies depending on the size of the library and the reputation of the author(s). Verification obviously means rigorous testing.

kugutsumen · on Aug 12, 2022

It would be nice if we could import a library and have a way to only import the dependencies that are actually used in the code path, hide the rest and be notified whenever things change after an update.

marginalia_nu · on Aug 12, 2022

Even if you do, you're still sort of screwed for reasons illustrated below:

http://www.underhanded-c.org/

chrismsimpson · on Aug 11, 2022

Also the reason why I won't touch GitHub Co-pilot