A common option for that purpose is Zig. WASI is just one of the many targets it can compile C code to, out of the box: `zig cc -target wasm32-wasi`. No need to install yet another LLVM copy.
Zig produces more optimized code, because it optimizes the libc like the rest of the application. So you can enable additional WebAssembly features according to the target runtime, for example `zig cc -target wasm32-wasi -mcpu=baseline+bulk_memory+simd128`.
That's cool and all, but how do I get it to use a linker that understands those options I give to `wasm-ld` (`--allow-undefined`, `--export-all`, `--no-entry`)? As it is, I'm getting “error: unsupported linker arg: --allow-undefined” when I pass that with `-Wl,`.
There is the version 1 which is not allowed to continue for political reasons (and that’s the one that everyone is implementing because it actually works)
and there is version 2 that redoes everything from 0.
Also there is wasix from an entirely different team that builds on top of version 1
You mean preview 1? It was never meant to be a finished spec that would remain supported long-term; it was a preview for early adopters and people working in the space to experiment with.
Genuinely curious: what are you working on that makes POSIX compatibility important, but also warrants using Wasm instead of just building native executables?
I ask because I'm personally much more excited about the future of WASI, because it brings so much to the table that it feels like it's actually worth the effort of building something new — I see it as a game changer. I.e. what you appear to view as an unnecessary diversion, I see as the whole point. So I'd love to understand more about your perspective and use case.
And there are runtimes and services implementing their own APIs, and there's emscripten which is how most people actually use WebAssembly today. But if you're not porting existing software and only care about the web, these APIs are not very relevant. Oh, and WebAssembly also looks very different if you are using it for smart contracts.
This is due to the fact that WebAssembly was not specified with an API to actually use it. But it's okay. CPUs don't ship with an operating system either. It doesn't make them useless, nor prevents multiple operating systems from existing.
I’m happy to see more excitement on the Wasm area.
For the cases where you may need full POSIX compatibility for sockets, threads, signals or even fork or longjmp/setjmp, I’d recommend using WASIX [1] (also referenced in other comments here) since it has implemented all those system calls missing from WASI.
Relevant but not mentioned: syrusakbary is the CEO of Wasmer, the company behind WASIX. I'm personally excited by WASIX (WASI has stagnated too long), but if you recommend something you have a part in, I think you should mention it.
The guy that tried to trademark the Webassembly name btw and it's constantly trying to hijack the community with halfbaked solutions instead of pushing forward the standard, cause Wasmer had beef in the past with bytecode alliance peeps.
WASI and WASM move slow for a good reason, This is how proper science and engineering works. The "go fast and break things" mentality of silicon valley startups has mainly filled the industry with low quality software.
I imagine Wasmer must be in a bit of a difficult place right now. My understanding is that they are VC-backed (YC?), which means they're essentially trying to figure out how to make a huge amount of money off an open standard or something adjacent to it. But then, they would have known this from the very start, so I'm not sure what they expected, or why a VC firm would even look at them as a promising prospect.
Maybe they hoped to be like Docker, where the value-add was cohesive higher-level tooling on top of all the lower-level bits. If that's the case, then WASI would be an existential threat in that it (and the tooling people are building around it) commoditizes what would otherwise be Wasmer's special sauce. I'm speculating wildly, of course, but it would at least be consistent with Wasmer's/Sirius's apparent vendetta against WASI and the people working on it.
I hope Wasmer eventually finds a way to add value that doesn't rely on constant gaslighting and white-anting the standards process.
My response is focused here to be as productive as possible, and intentionally doesn't enter your other comments about Wasmer or my persona as, while incredibly accurate, I want to make sure we keep conversation properly railed.
We recently presented Wasmer Edge [1], for which developers, enterprises and VC firms are incredibly excited for it as it brings a more scalable approach for both Cloud and Edge computing. Please give it a try and let us know if you have any questions!
I wonder why the need to post hateful/flamewar comments in all the Hacker News posts where Wasmer is mentioned [1] [2] [3].
In my last comment on a previous thread [1], I asked a direct question to see if you are related at all to the BA, without success. I'd love if you can bring some light on this, and I'd appreciate if you can do so without trying to start a flamewar.
hey, I am not affiliated with BA at all. I just don't like you and what you represent. The filth of silicon valley startups that try to make money off of the hard work and love some people put on their projects. Open source peeps do not need your pathetic public apologies which is obvious just another marketing strategy since you relized you f'ed up.
Runtimes didn't wait for WASIX to start adding missing system calls to WASI.
But WASIX brings a central place to document all this.
And I don't think WASIX is fundamentally incompatible with the future WASI. The different ABIs doesn't make this straightforward, but WASIX could be available as WASI extension like others.
I've never seen a worse-described project than WASI.
"Component model interfaces always support link-time interposition."
Like WTF does this mean? The repo tells me nothing and I've still yet to see a clear write-up about what WASI is. I click on "docs" folder and there's one file. https://github.com/WebAssembly/WASI/blob/main/docs/WitInWasi.... WTF is wit? This should be in a CONTRIBUTORS.md not in the docs folder. I click on "legacy" and I see preview0 and preview1, which are basically unreadable proto-specs. Wikipedia tells me WASI is a POSIX-like interface but with POSIX I know exactly where to look up the functions. Where's a single well-written WASI spec?
I'll be honest - this whole project feels like candy for architecture astronauts and goes against the spirit of WebAssembly. Looks at how well-written WebAssembly's goals are: https://webassembly.org/docs/high-level-goals/. Their spec is easy to find and easy to read. This is what I want from WASM. Whatever WASI is doing, I don't like it. And neither does AssemblyScript team apparently: https://www.assemblyscript.org/standards-objections.html.
I also agree with the AssemblyScript people. WASI is driven by people saying "I want to be able to compile existing Linux software to WASM and run it on a server!" and to do that they have pretty much just copied POSIX.
Great for running old software, but it seems very short sighted to me to tie WebAssembly to 70s UNIX design.
It'll probably be popular because people apparently love never fixing things...
They mean that an environment can always provide a fake version of a function to a webassembly program. They’re defining an API but there’s no requirement to implement it.
It seems like a rather weak specification where nothing is required to work? But at least it defines an interface. You might compare it to Go or Java interface types.
You're referring to yet another part of WASI, which is indeed about defining domain-specific sets of APIs.
No specific set is required to be implemented. So an application can work somewhere, and not load elsewhere.
But the idea is that similar environments will hopefully implement similar APIs, and there's a mechanism to encourage that ("worlds").
Not only interfaces are defined. As an illustration, the WASI Crypto proposal includes a lot of details on how individual functions must behave, because in this context, it's critical to avoid inconsistencies between implementations.
Good to hear. I was basing what I wrote on the previous sentence:
> This can be used to adapt or attenuate the functionality of a WASI API without changing the code using it.
It sounds like someone (who?) can use middleware to override an API to do whatever they like, even though there are specifications elsewhere about what they must do.
In the context of virtualization, would this allow things like running one virtual machine inside another? Sandboxes all the way down?
> "Component model interfaces always support link-time interposition."
>
> Like WTF does this mean? The repo tells me nothing
Directly above the sentence you quoted:
"Interposition in the context of WASI interfaces is the ability for a Webassembly instance to implement a given WASI interface, and for a consumer WebAssembly instance to be able to use this implementation transparently. This can be used to adapt or attenuate the functionality of a WASI API without changing the code using it."
> and I've still yet to see a clear write-up about what WASI is.
In the same document: [0]
> WTF is wit?
The first link in that document ("Starting in Preview2, WASI APIs are defined using the Wit IDL.") is [1].
> I click on "legacy" and I see preview0 and preview1, which are basically unreadable proto-specs.
The README for the legacy directory [2] clearly explains what they are.
> Where's a single well-written WASI spec?
"Development of each API happens in its own repo, which you can access from the proposals list." [3]
> Whatever WASI is doing, I don't like it.
Clearly not - you've gone out of your way to ignore all of the documentation that answer your questions.
> And neither does AssemblyScript team apparently
The AssemblyScript team have a bone to pick with WASI based on their misunderstanding of what WASI is for (it is not intended for use on the web) and WASI's disinterest in supporting UTF-16 strings. You can see for yourself in [4].
None of this is excusable. Tell me what your project does in clear English.
"WASI is a standard for 50 functions you can call to do systems-level things from your WASM code. Here they are."
Done.
I don't care about wit/witx. I don't care the repo being in transition. I don't want to read about interposition or components or capabilities. I don't want to see your copy-pasta goals from WASM (which aren't clear for WASI). You're an API. Show me the API.
The WASI section documents WASI as it is implemented today.
But since then, WASI pivoted and has become an umbrella for multiple projects. This is not just an API any more, and at the moment, the documentation on the WASI site and repositories is for WASI developers, not for developers using WASI. So if you didn't follow the whole thing, it is indeed be very hard to understand. The stack is complicated. But the ultimate goal of the project is to actually make it easier for developers to use wasm, without having to worry about all these details.
It's essentially about adding dynamic linking to wasm. The dynamic libraries embed the function prototypes, so that calling functions with the wrong type will cause a link error. That requires the definitions of every type of function, and WIT, WAI and WITX are domain-specific languages to do that.
Right now, WebAssembly is limited to static linking. It works very well, even across languages, but types aren't automatically checked. Actually, they are, but only using the primitive types available in WebAssembly. Here, the goal is to support something very close to the Rust type system.
The proposal also allows restricting every library to their own memory region. So, a buggy or malicious library can only mess with its own data, not with the rest of the application.
> Tell me what your project does in clear English.
Okay.
"Define a set of portable, modular, runtime-independent, and WebAssembly-native APIs which can be used by WebAssembly code to interact with the outside world. These APIs preserve the essential sandboxed nature of WebAssembly through a Capability-based API design."
That's the first goal.
> I don't care about wit/witx. I don't care the repo being in transition. I don't want to read about interposition or components or capabilities.
Everything you've just mentioned is relevant to people implementing the spec, given its wide scope and nature. It is not reducible complexity.
It is not a drop-in replacement for POSIX (i.e. it is not "a standard for 50 functions"); it goes beyond that, and aims to provide a secure and modular way to interact with the system where capabilities can be delegated or reduced.
I implemented my first real-world project in WASM for OSS Summit, and everything you said about WASI echoes my real-world experience. It's not a set of functions to call. It's a terrific bundle of constraints (enabling constraints) on how to interact with the world outside of WASM, that embracing will force you to make your problem smaller until it's well-defined and testable, unit-ized even.
You're not building the old monolith the same old way in WASM, at least not if WASI is the main tool in the toolbox. And you're also probably not building the whole thing in WASM, (whatever it is that you had as original goal to build as an enterprise.) I think we'll have legacy code that needs to integrate with WASM for a time, and the time might be forever.
As a Rubyist, from this perspective I have been studying WASM and I have to admit I was really disappointed when I first began to understand the limitations we have in the current crop of tools – I now firmly believe the next generation of WASM tools will do it better.
Sweeping enhancements that are going to change it all again. Dynamic loading will hopefully bring us the capability for Ruby gems with C extensions to join the vfs assembly. (But now, as an end user and not a deep systems implementor, this is the part of the conversation where I begin to lose the plot, and going out of my depth...)
The Wasm security model is to isolate at the module level. Each module gets its own isolated linear memory, and is prevented from reading another module's memory. A module is not prevented from reading its own memory via a pointer, nor should it be.
> So it is cool if you allow an attacker to change the program inside?
The abstract machine behind WASM is a Harvard architecture, so self-modifying code (even accidentally through attacks) is not possible.
Memory safety is still a problem for languages that aren't memory safe, but since the program memory is not writeable from the program itself you can't take over control through memory safety bugs. Now whether the program interprets its data in a way that can be made exploitable is a different bug, but one that exists outside what WASM (or any virtual machine) can fix - it does mitigate the consequences however, since every module is isolated and the runtime uses capability-based security to prevent the code from accessing any system resources that it doesn't need.
I don’t think wasm supports polymorphic code. I’m pretty sure the code section of a wasm module is not modifiable. It might not even be readable by the module itself.
The point of wasm is to allow C code to run unmodified, safely, and with as little performance degradation as possible. Sandboxing will mitigate a huge percentage of security vulnerabilities in wasm modules but C is still C. If you want better memory safety, use wasm with a memory safe language.
Sorry to be blunt but this is a completely nonsense comment that severely misunderstands how applications, memory, Rust and operating systems work together.
It's actually the hardware that enforces such guarantees. The OS could have bugs, like the WASM program - even if the entire stack is written in Rust from top to bottom.
Really the only solution is memory safe hardware, which there is effort going into, e.g. ARM CHERI[0] and ARM MTE
This is one of the few comments here I actually agree with. Totally agree our current architecture is totally broken. It sounds like ARM MTE is coming soonish so excited to see that.
WASM uses a Harvard architecture, so the instruction memory is separate from data memory. You can't change the program from within the program because you can't write to that address space.
You don’t understand how WebAssembly works. You severely misunderstand how the VM secures itself. Listen to what all the other comments are trying to teach you. There are no security vulnerabilities in your code.
For simplicity, pretend any pointer dereference is a function call with the address as its argument. The VM makes sure it’s within the range of valid values every time. There is no executable code in that range either, only data. Further, function pointers are not real pointers, just an index into a table which again, is checked on any indirect function call.
In practice the memory is secured by allocating a 4 GiB aligned memory range and just masking off the pointer on every access so it’s within that memory. This way there’s very little to no overhead, depending on the architecture.
The code running in the WASM VM doesn't have any access to the operating system. That printf, when run in a WASM VM, doesn't do what I think you think it does.
If it is 'use rust' wasm actively downgrades the memory safety that rust gives you because many of the security issues that wasm has is a function of both the compiler and the operating system working together. The compiler merely tells the operating system what should be done - it can't really enforce it.
In this case the various wasm runtimes are acting as that piece and they aren't enforcing the expected behavior.
If the compiler says - hey stick this in .rodata and the wasm interpreter says I don't know what that is then that memory safety feature/contract gets broken.
Rust's (and other languages) memory safety doesn't come from hardware features. Implementing memory safety at the language level means that you don't care about whether the data sits in a paged mapped as read-only, because the language already makes sure you're not writing to said data.
Besides, that's the safety of the code running inside the VM. Wasm is about the safety of the host.
> If it is 'use rust' wasm actively downgrades the memory safety that rust gives you because many of the security issues that wasm has is a function of both the compiler and the operating system working together.
Can you say more about this? I’m using rust through wasm and I hadn’t heard of any security disadvantages of doing so. What kind of security problems are mitigated in a native binary that are problems in wasm?
Again, it's not really 'native binary' vs wasm. The binary can only really tell the os what it expects it to do. It is more of 'operating system' vs wasm.
By no means an authoritative list and I've not brought up the sandbox at all - that's completely out of scope for this argument but:
* wasm has no aslr - making it incredibly easy to figure out where something lives - you can run the same wasm payload again and again again and get the same result, for instance on linux:
* wasm can write directly to 0x0 - meaning I can just initialize variables to all sorts of random bad values (eg: overriding functions that might return an expected value or perhaps I have a user_id from a database or worse an 'admin' flag or something, these are just a few examples - it gets way worse in other languages that depend on this)
* wasm can easily overflow buffers (despite claims to the contrary) - that claim really should be taken down from the website - it is totally false
* doesn't have the concept of read-only memory
These are all issues that linux hasn't really had to deal with for 20? years now.
Many systems don't employ ASLR. There's a long history of very smart people questioning if ASLR is even a good deterrent. This dissent even extends to KASLR. ASLR is also not unbreakable, don't fool yourself about this.
Also, ASLR only works if your code is compiled as position-independent, which isn't always the case. There's also (sometimes) a performance hit with it.
Finally, I don't see how the ASLR argument here is anything but moving the goalposts away from your original argument.
> wasm can write directly to 0x0
Again, this is not a guarantee, either. Nothing is stopping an OS from mapping in something at virtual page 0 either. It's just a specification from C (perhaps before) that's been adopted by a lot of other languages, so much so that OS developers tend not to map anything there as a rule.
On many Microcontrollers for example, 0x0 is usually a perfectly valid memory address, and not allowing something there means you lose at least <required alignment> bytes of usable memory.
> wasm can easily overflow buffers
Not sure what you mean by this.
> that claim really should be taken down from the website
Can you quote the claim? Bonus points for linking to it so we have some context.
> doesn't have the concept of read-only memory
So? Again, as many others have pointed out, neither do OSes unless you can somehow guarantee they're mapping memory in with an MMU or MPU with flags lacking X/W. You're still trusting the OS to do it right, the linker to do it right, and the language to place symbols in the right sections.
> These are all issues that linux hasn't really had to deal with for 20? years now.
> So? Again, as many others have pointed out, neither do OSes unless you can somehow guarantee they're mapping memory in with an MMU or MPU with flags lacking X/W. You're still trusting the OS to do it right, the linker to do it right, and the language to place symbols in the right sections.
If you don't care/understand about the concept of read-only memory I'm not going to be able to convince you of anything. I'll just state that there's a few tens of thousands of people descending on Vegas next week and the longer the wasm community doesn't deal with these issues the louder those people are going to get.
If I understand correctly, the attack vector here is that if a buffer overrun lets you modify a function pointer, you could replace that function pointer with another pointer to have the program execute different code. As you say, this is hard in native linux programs because of ASLR. You need a pointer to some code thats loaded in memory and you need to know where it is.
In wasm, the "pointer" isn't a pointer at all. indirect_call takes an index into the jump table. Yes, this makes it easier to find other valid function pointers. But wasm also has some advantages here. Unlike in native code, you can't "call" arbitrary locations in memory. And indirect_call is also runtime typechecked. So you can't call functions with an unexpected type signature. Also (I think) the jump table itself can't be edited by the running wasm module. So there's no way to inject code into the module and run it. You only have access to call the already-loaded functions that are in the jump table. And amongst those, you can only substitute one with a matching type signature.
I agree - ASLR could help here. But it still doesn't sound super easy to exploit. That said, it might be much easier to exploit this with wasi when the function table is filled with unix syscalls.
You're probably right about this. To be clear, it means if pointers are set to 0 then dereferenced, the program might continue before crashing. And the memory around 0 may be overwritten by an attacker. How bad this is in practice depends on the prevelance of use-after-free bugs (common in C / C++) and what ends up near 0 in memory. In rust, these sort of software bugs seem incredibly rare. And I wouldn't be surprised if wasm compilers for C/C++ start making a memory deadzone here - if they aren't doing that already.
- wasm can easily overflow buffers
Sure, but so can native C code. And unlike native code, wasm can't overflow buffers outside of the data section. So you can't overwrite methods or modify the memory of any other loaded modules. So on net, wasm is still marginally safer than native code here. If you're worried about buffer overflows, use a safer language.
- wasm doesn't have the concept of read-only memory
Interesting! I can see this definitely being useful for system libraries like mmap. This would definitely be nice to have, and it looks like the wasm authors agree with you.
> If the compiler says - hey stick this in .rodata and the wasm interpreter says I don't know what that is then that memory safety feature/contract gets broken.
In WASM terms this is called "multiple memories" and it's up to compilers to generate code that works as you'd expect (eg have a "read only data" memory segment and compile the code that references it correctly).
> The compiler merely tells the operating system what should be done - it can't really enforce it.
I really don't know how to parse this statement. `.rodata` is not a guarantee, you can almost always change the memory protection of pages your program owns, on almost all OSes (including Windows).
Rust also doesn't make guarantees about this stuff, either. `.rodata` is more borne out of the need to put certain data in certain areas of flash memory depending on the platform. It's also a feature of ELF, not Rust. It's also not even required. The flags specify how the memory is used by the application, not how it's meant to be mapped (though usually the ELF loader will map it exactly how it's specified to be used).
I'm writing an OS. One of the stage 1 bootloader tasks is to load the main kernel as a module. It's in ELF format. I can choose how the ELF segments are mapped into memory, and as long as they're mapped 1) in the right places and 2) with at least the access flags they're specified to use, the program will work correctly.
Further, with Rust in particular, unless a safety contract has been violated within an `unsafe{}` block, I am guaranteed as a contract of the compiler (barring compiler bugs) that anything marked as `const` and thus most likely put into a RO segment will not be written to by the application. That is not a guarantee C can make, and thus not anything an OS can rely on, anyway.
Again, I don't really see your point. I think you're conflating why operating systems have memory protection flags. They're there to help, not to guarantee. You can only guarantee so much, and if you're writing code like what you have in your C example, then the guarantees fly out the window.
I don't see how it's WASMs job to check memory read/write correctness for you. The _whole point_ is to make it as fast as possible whilst being as much of a sandbox as possible - "sandbox" here meaning your application should operate exactly how the language (not WASM) specifies it to work, whereas the environment is locked down to allow for only certain operations and accesses to resources (which has nothing to do about language specifications).
Thus, unless I'm completely missing your point, I disagree with your suggestion that WASM is doing something inherently wrong here.
---
To answer your question directly, barring using something like Rust (which I would recommend people do anyway as it's just a good language regardless of its safety guarantees, but that's a personal and subjective opinion), don't incur undefined behavior in C, and don't rely on the underlying system, if any, to protect you against UB.
Nope, that's just UB, so you cannot know what the code will do. Case in point, running it on linux (compiled via clang) doesn't trigger a segvault for me:
To be fair, that's because it was optimized out. Mark it as volatile and try again. It will almost definitely segfault. The other points still stand, though - that's just because Linux is forced to respond to a page fault exception and chooses to signal that to the process.
What is the security issue here? In any case memory references don't work like that in web assembly: they are objects the runtime manages. This isn't assembly, despite the name. Your pointer UB will be inside the sandbox and nowhere else.
This isn't about escaping the vm. It is about changing how the program works inside. Wasm also does not have the concept of read only memory which is another massive security hole.
"even if" you can change the program (and like some other comments indicated, I'm fairly certain that's not how linear memory in WASM works), WASM still runs on a "capability model": It has access to only certain file handles, sockets, etc. that are provided declaratively.
I was once a huge detractor and naysayer of WASM, but now I'll readily admit that this is the way security "always should have been done".
In this case we aren't talking about changing some code in a .js file through a RFI attack. We are talking about changing the behavior of how the node.js interpreter would run the .js file because at the end of the day the interpreter itself is what is getting ran by the os.
No, you would be changing how the wasm module operates in its own address space. You aren't changing how the VM runs. As far as I know wasm never made any promise against self-modifying code and I don't even understand the threat model you think exists since everything is isolated.
If you have specific security concerns, you should be showing actual attacks against wasm runtimes or somehow show that the security model of wasm as a whole will always reduce to an insecure configuration. What you have shown is something that is "exploitable" on pretty much every architecture I know of.
I really don't know how to parse this comment but I'll try to be charitable.
How do you propose WASM code is able to change the WASM interpreter behavior?
Could you be misunderstanding WASM's memory model? Are you aware that WASM code can only access explicitly instantiated userspace JS linear memories? Are you aware these memories are just ArrayBuffers in JS userspace?
Again, apologies for the bluntness, but the people who have linked those very links as responses to your ranting are pointing out that you are incorrect about a lot of things.
https://github.com/WebAssembly/wasi-sdk
It comes with a libc implemented in WASM.