This is documented as not working[1] in all but the most recent GCC versions. E.g. gcc-5.4 documents:
> "The aligned attribute can only increase the alignment; but you can decrease it by specifying packed as well. See below."
but gcc-6.2 documentation adds:
> "When used as part of a typedef, the aligned attribute can both increase and decrease alignment, and specifying the packed attribute generates a warning."
FWIW, Clang has supported reducing alignment in this fashion for a few years now.
[1] it's possible (likely, even) that it has been working for a while but the documentation was only recently brought up to date; I haven't investigated too carefully.
Must have been supported for a while because this stuff is even used in Linux:
#define __packed2__ __attribute__((packed, aligned(2)))
/*
* SystemV FS comes in two variants:
* sysv2: System V Release 2 (e.g. Microport), structure elements aligned(2).
* sysv4: System V Release 4 (e.g. Consensys), structure elements aligned(4).
*/
struct sysv2_super_block {
__fs16 s_isize; /* index of first data zone */
__fs32 s_fsize __packed2__; /* total number of zones of this fs */
Note the use of `packed`. `packed` causes the alignment to be set as small as possible (1 byte). The aligned(2) that follows then increases the alignment from 1 byte to 2 bytes.
If the strictest (largest) alignas on a declaration is weaker than the alignment it would have without any alignas specifiers (that is, weaker than its natural alignment or weaker than alignas on another declaration of the same object or type), the program is ill-formed.
Also, using alignas on pointer variable will probably specify alignment of the pointer itself, not the pointed object. I suppose you can create a wrapper class around int with alignas(32) and specify the pointer as pointing to that, but this is going to be nasty given that you can't derive from int and have to write all those operators by hand.
Not sure if this is what you are asking: Last time I tried alignment in Rust I worked around the lack of explicit alignment support by adding a zero length array of the correct size to the end of the struct. Not sure if alignment support from proper attributes has landed yet.
While this is true, it's a weird angle that's not itself super well defined. That is, the definition of repr(packed) implies that the alignment will always be there without it...
The final element of the struct is a zero length array of elements of size N bytes. So that element isn't padding, it has size 0! It's pure hinting. I'm not sure why or how this works under the hood I'm afraid. I used it successfully to call a library with pretty strict alignment requirements (Intel Embree).
The zero length array still must be aligned correctly. You can create a pointer to it (e.g. by taking &something._alignment[..] to create a slice) and pointers must have correct alignment, so it follows that a zero length array has the same alignment requirements as a longer array. So padding must be inserted in your struct so that the address of the zero-length array is correctly aligned.
Hmm, I don't understand (sorry). The pointer to the internal array must be aligned. But how does the element size play into the padding. If we take these two examples?
Types in Rust have alignment as in C, and bad stuff may happen if you use unsafe code to violate alignment.
It seems that at present, if you want to generate an unaligned SSE2 load/store, you have to rely on modern memcpy (spelled copy_nonoverlapping in Rust) optimization. See the first reply to
https://internals.rust-lang.org/t/unaligned-simd-sse2-in-par...
Nitpick: memcpy is string.h not stdlib.h, the type was uint32_t not uint64_t and you are making some unwarranted assumptions about sizeof(uint64_t), not to mention that the existence of this type is merely implementation defined ;)
Deal breaker: your memcpy invocation requires a sufficiently smart compiler to convert into normal unaligned load on x86 and seems to prevent GCC autovectorization. In this case OP actually didn't want vectorization, but in general it happens that such workarounds confuse compilers and produce worse code.
I'm not sure I understand your deal breaker. For the platform he was targeting it produces optimal code, for other platforms it's merely slower (but not specifically slower, since the compiler is likely not a great optimizer across the board).
Vectorization is in general not applicable here since it usually requires aligned memory... not all implementations do, but most. In any case, benchmarking is more appropriate than armchair optimizing.
You are writing convoluted code and hoping that your compiler will figure it out and convert it internally to the form I posted. Sometimes it does, sometimes it doesn't. In this case it generates reasonable code but doesn't vectorize it for some reason. WTF.
I prefer to just add alignment specification and move on, assuming I don't care about portability. If portability matters, reread my original post ;)
It's not convoluted. It's actually clear and well-defined making it easier to reason about.
I'd call compiler specific alignment attributes more arcane, convoluted, and susceptible to future bugs.
Vectorization isn't a panacea. You need to benchmark to be sure, lacking that I expect GCC to be better at optimizing code than you. If you disagree, please manually write a vectorized one that handles non-aligned addition and post your results :)
You also need the may_alias attribute to prevent other problems.
Data written as int may be read as char, but going the other way is usually a standards violation. (an exception being if you had used char to implement a memcpy-like function, but in that case you should expect compiler bugs to bite you)
Implementing `memcpy` can be a challenge of its own. Gcc has a tendency to replace memcpy implementations with a call to... memcpy. Which is quite smart, just not in that particular context...
Interesting, but is this a real world problem or just a standard nobody follows? I'm under impression that lots of networking and storage code running in the wild casts back and forth between char arrays and other types without giving it a thought.
And yes, as for other casts, I'm well aware that they cause problems.
Specifying alignment might make the crash go away. Like the "disable SSE" solution of the post, it does not make the undefined behavior go away and is not actually a solution.
If certain compilers provide functionality like __aligned__ or __packed__ it isn't undefined anymore on these particular compilers. I was specific about this being a GCC/clang thing (GCC to be exact, but somebody else confirmed clang).
__aligned__/__packed__ allow you to control alignment and packing, useful for e.g. avoiding false sharing in multithreaded code, and potentially useful if you're doing some of the type aliasing allowed by the standard - namely, type punning to/from char.
Unless __aligned__ or __packed__ is documented as also relaxing the strict aliasing rules - if such documentation exists, I haven't found it - there's still undefined behavior from the strict aliasing violation by type punning size_t <-> uint32_t.
Bad alignment is not the only possible "bad" optimization compilers can apply that relies on this being undefined behavior. For example, they may mistakenly assume a size_t value in memory can be cached in a register, and not reloaded from memory when it calls code that uses only uint32_t* pointers, because in a standards compliant program these cannot be used to modify size_t values.
Have you actually tried and verified that? Reading the docs I've had an impression that it says that the "aligned" is for the alignment inside of the structures, and you here declare the simple type?
I wouldn't have posted this without checking. And as far as documentation goes, they say this attribute applies to "types" and the example they give is
struct S { short f[3]; } __attribute__ ((aligned (8)));
where it actually is used outside of a struct so that this whole 6B thing is aligned to 8B instead of the default 2B.
movdqu, obviously. And the code is shorter now, I guess it drops this part responsible for processing the first 1~3 elements of the array to reach 16B alignment.
Which is a feature! If you use `char` instead of `uint8_t` your program would still compile on a system that doesn't have `uint8_t`, but it is likely to do something entirely unexpected. At least when you use `uint8_t` you are warned at compile-time that your program is broken.
`uint8_t` cannot exist on any platform with `CHAR_BIT > 8`. Such platforms are non-existent in the mainstream CPU world, but surprisingly common in the DSP world.
And yes, it's a compile-time failure, which is great. My comment should not be read as criticism at all (though I would likely use `uint16_t`, as the OP says the code is intended to work with [presumably aligned] 16-bit words).
And I'm not sure how exactly this code will fail on architectures which don't support unaligned uint32_t.