I’ve been writing software in C since 1991. While that was shortly after the ratification of the C89/90 standard, it took at least a few years for most compilers to catch up. So my formative years in C were spent developing code for compilers that weren’t yet standards conformant. Certain features were hit and miss. Since much of my development work involves C, I try to keep up with new developments. I freely admit that I don’t know it all; and every few years someone teaches me a new trick or hack that makes life just a little bit easier. Recently, I found a major gap in my understanding of one important aspect of the language.
In this article, I hope to explain some semantics in the C language that don’t seem to be widely understood. Surely, some of you will know this information, but in seeking an explanation for the issue I was seeing, I could not find a single explanation or example that would suggest these rules until I dug deep into the standard.
For quite some time now, I’ve been engaged in a project involving modifying a very large open-source code-base for a tool-suite targeting developers of high-assurance software. The code quality has been very impressive; and much of my work so far has been porting, reviewing, and documenting the code. A few days ago, I started on a new section of the suite and became very concerned about the defects I began to see. In fact, I was seeing the same two defects over and over. Oddly, this code seemed to work fine. At first I thought that the developers must have been playing on some kind of implementation defined behavior, then I realized that this code had been successfully used on at least 3 different compilers, and had been proven on more than a half-dozen platforms. Clearly there was something to learn here.
The “issue” I had been seeing was that functions were directly returning typedef’d structs; and the return value was being used to initialize equivalent structures in the calling functions. This went against what I thought I knew about the legalities of working with structs in C. I scoured the web looking for examples or tutorials that would explain it, but I found none. Assembly language listings showed that correct code was being generated, though it was less efficient than I would have expected. Stymied, I dug into my trusty copy of IEC-ISO 9889:1999. The standard gave no comparable examples either, but lack of example does not make it illegal, and with bit more digging I was able to figure out what was going on, and why it was in fact legal and well-formed C source code.
The information I sought was scattered among several sections. Rather than quoting these diverse bits and trying to build a case, I will simply provide my interpretation with relevant examples.
Simple Rules for structs
- Structs can always be initialized by a brace enclosed initializer list:
struct a {int x, y, z; char a[3]; char b;} foo = { 6, 7, 8, "jk", 'L' };
- Structs may have their elements selectively or partially initialized using dot notation:
struct a gee = { .y = 5, .a = "hi" };
- Within functions, structs can always be assigned to equivalent structs:
void fnA( void ) { struct a bar; bar = foo; /* ... do more here ... */ } - For auto-storage (stack-based) structs and unions, initialization may be performed by any expression that returns a compatible struct or union type (i.e. initialization by copy):
void fnB( void ) { struct a snaf = foo; /* ... do more here ... */ }
- Structs used as function parameters and returns are passed by value (i.e. copy semantics are used):
struct a myfunc( struct a arg1 ) { return arg1; } int main( int argc, char** argv ) { struct a golly = myfunc( gee ); /* ... do more here ... */ return 0; }
It should be noted that the last example is very inefficient since, disregarding any possible optimizations, gee will first be copied to the parameter, then to the result, then to golly.
Though I am now sure that the code I was seeing is legal; given the inefficiencies of copying structures and the large number of structs being set, I believe the originators of the application I'm dealing with would have significantly improved performance by using a pass-by-reference (pointer) strategy. While I have worked with structures in countless programs, I tend not to do direct struct-to-struct assignments, preferring to use memcpy or, for small stable (unlikely to change) structures, direct member to member copies.
It's been more than 20 years since the first ANSI C standard added struct-to-struct copying to C. It seems that many of the above implications of this change have gone unnoticed by most C developers. Perhaps I'm unique, and information in this article is a bizarre gap in my education; but the lack of information in books and on the internet leads me to believe that I'm not alone.

Comments