A note on strcpy

Code, algorithms, languages, construction...
hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: A note on strcpy

Post by hyatt » Wed Nov 27, 2013 1:28 am

User923005 wrote:
hyatt wrote: The behavior in Mavericks is NOT "undefined". You get a message that says "Abort" with no explanation as to why.
That is a perfectly good example of undefined behavior.
And the C language standard specifically says that the above exact behavior is completely acceptable and even lists it as one likely outcome.

Any time that code with undefined behavior works, it works by accident.
Now, it is possible to have a special implementation that gives implementation defined behavior for something that the standard says is undefined behavior.
An example of this is reading a hardware I/O port from a specific hardware address. Of course, this may work on a particular compiler and OS, but it is obviously not portable to other systems.

As a suggestion, go to the C language instructor at UAB and ask him what "undefined behavior" means in the C programming language. After his excellent description, I am quite sure that you will understand the situation better.

For the record, you are TALKING to "the C language instructor". I perfectly understand what "undefined" means. Sometimes it seems as though I am discussing this with a non-programmer. The point is, and always has been, Apple changed the behavior for absolutely NO good reason. You really do copy left-to-right with strcpy() because strings are defined left-to-right with a terminating null. This might catch one bug out of 1,000 that strcpy() misuse can produce. But at the same time, it breaks MANY programs, and in addition, slows down ALL programs that use strcpy() because the test slows things down. A pointless exercise. Seems a lot of Mac users agree that this was stupid.

User923005
Posts: 616
Joined: Thu May 19, 2011 1:35 am

Re: A note on strcpy

Post by User923005 » Wed Nov 27, 2013 1:29 am

hyatt wrote:
User923005 wrote: {snip}
Who is talking about memcpy? With memcpy you KNOW how many bytes you are to copy. Not so with strcpy(). Totally unrelated.
The memcpy and strcpy functions have exactly the same warning about overlap, for obvious reasons.

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: A note on strcpy

Post by hyatt » Wed Nov 27, 2013 1:30 am

User923005 wrote:Compile the following program on your favorite compiler:

#include <string.h>
#include <stdio.h>

int main(int argc, char* argv[]) {
char b[32];
strcpy(b, "123456789012345");
strcpy(b + 1, b);
printf("[%s]\n", b);
return 0;
}

For Microsoft Visual C++ in 64 bit mode, it exits with a dump and gives the following error message:
Unhandled exception at 0x000007F7AE6013C0 in bozo.exe: Stack cookie instrumentation code detected a stack-based buffer overrun.

In 32 bit mode, it exits with a core dump and gives the following error message:
Unhandled exception at 0x009A1035 in bozo.exe: 0xC0000005: Access violation writing location 0x00200000.


GCC gave me this:

dcorbit@dcorbit /q/cc
$ cat bozo.c
#include <string.h>
#include <stdio.h>

int main(int argc, char* argv[]) {
char b[32];
strcpy(b, "123456789012345");
strcpy(b + 1, b);
printf("[%s]\n", b);
return 0;
}

dcorbit@dcorbit /q/cc
$ gcc -Wall -ansi -pedantic bozo.c

dcorbit@dcorbit /q/cc
$ ./a
[1123456788012345]

Look at it carefully, is it what you expected?

Yes, but I am not doing that. I am doing this:

strcpy(str, str+3);

that will NOT produce an addressing exception or anything else. If you want to use an example, use one that is relevant. I've already pointed out that doing what you are testing is most definitely a bug due to overwriting. What I do does NOT produce that behavior. And can not possibly do so, either.

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: A note on strcpy

Post by hyatt » Wed Nov 27, 2013 1:32 am

User923005 wrote:
hyatt wrote:
User923005 wrote: {snip}
Who is talking about memcpy? With memcpy you KNOW how many bytes you are to copy. Not so with strcpy(). Totally unrelated.
The memcpy and strcpy functions have exactly the same warning about overlap, for obvious reasons.

So? They don't do the same thing. They are not passed the same arguments. They are completely unrelated. memcpy() does not even use strings, period, just copies memory to memory.

User923005
Posts: 616
Joined: Thu May 19, 2011 1:35 am

Re: A note on strcpy

Post by User923005 » Wed Nov 27, 2013 1:39 am

hyatt wrote:
User923005 wrote:
hyatt wrote: For the record, you are TALKING to "the C language instructor". I perfectly understand what "undefined" means. Sometimes it seems as though I am discussing this with a non-programmer. The point is, and always has been, Apple changed the behavior for absolutely NO good reason. You really do copy left-to-right with strcpy() because strings are defined left-to-right with a terminating null. This might catch one bug out of 1,000 that strcpy() misuse can produce. But at the same time, it breaks MANY programs, and in addition, slows down ALL programs that use strcpy() because the test slows things down. A pointless exercise. Seems a lot of Mac users agree that this was stupid.
Apple made a change that helps people find their bugs.
You should send them a thank you note.
Everyone who does not like the change needs to ask themselves, "Why don't I like it when people try to help me?"

Your code is broken.
You clearly do not understand what undefined behavior means or you would not have written your code that way.

User923005
Posts: 616
Joined: Thu May 19, 2011 1:35 am

Re: A note on strcpy

Post by User923005 » Wed Nov 27, 2013 1:46 am

hyatt wrote:
User923005 wrote:
hyatt wrote:
User923005 wrote: {snip}
Who is talking about memcpy? With memcpy you KNOW how many bytes you are to copy. Not so with strcpy(). Totally unrelated.
The memcpy and strcpy functions have exactly the same warning about overlap, for obvious reasons.

So? They don't do the same thing. They are not passed the same arguments. They are completely unrelated. memcpy() does not even use strings, period, just copies memory to memory.
Read:
https://sourceware.org/bugzilla/show_bug.cgi?id=16004
Note that it says:
Same goes for strcpy. Sometimes people do silly things, like
strcpy (some_str, some_str + strlen ("pref"));
and get data corruption.

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: A note on strcpy

Post by hyatt » Wed Nov 27, 2013 6:21 pm

User923005 wrote:
hyatt wrote:
User923005 wrote:
hyatt wrote: For the record, you are TALKING to "the C language instructor". I perfectly understand what "undefined" means. Sometimes it seems as though I am discussing this with a non-programmer. The point is, and always has been, Apple changed the behavior for absolutely NO good reason. You really do copy left-to-right with strcpy() because strings are defined left-to-right with a terminating null. This might catch one bug out of 1,000 that strcpy() misuse can produce. But at the same time, it breaks MANY programs, and in addition, slows down ALL programs that use strcpy() because the test slows things down. A pointless exercise. Seems a lot of Mac users agree that this was stupid.
Apple made a change that helps people find their bugs.
You should send them a thank you note.
Everyone who does not like the change needs to ask themselves, "Why don't I like it when people try to help me?"

Your code is broken.
You clearly do not understand what undefined behavior means or you would not have written your code that way.

I clearly DO understand what "undefined behavior" means. I, unlike yourself, apparently, ALSO understand WHY the warning about overlapping source/destination is discouraged. I, unlike yourself, am perfectly capable of avoiding that particular pitfall, which lets me use strcpy() in a way that will absolutely NOT fail.

As far as "Apple helping". Let me tell you a short story:

3-4 weeks ago, after releasing the most recent version of Crafty, I get an email stating "Bob, I can't seem to create a book with Crafty. It crashes when I type 'book create pgnfile etc..." I had made a few minor cosmetic changes here and there and thought "OK, something I changed had an unexpected side-effect." I started looking at the diff output for 23.7 vs 23.6... I went through them all, line by line. Couldn't see a thing that would cause this. Decided to test on my office box and used the "enormous.pgn" file as a test, since it has the infamous 17-level deep comments in one game that is a parsing nightmare. No problems. So I assume the report was caused by some really oddball PGN game. I asked him to send it to me. Tried it on my office box, worked perfectly. We started comparing notes as to compiler. He was using clang on a macbook. Aha, I thought, I know clang had some bugs when I got my macbook last year, which caused me to install gcc 4.7.2... I looked around the department until I found someone that had not yet updated to Mavericks (I did not know this was a mavericks issue yet). I tried his macbook, compiled, parsed the PGN and it worked perfectly. I asked for more details. He said "I"m running mavericks" (although the clang version was the same as what I had tested.)

Next I took my macbook, and tried to parse his pgn. Abort. I begin to suspect a compiler bug. I had already installed gcc 4.7.3 on my macbook after installing Mavericks, so I tried that. Failed also. Now it begins to seem that something is going on with Mavericks. After several hours of testing and debugging, I isolate it to the strcpy() I originally posted about. Several days wasted, quite a few hours wasted, just to debug a nonsensical change that Apple decided was a good idea, a change that NOBODY else has done or even considered doing, since it is not only a code-breaker for many existing programs, it is also a performance-losing idea as well.

So "I should thank Apple?" I think not. They caused me to waste a lot of time tracing something down that did not need to be broken in the first place.

You can harp on undefined behavior all you want. If you stop just mechanically repeating the same thing, and look at the REASON why the term "undefined behavior" was applied in the first place, it was ONLY for the circumstance of strcpy (a+n, a); where n is positive. It is a problem for obvious reasons. That will almost always break something because you don't reach a valid NULL string-terminator, you keep overwriting it before you reach it. Clearly bad. for positive n, absolutely no problem of any kind.

So, a lot of wasted time, for absolutely no gain in reliability, and for a definite loss in performance. I've programmed long enough I prefer to be allowed to do whatever I want, and if I pay for it somewhere along the way, fine. I presume you are one of those that prefer Pascal or Java since they don't have those damned pointers that get everyone into trouble if they do something wrong. Or Pascal doesn't allow you to re-cast something to a new type? Hmm... Even the Pascal guys realized that sometimes tricks are good and they gave us the ability to do what in C is known as a union, something that is KNOWN to be dangerous unless you know what you are doing.

Let the inexperienced use Java or Pascal. Let those that know what they are doing use C/C++. The compiler should NOT "get in the way" which it definitely has here.

Continue to preach about undefined behavior all you want. Some of us are intelligent enough to use whatever tools we have, even if it is dangerous when done incorrectly. Shoot, I even write in assembly. Something I assume you would want to see outlawed, or do you want the intel CPU to make sure that esi/edi registers can't overlap on the string instructions as well???

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: A note on strcpy

Post by hyatt » Wed Nov 27, 2013 6:24 pm

User923005 wrote:
hyatt wrote:
User923005 wrote:
hyatt wrote:
User923005 wrote: {snip}
Who is talking about memcpy? With memcpy you KNOW how many bytes you are to copy. Not so with strcpy(). Totally unrelated.
The memcpy and strcpy functions have exactly the same warning about overlap, for obvious reasons.

So? They don't do the same thing. They are not passed the same arguments. They are completely unrelated. memcpy() does not even use strings, period, just copies memory to memory.
Read:
https://sourceware.org/bugzilla/show_bug.cgi?id=16004
Note that it says:
Same goes for strcpy. Sometimes people do silly things, like
strcpy (some_str, some_str + strlen ("pref"));
and get data corruption.

You can get data corruption without overlapping strings.

char buff[256];

strcpy(buff, parsed); can wreck the world, if the string in parsed has more than 255 non-null characters. Nothing overlapping. Stupid people can do stupid things with anything, there is nothing you can do to stop it. The Pascal authors might applaud your stance. Experienced programmers would not.

User923005
Posts: 616
Joined: Thu May 19, 2011 1:35 am

Re: A note on strcpy

Post by User923005 » Wed Nov 27, 2013 9:02 pm

If I wrote code that relied on undefined behavior intentionally, I would be fired for it, and I would deserve it.
The manual says, "Don't do this."
You did it, it did something you did not like, and claimed that the tool set is stupid.
I don't know what else to say about it.

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: A note on strcpy

Post by hyatt » Thu Nov 28, 2013 1:40 am

Then I suppose I will simply consider myself privileged to have interacted with the "perfect programmer." One that has NEVER stepped over an array bound, because that produces "undefined behavior." One that has NEVER done computation with a variable that was not first initialized, because that produces "undefined behavior." How very fortunate and skilled you are. And you clearly have done none of those since you were not fired, and if you did something with undefined behavior you would be, according to your statement.

Breaking working code for no good reason whatsoever is stupid and arrogant. Always will be. See Torvald's comments to the glibc folks regarding memcpy().

BTW a much BETTER solution would be to detect the overlap condition, which they obviously wasted the execution cycles to do, and then simply call memcopy(). How's that for a user-friendly change? Actually FIX the problem? I suppose that doesn't fit your style, however. It would have been mine if I were working on the library and someone demanded the overlap check in the first place. That way it would break NOTHING. No complaints. No mysterious failures.

But not Apple...

Post Reply