c++ - ##: "concatenation vs. juxtaposition" full dissertation...
- dan (55/84) Dec 02 2003 No. The preprocessor does not "insert spaces" *ever*. At this point in
(straight from boost email forum; just pasting it below...) ...............................Even if 'concatenation' per-se is not called for, and against the Standard, could it be that the "." (dot) relieves the preprocessor from responsibility for adding a space at the end of the preceding string (since the dot already acts as a kind of 'separator'..)?No. The preprocessor does not "insert spaces" *ever*. At this point in translation, the preprocessor is operating on preprocessing tokens, not characters. There is a big difference between a lack of whitespace and concatenation. The first simply has adjacent preprocessing tokens, while the second forms a new preprocessing token. E.g. #define ID(x) x #define MACRO(a, b) ID(a)b MACRO(+,+) results in two immediately adjacent '+' preprocessing tokens. There is no intervening whitespace. Whether or not whitespace exists is irrelevant for all purposes *except* stringizing and the creation of an <h-char-sequence>. A preprocessor that does text stream -> text stream must insert whitespace in order to avoid the errant retokenization that would occur when the result gets reprocessed by some other tool (such as a C or C++ compiler). However, that is just a hack to make it work similarly in the presence of retokenization which does not exist in the phases of translation.I just find it hilarious how the boost libraries work with so work with DM. I wouldn't be surprised at all that they'd be all wrong; --won't be the first time that everybody is wrong, but this bug may be just about ready for acceptance by ANSI/ISO/whatever... ;-)I wish that arbitrary token-pasting was well-defined. However, the example given doesn't even make sense (per se). The reason is that token-pasting occurs prior to rescanning, so a construction like this: #define #define B(x) x The period (.) gets concatenated to right parenthesis before the expansion of B(x). Even if arbitrary token-pasting was well-defined, the argument 'x' could contain any amount of whitespace, and cause the construction to not work properly: #define EMPTY() A(file EMPTY()) // file .h In other words, there are only certain points in which whitespace is removed or when whitespace is condensed to only a single whitespace. This is not one of them. As I said before, however, this kind of problem only occurs during stringizing and during the creation of a header-name preprocessing token of the form <h-char-sequence>. Further, there is only one sure-fire way to guarantee that no whitespace exists and that is to concatenate to a placemarker preprocessing token ala C99: #define NO_LEADING(x) NO_LEADING_I(, x) #define #define NO_TRAILING(x) NO_TRAILING_I(, x) #define #define NO_LEADING_AND_TRAILING(x) \ NO_LEADING(NO_TRAILING(x)) \ /**/ ..but that is not currently well-defined in C++ as it is in C99.-------------------------------------------------------------- ------------------------ >The separator inserted by dmc is to make the preprocessor work right, it >isn't easilly removed. I don't really understand why boost seems to want to >rely on the was >added to Standard C specifically to move away from that practice.Juxtaposition is not concatenation, and a preprocessor that is operating at the character level rather than the preprocessing token level at this point in translation has to jump through hoops to mimic the behavior the actual phases of translation. This is not a kludge on Boost's side, this is a preprocessor implementation kludge revolving around textual representation at a phase of translation where it doesn't exist.-------------------------------------------------------------- ------------------------ Maybe if someone could paste the section of the Standard dealing with this, I'd much appreciate it. Yours. danThere is no section of the standard that *ever* says whitespace should be inserted. There are only places where it says whitespace should be removed or adjacent whitespace should be condensed. Regards, Paul Mensonides
Dec 02 2003
I appreciate your doing this. I still think, however, that tokens are "dan" <dan_member pathlink.com> wrote in message news:bqjemg$m3b$1 digitaldaemon.com...(straight from boost email forum; just pasting it below...) ...............................theEven if 'concatenation' per-se is not called for, and against the Standard, could it be that the "." (dot) relieves the preprocessor from responsibility for adding a space at the end of the preceding string (since the dot already acts as a kind of 'separator'..)?No. The preprocessor does not "insert spaces" *ever*. At this point in translation, the preprocessor is operating on preprocessing tokens, not characters. There is a big difference between a lack of whitespace and concatenation. The first simply has adjacent preprocessing tokens, whilesecond forms a new preprocessing token. E.g. #define ID(x) x #define MACRO(a, b) ID(a)b MACRO(+,+) results in two immediately adjacent '+' preprocessing tokens. There is no intervening whitespace. Whether or not whitespace exists is irrelevantfor allpurposes *except* stringizing and the creation of an <h-char-sequence>. A preprocessor that does text stream -> text stream must insert whitespaceinorder to avoid the errant retokenization that would occur when the resultgetsreprocessed by some other tool (such as a C or C++ compiler). However,that isjust a hack to make it work similarly in the presence of retokenizationwhichdoes not exist in the phases of translation.exampleI just find it hilarious how the boost libraries work with so work with DM. I wouldn't be surprised at all that they'd be all wrong; --won't be the first time that everybody is wrong, but this bug may be just about ready for acceptance by ANSI/ISO/whatever... ;-)I wish that arbitrary token-pasting was well-defined. However, thegiven doesn't even make sense (per se). The reason is that token-pastingoccursprior to rescanning, so a construction like this: #define #define B(x) x The period (.) gets concatenated to right parenthesis before the expansionofB(x). Even if arbitrary token-pasting was well-defined, the argument 'x'couldcontain any amount of whitespace, and cause the construction to not work properly: #define EMPTY() A(file EMPTY()) // file .h In other words, there are only certain points in which whitespace isremoved orwhen whitespace is condensed to only a single whitespace. This is not oneofthem. As I said before, however, this kind of problem only occurs during stringizing and during the creation of a header-name preprocessing tokenof theform <h-char-sequence>. Further, there is only one sure-fire way to guarantee that no whitespaceexistsand that is to concatenate to a placemarker preprocessing token ala C99: #define NO_LEADING(x) NO_LEADING_I(, x) #define #define NO_TRAILING(x) NO_TRAILING_I(, x) #define #define NO_LEADING_AND_TRAILING(x) \ NO_LEADING(NO_TRAILING(x)) \ /**/ ..but that is not currently well-defined in C++ as it is in C99.at the-------------------------------------------------------------- ------------------------ >The separator inserted by dmc is to make the preprocessor work right, it >isn't easilly removed. I don't really understand why boost seems to want to >rely on the was >added to Standard C specifically to move away from that practice.Juxtaposition is not concatenation, and a preprocessor that is operatingcharacter level rather than the preprocessing token level at this point in translation has to jump through hoops to mimic the behavior the actualphases oftranslation. This is not a kludge on Boost's side, this is a preprocessor implementation kludge revolving around textual representation at a phaseoftranslation where it doesn't exist.removed or-------------------------------------------------------------- ------------------------ Maybe if someone could paste the section of the Standard dealing with this, I'd much appreciate it. Yours. danThere is no section of the standard that *ever* says whitespace should be inserted. There are only places where it says whitespace should beadjacent whitespace should be condensed. Regards, Paul Mensonides
Dec 02 2003
I appreciate your doing this. I still think, however, that tokens areI'm having a hard time understanding his explanation. I think that what he means is that concatenation is not what is intended; --though that is not to mean that an extra space is. I always thought that the preprocessor did pure text substitution; but he seems to violate the initial tokenization. But having tokens with a dot in between like in 'something.else' the tokens are well separated already, adding a white space does nothing of value to it. Whereas with 'something else' it needs to preserve the white space, of course. And so in the case you need to violate initial tokenization to concatenate But in the case of #define a(x) x a(something).else turning that into something.else is not concatenation, nor juxtaposition, for that matter, because no tokens are in fact merging. So, at the text level you might call it concatenation, but at the token level it isn't. But then I'm not sure what happens if the preprocessor encounters, a(something)else Then we're in real trouble... ;-) Donno what the answer is Walter, I posted the whole thing in comp.lang.c++ but no replies yet... Cheers! dan
Dec 02 2003
To my question: ............................. But then I'm not sure what happens if the preprocessor encounters, a(something)else ............................. AG replied: -----------------------------------------------------16.3.3 [cpp.concat] para 3 (my emphasis): "For both object-like and function-like macro invocations, before the replacement list is reexamined for more macro names to replace, each argument) is deleted and the preceding preprocessing token is concatenated with the following preprocessing token. *If the result is not a valid preprocessing token, the behavior is undefined*. [...]" In the case in question, ")." is definitely not a valid preprocessing token (it's two).----------------------------------------------------- there, since it would not result in an invalid token being created. And that if an invalid token were being created, the result is undefined, according to the standard, anyways. Just my take. Cheers! dan
Dec 03 2003