STfindToken
Routine
-
char *STfindToken (const char String[], const char Delims[],
const char Quotes[], char Token[], int WSFlag,
int Maxchar);
Purpose
-
Find the first token string in a string
Description
This routine finds the first token string in a given string. A token string
is delimited by a character from a set of delimiter characters. If no
delimiter is found, the token string is the entire input string. A flag
determines how white-space characters are treated. Optionally, quote
characters can be specified to allow separators to be treated as ordinary
characters between paired quote characters.
The processing of white-space (as defined by isspace) is controlled by the
WSFlag flag.
-
WSFlag = 0:
-
This mode treats white-space characters as ordinary characters.
-
WSFlag = 1:
-
This mode causes leading and trailing white-space in the token string to
be stripped off before the token string is returned.
-
WSFlag = 2:
-
This mode causes white-space to serve as an additional delimiter.
There are two subcases here. If the Delims string is empty, then
only white-space serves as a delimiter. However leading and trailing
white-space in the input string does not serve as a delimiter, i.e.
only the characters after the initial white-space are returned as token
characters. The second subcase occurs if the Delims string is not empty.
Now the delimiter can be either white-space alone, or a character from
the Delims string with optional white-space surrounding the delimiter
character. The white-space serving as a delimiter or surrounding the
delimiter character is stripped off before returning the token string.
-
Example:
-
String = "XXX : yyy";
Delims = ":";
WSFlag = 1;
STfindToken (String, Delims, Token, WSFlag, 100);
On return, Token contains the string "XXX".
Quote characters are specified in pairs, a left quote character and a right
quote character for each pair. When multiple pairs of quote characters are
specified, the appearance of the first left quote character disables the
quoting interpretation of characters from other pairs of quote characters.
While within the scope of this quote character, only the left quote and the
right quote characters from the active pair will be interpreted as special
characters.
-
Quoting strings:
-
The left and right quote characters are the same. After an initial left
quote character, a second quote character is interpreted as a right quote
character. This means that nested quotes cannot occur.
-
Examples:
-
1: |"def ghi"|. Quote characters |""| and white-space as a delimiter.
The initial quote character '"' disables the interpretation of the blank
in the string as a token delimiter. The string returned in the token
includes the quote characters.
-
2: |"ab cd""ef gh"|. Quote characters |""| and white-space as a delimiter.
This string would be returned in its entirety, including the quote
characters.
-
Grouping expressions:
-
In this case, the left and right quote characters are different. Nesting
of groups can occur.
-
Example:
-
|f1 (x, f2 (y,z)), f3(x)|. Quote characters |()|, |,| as a delimiter.
The commas occurring inside the parentheses do not serve as delimiters.
Also, nesting is observed, so that only the second |)| character matches
the first |(|. The string returned is |f1 (x, f2 (y,z))|.
-
Notes:
Unmatched quote characters are not reported. The reporting of such errors is
left to whatever routine interprets the token strings.
This routine always returns a Token string, even if it is of zero length. If
the input is zero length, or all white-space (for WSFlag!= 0), such an input
string may be interpreted as having no token string. To enforce such an
interpretation, the calling routine should check for this case.
There is no escape mechanism to allow quote characters to appear in a string
without acting as quote characters. Note however the behaviour described
above, viz. once in quotes, quotes from other than the active quote pair are
treated as ordinary characters.
A typical program snippet for using this routine is
p = string;
while (p != NULL) {
p = STfindToken (p, Delims, Quotes, Token, WSFlag, Maxchar);
... process Token ...
}
Parameters
-
<- char *STFindToken
-
Pointer to the character after the delimiter. This pointer is set to
NULL if all tokens have been parsed from the input string. If not NULL,
this value is suitable for feeding back into this routine to extract the
next token string.
-
-> const char String[]
-
Input character string. If String is the NULL pointer, this routine
returns NULL and sets Token to be the empty string.
-
-> const char Delims[]
-
Character string specifying delimiter characters. White-space may also
act as a delimiter depending on the setting of the WSFlag flag. The
Delims string can be zero length, indicating either that no delimiters
are to be recognized, or that only white-space is to be recognized as a
delimiter (if WSFlag is set appropriately).
-
-> const char Quotes[]
-
Character string specifying pairs of quote characters (the left and
right quote characters). In the part of the input string between a
matched pair of quote characters, any other characters, including quote
characters other than from the active pair, are treated as ordinary
characters. Up to 5 pairs of quote characters can be specified. A zero
length string indicates that quote characters should not to be
recognized.
-
<- char Token[]
-
Output token string. This string has at most Maxchar characters, not
including the terminating null character. This string is always null
terminated. The token string may have leading and trailing white-space
stripped off if WSFlag is set appropriately. If the actual token string
is more than Maxchar characters long, only Maxchar characters are written
to Token, and a warning message is printed. Token can be occupy the same
string space as String. In that case Token overwrites the beginning
part of String.
-
-> int WSFlag
-
Flag controlling the interpretation of white-space characters.
0 - White-space characters are treated as ordinary characters
1 - Leading and trailing white-space in the token string are
stripped off.
2 - White-space serves as an additional delimiter
-
-> int Maxchar
-
Maximum number of characters (not including the trailing null character)
to be placed in Token.
Author / revision
P. Kabal
/ Revision 1.36 2003/05/09
See Also
STkeyMatch,
STkeyXpar
Main Index libtsp