http://xml.apache.org/http://www.apache.org/http://www.w3.org/

Home

Readme
Download
Installation
Build

API Docs
Samples
Schema

FAQs
Programming
Migration

Releases
Bug-Reporting
Feedback

Y2K Compliance
PDF Document

CVS Repository
Mail Archive

API Docs for SAX and DOM
 

Main Page   Class Hierarchy   Alphabetical List   Compound List   File List   Compound Members   File Members  

XMLString.hpp

Go to the documentation of this file.
00001 /*
00002  * The Apache Software License, Version 1.1
00003  *
00004  * Copyright (c) 1999-2001 The Apache Software Foundation.  All rights
00005  * reserved.
00006  *
00007  * Redistribution and use in source and binary forms, with or without
00008  * modification, are permitted provided that the following conditions
00009  * are met:
00010  *
00011  * 1. Redistributions of source code must retain the above copyright
00012  *    notice, this list of conditions and the following disclaimer.
00013  *
00014  * 2. Redistributions in binary form must reproduce the above copyright
00015  *    notice, this list of conditions and the following disclaimer in
00016  *    the documentation and/or other materials provided with the
00017  *    distribution.
00018  *
00019  * 3. The end-user documentation included with the redistribution,
00020  *    if any, must include the following acknowledgment:
00021  *       "This product includes software developed by the
00022  *        Apache Software Foundation (http://www.apache.org/)."
00023  *    Alternately, this acknowledgment may appear in the software itself,
00024  *    if and wherever such third-party acknowledgments normally appear.
00025  *
00026  * 4. The names "Xerces" and "Apache Software Foundation" must
00027  *    not be used to endorse or promote products derived from this
00028  *    software without prior written permission. For written
00029  *    permission, please contact apache\@apache.org.
00030  *
00031  * 5. Products derived from this software may not be called "Apache",
00032  *    nor may "Apache" appear in their name, without prior written
00033  *    permission of the Apache Software Foundation.
00034  *
00035  * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
00036  * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
00037  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
00038  * DISCLAIMED.  IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
00039  * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
00040  * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
00041  * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
00042  * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
00043  * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
00044  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
00045  * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
00046  * SUCH DAMAGE.
00047  * ====================================================================
00048  *
00049  * This software consists of voluntary contributions made by many
00050  * individuals on behalf of the Apache Software Foundation, and was
00051  * originally based on software copyright (c) 1999, International
00052  * Business Machines, Inc., http://www.ibm.com .  For more information
00053  * on the Apache Software Foundation, please see
00054  * <http://www.apache.org/>.
00055  */
00056 
00057 /*
00058  * $Log: XMLString.hpp,v $
00059  * Revision 1.2  2002/02/20 18:17:02  tng
00060  * [Bug 5977] Warnings on generating apiDocs.
00061  *
00062  * Revision 1.1.1.1  2002/02/01 22:22:16  peiyongz
00063  * sane_include
00064  *
00065  * Revision 1.26  2001/08/10 16:23:06  peiyongz
00066  * isHex(), isAlphaNum(), isAllWhiteSpace() and patternMatch() Added
00067  *
00068  * Revision 1.25  2001/07/06 20:27:57  peiyongz
00069  * isValidaQName()
00070  *
00071  * Revision 1.24  2001/07/04 14:38:20  peiyongz
00072  * IDDatatypeValidator: created
00073  * DatatypeValidatorFactory: IDDTV enabled
00074  * XMLString:isValidName(): to validate Name (XML [4][5])
00075  *
00076  * Revision 1.23  2001/06/13 14:07:55  peiyongz
00077  * isValidaEncName() to validate an encoding name (EncName)
00078  *
00079  * Revision 1.22  2001/05/23 15:44:51  tng
00080  * Schema: NormalizedString fix.  By Pei Yong Zhang.
00081  *
00082  * Revision 1.21  2001/05/11 13:26:31  tng
00083  * Copyright update.
00084  *
00085  * Revision 1.20  2001/05/09 18:43:30  tng
00086  * Add StringDatatypeValidator and BooleanDatatypeValidator.  By Pei Yong Zhang.
00087  *
00088  * Revision 1.19  2001/05/03 20:34:35  tng
00089  * Schema: SchemaValidator update
00090  *
00091  * Revision 1.18  2001/05/03 19:17:35  knoaman
00092  * TraverseSchema Part II.
00093  *
00094  * Revision 1.17  2001/03/21 21:56:13  tng
00095  * Schema: Add Schema Grammar, Schema Validator, and split the DTDValidator into DTDValidator, DTDScanner, and DTDGrammar.
00096  *
00097  * Revision 1.16  2001/03/02 20:52:46  knoaman
00098  * Schema: Regular expression - misc. updates for error messages,
00099  * and additions of new functions to XMLString class.
00100  *
00101  * Revision 1.15  2001/01/15 21:26:34  tng
00102  * Performance Patches by David Bertoni.
00103  *
00104  * Details: (see xerces-c-dev mailing Jan 14)
00105  * XMLRecognizer.cpp: the internal encoding string XMLUni::fgXMLChEncodingString
00106  * was going through this function numerous times.  As a result, the top hot-spot
00107  * for the parse was _wcsicmp().  The real problem is that the Microsofts wide string
00108  * functions are unbelievably slow.  For things like encodings, it might be
00109  * better to use a special comparison function that only considers a-z and
00110  * A-Z as characters with case.  This works since the character set for
00111  * encodings is limit to printable ASCII characters.
00112  *
00113  *  XMLScanner2.cpp: This also has some case-sensitive vs. insensitive compares.
00114  * They are also much faster.  The other tweak is to only make a copy of an attribute
00115  * string if it needs to be split.  And then, the strategy is to try to use a
00116  * stack-based buffer, rather than a dynamically-allocated one.
00117  *
00118  * SAX2XMLReaderImpl.cpp: Again, more case-sensitive vs. insensitive comparisons.
00119  *
00120  * KVStringPair.cpp & hpp: By storing the size of the allocation, the storage can
00121  * likely be re-used many times, cutting down on dynamic memory allocations.
00122  *
00123  * XMLString.hpp: a more efficient implementation of stringLen().
00124  *
00125  * DTDValidator.cpp: another case of using a stack-based buffer when possible
00126  *
00127  * These patches made a big difference in parse time in some of our test
00128  * files, especially the ones are very attribute-heavy.
00129  *
00130  * Revision 1.14  2000/10/13 22:47:57  andyh
00131  * Fix bug (failure to null-terminate result) in XMLString::trim().
00132  * Patch contributed by Nadav Aharoni
00133  *
00134  * Revision 1.13  2000/04/12 18:42:15  roddey
00135  * Improved docs in terms of what 'max chars' means in the method
00136  * parameters.
00137  *
00138  * Revision 1.12  2000/04/06 19:42:51  rahulj
00139  * Clarified how big the target buffer should be in the API
00140  * documentation.
00141  *
00142  * Revision 1.11  2000/03/23 01:02:38  roddey
00143  * Updates to the XMLURL class to correct a lot of parsing problems
00144  * and to add support for the port number. Updated the URL tests
00145  * to test some of this new stuff.
00146  *
00147  * Revision 1.10  2000/03/20 23:00:46  rahulj
00148  * Moved the inline definition of stringLen before the first
00149  * use. This satisfied the HP CC compiler.
00150  *
00151  * Revision 1.9  2000/03/02 19:54:49  roddey
00152  * This checkin includes many changes done while waiting for the
00153  * 1.1.0 code to be finished. I can't list them all here, but a list is
00154  * available elsewhere.
00155  *
00156  * Revision 1.8  2000/02/24 20:05:26  abagchi
00157  * Swat for removing Log from API docs
00158  *
00159  * Revision 1.7  2000/02/16 18:51:52  roddey
00160  * Fixed some facts in the docs and reformatted the docs to stay within
00161  * a reasonable line width.
00162  *
00163  * Revision 1.6  2000/02/16 17:07:07  abagchi
00164  * Added API docs
00165  *
00166  * Revision 1.5  2000/02/06 07:48:06  rahulj
00167  * Year 2K copyright swat.
00168  *
00169  * Revision 1.4  2000/01/12 00:16:23  roddey
00170  * Changes to deal with multiply nested, relative pathed, entities and to deal
00171  * with the new URL class changes.
00172  *
00173  * Revision 1.3  1999/12/18 00:18:10  roddey
00174  * More changes to support the new, completely orthagonal support for
00175  * intrinsic encodings.
00176  *
00177  * Revision 1.2  1999/12/15 19:41:28  roddey
00178  * Support for the new transcoder system, where even intrinsic encodings are
00179  * done via the same transcoder abstraction as external ones.
00180  *
00181  * Revision 1.1.1.1  1999/11/09 01:05:52  twl
00182  * Initial checkin
00183  *
00184  * Revision 1.2  1999/11/08 20:45:21  rahul
00185  * Swat for adding in Product name and CVS comment log variable.
00186  *
00187  */
00188 
00189 #if !defined(XMLSTRING_HPP)
00190 #define XMLSTRING_HPP
00191 
00192 #include <xercesc/util/XercesDefs.hpp>
00193 #include <xercesc/util/RefVectorOf.hpp>
00194 
00195 class XMLLCPTranscoder;
00196 
00208 class  XMLString
00209 {
00210 public:
00211     /* Static methods for native character mode string manipulation */
00214 
00225     static void binToText
00226     (
00227         const   unsigned int    toFormat
00228         ,       char* const     toFill
00229         , const unsigned int    maxChars
00230         , const unsigned int    radix
00231     );
00232 
00243     static void binToText
00244     (
00245         const   unsigned int    toFormat
00246         ,       XMLCh* const    toFill
00247         , const unsigned int    maxChars
00248         , const unsigned int    radix
00249     );
00250 
00261     static void binToText
00262     (
00263         const   unsigned long   toFormat
00264         ,       char* const     toFill
00265         , const unsigned int    maxChars
00266         , const unsigned int    radix
00267     );
00268 
00279     static void binToText
00280     (
00281         const   unsigned long   toFormat
00282         ,       XMLCh* const    toFill
00283         , const unsigned int    maxChars
00284         , const unsigned int    radix
00285     );
00286 
00297     static void binToText
00298     (
00299         const   long            toFormat
00300         ,       char* const     toFill
00301         , const unsigned int    maxChars
00302         , const unsigned int    radix
00303     );
00304 
00315     static void binToText
00316     (
00317         const   long            toFormat
00318         ,       XMLCh* const    toFill
00319         , const unsigned int    maxChars
00320         , const unsigned int    radix
00321     );
00322 
00333     static void binToText
00334     (
00335         const   int             toFormat
00336         ,       char* const     toFill
00337         , const unsigned int    maxChars
00338         , const unsigned int    radix
00339     );
00340 
00351     static void binToText
00352     (
00353         const   int             toFormat
00354         ,       XMLCh* const    toFill
00355         , const unsigned int    maxChars
00356         , const unsigned int    radix
00357     );
00358 
00369     static bool textToBin
00370     (
00371         const   XMLCh* const    toConvert
00372         ,       unsigned int&   toFill
00373     );
00374 
00387     static int parseInt
00388     (
00389         const   XMLCh* const    toConvert
00390     );
00391 
00393 
00396 
00410     static void catString
00411     (
00412                 char* const     target
00413         , const char* const     src
00414     );
00415 
00428     static void catString
00429     (
00430                 XMLCh* const    target
00431         , const XMLCh* const    src
00432     );
00434 
00437 
00448     static int compareIString
00449     (
00450         const   char* const     str1
00451         , const char* const     str2
00452     );
00453 
00464     static int compareIString
00465     (
00466         const   XMLCh* const    str1
00467         , const XMLCh* const    str2
00468     );
00469 
00470 
00484     static int compareNString
00485     (
00486         const   char* const     str1
00487         , const char* const     str2
00488         , const unsigned int    count
00489     );
00490 
00504     static int compareNString
00505     (
00506         const   XMLCh* const    str1
00507         , const XMLCh* const    str2
00508         , const unsigned int    count
00509     );
00510 
00511 
00525     static int compareNIString
00526     (
00527         const   char* const     str1
00528         , const char* const     str2
00529         , const unsigned int    count
00530     );
00531 
00546     static int compareNIString
00547     (
00548         const   XMLCh* const    str1
00549         , const XMLCh* const    str2
00550         , const unsigned int    count
00551     );
00552 
00565     static int compareString
00566     (
00567         const   char* const     str1
00568         , const char* const     str2
00569     );
00570 
00582     static int compareString
00583     (
00584         const   XMLCh* const    str1
00585         , const XMLCh* const    str2
00586     );
00587 
00614     static bool regionMatches
00615     (
00616         const   XMLCh* const    str1
00617         , const int             offset1
00618         , const XMLCh* const    str2
00619         , const int             offset2
00620         , const unsigned int    charCount
00621     );
00622 
00650     static bool regionIMatches
00651     (
00652         const   XMLCh* const    str1
00653         , const int             offset1
00654         , const XMLCh* const    str2
00655         , const int             offset2
00656         , const unsigned int    charCount
00657     );
00659 
00662 
00672     static void copyString
00673     (
00674                 char* const     target
00675         , const char* const     src
00676     );
00677 
00688     static void copyString
00689     (
00690                 XMLCh* const    target
00691         , const XMLCh* const    src
00692     );
00693 
00706     static bool copyNString
00707     (
00708                 XMLCh* const    target
00709         , const XMLCh* const    src
00710         , const unsigned int    maxChars
00711     );
00713 
00716 
00722     static unsigned int hash
00723     (
00724         const   char* const     toHash
00725         , const unsigned int    hashModulus
00726     );
00727 
00734     static unsigned int hash
00735     (
00736         const   XMLCh* const    toHash
00737         , const unsigned int    hashModulus
00738     );
00739 
00749     static unsigned int hashN
00750     (
00751         const   XMLCh* const    toHash
00752         , const unsigned int    numChars
00753         , const unsigned int    hashModulus
00754     );
00755 
00757 
00760 
00768     static int indexOf(const char* const toSearch, const char ch);
00769 
00778     static int indexOf(const XMLCh* const toSearch, const XMLCh ch);
00779 
00790     static int indexOf
00791     (
00792         const   char* const     toSearch
00793         , const char            chToFind
00794         , const unsigned int    fromIndex
00795     );
00796 
00807     static int indexOf
00808     (
00809         const   XMLCh* const    toSearch
00810         , const XMLCh           chToFind
00811         , const unsigned int    fromIndex
00812     );
00813 
00822     static int lastIndexOf(const char* const toSearch, const char ch);
00823 
00832     static int lastIndexOf(const XMLCh* const toSearch, const XMLCh ch);
00833 
00844     static int lastIndexOf
00845     (
00846         const   char* const     toSearch
00847         , const char            chToFind
00848         , const unsigned int    fromIndex
00849     );
00850 
00861     static int lastIndexOf
00862     (
00863         const   XMLCh* const    toSearch
00864         , const XMLCh           ch
00865         , const unsigned int    fromIndex
00866     );
00868 
00871 
00876     static void moveChars
00877     (
00878                 XMLCh* const    targetStr
00879         , const XMLCh* const    srcStr
00880         , const unsigned int    count
00881     );
00882 
00884 
00887 
00895     static void subString
00896     (
00897                 char* const    targetStr
00898         , const char* const    srcStr
00899         , const int            startIndex
00900         , const int            endIndex
00901     );
00902 
00911     static void subString
00912     (
00913                 XMLCh* const    targetStr
00914         , const XMLCh* const    srcStr
00915         , const int             startIndex
00916         , const int             endIndex
00917     );
00918 
00920 
00923 
00927     static char* replicate(const char* const toRep);
00928 
00933     static XMLCh* replicate(const XMLCh* const toRep);
00934 
00936 
00939 
00945     static bool startsWith
00946     (
00947         const   char* const     toTest
00948         , const char* const     prefix
00949     );
00950 
00957     static bool startsWith
00958     (
00959         const   XMLCh* const    toTest
00960         , const XMLCh* const    prefix
00961     );
00962 
00971     static bool startsWithI
00972     (
00973         const   char* const     toTest
00974         , const char* const     prefix
00975     );
00976 
00986     static bool startsWithI
00987     (
00988         const   XMLCh* const    toTest
00989         , const XMLCh* const    prefix
00990     );
00991 
00998     static bool endsWith
00999     (
01000         const   XMLCh* const    toTest
01001         , const XMLCh* const    suffix
01002     );
01003 
01004 
01011     static const XMLCh* findAny
01012     (
01013         const   XMLCh* const    toSearch
01014         , const XMLCh* const    searchList
01015     );
01016 
01023     static XMLCh* findAny
01024     (
01025                 XMLCh* const    toSearch
01026         , const XMLCh* const    searchList
01027     );
01028 
01035     static int patternMatch
01036     (
01037                 XMLCh* const    toSearch
01038         , const XMLCh* const    pattern
01039     );
01040 
01045     static unsigned int stringLen(const char* const src);
01046 
01051     static unsigned int stringLen(const XMLCh* const src);
01052 
01058     static bool isValidNCName(const XMLCh* const name);
01059 
01065     static bool isValidName(const XMLCh* const name);
01066 
01072     static bool isValidEncName(const XMLCh* const name);
01073 
01079     static bool isValidQName(const XMLCh* const name);
01080 
01086 
01087     static bool isAlpha(XMLCh const theChar);
01088 
01094     static bool isDigit(XMLCh const theChar);
01095 
01101     static bool isAlphaNum(XMLCh const theChar);
01102 
01108     static bool isHex(XMLCh const theChar);
01109 
01115     static bool isAllWhiteSpace(const XMLCh* const toCheck);
01116 
01118 
01121 
01127     static void cut
01128     (
01129                 XMLCh* const    toCutFrom
01130         , const unsigned int    count
01131     );
01132 
01141     static char* transcode
01142     (
01143         const   XMLCh* const    toTranscode
01144     );
01145 
01162     static bool transcode
01163     (
01164         const   XMLCh* const    toTranscode
01165         ,       char* const     toFill
01166         , const unsigned int    maxChars
01167     );
01168 
01177     static XMLCh* transcode
01178     (
01179         const   char* const     toTranscode
01180     );
01181 
01193     static bool transcode
01194     (
01195         const   char* const     toTranscode
01196         ,       XMLCh* const    toFill
01197         , const unsigned int    maxChars
01198     );
01199 
01205     static void trim(char* const toTrim);
01206 
01212     static void trim(XMLCh* const toTrim);
01213 
01220     static RefVectorOf<XMLCh>* tokenizeString(const XMLCh* const tokenizeSrc);
01221 
01227     static bool isInList(const XMLCh* const toFind, const XMLCh* const enumList);
01228 
01230 
01233 
01241     static XMLCh* makeUName
01242     (
01243         const   XMLCh* const    pszURI
01244         , const XMLCh* const    pszName
01245     );
01246 
01262     static unsigned int replaceTokens
01263     (
01264                 XMLCh* const    errText
01265         , const unsigned int    maxChars
01266         , const XMLCh* const    text1
01267         , const XMLCh* const    text2
01268         , const XMLCh* const    text3
01269         , const XMLCh* const    text4
01270     );
01271 
01276     static void upperCase(XMLCh* const toUpperCase);
01277 
01282     static void lowerCase(XMLCh* const toLowerCase);
01283 
01287     static bool isWSReplaced(const XMLCh* const toCheck);
01288 
01292     static bool isWSCollapsed(const XMLCh* const toCheck);
01293 
01298     static void replaceWS(XMLCh* const toConvert);
01299 
01304     static void collapseWS(XMLCh* const toConvert);
01306 
01307 
01308 private :
01309 
01312 
01313     XMLString();
01315     ~XMLString();
01317 
01318 
01321 
01322     static void initString(XMLLCPTranscoder* const defToUse);
01323     static void termString();
01325 
01330     static bool validateRegion(const XMLCh* const str1, const int offset1,
01331                         const XMLCh* const str2, const int offset2,
01332                         const unsigned int charCount);
01333 
01334     friend class XMLPlatformUtils;
01335 };
01336 
01337 
01338 // ---------------------------------------------------------------------------
01339 //  Inline some methods that are either just passthroughs to other string
01340 //  methods, or which are key for performance.
01341 // ---------------------------------------------------------------------------
01342 inline void XMLString::moveChars(       XMLCh* const    targetStr
01343                                 , const XMLCh* const    srcStr
01344                                 , const unsigned int    count)
01345 {
01346     XMLCh* outPtr = targetStr;
01347     const XMLCh* inPtr = srcStr;
01348     for (unsigned int index = 0; index < count; index++)
01349         *outPtr++ = *inPtr++;
01350 }
01351 
01352 inline unsigned int XMLString::stringLen(const XMLCh* const src)
01353 {
01354     if (src == 0 || *src == 0)
01355     {
01356         return 0;
01357    }
01358     else
01359    {
01360         const XMLCh* pszTmp = src + 1;
01361 
01362         while (*pszTmp)
01363             ++pszTmp;
01364 
01365         return (unsigned int)(pszTmp - src);
01366     }
01367 }
01368 
01369 inline bool XMLString::startsWith(  const   XMLCh* const    toTest
01370                                     , const XMLCh* const    prefix)
01371 {
01372     return (compareNString(toTest, prefix, stringLen(prefix)) == 0);
01373 }
01374 
01375 inline bool XMLString::startsWithI( const   XMLCh* const    toTest
01376                                     , const XMLCh* const    prefix)
01377 {
01378     return (compareNIString(toTest, prefix, stringLen(prefix)) == 0);
01379 }
01380 
01381 inline bool XMLString::endsWith(const XMLCh* const toTest,
01382                                 const XMLCh* const suffix)
01383 {
01384 
01385     unsigned int suffixLen = XMLString::stringLen(suffix);
01386 
01387     return regionMatches(toTest, XMLString::stringLen(toTest) - suffixLen,
01388                          suffix, 0, suffixLen);
01389 }
01390 
01391 inline XMLCh* XMLString::replicate(const XMLCh* const toRep)
01392 {
01393     // If a null string, return a null string!
01394     XMLCh* ret = 0;
01395     if (toRep)
01396     {
01397         const unsigned int len = stringLen(toRep);
01398         ret = new XMLCh[len + 1];
01399         XMLCh* outPtr = ret;
01400         const XMLCh* inPtr = toRep;
01401         for (unsigned int index = 0; index <= len; index++)
01402             *outPtr++ = *inPtr++;
01403     }
01404     return ret;
01405 }
01406 
01407 inline bool XMLString::validateRegion(const XMLCh* const str1,
01408                                       const int offset1,
01409                                       const XMLCh* const str2,
01410                                       const int offset2,
01411                                       const unsigned int charCount)
01412 {
01413 
01414     if (offset1 < 0 || offset2 < 0 ||
01415         (offset1 + charCount) > XMLString::stringLen(str1) ||
01416         (offset2 + charCount) > XMLString::stringLen(str2) )
01417         return false;
01418 
01419     return true;
01420 }
01421 
01422 #endif


Copyright © 2000 The Apache Software Foundation. All Rights Reserved.