Packages
 In this topic

*Constructors

*Methods

*Fields

 

Packages   PreviousThis PackageNext
Package com.ms.util   Previous This
Package
Next

 


Class HTMLTokenizer

public class HTMLTokenizer
{
  // Fields
  public Hashtable attrs;
  public String tag;
  public String text;
  public static final int TT_BEGIN_TAG;
  public static final int TT_COMMENT;
  public static final int TT_END_TAG;
  public static final int TT_TEXT;
  public int type;

  // Constructors
  public HTMLTokenizer (InputStream isin);

  // Methods
  public boolean hasMoreTokens ();
  public void mark (int readLimit) throws IOException;
  public int nextToken () throws ParseException, IOException;
  public void reset () throws IOException;
  public String toString ();
}

This class parses an HTML version 3.2 document. The parser does not interpret any HTML tags, except for comments and the <PRE> tag.

Constructors

HTMLTokenizer

public HTMLTokenizer (InputStream isin);

Creates an HTMLTokenizer object when passed to an input stream.

ParameterDescription
isin The input stream to tokenize.

Methods

hasMoreTokens

public boolean hasMoreTokens ();

Indicates if the HTMLTokenizer object contains more tokens.

Return Value:

Returns true if there are more tokens; otherwise, returns false.

mark

public void mark (int readLimit) throws IOException;

Marks the parser's current position in the input stream.

Return Value:

No return value.

ParameterDescription
readLimit The number of bytes that can be read before this mark is invalidated.

Exceptions:

IOException if the tokenized input stream cannot set the requested mark.

See Also: java.lang.InputStream.mark

nextToken

public int nextToken () throws ParseException, IOException;

Parses the next token from the input stream. The white space that follows the token and the first character of the next token is consumed.

Return Value:

Returns one the following token types:

Exceptions:

NoSuchElementException if a null token is received.

ParseException if no tag is found after a less than (<) symbol or a tag does not have a matching greater than (>) symbol.

reset

public void reset () throws IOException;

Resets the input to the last marked position.

Return Value:

No return value.

Exceptions:

IOException if the tokenized input stream cannot set the requested mark.

See Also: java.lang.InputStream.reset

toString

public String toString ();

Retrieves a string representation of the HTMLTokenizer object.

Return Value:

Returns a string containing the tag types, tags, attributes, and text of the current token in the HTML file.

Fields

attrs
The attributes of a tag. They are valid for these token types: TT_BEGIN_TAG and TT_END_TAG.
tag
The tag.
Comments:
If this is the closing end of a tag, it will not have the leading slash (/) character. This tag is valid for these token types: TT_BEGIN_TAG and TT_END_TAG.
text
Plain text. They are valid for these token types: TT_TEXT and TT_COMMENT.
TT_BEGIN_TAG
A token type representing a beginning tag (for example, <H1>).
TT_COMMENT
A token type representing a comment.
TT_END_TAG
A token type representing an ending tag (for example, </H1>).
TT_TEXT
A token type representing the token text.
type
The last token type read. It can be one of the following:

upnrm.gif © 1998 Microsoft Corporation. All rights reserved. Terms of use.