Project Properties dialog: URL filters sections

URL filters allow you to easily control Project downloads by setting which files should be loaded and which should be skipped.

URL Filters are divided into four parts:

Please note that if a URL doesn't contain a filename (such as http://www.srv.com/path) it should be ended with "/" slash. Otherwise "path" will be recognized as a filename.

The URL Filter sections work as a group. If one filter specifies that a file is not to be downloaded, it will not be downloaded even if subsequent filters allow it.

Each filter allows to specify whether to load within the starting server/directory/filename and exclude/include keywords to fine-tune the downloads.

When Portable Offline Browser decides whether a link will be downloaded or not, it checks the "Load only within the starting..." box state first. If the box is checked and the link doesn't correspond the starting URL part (e.g. link points to an external server) then the link will be not downloaded at all. If the link corresponds to the starting URL part, it will be then checked against included/excluded keywords lists.

The file is allowed for downloading only if the corresponding URL segment is the same, as in the starting URL (Project | Address (URL)). For the

Load only within the starting...

If the "Load only within the starting..." box is not checked, the link will be compared with the included/excluded keywords lists only.

There are two types of keywords - to enable and to disable file loading if the keyword matches. There are two lists for each type of keyword.

Portable Offline Browser compares the corresponding URL segment of every file to be downloaded to the keywords of each type. If any of keywords in the Include list matches then the file is allowed for download. If none of the keywords match then the file will not be loaded. If any of keywords in the Exclude list matches then the file is not allowed for download. If none of the keywords match then the file will be loaded. The keyword may include any symbol. To match to a keyword means that the corresponding URL segment (server name, path or filename with extension) contains the keyword.

Examples: server name www.zdnet.com contains keyword zdnet. Path /~user015/images/ does not contain keyword user705.

The two special symbols "^" and "$" indicate the beginning and end of the URL segment.

For example, keyword ^www.cnn means that the URL segment should begin from www.cnn. Keyword .htm$ means that URL segment should end with the .htm.

To define a set of possible symbols on a position, use [] characters. Thus, keyword g[eo]t matches both get and got words. The symbol "-" means a range from one symbol to another. ^ symbol after [ matches a single character that is not contained within the brackets.

For example, [a-z] means all symbols from a to z (including.) [0-9]means all numeric symbols.

Use an asterisk "*" for any number of any symbols, such as ^a*.gif$, which means all files that begin with an a and end with .gif.

Use plus "+" to match the preceding symbol one or more times. For example, ab+c matches "abc", "abbc", "abbbc", and so on, but not "ac". Another sample is ab[0-9]+.htm to match filenames like ab564562.htm or ab1.htm.

If you want to exclude a directory, such as /ads/, don’t forget to place "/" slashes around it. Otherwise other directories, such as /leads/, will also be excluded from downloading.

You may specify the server name (starting with http://, ftp://, https://, rtsp://, pnm://, mms:// or mmst://) in the Directory keywords.

You may also specify full URLs in Filename keywords.

For example, the following keyword in the URL Filters | Filename | Custom configuration is allowed: http://www.srv.com/directory/img[0-9]/*.gif URL Filters | Directory | Custom configuration can contain the following keyword: http://www.zdnet.*/newfiles/

You can also specify a keyword to look in link names (text between tags): link:someword

This keyword will work only in the URL Filters | Filename section.

The "tree" icon next to the keyword field shows the map of the downloaded site to choose a filename, directory or server as a keyword. Note: the keyword check is case insensitive.

Tip: To filter URLs that end with a slash (no filename specified), use default.htm in URL Filters | Filename section.

With the exception of the Protocol URL filter, the Custom Configuration sections of each URL Filter contain an Include and Exclude list, and you may enter any number of keywords into each of these lists. If a keyword matches the corresponding URL segment in an Include list, then the file will be downloaded. If a keyword matches the corresponding URL segment in an Exclude list, then the file will not be downloaded.

Server keyword examples:

Table 1. 

Keyword Matches URLsDoesn’t match URLs
zdnet.com
http://www.zdnet.com/ http://hotfiles.zdnet.com http://zdnet.com.de
http://www.zdnet.de
www.zdnethttp://www.zdnet.com/ http://my-www.zdnet.de/http://hotfiles.zdnet.com http://zdnet.com.de
w.zd http://www.zdnet.com/ http://www.zdnet.de/http://hotfiles.zdnet.com http://zdnet.com.de
www.*.comhttp://www.zdnet.com/http://www.zdnet.de/

Directory keyword examples:

Table 2. 

KeywordMatches URLs Doesn’t match URLs
somedir
http://www.zdnet.com/somedir/file.htm http://files.com.de/somedirectory/ http://zdnet.com.uk/mysomedirs/
http://www.zdnet.de/someotherdir/
/path/dirhttp://www.zdnet.com/path/dir/file.htm http://www.zdnet.de/inside/path/directory/http://hotfiles.zdnet.com/mypath/dir/file.gif http://zdnet.com.de/path/dir.txt
^/mydir/$ http://www.zdnet.com/mydir/file.exthttp://hotfiles.zdnet.com/mydir/path/file.txt http://zdnet.com.de/some/mydir/
http://www.s3.*/pathhttp://www.s3.com/path/file.htm http://www.s3.jp/path/dir/image.jpgfile.htmhttp://www.zdnet.com/path/

Filename keyword examples:

Table 3. 

Keyword Matches URLsDoesn't match URLs
somefile
http://www.zdnet.com/dir/somefile.htm http://files.com.de/file.asp?somefile=val
http://www.zdnet.de/someotherfile
default.htm http://www.server.com/dir/ http://www.other.com/default.htmhttp://www.server.com/file.html
http://*.zdnet.*/path*/*.cgihttp://www.zdnet.com/path/file.cgi http://www.zdnet.de/path/dir/other.cgihttp://hotfiles.zdnet.com/mypath/dir/file.gif http://zdnet.com.de/path/dir.txt
/*folder*/*.ziphttp://www.srv.com/dir/folder/other/file.zip http://www.other.uk/folder/archive.ziphttp://hotfiles.zdnet.com/folder/dir/file.gif http://zdnet.com.de/dir/file.zip
link:FootballAny link that has Football word in its text 

The Protocol URL Filter Custom Configuration section allows selection or deselection of each protocol directly.

Some useful keys to manage URL Filters keywords lists:

Ctrl-L loads keywords from a text file to the currently selected list.

Ctrl-M moves all keywords in the current list to another (like from Included to Excluded).

Ctrl-S adds the starting server, directory or filename to the selected keywords list. For example, when you want to allow loading only from the starting directory in the URL Filters | Directories | Custom configuration.

Ctrl-C copies all keywords to Windows clipboard.

Ctrl-A checks all keywords in the list.

Ctrl-N unchecks all keywords.

Press F2 key on a selected keyword to edit it in-place.

Note: URL Filters settings do not apply to files, which File Filters category doesn't have "Load using URL Filters setting" selected in its Location box.

You can also use URL Macros in the URL Filters keywords. For example:

filename{:0day}.htm

Tip: To filter URLs that end with a slash (no filename specified), use default.htm in URL Filters | Filename section.

Note: Portable Offline Browser skips a URL if any of the URL Filters or File Filters setting doesn't allow it to be loaded (even if all other Project settings allow the URL to be loaded).