What is Data Mining?
Data mining is processing of data to extract information. Examples include:
Process a web site to extract product catalog and cost information, which can then be used to compare prices between different suppliers.
Process a web site to extract email addresses or web URLs.
Harvest the data on a web site for your own purposes.
Extracted data is designed to be easily loaded into a database for further analysis.
How does it work in Portable Offline Browser?
If you need to data mine a Web site, you should create a Project and download the desired site to your hard disk. When the download is complete, you should select the Project and choose Data Mining on the Ribbon - Tools tab. Portable Offline Browser will use an external utility - TextPipe - to process the downloaded Web site.
How can TextPipe help?
TextPipe can be used to generate an extract from any text data source, including web sites. TextPipe can also be used to perform data cleansing or any additional processing e.g.
add a header record (e.g. provide column titles for .CSV files)
remove unwanted data
replace specific text
convert line feeds to DOS/Unix/Mac
expand tabs
fix capitalization
convert from EBCDIC to ASCII
remove multiple whitespace
remove columns, lines or fields
remove duplicate records
sort
extract email addresses from specific fields
discard records matching a pattern
and much more
You may find more information about TextPipe Pro at the Web site: http://www.datamystic.com/offlineexplorer.html
You may download TextPipe Pro from: http://www.datamystic.com/textpipepro.exe
You can also run TextPipe automatically when a Project download is complete. Simply add the following line to the URLs field of the Project:
TextPipe=c:\path\filter_filename.fll
To make TextPipe quit after processing downloaded files, add ;/Q at the end:
TextPipe=c:\path\filter_filename.fll;/Q