×

Loading...
Ad by
  • 推荐 OXIO 加拿大高速网络,最低月费仅$40. 使用推荐码 RCR37MB 可获得一个月的免费服务
Ad by
  • 推荐 OXIO 加拿大高速网络,最低月费仅$40. 使用推荐码 RCR37MB 可获得一个月的免费服务

Thanks. I have managed to search as follows. However refining is not yet done. Access to our Web Site is blocked by firewall :(

I found, at my best, no one has the perfect solution to documents with various formats. My steps:

1, index all searchable files and generate database files to be searched for, EVERY NIGHT! ( One may post documents and bulletins in daytime , lots of huge size documents)

2, Search though the above generated files when the engine is lanched.

Unfortunately there seems no many tools for converting binary files to plain text files. Searchable files must be in plain text like .html files or *.txt

3, Search engine like that of rolia seems can only search .html files only
hmm :). This is a limitation. The catdoc and xls2csv tools may work and i am trying to use others.

Any idea? Any Demo sites?

Will appreciate it.

Thanx
Report

Replies, comments and Discussions:

  • 工作学习 / IT技术讨论 / 搜索引擎search engine,那位有开发经验?particularly in indexing *.doc,*.xls,*pdf
    免费engine 基本都不完全符合要求,要改造或重写。
    觉得pdftotext, catdoc,xls2csv ... 如何?有更好的推荐吗?
    正准备用perlfect (ksearch, htdig ) + pdftotext+catdoc+xls2csv.
    很没有把握,时间紧.
    machine: dec unix+perl5.004, medium size site
    Thanx
    • load it to MySQL, enable full-text search
    • I did a search engine before, let me know your website name. I can show a demo for you
      • Thanks. I have managed to search as follows. However refining is not yet done. Access to our Web Site is blocked by firewall :(
        I found, at my best, no one has the perfect solution to documents with various formats. My steps:

        1, index all searchable files and generate database files to be searched for, EVERY NIGHT! ( One may post documents and bulletins in daytime , lots of huge size documents)

        2, Search though the above generated files when the engine is lanched.

        Unfortunately there seems no many tools for converting binary files to plain text files. Searchable files must be in plain text like .html files or *.txt

        3, Search engine like that of rolia seems can only search .html files only
        hmm :). This is a limitation. The catdoc and xls2csv tools may work and i am trying to use others.

        Any idea? Any Demo sites?

        Will appreciate it.

        Thanx