SkillAgentSearch skills...

Mycelium

An open source information retrieval system written in C++11 and Python. Aspires to be an alternative to Nutch / Lucene. It uses MongoDB as an storage engine.

Install / Use

/learn @larroy/Mycelium
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

The Mycelium Information retrieval system

Check the latest up-to-date user documentation at http://pedro.larroy.com/mycelium/sphinx/

For the impatient

$ ./bootstrap.sh $ ./build.py $ scons $ build/release/crawler $ echo 'http://google.com' | nc localhost 1024

Status of the modules / Features

  • Crawler:

    Feature complete.

  • Tokenizer/Stemmer:

    Work in progress.

  • Inverted index:

    TODO

  • Search frontend:

    TODO

How to use it

  • Initialize the git submodules:

$ git submodule init $ git submodule update

  • Build the 3rd-party libraries:

$ ./build.py

  • Compile the sources with SCons:

$ scons

  • Alternatively you might build with system curl:

$ scons --system_curl

  • But as said previously, synchronous DNS resolving will harm the performance and block. So it's not recommended unless curl has been compiled with c-ares, as it will be done by build.py

Running

  • The environment variables that affect some configuration parameters are:

    • Specific for the crawler: MYCELIUM_CRAWLER_PORT: port to listen for urls MYCELIUM_CRAWLER_PARALLEL: number of parallel crawlers to run

    • General for all the tools that interact with the DB:

    MYCELIUM_DB_HOST: mongodb host for storing the documents, default is "localhost" MYCELIUM_DB_NS: database.collection, defaults to "mycelium.crawl"

Dependencies

The software is build on debian / ubuntu systems, although it should be fairly easy to port to other platforms.

Some (might be incomplete) list of libraries that are required:

  • z
  • boost_filesystem
  • boost_system
  • boost_regex
  • log4cxx
  • pthread
  • curl
  • event
  • ssl
  • libidn11-dev

Other software:

  • scons
  • flex
  • autoconf (for building curl and c-ares)

Troubleshoting

AttributeError: 'SConsEnvironment' object has no attribute 'CXXFile': File "/home/piotr/devel/mycelium/SConstruct", line 210: variant_dir='build/{0}'.format(build), duplicate=0) File "/usr/lib/scons/SCons/Script/SConscript.py", line 614: return method(*args, **kw) File "/usr/lib/scons/SCons/Script/SConscript.py", line 551: return _SConscript(self.fs, *files, **subst_kw) File "/usr/lib/scons/SCons/Script/SConscript.py", line 260: exec file in call_stack[-1].globals File "/home/piotr/devel/mycelium/src/SConscript", line 11: env.CXXFile(target='Robots_flex.cc',source='robots.ll')

To solve this problem make sure flex is installed.

View on GitHub
GitHub Stars87
CategoryDevelopment
Updated2y ago
Forks12

Languages

C++

Security Score

65/100

Audited on Mar 23, 2024

No findings