Public | Automated Build

Last pushed: 2 years ago
Short Description
Scientific Name Finder in texts
Full Description

NetiNeti -- Scientific Names Discovery Tool

Setting up VirtualEnv (Linux)

  • install virtualenv (easy_install virtualenv or pip install virtualenv)
  • use the as the environment bootstrap (python ~/virtualenvs/neti)
  • this creates a local environment for the netineti project with all the dependencies installed
  • dependencies => pyyaml, nltk, nose, scikitlearn
  • source ~/virtualenvs/neti/bin/activate
  • use netineti


  • nltk >= 2.09b3. Run to get neccessary corpus.
    Just download all the data and packages if you don't know which one to choose.


Neti-Neti scientific name finder.
Input: Any text preferably in English
Output: A list of Scientific Names in the text

To run it: (Add a config file named as neti_http_config.cfg in the folder config)

$ python

To use webservice:

$ ruby webservices/ruby/taxon_finder_web_service.rb

(use your server name instead of localhost:4567)





Files Descriptions this file
src/data/black_list.txt "black list" for pre filtering, common words to decrease number of false positives
src/data/white_list.txt big training list, run by default
src/data/no_names.txt training text w/o scientific names for negative examples
src/data/names_in_context.txt training list of names and these names in a context of a sentence.
src/data/test.txt American Seashells book (with scientific names) for testing purposes
src/ Machine Learning based approach to find scientific names
src/ miscellaneous helper functions
src/ Scientific Name classifier -- given a name-like string it accepts or rejects it as a scientific name



docker run -d -p --name netineti gnames/netineti

Using from (python) server:

  from netineti import *

  # for long training set, about 20 min on slow machine
  nnt = NetiNetiTrain()
  # you can use other training text if you supply it as an argument:
  # nnt = NetiNetiTrain("species_train.txt")

  nn = NetiNeti(nnt)

Example Urls to try:

New Species


Note: offsets do not work in this version.

Docker Pull Command
Source Repository