Named Entity Recognition with Stanford NER and Ruby

Is Named Entity Recognition a “solved problem”?

You know that feeling you get when a computer acts even the slightest bit human? I felt it the first time I realized my computer could recognize people’s names, the names of locations, and company names in text: It’s called Named Entity Recognition (NER). The accuracy and reliability of NER varies depending on the trained language models and domain contexts. Some call NER a “solved problem” and others say it is far from being solved. I think this all depends on user expectations, the purpose for using it, and the quality of the models used for NER tasks.

Quickly testing NER across multiple domain contexts

I put together this short tutorial as a demonstration of Stanford’s NER Server and Ruby. In order to quickly test NER tasks across a variety of domain contexts, we’ll be using web URLs as data sources for processing.

Getting started

Clone ‘ruby-ner’ from github
$ git clone https://github.com/mblongii/ruby-ner.git
$ cd ruby-ner

Download the Stanford Named Entity Recognizer (NER) software
$ curl -O http://nlp.stanford.edu/software/stanford-ner-2012-04-07.tgz
$ tar xvfz stanford-ner-2012-04-07.tgz

Run NER as a server on port 8080
$ java -mx1000m -cp stanford-ner-2012-04-07/stanford-ner.jar edu.stanford.nlp.ie.NERServer -loadClassifier stanford-ner-2012-04-07/classifiers/english.muc.7class.distsim.crf.ser.gz -port 8080 -outputFormat inlineXML &

Install required ruby gems
$ bundle install

Run the ruby script
$ ruby get_named_entities.rb

LOCATIONS:
San Francisco
ORGANIZATIONS:
Google
PEOPLE:
Mike Long
DATES:
May 29th

Try passing a URL to the script
$ ruby get_named_entities.rb http://cnn.com

LOCATIONS:
Illinois
ORGANIZATIONS:
FBI
PERCENTS:
-0.59 %
MONEY:
$90M
PEOPLE:
Estrella Carrera
DATES:
Saturday

How it works

NER server loads the model english.muc.7class.distsim.crf.ser.gz which was trained across a variety of corpora and is fairly robust across domains. The entity classes trained into this seven class model include: location, time, organization, percent, money, person, and date.

Take a look at get_named_entities_script.rb and feel free to give feedback or ask questions :)

What are some interesting uses of Named Entity Recognition?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s