Jul 8, 2013

Text Mining: Named Entity Detection

An interesting task of text mining is detecting entities in the text. Such entities could be a person, a company, a product, or a location. Since an entity is associated with a special name, it is also called Named Entity. For example, the following text contains 3 named entities:
       Apple has hired Paul Deneve as vice president, reporting to CEO Tim Cook.
The first term “Apple” indicates a company, and the second and third are persons.

Named entity detection (NER) is an important component in  social media analysis. It helps us to understand user sentiment on specific products. NER is also important for product search for E-commerce companies. It helps us to understand user search query related to certain products.

To map each name to an entity, one solution is using a dictionary of special names. Unfortunately, this approach has two serious problems. The first problem is that our dictionary is not complete. New companies are created and new products are sold every day. It is hard to keep track all the new names. The second problem is the ambiguity of associating a name to an entity. The following example illustrates this:
As Washington politicians argue about the budget reform, it is a good time to look back at George Washington’s time.
In this text, the first mention of “Washington” refers to a city, while the second mention refers to a person. The distinction of these two entities comes from their context.

To resolve ambiguity in entity mapping,  we can create certain rules to utilize the context. For example, we can create the following rules:
  1. When ‘Washington’ is followed by ‘politician’, then it refers to a city.
  2. When ‘Washington’ is preceded by ‘in’, then it refers to a city.
  3. When ‘Washington’ is preceded by ‘George’, then it refers to a person.
But such rules could be too many. For example, each of the following phrases would generate a different rule: “Washington mentality”, “Washington atmosphere”, “Washington debate” as well as “Washington biography” and “Washington example”. The richness of natural language makes the number of rules exploding and still susceptible to exceptions.

Instead of manually creating rules, we can apply machine learning. The advantage of machine learning is that it creates patterns automatically from examples. No rule needs to be manually written by humans. The machine learning algorithm takes a set of training examples, and chunk out its own model (that is comparable to rules). If we get new training data, we can re-train the machine learning algorithm and generate a new model quickly.

How does the machine learning approach work? I will discuss it in the next post. 


  1. Thank you for this post. Very interesting.

  2. I think it is amazing that the search engine will try to interpret and answer my question. It sounds like program that was playing jeopardy. What was it called? Watson?
    As the technology grows it will reduce the really odd search results that pop up. To see more info please visit

  3. Interesting post. I will wait for your next post on machine learning.

  4. After I originally left a comment I seem to have clicked on the -Notify me when new comments are added- checkbox aand now whenever a comment is added I recieve 4 emaols with the same comment.
    Perhaps there iis a means you are able to remove me from that service?
    Many thanks!

    Also visit my homepage ... bristle hair btush - -

  5. Sometimes single mothers and couples may encounter such situations wherein childhood
    becomes impossible for them. Try to think of
    what areas in your neighborhood could use a little bit
    of sprucing up - As you try to decide what services are needed, you
    can actively advertise and promote it to try to get others involved.

    Closely linked to this strategy is the technology strategy.

    Feel free to surf to my blog post; Adoption Network
    Law Center webblog - -

  6. This article has covered the topic quite well. Very informational and interesting. Thanks for sharing this knowledge with us.

    new year, new year images, new year wallpaper, new year quotes, new year wishes, new year sms, new year greetings, whatsapp status