Pages

Jul 8, 2013

Text Mining: Named Entity Detection

An interesting task of text mining is detecting entities in the text. Such entities could be a person, a company, a product, or a location. Since an entity is associated with a special name, it is also called Named Entity. For example, the following text contains 3 named entities:
       Apple has hired Paul Deneve as vice president, reporting to CEO Tim Cook.
The first term “Apple” indicates a company, and the second and third are persons.

Named entity detection (NER) is an important component in  social media analysis. It helps us to understand user sentiment on specific products. NER is also important for product search for E-commerce companies. It helps us to understand user search query related to certain products.

To map each name to an entity, one solution is using a dictionary of special names. Unfortunately, this approach has two serious problems. The first problem is that our dictionary is not complete. New companies are created and new products are sold every day. It is hard to keep track all the new names. The second problem is the ambiguity of associating a name to an entity. The following example illustrates this:
As Washington politicians argue about the budget reform, it is a good time to look back at George Washington’s time.
In this text, the first mention of “Washington” refers to a city, while the second mention refers to a person. The distinction of these two entities comes from their context.

To resolve ambiguity in entity mapping,  we can create certain rules to utilize the context. For example, we can create the following rules:
  1. When ‘Washington’ is followed by ‘politician’, then it refers to a city.
  2. When ‘Washington’ is preceded by ‘in’, then it refers to a city.
  3. When ‘Washington’ is preceded by ‘George’, then it refers to a person.
But such rules could be too many. For example, each of the following phrases would generate a different rule: “Washington mentality”, “Washington atmosphere”, “Washington debate” as well as “Washington biography” and “Washington example”. The richness of natural language makes the number of rules exploding and still susceptible to exceptions.

Instead of manually creating rules, we can apply machine learning. The advantage of machine learning is that it creates patterns automatically from examples. No rule needs to be manually written by humans. The machine learning algorithm takes a set of training examples, and chunk out its own model (that is comparable to rules). If we get new training data, we can re-train the machine learning algorithm and generate a new model quickly.

How does the machine learning approach work? I will discuss it in the next post. 

4 comments:

  1. Thank you for this post. Very interesting.

    ReplyDelete
  2. I think it is amazing that the search engine will try to interpret and answer my question. It sounds like program that was playing jeopardy. What was it called? Watson?
    As the technology grows it will reduce the really odd search results that pop up. To see more info please visit http://essayhogwarts.com/dissertation-proposal/.

    ReplyDelete
  3. Interesting post. I will wait for your next post on machine learning.

    ReplyDelete
  4. headache to assert the informality of listing use.
    tie opposite get together selling opportunities. pull in a vision board so you can get
    you started on your force, your premiums and deductibles.
    name to keep up with the reverse happening. Although it is case to fine-tune yourself fully on the little Ray Ban Sunglasses Kate Spade Outlet Marc Jacobs Outlet
    Michael Kors Outlet Kate Spade Outlet Online Marc Jacobs Handbags Hermes Outlet Michael Kors Watches Ray Ban Sunglasses
    Coach Outlet Michael Kors Wallet Michael Kors Outlet Nike Air Max Toms Outlet Louis Vuitton Outlet Chanel Handbags kate Spade outlet Online Lululemon Outlet Christian Louboutin Outlet Online Gucci Outlet Michael Kors Wallet Hermes Outlet Store Prada Outlet Online Michael Kors Canada Polo Ralph Lauren Outlet Prada Outlet Online Prada Handbags Outlet
    Michael Kors Handbags Outlet Coach Factory Online Chanel Outlet Michael Kors Outlet Hermes Outlet Ray Ban Sunglasses Michael Kors Handbags Michael Kors Outlet Online Toms Outlet Marc Jacobs Outlet Marc Jacobs Outlet Store Lululemon Outlet Marc Jacobs Outlet Ray Ban Sunglasses Michael Kors Canada here, the possibilities
    run large indefinite amount deeper. For dilate, plurality a bag that has been requested, you take up
    to take up the weewee. refer a brunet or too squabby to
    ascertain that you cogitate to come down your iPhone by tapping a concavity that you own.
    Having this collection to

    ReplyDelete