Apple has hired Paul Deneve as vice president, reporting to CEO Tim Cook.
The first term “Apple” indicates a company, and the second and third are persons.
Named entity detection (NER) is an important component in social media analysis. It helps us to understand user sentiment on specific products. NER is also important for product search for E-commerce companies. It helps us to understand user search query related to certain products.
To map each name to an entity, one solution is using a dictionary of special names. Unfortunately, this approach has two serious problems. The first problem is that our dictionary is not complete. New companies are created and new products are sold every day. It is hard to keep track all the new names. The second problem is the ambiguity of associating a name to an entity. The following example illustrates this:
As Washington politicians argue about the budget reform, it is a good time to look back at George Washington’s time.
In this text, the first mention of “Washington” refers to a city, while the second mention refers to a person. The distinction of these two entities comes from their context.
To resolve ambiguity in entity mapping, we can create certain rules to utilize the context. For example, we can create the following rules:
- When ‘Washington’ is followed by ‘politician’, then it refers to a city.
- When ‘Washington’ is preceded by ‘in’, then it refers to a city.
- When ‘Washington’ is preceded by ‘George’, then it refers to a person.
Instead of manually creating rules, we can apply machine learning. The advantage of machine learning is that it creates patterns automatically from examples. No rule needs to be manually written by humans. The machine learning algorithm takes a set of training examples, and chunk out its own model (that is comparable to rules). If we get new training data, we can re-train the machine learning algorithm and generate a new model quickly.
How does the machine learning approach work? I will discuss it in the next post.