Dec 10, 2012

Introduction to N-grams

In analyzing text documents, we can count the frequency of words appearing together in a fixed order. For example, we can count 2-word phrases like “baseball game”, “baseball card”, and “baseball player” etc. Similarly we can count 3-word phrase like “baseball game online”, “baseball game today”, and “baseball game reports”. This approach of counting all adjacent n words in a document is called n-gram approach. The 1-grams are all individual words in the documents, 2-grams are the adjacent 2-word phrases, and so on.

For example, in the following sentence:
A Major League Baseball game was held in Salt Lake City 40 years ago.
1-grams are: {a, Major, League, Baseball, game, was, held, in, Salt, Lake, City, 40, years, ago}
2-grams are: {a Major, Major League, League Baseball, Baseball game, game was, was held, held in, ... }
3-grams are: {a Major League, Major League Baseball, League Baseball game, Baseball game was,...}
Similarly we can count 4-grams, 5-grams and so on. 

Google has an n-gram viewer that counts n-grams in all Google books with certain periods. The chart in the beginning of this blog shows the count for 2-grams “baseball game”, “baseball card”, and “baseball player” in Google books between 1950 and 2008. 

The count of n-grams can be used to predict people’s next search keywords. Suppose we have a collection of all search keywords in the last 2 years. In this collection, the 2-gram “baseball express” has the highest count, followed by “baseball cards”. The 2-gram with higher count is more likely to appear with the 2-grams with lower count. Thus after a user types “baseball”, the next word is more like to be “express” than “cards” in the online search scenario. Here is a snapshot of Yahoo! Search suggestion.
N-grams are also used in spell checking, where recommended words are determined by the surrounding words. They are used in speech recognition, where word identification is based on words uttered before.


  1. This comment has been removed by the author.

  2. This comment has been removed by the author.

  3. good One is normally interaction, and the different is normally Get Camera Snapchat Trophies Snapchat has got displayed it is normally happy to splurge nice.

  4. good Download the hottest Kodi DEB assembly data file to the “Computer help” of your laptop. Kodi iPhone 6 XBMC for no cost on iPhone and iPad. You can reading illustrated Kodi Installation great.

  5. good It is usually therefore simple to make use of that Android emulator for your Laptop or computer. nice.

  6. This type of message always inspiring and I prefer to read quality content, so happy to find good place to many here in the post, the writing is just great, thanks for the post.custom printed t shirts

  7. When iOS 10.0.1 came out, Tedesco stirred up some excitement by claiming he had jailbroken the latest iOS 10 now they already had launched Pangu iOS 10 Jailbreak go to CydiaNerd.

  8. good installed in quite humid places, it damaged or won't readily malfunction. Has An Instant Set up characteristic which empowers any nice.

  9. Great You could download Aptoide apk from Given link listed below. aptoide apk then you could likewise download and install the Fine.

  10. good a lot of enjoyable. Now that you are well accustomed ikodidownload sources or databases that you will need to add nice.

  11. good Click the Package Installer Icon on top left corner. Kodi App the area often makes add-ons that streaming best.


  12. Thanks for sharing your info. I really appreciate your efforts and I will be waiting for your further write.
    Thanks for sharing !
    tanki online 2 | 2048 game online