In my practical experience in building data science teams, I have come to appreciate the following qualities:
- A fundamental understanding of machine learning. Ultimately data mining cannot exist without machine learning, which provides core technique. Thus a researcher in machine learning or related fields (such as natural language processing, computer vision, artificial intelligence, or bioinformatics) is an ideal candidate. They have studied different machine learning methods, and know the newest and best techniques to apply to a problem.
- A sophisticated understanding of statistics and advanced mathematics. Such understanding requires years of training. Thus a Ph.D. degree is typically required for data scientists.
- Training in computer science. Ultimately, mining data is a way of computing. It requires design of computer algorithms that are efficient in memory (space) and time. People who are trained in computer science understand the tradeoff of space and time in computer. They understand the basic concept of computational complexity. Someone who has majored in computer science would have this training ingrained in their DNA.
- Good coding skill. We live in a big data era. In order to work with data, we write code to process them, clean them, and transform them. Then we need to create programs on big data platform, and test and improve the program constantly. All of these require good coding skill. Data mining is about implementation and testing. Programming skill is thus a core requirement.
- Experience with big data. This enables someone to work in certain environments such as Hadoop, and use the tool fast. But such knowledge can be easily learned.
- Knowledge of a specific program language. A good programmer can easily learn any new language quickly. In addition, there are many options to run big data program, from Python, to Java, to Scala. If a person masters any one of these languages, he can be very productive.
Even today, in early 2014, companies are struggling to bring in data scientists. Those who are on the job market are immediately snatched away by large and well-known companies. Today, every company is trying to implement “data strategy” (or “big data strategy” in its fancier term). This is a golden age for data scientists but a challenging time for employers.