Data Scientist Salary Survey
O’Reilly Media conducted an anonymous salary and tools survey in 2013 at several recent Conferences. The conferences included making Data Work in Santa Clara, California and Strata + Hadoop World in New York.
The goal of the survey was to better understand which tools data analysts and data scientists use and how those tools correlate with salary. Not all respondents describe their primary role as data scientist/data analyst, but almost all respondents are exposed to data analytics.
Just over half the respondents described themselves as technical leads, almost all reported that some part of their role included technical duties (i.e., 10–20% of their responsibilities included data analysis or software development).
Tools of choice were examined to determine which tools correlate with others (if respondents use one, are they more likely to use another?) Tools could then be compared with salary either individually or collectively.
Some select survey hghlights:
• By a significant margin, more respondents used SQL than any other tool (71% of respondents, compared to 43% for the next highest ranked tool, R).
• The open source tools R and Python, used by 43% and 40% of respondents, respectively, proved more widely used than Excel (used by 36% of respondents).
• Salaries positively correlated with the number of tools used by respondents. The average respondent selected 10 tools and had a median income of $100k; those using 15 or more tools had a median salary of $130k.
• There were primarily two clusters of correlating tool use: one consisting of open source tools (R, Python, Hadoop frameworks, and several scalable machine learning tools), the other consisting of commercial tools such as Excel, MSSQL, Tableau, Oracle RDB, and BusinessObjects.
• Respondents who use more tools from the commercial cluster tend to use them in isolation, without many other tools.
• Respondents selecting tools from the open source cluster had higher salaries than respondents selecting commercial tools. For example, respondents who selected 6 of the 19 open source tools had a median salary of $130k, while those using 5 of the 13 commercial cluster tools earned a median salary of $90k.
Efficient access to large operational data stores can transform how we understand and solve major problems for business and government. The growth of Big Data from plain old Data Warehousing in the 1990s has brought new, complex tools that relatively few people understand or have even heard of. But is it worth learning them? The short answer seems to be yes.
Here is why:
• Several open source tools used in analytics such as R and Python are just as important, or even more so, than traditional data tools such as SAS or Excel.
• Some traditional tools such as Excel, SAS, and SQL are used in relative isolation.
• Using a wider variety of tools—programming languages, visualization tools, relational database/Hadoop platforms correlates with higher salary.
• Using more tools tailored to working with big data, such as MapR, Cassandra, Hive, MongoDB, Apache Hadoop, and Cloudera, also correlates with higher salary.
Most Popular Tool Usage
2. R and Python
R and Python are likely popular because they are easily accessible and effective open source tools for analysis. More traditional statistical programs such as SAS and SPSS were far less common than R and Python.
Here is the full report.
Enjoyed the article?
Sign-up for our free newsletter to kick off your day with the latest technology insights, or share the article with your friends and contacts on Facebook, Twitter or Google+ using the icons below.
Please login first in order for you to submit comments