The Rise of the Data Scientist

Jonathan Cornelissen of DataCamp

Jonathan Cornelissen tells us about DataCamp, the need for data scientists, and how to become one yourself. We also learn about some popular languages and libraries for analyzing data.

  • Here's what to listen for:
  • 00:43 What is the story behind DataCamp?
  • 02:06 What is data science?
  • 02:52 What kind of data is out there that can be analyzed?
  • 04:46 Do I need a scientific or statistical background to work with data science?
  • 05:26 Does DataCamp help establish a theoretical background?
  • 06:21 Do only big companies need data science?
  • 07:16 What is big data?
  • 07:58 Can the term big data be used interchangeably with data science?
  • 09:08 Do you need a “billion dollar budget” to build a data science team? What kind of people do I need to build that kind of team?
  • 12:08 What is behind the shortage of data scientists?
  • 12:48 What can a startup do to incorporate data science into their team?
  • 13:45 What is meant by data savvy?
  • 14:10 What do you do with the data once it’s collected?
  • 14:50 What is cohort analysis?
  • 15:42 Once users are segmented, what could you do at that point?
  • 16:21 Are correlations the primary sort of analysis?
  • 17:14 Are people trying to make causative claims out of correlative data?
  • 18:23 What are some other examples of techniques in addition to correlation?
  • 18:55 Are there any other interesting algorithms out there that people are using?
  • 20:07 Are these analyses run offline or real-time?
  • 20:37 What is the Spark framework?
  • 21:10 What is the R language?
  • 24:09 Where does R fit in in a company?
  • 24:47 Is R being run by a human or is there also a sense of R running on the server to serve up recommendations?
  • 25:30 Is R still evolving as a language?
  • 25:58 Is there anything people should try to learn before trying to tackle R as a language?
  • 26:52 Why learn a language like R?
  • 28:27 Does R allow you the ability to communicate the insights that you’re getting from the data that you’ve analyzed to build a narrative to help the non-technical people on your team?
  • 29:19 Is visualizing the data that we get back important to our understanding of that data? Why?
  • 29:57 Does DataCamp help people visualize data?
  • 30:51 Aside from R, what other tools are out there that a data scientist would use?
  • 31:23 What is Hadoop?
  • 33:09 What is the concept of MapReduce?
  • 33:42 What is the mark of a good data scientist?
  • 35:30 Why do you need domain expertise?
  • 38:30 How are people becoming aware of data science? Where do these people start?

"If you're in a healthcare company you might want to try to predict the effectiveness of new drugs."

"If you're a dating site you may want to analyze data to predict what people will be a good match."

"You're being experimented with all the time as an internet user."

"Everybody w/ a STEM background willing to put in some effort can pretty quickly learn the basics."

"Eventually most companies will have some sort of data science department."

"Big data literally means a lot of data. The bigger trend is data science and big data is a subset."

"There's a huge demand for people who combine programming with knowledge of statistics and business."

"The median wage for somebody with a title of data scientist is between $105k and $144k."

"There will be a shortage of about 200,000 data scientists by 2018. And that's in the US alone."

"More and more people are trying to analyze data online, which is very challenging technically."

"R is the most popular statistical computing language; now gaining traction in the business world."

"What makes R super powerful is its community which has developed enormous amounts of functionality."

"You have a base R version and you just load in packages just like you would load in a gem in Ruby."

"Many people continue with Python to analyze their data just because it's something they know."

"You need to understand what you're looking for in data and you need to interpret what comes out."

"Often people see the value of analyzing data and Excel just doesn't do the work anymore."