• Ever wanted an RSS feed of all your favorite gaming news sites? Go check out our new Gaming Headlines feed! Read more about it here.

Raticus79

Community Resettler
Member
Oct 25, 2017
1,033
What is Data Science?
It's an evolving field and this could be a topic in itself, so I'll defer to UC Berkeley here:
https://datascience.berkeley.edu/about/what-is-data-science/

"The term "data scientist" was coined as recently as 2008 when companies realized the need for data professionals who are skilled in organizing and analyzing massive amounts of data."

"Effective data scientists are able to identify relevant questions, collect data from a multitude of different data sources, organize the information, translate results into solutions, and communicate their findings in a way that positively affects business decisions. These skills are required in almost all industries, causing skilled data scientists to be increasingly valuable to companies."

QUfk6vAl.jpg

"The image represents the five stages of the data science life cycle: Capture, (data acquisition, data entry, signal reception, data extraction); Maintain (data warehousing, data cleansing, data staging, data processing, data architecture); Process (data mining, clustering/classification, data modeling, data summarization); Analyze (exploratory/confirmatory, predictive analysis, regression, text mining, qualitative analysis); Communicate (data reporting, data visualization, business intelligence, decision making)."

There's a lot more at the link.

I'm starting from the data warehousing side and have started to grow into data engineering, working with Hadoop and a lot of Python. Rather than trying to cover everything, I'll post what I've found useful personally to get things started and will add any other links people would like to include (e.g. for the R side of things, BI tools, etc).

Resources

Andrew Ng's free machine learning course at Coursera
https://www.coursera.org/learn/machine-learning

SIGGRAPH Deep Learning Crash Course
https://youtu.be/r0Ogt-q956I

MIT's intro to deep learning 2019 session: http://introtodeeplearning.com/
2018 archive: http://introtodeeplearning.com/2018/index.html

Calculus basics by 3Blue1Brown (re: prerequisites for MIT course above)
https://www.youtube.com/playlist?list=PLZHQObOWTQDMsr9K-rj53DwVRMYO3t5Yr

Probability and statistics
https://seeing-theory.brown.edu/

Learning SQL
Datacamp: https://www.datacamp.com/courses/intro-to-sql-for-data-science
Udemy: https://www.udemy.com/70-461-session-2-querying-microsoft-sql-server-2012/

Kaggle
https://www.kaggle.com/
Companies post paid competitions here and people compete for the best solutions. There are lots of resources here, including exercises and examples to learn from, and a large community. XGBoost has been part of many winning solutions here. Surprisingly it's an ensemble approach rather than deep learning, but it happens to be well suited to the nature of most challenges.

Python-specific machine learning resources

The Complete Machine Learning Course with Python
https://www.udemy.com/machine-learning-course-with-python/
Good focus on machine learning, but assumes some familiarity with Python. Covers some of the same content as Andrew Ng's free course at Coursera. I like this one better because it's newer, has better quality video and focuses more on implementation with examples.

DataQuest
http://dataquest.io
Free hands-on introduction using Python. Has an optional subscription with more content and access to support.

Codecademy data science path:
https://www.codecademy.com/learn/paths/data-science
Subscription required for this content. The site also has lots of free resources for general development.

Data Analysis with Pandas and Python
https://www.udemy.com/data-analysis-with-pandas/

General Python:

The Modern Python 3 Bootcamp
https://www.udemy.com/the-modern-python3-bootcamp/

Python Cookbook
https://www.amazon.com/Python-Cookbook-Third-David-Beazley/dp/1449340377
Advanced material (modern patterns, aimed at programmers)

R

R for Data Science
https://r4ds.had.co.nz/

The Tidyverse packages:
https://www.tidyverse.org/packages/

Spark
(Pending)

Youtube

Google has been putting out a ton of content for TensorFlow lately:
https://www.youtube.com/channel/UC0rqucBdTuFTjJiefW5t-IQ/videos

Siraj Raval: https://www.youtube.com/channel/UCWN3xxRkmTPmbKwht9FuE5A/videos
Has produced many compact videos on interesting topics around this space, for example some creative applications of generative adversarial networks (GANs).

3Blue1Brown's deep learning series, focusing on the math behind neural networks
https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi

Enthought: https://www.youtube.com/user/EnthoughtMedia/videos
Presentations from the SciPy conferences.

Tools

Anaconda
https://www.anaconda.com/
A popular Python distribution which preinstalls many common libraries and tools like Jupyter Notebook.

Power BI
https://powerbi.microsoft.com/
A free desktop program which lets you import and visualize data from various sources.

Visual Studio Code
https://code.visualstudio.com/
A great free IDE with plugins for many common languages.

Enterprise BI Platforms
Tableau, Business Objects, Power BI (enterprise), Spotfire, SAP Analytics Cloud, Qlikview, Alteryx...

RapidMiner
https://rapidminer.com
Platform focused on data science. Has a free educational license.

Cool Threads

"I trained an AI on tens of thousands of ResetEra post titles and discovered how the world ends"
https://www.resetera.com/threads/82679/
 
Last edited:

WedgeX

Member
Oct 27, 2017
13,162
Data science is so very interesting, thanks for making this!

I guess I've done work in all five areas. The most fun have been in visualization, while the hardest/most rewarding has been Monte Carlo modeling. A little sad that it's not what I do for work - really that my work is far less data intense than my internships/consulting work - but I try to do some data work on the side to keep those skills going.
 

Irnbru

Avenger
Oct 25, 2017
2,127
Seattle
Currently doing my masters in business analytics, which is this from a user/project management perspective. We'll be using R and Python next quarter! Wooo wooo. These are some great resources!
 

IPSF

Banned
Oct 27, 2017
345
Got my Computational Inteligence & Data Science Masters last year aged 35 for a career change. Been in a job a year and a bit now, but it's a startup and I'm doing lots of analysis and analytics and no ML at all, which is what I went into it all for.

Problem is I'm paid pretty well and don't want a pay cut to move to a more appropriate job (also I like the people if not the work), but worried about deskilling in such a fast moving field. What to do?

Anyone got any tips for people starting their careers (especially oldies like me), recommenced courses? Ideas for how to build a career outside of London in the UK?
 

RailWays

One Winged Slayer
Avenger
Oct 25, 2017
15,665
Thanks for doing this thread. I was looking at pursuing a career in data in the future so I'll be sure to look into these resources.
 

Tebunker

Member
Oct 25, 2017
3,844
Subbed.

As a server admin for two BI Tools and a pretty damned good report writer, dashboard builder and data analyst I like this. I approve 100%
 

Gazele

Member
Oct 25, 2017
972
Data scientist here, been working about a year, (recovering academic), curious to hear what others have to say.

Currently working on a word2vec model for labels, curious if anyone else has tried to do that.
 

gig

Prophet of Regret
Member
Oct 25, 2017
3,268
Operations Analyst slowly turning into a Data Scientist, reporting for duty. Great thread idea.

I second Kaggle.
 
Oct 27, 2017
3,664
The OP states it's not meant to be a completely comprehensive resource which is certainly fair given the massive quantity of resources and tools used, but I would strongly encourage and recommend anybody interested in Data Science to become incredibly familiar and competent in SQL (i.e. dynamic SQL). It's so commonly used that you're very likely to encounter it at some point, and even though you can use Python to interact with SQL databases you're still going to need familiarity and expertise in SQL.
 

Tebunker

Member
Oct 25, 2017
3,844
The OP states it's not meant to be a completely comprehensive resource which is certainly fair given the massive quantity of resources and tools used, but I would strongly encourage and recommend anybody interested in Data Science to become incredibly familiar and competent in SQL (i.e. dynamic SQL). It's so commonly used that you're very likely to encounter it at some point, and even though you can use Python to interact with SQL databases you're still going to need familiarity and expertise in SQL.
Hell I will go as far as to say even if you and your company use a web based or desktop tool like Business Objects/Power bi/Tableau

learn some SQL. Ask your IT team see if there is a class. I have found 99 time out of 100 that if I can look at the sql and trace the paths of everything I am more competent and confident.

Also I just want to bitch about this one union query I still couldn't get to run in Tableau before I left for break and it was driving me insane. However, understanding how it worked I was able to get to source of the sql that doesnt work in tableau so it can be re-written.
 

BrassDragon

Member
Oct 26, 2017
3,154
The Netherlands
Subscribed for education and curiosity. My background is human intelligence collection/analysis but nearly everything feeds into big data models nowadays. Innovation comes at you fast.
 

chuey

Member
Oct 28, 2017
1,036
BI geek checking in. Subbed! This is awesome. Been in this field for a couple of years now and been Tableau Certified for about a year.
 
Oct 27, 2017
3,664
Given that everybody seems to be charting their position I figured I'd also throw in mine, but I'm a Data Scientist in a technology and financial consulting company, with my role primarily revolving around the automation of financial processes (i.e. investors, hedge-fund managers, tax and financial advisors/consultants, auditors, etc.) for client companies and for large-scale internal automation projects. Because we work with a range of companies, we use quite a broad suite of tools and languages (mainly SQL, R, Python, C++, VBA, Qlikview, Spotfire, Tableau, PowerBI, Alteryx, Hadoop and Spark). Haven't been in the position for too long though (and don't intend on it). My background is maths and got in as an alternative to continuing on a PhD in academia (which I plan on going back to).

Hell I will go as far as to say even if you and your company use a web based or desktop tool like Business Objects/Power bi/Tableau

learn some SQL. Ask your IT team see if there is a class. I have found 99 time out of 100 that if I can look at the sql and trace the paths of everything I am more competent and confident.

Also I just want to bitch about this one union query I still couldn't get to run in Tableau before I left for break and it was driving me insane. However, understanding how it worked I was able to get to source of the sql that doesnt work in tableau so it can be re-written.
Absolutely. Becoming competent in SQL is massively undersold and understated in threads relating to advice regarding Data Science based on what I've seen, but regardless of the tools you're using a strong competency in SQL is one of the biggest advantages you can have. Particularly if you do anything regarding data extraction or data integration, SQL is going to appear in some form and it's better to be prepared for when you need it. Even in visualisation or analytics related tasks, as you say, it can be incredibly useful for troubleshooting.
 
OP
OP
Raticus79

Raticus79

Community Resettler
Member
Oct 25, 2017
1,033
The OP states it's not meant to be a completely comprehensive resource which is certainly fair given the massive quantity of resources and tools used, but I would strongly encourage and recommend anybody interested in Data Science to become incredibly familiar and competent in SQL (i.e. dynamic SQL). It's so commonly used that you're very likely to encounter it at some point, and even though you can use Python to interact with SQL databases you're still going to need familiarity and expertise in SQL.

Oh yeah, approaching things from the data warehouse / integration world I didn't think to call it out, but it's fundamental. Does anyone have a good resource they'd like to include for it? I'm sure there must be some good free ones out there.
 

daveo42

Member
Oct 25, 2017
17,250
Ohio
Subbed.

I've actually worked in the field for a few years, but mostly in aspects of mining, visualization, and reporting. Most is SQL and Hadoop, with some automation through PowerShell. I'm actually moving to a new company in a few weeks that is more of a focused role and I'm looking to learn more and possibly do far more analytical work as opposed to mining a creation of data sets.
Absolutely. Becoming competent in SQL is massively undersold and understated in threads relating to advice regarding Data Science based on what I've seen, but regardless of the tools you're using a strong competency in SQL is one of the biggest advantages you can have. Particularly if you do anything regarding data extraction or data integration, SQL is going to appear in some form and it's better to be prepared for when you need it. Even in visualisation or analytics related tasks, as you say, it can be incredibly useful for troubleshooting.
Could not agree more. While there are plenty of varied analytic and data visualization tools out there, most database structures tend to work on a flavor of SQL and is easily transferable to different architectures. There might be some hiccups in terms of what you can and can't do, but learning those differences are usually fairly simple and highly rewarding.
 

maxxpower

Attempted to circumvent ban with alt account
Banned
Oct 25, 2017
8,950
California
I have an engineering bachelor's degree. I've gotten pretty far into learning data science in my own. Is it too late to get a career in data science now? I feel like it's too saturated given how many resources there are out there. I want data science to be my career but I don't want to put in any more time or money if I'll never get a job in the field.
 

Tebunker

Member
Oct 25, 2017
3,844
I have an engineering bachelor's degree. I've gotten pretty far into learning data science in my own. Is it too late to get a career in data science now? I feel like it's too saturated given how many resources there are out there. I want data science to be my career but I don't want to put in any more time or money if I'll never get a job in the field.
You will get a job. Learn a tool or too as well. Add something like visualization to the skillset.

This is what I need to convey to potential employers better. I may be too well rounded and not have enough specific experience.

I can say that a good Data Analyst with Bi tool experience will be pushing 6figs plus dependent on the area.

Know a lot of people in DC area getting 6figs.
 
Oct 25, 2017
1,465
Great thread!

I actually started out in the environmental/public health field from my biochemistry BS. Worked for a nonprofit where I did a bunch of data stuff, and transitioned to a new job last year to a more data analysis/data science position for state government. Use a lot of Tableau (and SQL) and planning on taking courses for python this year. I feel like I'm only scratching the surface.
 

Tebunker

Member
Oct 25, 2017
3,844
Anyone have experience with CodeAcademy's Python Pro Courses or any thing with their pro courses?

I had done the free Python ones but felt like I was missing something more. I'd also lile to refresh my SQL skills more but I am not sure if they are a good resource as well.
 

Jpop

Banned
Oct 27, 2017
2,655
I've been working in the field for over a year -

Knowing SQL at an advanced level really elevated what you can do. Also Data visualization is a key skill , people live data visualized and it makes it easy to gain key insights into performance.

Thanks for making the thread.
 

Felt

The Fallen
Oct 27, 2017
3,210
Nice topic! I've been pretty interested in mastering plotting in Python. I mostly rely on seaborn but occasionally I need more control. There's a nice coursera course on the subject too in a data science path.
 
OP
OP
Raticus79

Raticus79

Community Resettler
Member
Oct 25, 2017
1,033
Nice topic! I've been pretty interested in mastering plotting in Python. I mostly rely on seaborn but occasionally I need more control. There's a nice coursera course on the subject too in a data science path.

The Enthought channel has a nice presentation from PyViz that's worth checking out. Apparently they have some good financial backing to improve things in this area.
https://youtu.be/aZ1G_Q7ovmc
 

RustyNails

Attempted to circumvent ban with alt account
Banned
Oct 26, 2017
24,586
Does someone know any data analysis books or tools to use in development and construction? By that I mean having a project that has start dates for various milestones in construction: such as building permit acquired date, construction start date, construction end date, etc. Imagine 300 or so milestones. Also imagine hundreds of such construction projects. I need to figure out where bottlenecks are, which preceding milestone impacts the next, etc. Basically perform some data analysis and present some findings. It will be an ongoing part of my job function.

Thanks!
 
OP
OP
Raticus79

Raticus79

Community Resettler
Member
Oct 25, 2017
1,033
Does someone know any data analysis books or tools to use in development and construction? By that I mean having a project that has start dates for various milestones in construction: such as building permit acquired date, construction start date, construction end date, etc. Imagine 300 or so milestones. Also imagine hundreds of such construction projects. I need to figure out where bottlenecks are, which preceding milestone impacts the next, etc. Basically perform some data analysis and present some findings. It will be an ongoing part of my job function.

Thanks!

For general project management, I'd start with something like MS Project for modelling your tasks and dependencies and get them into a Gantt chart to see your critical path. There are some free Excel add-ins to do something similar, but Project is probably worth the extra cost in this case.
 

ebs

Banned
Oct 27, 2017
443
I have an engineering bachelor's degree. I've gotten pretty far into learning data science in my own. Is it too late to get a career in data science now? I feel like it's too saturated given how many resources there are out there. I want data science to be my career but I don't want to put in any more time or money if I'll never get a job in the field.

It's never too late! I finished my masters in electrical engineering, got a standard EE job for a year, didn't really like it so started learning DS on the side, got a job in a few months later.

I would highly recommend An Introduction to Statistical Learning to bolster your math knowledge, and then there's any number of great online courses to develop Python/R skills.
 
OP
OP
Raticus79

Raticus79

Community Resettler
Member
Oct 25, 2017
1,033
Since there's now a data science thread and there's a discussion about what technologies to learn, it's a good time to plug my blog post/rant Things About Real-World Data Science Not Discussed In MOOCs and Thought Pieces.

It's good to see a few comments about the importance of BI tooling, but keep in mind that data sci is more than just modeling and results.

Very nice. I was debating whether to include DevOps since that's actually what I've spending the most time with lately. The OP would get huge pretty quickly.
 

Munti

Member
Oct 26, 2017
884
Great thread.
I have a degree in information science (~library) with the major in information engineering and have the perfect requirements to get into data science.
After a very long period of thinking I decided not to go in that direction though because I heard that even though the salary is stupidly high, the job itself is very dry and has a lot to do with statistics (instead I will go into human-computer interaction).

Did I do a bad decision? :/
 

ChrisR

Member
Oct 26, 2017
6,794
SQL is not hard to pick up. And it will make you seem like a wizard in the office. Strongly recommended to pick it up!

Great thread btw, I'll have to check in from time to time even though I don't really do much high level Data Science stuff.
 

Deleted member 18095

User requested account closure
Banned
Oct 27, 2017
205
Awesome thanks op. Currently working as a data scientist. Will keep this thread on my mind. Btw, what about docker for isolated analysis? Great for distributing/ performing pipeline analyses.
 
Oct 27, 2017
3,664
Oh yeah, approaching things from the data warehouse / integration world I didn't think to call it out, but it's fundamental. Does anyone have a good resource they'd like to include for it? I'm sure there must be some good free ones out there.
It's more of a general SQL resource rather than a Data science specific one, but the 70-461 SQL course on Udemy was invaluable to me (https://www.udemy.com/70-461-session-2-querying-microsoft-sql-server-2012/). It's quite slow and not massively into dynamic SQL, but was a very good resource for me in the basics.
 

RustyNails

Attempted to circumvent ban with alt account
Banned
Oct 26, 2017
24,586
For general project management, I'd start with something like MS Project for modelling your tasks and dependencies and get them into a Gantt chart to see your critical path. There are some free Excel add-ins to do something similar, but Project is probably worth the extra cost in this case.
Yeah we already use MS project. But I want to deep dive into the data and perform analysis through numbers. Like it takes 30 days from A to B, 60 days from B to C etc. Crunch these numbers and look for bottlenecks.
 
OP
OP
Raticus79

Raticus79

Community Resettler
Member
Oct 25, 2017
1,033
Awesome thanks op. Currently working as a data scientist. Will keep this thread on my mind. Btw, what about docker for isolated analysis? Great for distributing/ performing pipeline analyses.

Sure, I'll add Docker in there.

It's more of a general SQL resource rather than a Data science specific one, but the 70-461 SQL course on Udemy was invaluable to me (https://www.udemy.com/70-461-session-2-querying-microsoft-sql-server-2012/). It's quite slow and not massively into dynamic SQL, but was a very good resource for me in the basics.

All right, I'll add that one, thanks.

Yeah we already use MS project. But I want to deep dive into the data and perform analysis through numbers. Like it takes 30 days from A to B, 60 days from B to C etc. Crunch these numbers and look for bottlenecks.

Oh ok, it sounded like you hadn't gotten to that point yet. In that case one thing to try next is putting it into a tabular format in Excel - rows for each project, columns for each standard task, and the durations as the values. From there the sheet can be imported into other tools pretty quickly for further analysis - for example, Pandas df.describe would give you your average duration by task, along with a bunch of other stats. That tabular format also sets you up for other analysis like looking for relationships between durations for different tasks, which could be handy for planning. Then maybe you add columns for things like locations, the people responsible for each task, number of people assigned, etc, and can keep digging from there.
 

Tebunker

Member
Oct 25, 2017
3,844
Codeacademy should be added to the free resources too:

https://www.codecademy.com/catalog/subject/all

And like I was asking earlier, not sure if any one has done the paid ones. But I have been eyeing the Data Science track - but I want to know if anyone else used their pro and I kind of wanted my company to pay for it.

I have never felt like Udemy courses do enough, and stack skills always seem behind. Not sure of other cheap resources.
 
Oct 25, 2017
1,086
Thanks for creating this thread! I currently work as a data analyst and I have a MS in Bioinformatics. Looking forward to the discussion!

I do have a question - has anyone done the MCSE certifications? I'm particularly interested in the Data Management and Analytics one.
 
OP
OP
Raticus79

Raticus79

Community Resettler
Member
Oct 25, 2017
1,033
Codeacademy should be added to the free resources too:

https://www.codecademy.com/catalog/subject/all

And like I was asking earlier, not sure if any one has done the paid ones. But I have been eyeing the Data Science track - but I want to know if anyone else used their pro and I kind of wanted my company to pay for it.

I have never felt like Udemy courses do enough, and stack skills always seem behind. Not sure of other cheap resources.

Sure, I added a link. Looks like that data science track covers an intro to SQL too. If it's just the $20/month for Pro without an additional fee for the course it could be a good deal.
 

IPSF

Banned
Oct 27, 2017
345
Data scientist here, been working about a year, (recovering academic), curious to hear what others have to say.

Currently working on a word2vec model for labels, curious if anyone else has tried to do that.

I'm just about to start playing with word2vec for a text classification task. Will share anything I find if I get anywhere.
 
May 9, 2018
3,600
Data scientist here, been working about a year, (recovering academic), curious to hear what others have to say.

Currently working on a word2vec model for labels, curious if anyone else has tried to do that.
I'm just about to start playing with word2vec for a text classification task. Will share anything I find if I get anywhere.
word2vec is a bit obsolete thanks to fasttext, which is available in an easy-to-use function in gensim.
 

Pau

Self-Appointed Godmother of Bruce Wayne's Children
Member
Oct 25, 2017
5,833
Yay a data science thread! Hope to join you all in the field soon! :)

I'm currently applying to Masters programs in data science. I work as a data analyst for a university, but the work I do is mostly data processing and simple descriptives and regression models. Want to do something that takes more advantage of my statistical training.
 

Nothing Loud

Literally Cinderella
Member
Oct 25, 2017
9,961
I'm a BS in ChemE trying to beef up my data science skills. I'll be turning to this thread and some Udemy courses
 

maxxpower

Attempted to circumvent ban with alt account
Banned
Oct 25, 2017
8,950
California
Yay a data science thread! Hope to join you all in the field soon! :)

I'm currently applying to Masters programs in data science. I work as a data analyst for a university, but the work I do is mostly data processing and simple descriptives and regression models. Want to do something that takes more advantage of my statistical training.
When you say data processing do you mean like cleaning up datasets and feature engineering? Sorry, I'm not a practicing data scientist.
 

the_wart

Member
Oct 25, 2017
2,261
Frankly, I think the OP is likely to get people started down the wrong track. Before anyone jumps into whatever people are calling "machine learning" these days, you really really really need to invest in some basic understanding of probability and statistics. The single most important thing anyone needs to understand is the concept of generalizing from a sample to a population; if you can't understand what you are doing in those terms then you simply have no idea what you are doing at all.

Also, thinking hard about what kinds of data are available to you and what they can in principle tell you about your population of interest, and accordingly asking appropriate questions, is far more important than using whatever fancy new algorithm is the new hotness.
Jumping into the deep end with new and poorly-understood methods is just setting yourself up for expensive and incomprehensible failure.