• Ever wanted an RSS feed of all your favorite gaming news sites? Go check out our new Gaming Headlines feed! Read more about it here.

iamgone9828

Member
Oct 29, 2017
176
Subbing. Recent Master's grad in BI in the first few months of my data scientist career (currently working in customer analytics in banking) and I absolutely love the field. It's one of those rare fields where the more you learn, the less you know in the sense that there is so much to learn and so many areas to specialize in, so many datasets to explore...it's a beautifully challenging environment.

Cheers to all of you, hope this thread turns into a fun community.
 

spyder_ur

Member
Oct 25, 2017
11,411
While I wouldn't necessarily consider myself a data scientist, I certainly do many of these things in my role in higher ed development (fundraising) at a well-known Boston college - probably not a field people naturally associate with this stuff. My role is a manager of my team, so I fall more on the Analyze/Communicate end of things. We use Tableau, and have a programming staff that use SQL, among other listed tools. I'm also in charge of developing policies governing our data collection and maintaining data integrity throughout the system for the purpose of processing and reporting.

Definitely keeping track of this thread.
 
Oct 25, 2017
1,465
Does anyone know of some good tutorials for using Tableau ?

I learned how to use Tableau a year ago, and working with it everyday, I've become very proficient with it. Tableau actually offers some free tutorial videos and a sample dataset on its website (you'll need to create an account), and that was good for me to learn the basics. Actually becoming proficient on it took a lot of practice at my job with really large and complicated datasets.

There are also tons of data viz bloggers that provide practical and not-so-practical tips and tricks on random data.
 

Pau

Self-Appointed Godmother of Bruce Wayne's Children
Member
Oct 25, 2017
5,837
When you say data processing do you mean like cleaning up datasets and feature engineering? Sorry, I'm not a practicing data scientist.
Yeah, it's mostly cleaning up raw data in CSVs and converting them to SAS datasets and then doing some feature engineering (although we definitely don't call it that :P).
 

Tebunker

Member
Oct 25, 2017
3,844
Does anyone know of some good tutorials for using Tableau ?
I learned how to use Tableau a year ago, and working with it everyday, I've become very proficient with it. Tableau actually offers some free tutorial videos and a sample dataset on its website (you'll need to create an account), and that was good for me to learn the basics. Actually becoming proficient on it took a lot of practice at my job with really large and complicated datasets.

There are also tons of data viz bloggers that provide practical and not-so-practical tips and tricks on random data.

Tableau's web based videos are amazingly good to get started. On top of that the community is amazing. Look for a local Tableau User Group. And on linked in definitely seek out gurus and follow them.

I have learned so much the last couple of years.

Lastly you can also watch and download all of the hands ons and break outs from their conference. Just look up Tableau Conference 18 and then look up the sessions and gonby skill level
 

Haselbacher

Member
Oct 27, 2017
341
Got my Computational Inteligence & Data Science Masters last year aged 35 for a career change. Been in a job a year and a bit now, but it's a startup and I'm doing lots of analysis and analytics and no ML at all, which is what I went into it all for.

Did you get your Masters full time? Or besides a job?

I really like Data Science. However, I am a Mechanical Engineer and my Python/programming skills are mostly self taught. So not really sophisticated. So at the moment I don't really see a chance in the Data Science field.

I guess I stick around with doing Udemy courses and doing minor project on the side.
 
OP
OP
Raticus79

Raticus79

Community Resettler
Member
Oct 25, 2017
1,034
Frankly, I think the OP is likely to get people started down the wrong track. Before anyone jumps into whatever people are calling "machine learning" these days, you really really really need to invest in some basic understanding of probability and statistics. The single most important thing anyone needs to understand is the concept of generalizing from a sample to a population; if you can't understand what you are doing in those terms then you simply have no idea what you are doing at all.

Also, thinking hard about what kinds of data are available to you and what they can in principle tell you about your population of interest, and accordingly asking appropriate questions, is far more important than using whatever fancy new algorithm is the new hotness.
Jumping into the deep end with new and poorly-understood methods is just setting yourself up for expensive and incomprehensible failure.

You'd probably like the Udemy course I linked on machine learning then... the first 20 or so videos are going over basics before he starts getting into fancier stuff. Anything you'd like to include in a stats section?
 

Jpop

Banned
Oct 27, 2017
2,655
You'd probably like the Udemy course I linked on machine learning then... the first 20 or so videos are going over basics before he starts getting into fancier stuff. Anything you'd like to include in a stats section?

Psych/Research statistics is a good reference and has tons of overlap with Data Science.
 

Commedieu

Banned
Nov 11, 2017
15,025
Ah thanks for this.

Coursera has good courses on deep and machinr learning taught by andrew ng.


Should be added to link..

And "deep learning crash course" from siggraph really help explaining things.
 
OP
OP
Raticus79

Raticus79

Community Resettler
Member
Oct 25, 2017
1,034
All right, I added links for the Coursera course and the SIGGRAPH talk, put RapidMiner at the end of the tools section and rearranged things a bit.
 

Tebunker

Member
Oct 25, 2017
3,844
Okay I have been testing the free Datacamp SQL for Data Science

It is pretty basic, but it is free and may be good for a lot of you to start with if you are interested:

https://www.datacamp.com/courses/intro-to-sql-for-data-science

I'd already learned a lot of it in my pl/sql class, but I really like the set up of Datacamp.

Also, there are just too many self teaching websites now. Between Datacamp, Treehouse, Coursera, Udemy, Stackskills and Codeacademy I have found like 20 Data Science and BI tracts. It's very hard to discern what is truly good and useful. But I figured I'd share some more places in case you guys are looking.
 
Oct 28, 2017
53
Okay I have been testing the free Datacamp SQL for Data Science

It is pretty basic, but it is free and may be good for a lot of you to start with if you are interested:

https://www.datacamp.com/courses/intro-to-sql-for-data-science

I'd already learned a lot of it in my pl/sql class, but I really like the set up of Datacamp.

Also, there are just too many self teaching websites now. Between Datacamp, Treehouse, Coursera, Udemy, Stackskills and Codeacademy I have found like 20 Data Science and BI tracts. It's very hard to discern what is truly good and useful. But I figured I'd share some more places in case you guys are looking.
Datacamp is great and worth the sub. Kaggle comps are great testbeds but I'd find something you find interesting and "Data Science" it.

Like I do loads of running so I did a Data Science on my runs from Strava using the API
 
OP
OP
Raticus79

Raticus79

Community Resettler
Member
Oct 25, 2017
1,034
Okay I have been testing the free Datacamp SQL for Data Science

It is pretty basic, but it is free and may be good for a lot of you to start with if you are interested:

https://www.datacamp.com/courses/intro-to-sql-for-data-science

I'd already learned a lot of it in my pl/sql class, but I really like the set up of Datacamp.

Also, there are just too many self teaching websites now. Between Datacamp, Treehouse, Coursera, Udemy, Stackskills and Codeacademy I have found like 20 Data Science and BI tracts. It's very hard to discern what is truly good and useful. But I figured I'd share some more places in case you guys are looking.

All right, I added that, thanks. Yeah, there really are a ton out there, so that's why I was sticking to ones people have tried personally.
 

RailWays

One Winged Slayer
Avenger
Oct 25, 2017
15,665
Okay I have been testing the free Datacamp SQL for Data Science

It is pretty basic, but it is free and may be good for a lot of you to start with if you are interested:

https://www.datacamp.com/courses/intro-to-sql-for-data-science

I'd already learned a lot of it in my pl/sql class, but I really like the set up of Datacamp.

Also, there are just too many self teaching websites now. Between Datacamp, Treehouse, Coursera, Udemy, Stackskills and Codeacademy I have found like 20 Data Science and BI tracts. It's very hard to discern what is truly good and useful. But I figured I'd share some more places in case you guys are looking.
To add onto this, DataQuest is excellent as well.
 

Joni

Member
Oct 27, 2017
19,508
I always feel like the old dude in town when people talk about data science. I'm focusing on data lakes and making sure everyone gets the data as correctly as possible.
 

Menaged

Member
Oct 29, 2017
568
Wow, thank you for this thread!
I recently finished my Masters in Social Psychology and am deeply interested in this field.

I have some time off soon, so I'm thinking of putting some effort in coding and the like (already know some basic R, which is a nice starting point I guess)
 
OP
OP
Raticus79

Raticus79

Community Resettler
Member
Oct 25, 2017
1,034

Added, thanks! Also added a few more math focused links for the prerequisites mentioned.

I always feel like the old dude in town when people talk about data science. I'm focusing on data lakes and making sure everyone gets the data as correctly as possible.

Similar story here - my exposure to it started with helping the data scientists with some queries. This whole thing is a bit of a "learn through explaining" research effort for me... looking forward to making my abstract understanding of it all a bit more concrete.
 
Last edited:

RailWays

One Winged Slayer
Avenger
Oct 25, 2017
15,665
Anything good in the free content or is the recommendation mostly for the subscription?

Adding this one for probability and statistics: https://seeing-theory.brown.edu/
The free content is definitely helpful for Python basics with a data-driven context, and some basic overview of numpy/pandas/data visualization. It's nice for those who like a more hands-on curriculum, since I learn more by doing tasks. Haven't explored much of the subscription content though.
 
OP
OP
Raticus79

Raticus79

Community Resettler
Member
Oct 25, 2017
1,034
The free content is definitely helpful for Python basics with a data-driven context, and some basic overview of numpy/pandas/data visualization. It's nice for those who like a more hands-on curriculum, since I learn more by doing tasks. Haven't explored much of the subscription content though.

Oh ok, their pricing page made it sound like only the missions were free (as in a list of things to try), didn't realize those missions included the training itself. Added in the python section, thanks.
 
Last edited:

ElMexiMerican

Member
Oct 25, 2017
1,506
Lately I've been thinking about going back to school for a master's in data science/business analytics. I graduated last year with a marketing bachelor's, but one of my favorite courses was my analytics class where we learned about Tableau. It seems like a lot of posters here had success taking a master's program, so for those of you that have been - would you mind speaking on where you specifically went to get your degree? I just started looking at programs, but I feel a little overwhelmed on where to go and whether to take an online program on in-class one. Hearing about experiences from people who have been through successful programs would be appreciated.
 
May 31, 2018
153
Lately I've been thinking about going back to school for a master's in data science/business analytics. I graduated last year with a marketing bachelor's, but one of my favorite courses was my analytics class where we learned about Tableau. It seems like a lot of posters here had success taking a master's program, so for those of you that have been - would you mind speaking on where you specifically went to get your degree? I just started looking at programs, but I feel a little overwhelmed on where to go and whether to take an online program on in-class one. Hearing about experiences from people who have been through successful programs would be appreciated.
I also graduated with an Bachelors in Marketing and currently looking for a Master in Health Administration/Health Infomatics(focus in Analytics). I'm currently working for healthcare organization and been during self learning courses recently.

I've looked into Masters at Syracuse Applied Data Science and Berkeley Data Science for a Masters outside of healthcare . Both are online programs.
 

f0rk

Member
Oct 25, 2017
1,694
I always feel like the old dude in town when people talk about data science. I'm focusing on data lakes and making sure everyone gets the data as correctly as possible.
I mean this is way more important, 95% of organisations aren't close to being in a position to apply any sort of real data science because their data is such a mess. Even areas that have been doing similar stuff for years before anyone gave called it data science, like credit risk or whatever. Most banks are still quite far from doing anything new with all the data they have because they are really struggling to understand and unlock it.

I work for a Big 4 data team and while we employee data scientists and talk about all the smart sexy stuff we can help clients with, it's still a relatively small (but growing) part of our work just because of how much help clients need with the basic data stuff that needs to come well before the cutting edge.

This is why what someone else mentioned about understanding the original problem is so important, a lot of the time you can add a lot of value just by using SQL instead of Excel, without having to apply complicated models stakeholders aren't really going to understand. And I would say most corporate businesses are still at the stage of trying to get the basics right rather than applying some machine learning algorithm to everything.
 
Last edited:

Nothing Loud

Literally Cinderella
Member
Oct 25, 2017
9,971
I just got accepted into a PhD engineering program! So now I'm gonna spend my free time until September learning all the data visualization and data science I can to help me in my grad studies and make me proficient in programming for data science :)
 

Pankratous

Member
Oct 26, 2017
9,238
I had an entire class on BI at university and I can't remember any of it at all. Maybe I should read this thread.
 
OP
OP
Raticus79

Raticus79

Community Resettler
Member
Oct 25, 2017
1,034
Siraj Raval posted this overview of current deep learning frameworks a few days ago. It has a good practical focus and ends with some recommendations for deployments on different platforms.
https://youtu.be/SJldOOs4vB8
 
Oct 30, 2017
55
Hey guys,

Ive been looking into Business intelligence for the past few days. Specifically i need to accurately know how profitable our products are which are sourced from different suppliers, at different costs, sold to multiple channels at different prices.

Our current ERP / Business software isnt capable of doing the calculations in regards to actual profits. As is now, we manually set a set cost price which is leading for all sales made at set date. For example if we set cost for 10 euro's and we sell at 12 the system says a 2 euro profit. If we source for 9 euros, the system says 2 euros profit because the cost was set at 10 euro. Hence we never accurately know how profitable a product is.

End game i would like the following.

Per Product: Product Revenue vs Actual Sourced Product cost from all different suppliers: Actual Profit
Per supplier: products sourced vs FIFO Generated sales: Profit per supplier
Per sales channel: All sales revenue - fifo sourced products: profit per channel

All of course should be able to be broken down per selected term.

Our ERP cant handle the calculations as of now, currently we have +10K different sales, unique order lines, on a selection of over 100K different products, on over +100 customers (sales channels), with products sourced from 100+ suppliers per day.

Excel wouldnt cut it and our ERP cant handle it. I am able to export all accurate cost / sales / per supplier / channel data.

What would be best for me? Currently situation is getting unmanageable.
 

Geido

Member
Oct 30, 2017
1,097
Interesting topic! Subbed!

I currently work as an application consultant an specific process automation system. But I really like the reporting and dashboarding part of my job. I've kind of taught myself SQL and I'm a bit surprised by the responses here that SQL is that important. I thought having no python knowledge would be the biggest issue in pursuing a DS career.

I have no CS background (Master of arts in HR actually) and rolled into this role more from the consultant end. So I usually think I'm not techy enough for DS. Am I wrong?
 

erd

Self-Requested Temporary Ban
Banned
Oct 25, 2017
1,181
Siraj Raval posted this overview of current deep learning frameworks a few days ago. It has a good practical focus and ends with some recommendations for deployments on different platforms.
https://youtu.be/SJldOOs4vB8

I pretty much only use Keras/Tensorflow, so I was hoping this video would give me a nice overview of different frameworks that I'm not familiar with. However, there are a few points where I'm just really confused

- For Tensorflow:
  • The video starts off with a meme that seems to diss the documentation and then follows that by immensely praising the documentation. I feel like I'm missing something here.
  • Says that one of the main drawbacks is that you need to specify the number of layers. Is there a framework that can construct a network without having to specify which layers are in it? Because that seems a bit strange to me but it sounds like something that would be fun to check out.
  • Says the other main drawback is that it lacks eager execution, but then say that it does have that as an option but "it isn't native and will get even better in 2.0". That makes it seem like that option is currently bad, but the video doesn't really elaborate on why it's bad so I'm just kind of confused.
- For pytorch, he says that by looking at NIPS 2018 "it's clear that researchers tend to prefer pytorch to tensorflow" and I don't really understand how he arrived at that conclusion considering:
  • Looking at the accepted papers/workshops from 2018, this preference doesn't seem clear at all. Out of all the accepted papers, 11 explicitly mention pytorch and 9 explicitly mention tensorflow. The others (over 1000 of them) do not explicitly mention either so without reading every one of them it's hard to say what they used. I'm not sure how you could extrapolate a clear preference from that. If some other methodology was used it would be nice to know what it was.
  • It goes against more comprehensive comparisons (like this or this), which show that tensorflow has a huge lead in popularity among the research community, with Keras also being very popular.
  • He even says Google uses tensorflow, which would make it highly relevant to researchers. I don't know why he finds it curious DeepMind is using Google's own product over something made by a competitor. It seems pretty obvious to me.
- For Keras:
  • The video says "Building a massively complicated deep learning model can be done in just a few lines of code" while the background slowly scrolls over pretty much the simplest LSTM model possible. I don't know how to feel about that.
- For mxnet, things just get really weird. The video says that one of its main advantages is that it scales really well with multiple gpus, as was demonstrated by benchmarks produced by Amazon's CTO. While this is explained, the video shows this graph:
Sam3X44m.png

The graph shows the FP32 utilization by mini-batch size and was produced by scientists at the EcoSystem Research Group at University of Toronto and Project Fiddle at Microsoft Research, Redmond (the source is here, since the video doesn't really bother including it). So it looks basically completely unrelated to what the video is talking about. Am I missing something here?

I feel like watching the video has made me more confused than I was beforehand, though it does give an nice overview of what frameworks are available so it's still nice in that respect.
 

Tebunker

Member
Oct 25, 2017
3,844
So I have now had like 3 interviews outside my company where they say they don't want an overly technical person, but then say they want 2+ years of SQL Querying.

I post this, because we have talked about several differing tools and skillsets, and I am still seeing a lot of companies want hardcore SQL. I get why, but in case people are interested in this career field, make sure to have a wide range of these skills. Honestly though I was a little put off by the last one, I can do great stuff with Tableau and Power Bi among other tools with a basic SQL knowledge, so they are missing out.

SQL is useful, but I haven't felt the need to use it when right now every tool is pushing away from that model. I will still brush up and refine my knowledge, but man it has been frustrating.
 
Oct 25, 2017
1,465
So I have now had like 3 interviews outside my company where they say they don't want an overly technical person, but then say they want 2+ years of SQL Querying.

I post this, because we have talked about several differing tools and skillsets, and I am still seeing a lot of companies want hardcore SQL. I get why, but in case people are interested in this career field, make sure to have a wide range of these skills. Honestly though I was a little put off by the last one, I can do great stuff with Tableau and Power Bi among other tools with a basic SQL knowledge, so they are missing out.

SQL is useful, but I haven't felt the need to use it when right now every tool is pushing away from that model. I will still brush up and refine my knowledge, but man it has been frustrating.

I hear you, but at the same time, some companies and organizations have really complicated databases. Most of my time is spent in Tableau, but I've had to write SQL queries in Tableau to make my visualizations/analyses to work.
 

Tebunker

Member
Oct 25, 2017
3,844
I hear you, but at the same time, some companies and organizations have really complicated databases. Most of my time is spent in Tableau, but I've had to write SQL queries in Tableau to make my visualizations/analyses to work.
I am probably spoiled too in that I can grab an ETL developer and have a view or table created in quick order. I just don't think I should have to demonstrate I can write three table joins in sql, when a lot of that work should be done in a relational DB.

But I do see where you are coming from, not every company does data warehousing or builds their DBs in a common sense schema.
 

Totakeke

Member
Oct 25, 2017
1,673
You don't need to do SQL to do data processing or munging but it's still vital to do the initial data pull as it is still the most common language across all data platforms. Also if you can avoid data transfer across different environments, it always speeds up data processing. So being good at SQL (which has a relatively low ceiling) is always good to do unless you work in situations where your datasets are already mostly isolated out for you.
 

Tebunker

Member
Oct 25, 2017
3,844
You don't need to do SQL to do data processing or munging but it's still vital to do the initial data pull as it is still the most common language across all data platforms. Also if you can avoid data transfer across different environments, it always speeds up data processing. So being good at SQL (which has a relatively low ceiling) is always good to do unless you work in situations where your datasets are already mostly isolated out for you.

I mean like posted above, I agree it is useful, I was more bugged by the notion that a non-technical position needed strong sql skills. I have no problem with getting better at it, but I work in a Data Warehouse environment that uses good RDBM and I don't really have to worry about doing the initial data pulls because our DW team has done all the hard work, I just connect to my Oracle schema and pull in what I need with minimal SQL filtering or filtering in general. I guess I just took for granted that a lot of larger companies worked in similar manners.

Data Warehousing has evolved a lot, and I guess I just wasn't prepared for the expectation of a non-technical position needing to be doing joins via sql, when we just build that in our DW first. Even better if you can get a materialized view and have alot the heavy lifting done before you hit the table.

It's why our company uses Business Objects and it is really convenient for our end users. We are transitioning to Tableau and plan to give those same kind of tables and views to the users as Data Sources.
 

Tebunker

Member
Oct 25, 2017
3,844
What's the definition of a non technical position in this case?
It was a report designer role with experience in Tableau, OBIEE, Microstrategy and Power Bi. A front end position for their data team because no one on that team wanted to do reports or be the front facing team member.

I knew they were expecting some SQL knowledge, like I said I am just above basic, I can troubleshoot, understand how it all flows and works, and can write more basic queries. When I got the SQL exercise I was being asked to do three table joins, left joins, full outer etc and pull in partitioned data etc. I thought I showed that I understood the concepts and what they wanted, but sheesh I was really hit from the side on it. The recruiter was like, yeah we've never hired for a non-technical role. I mean, to be clear I am taking it as a learning opp. I am going to get my hands a lot dirtier and try writing a lot more SQL going forward.

When I hear non-technical it tells me I am not going to have to hard write a lot of code, whether python or sql or R or anything related. My role is similar but I also do server admin functions for BO and Tableau so I thought I was a good fit. I write complex reports and analysis all day, but I do in tool 99% of the time.
 

Totakeke

Member
Oct 25, 2017
1,673
Okay, if it doesn't involve any analysis, which what you described could be, it's fine.
Otherwise for anyone doing analysis I would definitely raise an eyebrow if they had to rely on others for ETL or SQL pulls. Any assumption made along the way to generating the data is worth investigating and often there's a lot of implicit ones.