Data Science Era |OT| Desktop BI, Deep Learning and Everything in Between

Tebunker

Member
Oct 25, 2017
3,599
Yeah it was a report designer, we do some analysis in our group now, but the nature of our structure and using Data Warehousing is different, and you can still do raw SQL pulls against it, we just happen to have several BI tools that make it so you don't have to. We can't teach our 5000 users SQL and most of them don't have the time. So we are reliant on doing good ETL and making sure we work hand in hand with the business to ensure we encompass all of their data needs. It is still a tried and true DW model.

A lot of the time our data marts are just massive relational star-schema DBs with tons of data and a lot of ways to dive in to it. Other times it is already very sliced up due to business rules.

I get what you are saying for sure though and I agree, you want to avoid a lot of assumptions. Especially going forward in analytics you don't want a lot of those presumptions. And when I say I have an ETL dev create a view, it is literally just, get me these four tables from DB A, these four from DB B and these 6 from DB C and give me these joins. It is what the guy does all day and he can turn around a MV real quick and then I can go in and pull in that data to my tool and do the actual analysis with no preconceived assumptions.

Also, just in general, I feel like the more and more I get exposed to new concepts and the way other companies operate in Data Science and Analytics, the more I think my current company isn't completely doing it right or completely understanding new concepts. I want to keep building and growing in this, but man it feels like they are missing the boat.
 
Last edited:

maxxpower

Member
Oct 25, 2017
2,716
California
I'm questioning whether I should continue my solo learning path on data science. Over the past two years I've learned and practice enough data analysis and machine learning to get a job but I have no intention of getting a Master's or PhD nor do I have the time or money. It sucks because I love this field but I want to make a career out of it. I love how easy it is to learn on your own.
 

Pau

Self-Appointed Godmother of Bruce Wayne's Children
Moderator
Oct 25, 2017
1,810
This past month has been a whirlwind.

I got rejected at one program, then accepted into three, and now I'm waiting on two more.

I honestly thought I would get accepted into one at most, so this is incredibly surprising. This is going to be such a hard decision to make, but the good kind of hard. I have until mid-April to decide!

Anyone have experience with or considering Masters in Data Science at Harvard, New York University, or University of Washington?

I'm questioning whether I should continue my solo learning path on data science. Over the past two years I've learned and practice enough data analysis and machine learning to get a job but I have no intention of getting a Master's or PhD nor do I have the time or money. It sucks because I love this field but I want to make a career out of it. I love how easy it is to learn on your own.
What's stopping you from applying to (and eventually taking on) data science jobs?
 

Totakeke

Member
Oct 25, 2017
881
Definitely pick NYU if you want to do cutting edge machine learning like deep learning stuff. They have both the faculty (Yann LeCun) and the connections. Not sure about Harvard but program looked quite technical last time I looked at it. No idea about University of Washington.
 

Totakeke

Member
Oct 25, 2017
881
I have the practical knowledge and experience but I feel like my lack of a formal education would essentially disqualify me from any data science position.
The amount of people with "formal" data science education is pretty small, since most of the data science programs are pretty new. Most data scientists came from and are still coming from other fields. So if your experience or educational background had a good amount of exposure to statistics and programming, then you're in decent shape. Of course experience in this field are highly valued, but as long you're not picky and you really want to enter the field, it's definitely doable. You just have to think about how to showcase your interest in your resume to stand apart from all the other people also applying for the same job. That could involve talking about some Kaggle competition you attempted or other kinds of data science exploration you did by yourself.

On the flip side, data science in a lot of smaller and medium-sized companies still requires a lot of analytics skills (which isn't really taught by any course and education) and experimentation skills are often far more useful compared to machine learning skills. That fact may be something the companies don't even realize themselves as they attempt to build their data science teams. Can't argue about the wealth of opportunities that exist though, definitely worth it if you're interested unless your current field is already pretty nice and comfy.
 

Blu10

Member
Oct 27, 2017
48
I have the practical knowledge and experience but I feel like my lack of a formal education would essentially disqualify me from any data science position.
I suspect most teams are like my team, in that they have a wide variety of roles and levels. I hired a guy with no experience in enterprise analytic tools (he knew sql) into a junior role a couple years ago, and he has absolutely blossomed. He’ll get his 3rd promotion this year, and will probably be my boss in under 5 years.

This is a path my team continues to follow today as junior positions open up. While it might not always be as successful as it was with that one guy, it is also a path I took when I joined the team. Don’t count yourself out, just look for the right analytic position, on the right team, and you’ll get your foot in the door.
 

Tebunker

Member
Oct 25, 2017
3,599
Does anyone use Power Pivot, Solver or Power Query in Excel? I am going through a screening call and these are some of the skills they want, and I am just sitting here wondering why not just use something like Power BI, and yes I get that money and costs etc, but this is a somewhat large Credit Union, they could afford a modern BI tool.
 
Oct 25, 2017
794
Does anyone use Power Pivot, Solver or Power Query in Excel? I am going through a screening call and these are some of the skills they want, and I am just sitting here wondering why not just use something like Power BI, and yes I get that money and costs etc, but this is a somewhat large Credit Union, they could afford a modern BI tool.
It's not so much the cost of the BI tool, but rather the cost of having to train your workforce on how to use that tool. It's annoying where I work too because people would rather use more limited software than adopt something that has a lot more capabilities.
 

Tebunker

Member
Oct 25, 2017
3,599
What are some recommended resources for learning Power BI outside of the ones in the OT
Free resourcea are a little tougher to come by, there are several Lynda/Linkedin courses that are worth pursuing. I’ve just kind of felt like Power Bi’s community hasn’t quite as grown like some other tools and it makes getting a lot of community support harder.

That should be changing with more Adoption.

I believe if you have some power pivot/power query stuff it can be applied to power bi too.
 
May 31, 2018
61
Free resourcea are a little tougher to come by, there are several Lynda/Linkedin courses that are worth pursuing. I’ve just kind of felt like Power Bi’s community hasn’t quite as grown like some other tools and it makes getting a lot of community support harder.

That should be changing with more Adoption.

I believe if you have some power pivot/power query stuff it can be applied to power bi too.
Thanks for your input. I'll do a little more research
 

impingu1984

Member
Oct 31, 2017
1,378
UK
Subbing to this thread... Didn't know it existed...

Working currently as a data scientist, been working in analytics for the past 7 years or so, knowledge of SQL and R and currently use Alteryx (the best piece of software I have ever used frankly) and PowerBI, have used tableau in the past as well.

May start learning python at some point.

BTW I have no degree at all simply have an aptitude for it and have managed demonstrate this to get the job I have now. So despite many places asking for one, personally I say you can get by without one.... Although it is the harder route no doubt.
 

Somnid

Member
Oct 25, 2017
2,004
Aside from some computational efficiencies is the any reason to not just use neural networks for everything? Like, are there real cases of more traditional methods yielding better accuracy anymore? I feel compute power is pretty much at the point I shouldn't worry about anything else.
 

Irnbru

Avenger
Oct 25, 2017
1,220
Seattle
Aside from some computational efficiencies is the any reason to not just use neural networks for everything? Like, are there real cases of more traditional methods yielding better accuracy anymore? I feel compute power is pretty much at the point I shouldn't worry about anything else.
It might not be the right model for everything, depending on the item a ensemble method might yield better results while understanding the math better. I find neural networks to be very black box. Very powerful though.

Also, cost of compute power for large companies is still a very large thing
 

impingu1984

Member
Oct 31, 2017
1,378
UK
Aside from some computational efficiencies is the any reason to not just use neural networks for everything? Like, are there real cases of more traditional methods yielding better accuracy anymore? I feel compute power is pretty much at the point I shouldn't worry about anything else.
Firstly computational efficiency is a extremely important factor in a decision as to what kind of implementation solution to a problem you're going to use, I don't feel you can just set that aside.

I recently setup a naive Bayes classifier that takes a couple of mins to train from 2 million records and predict a binary outcome for 250k other records with 85% accuracy Vs a no info 55% accuracy, and it took a day to setup.

Could we get better results? possibly, but it offers good enough results in a short space of time and can be fully trained so quickly it is practically easy to run on a whim.

That being said a lack of data is a good reason to use a more traditional machine learning algorithm or technique, again my simple naive Bayes classifier works well with limited subsets of training data.

Also my naive Bayes classifier offers great insight into what attributes effect the outcome, even ones that aren't observed often, it's hard to observe this in a neural net

But it's extremely limiting to just use neural networks for everything... And sometimes the simple solutions work well enough. Don't limit yourself just because of the new sexy hotness...
 

Eridani

Member
Oct 25, 2017
619
Aside from some computational efficiencies is the any reason to not just use neural networks for everything? Like, are there real cases of more traditional methods yielding better accuracy anymore? I feel compute power is pretty much at the point I shouldn't worry about anything else.
I'm hardly an expert on the topic, but from my knowledge there's a bunch of stuff that might make neural networks sub-optimal. For example, neural networks don't tend to perform as well without a large amount of data, which can be a very real problem in a lot of cases. People are constantly coming up with ways to go around that, of course, but if you have some niche use case that hasn't been explored yet and don't have a lot of data, you might be better off just using another, off-the-shelf classifier. It will definitely be a lot less of a hassle.

Interpretability is another big reason. This doesn't matter if you're only interested in the raw accuracy, but there are a lot of cases where you also want to know exactly how your classifier arrived at a conclusion. Something like an AI to replace a judge, for instance, should be able to clearly explain why it arrived at the sentence it did. While there are ways to add interpretability to neural networks, they might not be enough in a lot of cases. Taking a hit to accuracy is justified in cases like that.

Another reason is that neural networks don't work equally well for all types of data. Convolutional neural networks work incredibly well on images since they are able to exploit information about pixel positions, and similarly, RNNs and LSTMs work well on text and sequences since they take into account the positions and distances between characters/points. On the other hand, something like tabular data isn't as easy to work with for them. From what I've seen written online, more traditional approaches like XGBoost and various ensembles still achieve great results on Kaggle competitions (and win quite often, apparently), which mostly have data like that.

You might also have to fall back on more traditional approaches when dealing with unlabelled data. If someone just throws some data at you, with no labels, and tells you to figure out what to do with it (which isn't an unrealistic scenario), just throwing that into a neural network will not tell you anything about the data.

They can also be a lot of work in cases. For example, for game AI it's much easier to just use non-neural-network based methods. AlphaZero is technically the best chess AI, but it took an incredible amount of effort to create and an eternity to train (most companies simply don't have the resources to do something like that in a reasonable time-frame). Meanwhile, a simple search-based approach is still enough to beat every human. So you likely won't be seeing stuff like that in every games any time soon, even if some companies are working on it.

From a research point of view, limiting ourselves to only neural networks also isn't the best idea. Despite their impressive results, they are still fundamentally flawed and likely won't lead to something like artificial general intelligence. Focusing on things that aren't quite as good now but might be better in the future should still be done. That's exactly what happened with neural networks as well: they went from a discarded piece of technology no-one wanted to use to the biggest hotness in AI because some people remained working on them.

Neural networks are still super cool though. The above examples probably aren't even 100% true. I'm sure you could find examples of NNs working well on limited data, or being perfectly interpretable on a given domain, or working super well on completely unstructured data, or stuff like that.
 

Totakeke

Member
Oct 25, 2017
881
Eridani gave a pretty good set of answers. The realm of problems where deep learning should be the best solution is also simply a subset of all the problems you could potentially solve with machine learning. Within the requirement of large amounts of data for deep learning, it is also implicit that the problem space needs to be something that has the right answers that don't change much with time. A photo with a cat in it decades ago is still a photo with a cat today, and there's not much externalities that your dataset doesn't capture that might affect that conclusion. Also, the number of companies that have the data collection and technical resources to solve problems within that subset is also pretty small.

Not so much when you're predicting stock market trends, providing recommendations, or fighting fraud. There might be a lot of things that your dataset doesn't capture or it's just too much work to capture all the possible factors that might affect the results. Things always change and using less of your data may in fact provide better results. Overfitting is always a problem, there's seldom a right model for it and it's relatively easy to just try all sorts of different models before you rule any of them out.

Also data science projects tend to be an iterative process, your first attempt will often be far from the ideal solution. Having longer iteration times by using needlessly expensive models will just slow down your iterations. Also when you don't have interpretability and there are some obvious flaws with your model results, it becomes harder to diagnose why the model is failing the way that it is, or maybe you don't even know that your model has data leakage issues because people tend to not spend a lot of time looking through the model results.
 

Clay

Member
Oct 29, 2017
1,184
I have a Master's degree in econ and I'm trying to get into data science. I've had Data Analyst positions in the past but I did very basic stuff, basically plotting time series of employment, educational attainment, and other demographic data. I also know some basic programming, most in Stata.

I took some high-level stats courses but I'm pretty rusty since I don't use it on a daily basis. I recently bought a few math review books (stats, linear algebra, calc) and I've been teaching myself Python, which is going well.

I currently work a couple part-time jobs that aren't data-related at all. I loved stats and working with data in school but after graduating the jobs I found basically amounted to creating simple graphics in Excel, which was extremely boring. I'd love to get back into working with data but in a more stimulating role. My worry is that my resume will look horrible to potential employers since I never took college classes about programming, machine learning, data scraping, and other concepts that seem to be key to Data Scientist positions.

I've been looking through the resources in the OP but I wonder whether there are any certificates or licenses that would be useful to have. I've seen there are different certificates you can earn to prove you know how to use Excel or whatever but I'm always skeptical about how impressive they are. Am I wrong about this? Would it be useful to pursue certificates that show I know Python, SQL, or any other relevant concepts/ skills?
 

Totakeke

Member
Oct 25, 2017
881
I have a Master's degree in econ and I'm trying to get into data science. I've had Data Analyst positions in the past but I did very basic stuff, basically plotting time series of employment, educational attainment, and other demographic data. I also know some basic programming, most in Stata.

I took some high-level stats courses but I'm pretty rusty since I don't use it on a daily basis. I recently bought a few math review books (stats, linear algebra, calc) and I've been teaching myself Python, which is going well.

I currently work a couple part-time jobs that aren't data-related at all. I loved stats and working with data in school but after graduating the jobs I found basically amounted to creating simple graphics in Excel, which was extremely boring. I'd love to get back into working with data but in a more stimulating role. My worry is that my resume will look horrible to potential employers since I never took college classes about programming, machine learning, data scraping, and other concepts that seem to be key to Data Scientist positions.

I've been looking through the resources in the OP but I wonder whether there are any certificates or licenses that would be useful to have. I've seen there are different certificates you can earn to prove you know how to use Excel or whatever but I'm always skeptical about how impressive they are. Am I wrong about this? Would it be useful to pursue certificates that show I know Python, SQL, or any other relevant concepts/ skills?
Excel isn’t something I would go for, it is either something really to pick up or it’s only used because other people at the company is too entrenched in it to consider something else.

My usual advice for these kind of questions is to work backwards from what is the ideal job that you want to obtain. With your background in econ, it is possible that the work that you want to do involves more stata and excel. Go look at job postings and see what skills that they want you to have, and then go from there.
 

Clay

Member
Oct 29, 2017
1,184
Excel isn’t something I would go for, it is either something really to pick up or it’s only used because other people at the company is too entrenched in it to consider something else.

My usual advice for these kind of questions is to work backwards from what is the ideal job that you want to obtain. With your background in econ, it is possible that the work that you want to do involves more stata and excel. Go look at job postings and see what skills that they want you to have, and then go from there.
Thanks!

Good advice, I'll look into some postings. Are there any certificates or licenses that are just generally good to have though?
 

Totakeke

Member
Oct 25, 2017
881
Personally for someone with a light resume I would prefer hobby projects over a certificate. There's no equivalent to standardized IT certificates so the value of a certificate is pretty much tied to whether the people who are hiring you have been through the same programs. So if you really need to, just pick the popular ones, otherwise I don't think it's that valuable generally.

Edit: I wouldn't get certificates in Python/R/SQL necessarily, but it's definitely valuable to get a formal education in statistics, a/b testing, machine learning, and deep learning since that's harder to learn by yourself and know that you're doing it right. Again, which one is more important will depend on the job that you want to do.
 
Last edited:

Spliced-Up

Member
Oct 29, 2017
18
I'm currently working as a data analyst / implementation specialist and I'm very interested in making data science the next step in my career. Was wondering what would be the best areas to focus on to make that transition. I have a background and degree (but from a for profit, basically worthless) in software development and I currently mainly work on writing queries, stored procedures, and some SSIS packages to migrate data from one system/format to another. As of now I have solid skill and experience with SQL and a variety of object oriented coding languages.

Should I start with learning Python and R? Or would I want to focus on the statistics and machine learning side of things first? I see it being recommended that I focus on doing my own projects instead of working towards certifications, so I would ideally like to start working towards that as quickly as possible.

Thanks for any input.
 

LakeShore

Member
Oct 28, 2017
69
Been following this thread for a while, but figure I'd make a comment in here.

So on last weekends football results, I tried to predict the scores with the Poisson Distribution. I spent the Friday morning taking the variables for the premier league table. Total games played, total goals scored home and away.

Then created the attack strength and defence strength and the average home and away goals scored for opponents.

I had:

Liverpool 4 - Huddersfield 0 : 14.37% likelihood
Spurs 1 - West Ham 0: 16.53%
Crystal Palace 0 - Everton 1: 17.75%
Fulham 1 - Cardiff 0: 13.5%
Southampton 1 - Bournemouth 1: 9.82%
Watford 1 - Wolves 1: 13.40%
Brighton 0 - Newcastle 0: 19.18%

Well, that went terribly:

Liverpool 5-0
Spurs Lost 0-1
Palace and Everton 0-0
Fulham 1-0 (Woooooo)
Southampton 3 - Bournemouth 3 (so was still a draw, but not the expected goals frequency)
Watford lost 1-2 to Wolves
Brighton 1 - Newcastle 1 (so was another draw, but again, not expected goals frequency)

So if betting on results, would have got 4 correct (liverpool W, Fulham W, Southampton D, Brighton D), but actual correct scores, still some way off. But it was fun though.
 

Haselbacher

Member
Oct 27, 2017
150
Been following this thread for a while, but figure I'd make a comment in here.

So on last weekends football results, I tried to predict the scores with the Poisson Distribution. I spent the Friday morning taking the variables for the premier league table. Total games played, total goals scored home and away.

Then created the attack strength and defence strength and the average home and away goals scored for opponents.

I had:

Liverpool 4 - Huddersfield 0 : 14.37% likelihood
Spurs 1 - West Ham 0: 16.53%
Crystal Palace 0 - Everton 1: 17.75%
Fulham 1 - Cardiff 0: 13.5%
Southampton 1 - Bournemouth 1: 9.82%
Watford 1 - Wolves 1: 13.40%
Brighton 0 - Newcastle 0: 19.18%

Well, that went terribly:

Liverpool 5-0
Spurs Lost 0-1
Palace and Everton 0-0
Fulham 1-0 (Woooooo)
Southampton 3 - Bournemouth 3 (so was still a draw, but not the expected goals frequency)
Watford lost 1-2 to Wolves
Brighton 1 - Newcastle 1 (so was another draw, but again, not expected goals frequency)

So if betting on results, would have got 4 correct (liverpool W, Fulham W, Southampton D, Brighton D), but actual correct scores, still some way off. But it was fun though.
I love this!
Can you explain some more, what you did and how?

I wanted to do something similar. But I think sports with more games, bigger sample size may be better. Like NBA or MLB.
Just from the data point of view.

But I think your predictions are not that bad!
 

HarryHengst

Member
Oct 27, 2017
465
I love this!
Can you explain some more, what you did and how?

I wanted to do something similar. But I think sports with more games, bigger sample size may be better. Like NBA or MLB.
Just from the data point of view.

But I think your predictions are not that bad!
This explains the process pretty well: https://help.smarkets.com/hc/en-gb/articles/115001457989-How-to-calculate-Poisson-distribution-for-football-betting


Also, for everyone interested in data science, you will have to pick up statistics and probability. To do that you need, among others, calculus. If you've never done it, or got nightmares from your college classes, the solution is Professor Leonard. He filmed his classes he teaches at some community college and he is the absolute best at explaining this stuff in a way that makes you go ''huh, so calculus isnt hard after all?!?!". He has full playlists for Calculus I-III, Statistics, and Algebra (in case you need to work on your pre-calculus stuff), and is currently working on a series on differential equations.
 
Last edited:

Tebunker

Member
Oct 25, 2017
3,599
Anyone use Qlik in their jobs? I am moving to a company using Qlik view and Qlik Sense and I will be helping build user adoption and training for self service analytics while also developing reporting and analytics for the leadership.

I would have preferred a job using Tableau or Power Bi but this one has me excited because they want to help me learn python and more sql so I can fill in on the agile team
 

LakeShore

Member
Oct 28, 2017
69
I love this!
Can you explain some more, what you did and how?

I wanted to do something similar. But I think sports with more games, bigger sample size may be better. Like NBA or MLB.
Just from the data point of view.

But I think your predictions are not that bad!
Hey, Sorry for the late reply. as HarryHengst has commented above, I used this link as a guide to work out the figures. I did it all on excel - https://help.smarkets.com/hc/en-gb/...ate-Poisson-distribution-for-football-betting

NBA could be done perhaps? just there'd be a lot higher averages and attack / defence scores to apply. And as for the variable, I'd not know what the ceiling is in NBA, where as in football, one team scoring 6 goals a game would usually be the most. There must be some forms of adjusting this process to other sports though. I'd figure it'd work for NHL as they're similar to football with regards to scores per game right? Maybe on the odd occasion a team might score +6 goals?
 

fanboi

The Fallen
Oct 25, 2017
2,488
Sweden
We are currently implementing Metabase in one of our projects I am running for generating reports and data for the company (and for the clients we work for as well).

The tool is open source and is incredible easy to use, and powerful.
 
May 31, 2018
61
What are some recommended resources for learning Power BI outside of the ones in the OT
Thanks for all the previous help with my Power BI questions.

While searching through Reddit I found this Microsoft certification - Analyzing and Visualizing Data with Power BI
https://www.reddit.com/r/PowerBI/comments/boode8/power_bi_certification_70778_microsoft_or_edx/

I will probably start studying for this exam tonight using the edx lesson plan and the goal to take the exam in 2-3 weeks. I'm passing this information along for anyone that would like to join me in studying or interest in learning Power BI.