Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.
- Spark has gotten a large update and now supports R!
Apache Spark is a powerful open source processing engine built around speed, ease of use, and sophisticated analytics. Runs on top of Hadoop, but doesn’t need to.
- Jupyter is getting traction
We’ve mentioned Jupyter before in our newsletter as the follow-up of IPython. A lot has happened in just a short span of time, and the project now supports a wealth of languages, including R, Lua, Julia, and of course Python.
- SparkR preview in Rstudio
No worries for you RStudio users, as the RStudio blog already describes how you can get up and running with Spark directly within your favored development environment.
- Rodeo has gotten an update as well
Another tool we’ve mentioned when it was first released has received many updates. Rodeo is a data centric IDE for Python, and now supports Spark out of the box as well.
- This O’Reilly articles shows how to perform dimensionality reduction at the command line
For those data scientists who love working at their command line.
- Escher lets you build interactive web UIs in Julia
Similar as R’s Shiny, this project aims to bring web-native visualisations to the Julia ecosphere.
- Baidu fires researcher tied to contest disqualification
Following the discovery of Baidu scientists breaking the rules of a high-profile international computer vision contest, the Chinese web service said that it had fired the team leader.
- Extending “Let It Go” with LSTM
A fun one: recurrent neural networks get all the hype these days, and this page shows how a recurrent neural network (with Long Short Term Memory) generates the next thousand of bytes of the popular song “Let It Go”.
- MarI/O – Machine Learning for Video Games
Another fun entry to close the list: this video shows how a neural network can be evolved to learn how to play Mario, using NEAT (NeuroEvolution of Augmenting Topologies, which combines neural networks with a genetic algorithm based training method). We wonder how it generalizes on unseen instances (i.e. levels), but the video is interesting nonetheless.