Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.
- Deep learning for assisting the process of music composition
In the same vein as the “Composing Music With Recurrent Neural Networks“-article we found in our previous newsletter, this four-part post (part 2, part 3, part 4) shows off how the author uses deep learning techniques to compose folk music. Some of the tunes the algorithm produces sound really good.
- Simple/limited/incomplete benchmark for scalability/speed and accuracy of machine learning libraries for classification
This article made the rounds in the ML community in the past week. It’s a benchmark-work-in-progress, but it compares all modern and favorite techniques as implemented in R, scikit-learn, Vowpal Wabbit, H2O, xgboost, and Spark’s MLlib. xgboost and H2O appear to be in the lead.
- Let me guess where you’re from
On the website letmeguesswhereyourefrom.com you can enter a name, and an algorithm will print five countries the name seems come from. In this blog post, the author details how he used scikit-learn to construct the model.
- auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator
This project aims to free a machine learning user from algorithm selection and hyperparameter tuning. It leverages recent advances in Bayesian optimization, meta-learning and ensemble construction — the authors published their work at the AutoML workshop@ICML 2015 .
- UW CSE’s Pedro Domingos and Abe Friesen capture top prize at IJCAI with “magical” new algorithm
UW CSE professor Pedro Domingos and Ph.D. student Abe Friesen brought home the Distinguished Paper Award from the 2015 International Joint Conference on Artificial Intelligence (IJCAI) last month in Buenos Aires. They developed a new algorithm, Recursive Decomposition into locally Independent Subspaces (RDIS) [pdf], capable of solving a broad class of nonconvex optimization problems. The duo demonstrated that RDIS significantly outperforms standard optimization techniques when applied to complex problems such as protein folding and mapping three-dimensional space from two-dimensional images.
- Jupyter Ascending
The Jupyter team announces the release of IPython 4.0.
- Baidu explains how it’s mastering Mandarin with deep learning
At the International Neural Network Society conference on big data in San Francisco, Baidu senior research engineer Awni Hannun presented a new model that the Chinese search giant has developed for handling voice queries in Mandarin. The model, which is accurate 94 percent of the time in tests, is based on a powerful deep learning system called Deep Speech that Baidu first unveiled in December 2014.
- Building the next New York Times recommendation engine
Alexander Spangher from the New York Times details how the newspaper implemented their “recommended articles” functionality on the site, using a collaborative topic modelling approach, after trying both a pure content-based filtering and collaborative filtering approach.