Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.
- Towards Neural Network-based Reasoning
In this paper, the authors propose a framework for neural network-based reasoning over natural language sentences. They show how their technique is capable of Positional Reasoning and Path Finding.
- Large Scale Decision Forests: Lessons Learned
The folks at Sift Science detail their investigation into random forest based models. Their findings: transform sparse features to a single set of rate indicators (similar as this implementation of Leave One Out Encoding), deal with missing features by considering it as a third split instead of imputing, don’t split small leaves, keep trees shallow but use many, use Gini impurity as a splitting measure. In addition, they consider the prediction of the leaf node and its parent node to smooth predictions.
- Mocha.jl: Deep Learning for Julia
After Python and Lua, Julia now also gets its deep learning framework which is able to interact well with the GPU: Mocha.
- A Word is Worth a Thousand Vectors
A great post showcasing a solid use case for the word2vec algorithm.
- k-means++ Silhouettes
This post illustrates the k-means++ clustering method using an interactive example. Devised in 2007 by Arthur and Vassilvitskii, k-means++ solves the NP-hard problem of how to choose centroids in the naive k-means clustering problem and provides an answer that is provably close to the optimum clustering solution.
- Beyond Trending Topics: identifying important conversations in communities
Betaworks describes how they identify, follow and reach communities on Twitter: “It is still hard to evaluate which items appear on a regular basis, and which are more unique. Wouldn’t it be great to know when activity around a certain hashtag is unique? Or more specifically, deviates from the expected behavior?”
- Doing data science at Twitter
Robert Chang talks about his two year data science journey at Twitter.
- You don’t need a data scientist (yet)
Yanir Seroussi outlines his reasons on why you might not yet need a data scientist.
- Maxima Core beats DeepMind: plays Atari games on a Raspberry without domain knowledge [youtube]
Maxima Core has released a video showcasing their AI apparently beating Google’s Deepmind. They also released a follow-up video with more details.
- Python, Machine Learning, and Language Wars. A Highly Subjective Point of View
Sebastian Raschka talks about his favorite Python packages, tools, and techniques.
- Domino Datalabs: To Jupyter and beyond
Domino announces support for Jupyter with R, Python, and Julia kernels. Eye-catching feature is the fact that they allow to schedule notebooks for execution as batch jobs, saving the output as an HTML report.
- A Neural Algorithm of Artistic Style
Without a doubt the most eye-catching news in the past weeks, featuring some of the most compelling imagery generated by convolutional neural networks since “deepdream”, a paper was published introducing a deep learning network that creates artistic images in the same style as a “reference image” (pictures can be seen here as well). Since the release, people have been working on a number of hand-crafted implementations:- Comparing Artificial Artists: a blog post showing off the results obtained by this implementation, created with Torch
- Another implementation using Torch
- Another implementation, using DeepPy
- A notebook showing off another implementation using Lasagne