Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.
- Project Jupyter: the new name for IPython notebooks
The language-agnostic parts of IPython are getting a new home in Project Jupyter, meaning that it will become easier to use the popular web-based “data science notebook” with a variety of language kernels, including R and Julia.
- A bleeding edge RStudio look-alike for Python: Rodeo
Speaking of data science IDE’s, Rodeo is a brand-new, under development data centric IDE for Python, similar in style to RStudio.
- Optimize hyperparameters with Spearmint
Spearmint is a Python library which is able to optimize hyperparameter configurations using Bayesian Optimization, a more advanced alternative to simple grid based optimization, but better able to keep the number of experiments to be ran under control and more effectively reach an optimum configuration.
- Google announces Google Cloud Bigtable
Google has recently announced the availability of their “Google Cloud Bigtable” product, a high-performance, extremely scalable NoSQL database service accessible through the open-source Apache HBase API. According to Google, Bigtable already drives nearly all of the company’s largest applications, shows better performance than other NoSQL alternatives, and can be natively integrated with an Hadoop stack.
- “The Data Science Handbook” released
Not a technical handbook, but rather a “compilation of in-depth interviews with 25 remarkable data scientists, where they share their insights, stories, and advice.”
- Cigarettes, damn cigarettes and statistics
In his recent article, Tim Harford makes a case for correlation: “Large data sets can throw up intriguing correlations that may be good enough for some purposes. (Who cares why price cuts are most effective on a Tuesday? If it’s Tuesday, cut the price.) Andy Haldane, chief economist of the Bank of England, recently argued that economists might want to take mere correlations more seriously. He is not the first big-data enthusiast to say so.”
- Competing in a data science contest without reading the data
We close with a funny read from Moritz Hardt. His “cautionary tale of wacky boosting” will make many data scientists smile.
- Microsoft will add R to SQL Server 2016
SQL Server 2016 is to become the first Microsoft product to integrate Revolution R (following the later’s acquisition by the former).