Search This Blog

Wednesday, November 2, 2016

What's trending in the world of GitHub and Open Source?

GitHub has a trove of information about its organizations, committers, repos, code and issues.

GitHub Archive maintains per-hour stats on 28 event types hooking into repo activities across the platform.  It includes things like a committers login, url, organization, followers, gists, starred repos, and a history of all your public coding activity.

The October 2016 archive table is 29M rows and 70GB of data.

In September 2016, Microsoft became the largest "open-source" contributor organization on Github, largely due to its custom API integration using Azure Services and rather elegant management system for its employees and repos.   If you can onboard all developers in a company the size of Microsoft, and automate repository setup and discovery, you will quickly become the largest contributor.

Microsoft beat out , Docker, Angular, Google, Atom, FortAwesome, Elastic, and even Apache.



FortAwesome is an interesting name on the list, they serve up a vector graphics library of fonts and symbols for a hundred million or so web sites and apps.  There's a Kickstarter going on right now for Font Awesome 5.  They raised $450,000 / $30,000.

Atom is another.  Really nice text editor.  They have a git-time-machine package for reviewing commits over time in a timeplot, Git-Plus for commits from the editor, and 5,189 other packages with at least 400 of those related to Git.  Plus font-awesome-snippet-double-quotes.

So what's trending in the world of Hadoop?

Gitlogs provides a search engine for displaying top resources on GitHub.
http://www.gitlogs.com/most_popular?from=last-month&topic=hadoop

This is showing repositories created in last month by commit activity, from what I can tell. It doesn't really show anything too exciting or too popular.

The CERT-W Hadoop Attack Library is the top repo of last month.  Ethical (?) hacking repo.  I don't think I will link to that one...
Vishal Dodiya provides Hadoop Installation, "Easy installing hadoop without much a do."
Hadoop on Docker
hadoopwork - Class homework from China
hadoopoffice - Analyze Office documents using Hadoop Ecosystem.


One of interest, Microsoft's OozieBot.   OozieBot automates coordinators and workflows for MR, Pig, Hive, Spark, Sqoop, and Shell actions, allowing you to skip writing cryptic XML and automate launching jobs in HDInsight.

Another, Working with Deeploy Nested Documents in Solr from the Lucene / Solr Revolution 2016 conference, a session with Anshum Gupta and Alisa Zhila of the IBM Watson team.


For 2016, one crowd-sourced approach to investigating trends would be in the Hadoop Ecosystem Table commits.  I fixed a couple missing links while I was doing some work with Apache Falcon.


Apache Fluo was linked last month.  Fluo is faster than Spark when aggregating entire datasets, according to Keith Turner.  Google Percolator for Apache Accumulo.

Alluxio (formerly Tachyon, one piece of tech behind Spark) was linked in March.  Pin files in memory? Alluxio Enterprise Edition + Manager?  It is a really cool in-memory storage fabric.   




Querying GitHub Archive with Google BigQuery shows that the top forked repos of October, 2016, include Google Interview University, How to share data with a statistician, Google Tensorflow, and Spoon-Knife, an example repo for forking repositories,  

Another "ethical hacker" leak, or not-so-ethical, is a repo containing the Mirai BotNet code, the DoS attack code used to take down the internet from IoT devices.  Not linking to that one, I don't need my thermostat to get infected.

Jekyll is Blogger for GitHub.  1277 people forked a blog.
Deep Learning Papers Reading Roadmap.  750 people are trying to make a self-driving car.
FreeCodeCamp, an open-source codebase and curriculum to help nonprofits.  752 charitable programmers on GitHub last month.
NightScout cgm remote monitor.   539 developers got type-1 diabetes or decided to monitor someone with it.
DirtyC0w.    Linus Torvalds tried to fix this in 2005, 454 sys admins and hackers are trying to figure out what to do with it on GitHub.  Ubuntu and Android users beware.

My favourite content on GitHub is these kinds of lists, discovering more content.

List of Free Learning Resources
Ultimate List of App Development

Awesome Hadoop
Awesome Public Datasets

Curated List of Awesome Lists

No comments:

Post a Comment