Search This Blog

Sunday, January 3, 2016

Configs and GitHub Viz

In the case of open-source projects, you may need to dig further into what a particular configuration setting does.  If the documentation does not give you enough detailed information on the implementation, you can also trace the configuration details by searching for the file or getting the project from Github or SVN.

Some of the common Hadoop projects configuration code.

Pig configuration
Hive configuration
Sqoop configuration
Flume configuration
Kafka configuration

Github allows you to scope your searches which is useful for narrowing down your search to specific files.

Searching code is documented here.  In the search box, you can search by filename:<myfile> or <myfile> in:path to track down particular files.    You can also search by language, this searches for Scala files.

At the time of this writing, there's some interesting stats available just by looking at language of repositories in Github

  • There are 1.5m Java repos with ElasticSearch, Android Universal Image Loader and Reactive Extensions for the JVM showing up as the top 3 best matches.
  • There are nearly 900k Python repos with httpie, Flask, the Django framework and the Awesome Python library coming in the top 4 best matches.
  • There are 400k C# repos with the .NET framework, SignalR and Mono in the top 3.
  • There are 421k C repos with Linux being the best match.
  • There are 60k Scala repos with PredictionIO, the Play Framework and Scala itself showing up as top 3 best matches.

Much cooler than just searching is the GitHub Visualizer created by Artem Zukov using D3js.

Apache's visualization shows an assortment of languages in their repos.

The Hive repo's contributors and file extensions

No comments:

Post a Comment