mopic.blogg.se

Closed source code repository tools
Closed source code repository tools












closed source code repository tools
  1. Closed source code repository tools how to#
  2. Closed source code repository tools archive#
  3. Closed source code repository tools software#
  4. Closed source code repository tools Offline#

KibbleĪpache Kibble is a suite of tools for collecting, aggregating and visualizing activity in software projects.

Closed source code repository tools archive#

GH Archive aims at providing a more exhaustive collection of events while GH Torrent makes a stronger effort in giving you the events data in a slightly more structured way to make it easier for you to get all the information surrounding the event. It then stores the JSON responses to a MongoDB database, while also extracting their structure in a MySQL database.Īs you can see, its goal is similar to GH Archive. For each event, it retrieves its contents and their dependencies, exhaustively. GHTorrent monitors the Github public event timeline.

Closed source code repository tools Offline#

GH Archive stores all GitHub events in a set of JSON files that you can later download and process offline as you wish.Īlternatively, GH Archive is also available as a public dataset on Google BigQuery: the dataset is automatically updated every hour and enables you to run arbitrary SQL-like queries over the entire dataset in seconds. GitHub Archive is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis. Note that the previous rate limits still apply but GHCrawler employs token pooling and rotation to optimize the use of your API tokens (if you’re able to collect several ones from “friends and family”). GHCrawler is especially useful if you want to keep track of a set of orgs and repositories. The GHCrawler is a robust GitHub API crawler that walks a queue of GitHub entities transitively retrieving and storing their contents.

closed source code repository tools

if you want to know what lines of code were modified during the last day). Keep in mind that, via this API, you can access basically all the info you see when browsing the GH repo of the project but you have a limited perspective on the internals of the “Git side of the project” (e.g. This is the strategy we use in our stargazer bot. But if you want to build some kind of dashboard focused on a single project or contributor, this is more than enough. One nice aspect is that you can also subscribe to get notified after certain events occur in a project. Unfortunately, there is a limit to the hourly number of requests so using the API is not a good solution if you’re looking to analyze large projects (or do some global analysis on a number of them). GitHub itself offers a public API to query any project. As usual, this post does not pretend to be an exhaustive and perfect analysis of the tools but just a way to sort out a little bit the myriad of notes and thoughts I had written down in several places. Let’s see the Git and/or GitHub analysis tools I know (and let me know the ones I may be missing). Depending on your scenario one of them may be enough. While I haven’t found the perfect tool (for me), at least we do have a number of good tools that will help you prepare this kind of ETL process for software data.

  • Without spending weeks preparing the scripts to run the process (and more weeks waiting for them to finish).
  • Even better if it has a temporal dimension

    Closed source code repository tools how to#

  • Let me decide how to analyze the data (instead of limiting it to a number of predefined visualizations).
  • Even better, if the tool would help to find me the projects I want to learn more from (using some kind of search tool based on the project language, size, popularity,…) and
  • Supports extracting data from a number of projects “on-demand”.
  • from communication channels the community may be using)
  • Covers a good number of data sources around the project (for sure Git and the issue/bug tracker but also other importers, e.g.
  • There is even a conference devoted to this field of research.īut, so far, I haven’t been able to find a tool that Plenty of researchers ( ourselves included) focus on developing new theories on software engineering stemming from mined data. And it’s not because of a lack of interest in mining this data. to know how yours is doing compared to other “similar” ones), it’s almost impossible without investing too many hours. And if you want to analyze a number of projects (e.g. It’s actually quite difficult to easily get data out of Git/GitHub. GitHub) where your project resides.Īnd this is where problems start. Most times, this implies getting some meaningful data out of the Git repository and the code hosting platform (e.g. You shouldn’t reach any conclusion regarding the health of your project or the actions to take to improve it without a good look at the data describing the project evolution (in terms of code changes but also regarding the community changes, especially if we are looking at an open-source project). This is also true for any decision that affects your software projects. Any important decision should be grounded on data.














    Closed source code repository tools