Public | Automated Build

Last pushed: 2 years ago
Short Description
Short description is empty for this repo.
Full Description

Algostat




Tools to find the most frequently used C++ algorithms on Github.

Results

You can look at the results of 3869 analyzed C++ repos in my
Google Spreadsheets
or use the results.csv directly.

algorithm sum avg
swap 108363 28
find 81006 21
count 60306 16
move 57595 15
copy 48050 12
sort 33317 9
max 28848 7
equal 27467 7
min 21720 6
unique 18484 5
lower_bound 15017 4
remove 13972 4
replace 13262 3
upper_bound 11835 3
for_each 11518 3

##Usage

For best mode you should disable input and output buffering of Python.

export PYTHONUNBUFFERED=true

Analyze top C++ repos on Github

Analyze the top C++ repos on Github and create a CSV file.

./top-github-repos.py | ./algostat.py | ./create-csv.py > results.csv

Analyze all C++ repos on Github

Analyze all C++ repos listed in GHTorrent.

cat cpp_repos.txt | ./algostat.py | ./create-csv.py > results.csv

Distributed Analyzing with Redis Queue and workers

Use a Redis Queue to distribute jobs among workers and then fetch the results.
You need to provide the ALGOSTAT_RQ environment variable to the process with the
address of the redis server.

export ALGOSTAT_RQ_HOST="localhost"
export ALGOSTAT_RQ_PORT="6379"

Now you need to fill the job queue with results from top github repos
and repos listed in GHTorrent and sort out the duplicates.

./top-github-repos.py >> jobs.txt
cat cpp_repos.txt >> jobs.txt
sort -u jobs.txt | ./enqueue-jobs.py

On your workers you need to tell algostat.py to fetch the jobs from
a redis queue and then store it in a results queue.

./algostat.py --rq | ./enqueue-results.py

After that you aggregate the results in a single csv.

./fetch-results.py | ./create-csv.py > results.csv

Installation

  1. Make sure you have Python 3 installed
  2. Clone the repository
  3. Install requirements with pip install -r requirements.txt

Using Docker for Deployment

You can use Docker to run the application in a distributed setup.

Redis

Run the redis server.

docker run --name redis -p 6379:6379 -d sameersbn/redis:latest

Get the IP address of your redis server. Assign it to the ALGOSTAT_RQ_HOST env variable for all following docker run commands. In this example we will work with 104.131.5.11.

Get the image

I have already setup an automated build lukasmartinelli/algostat which you can use.

docker pull lukasmartinelli/algostat

Or you can clone the repo and build the docker image yourself.

docker build -t lukasmartinelli/algostat .

Fill job queue

docker run -it --rm --name queue-filler \
-e ALGOSTAT_RQ_HOST=104.131.5.11 \
-e ALGOSTAT_RQ_PORT=6379 \
lukasmartinelli/algostat bash -c "cat cpp_repos.txt | ./enqueue-jobs.py"

Run the workers

Assign as many workers as you like.

docker run -it --rm --name worker1 \
-e ALGOSTAT_RQ_HOST=104.131.5.11 \
-e ALGOSTAT_RQ_PORT=6379 \
lukasmartinelli/algostat bash -c "./algostat.py --rq | ./enqueue-results.py"

Aggregate results

Note that this step is not repeatable. Once you've aggregated the results the redis list will be empty.

docker run -it --rm --name result-aggregator \
-e ALGOSTAT_RQ_HOST=104.131.5.11 \
-e ALGOSTAT_RQ_PORT=6379 \
lukasmartinelli/algostat bash -c "./fetch-results.py | ./create-csv.py"
Docker Pull Command
Owner
lukasmartinelli
Source Repository

Comments (0)