Public Repository

Last pushed: a year ago
Short Description
S2Graph is a graph database designed for distributed and scalable management of data at web scale.
Full Description

Docker image of S2Graph(http://s2graph.apache.org/)

Start a S2Graph server instance

Starting a S2Graph instance is simple:

$ docker run -p 9000:9000 -d mskim/s2graph

Attach to the container

(Ctrl-c to detach)

$ docker attach --sig-proxy=false <container_id>

Execute a shell

$ docker exec -it <container_id> bash

Terminate the instance

# docker kill <container_id>

Tutorial

default is the hostname of the instance.

Prepare dataset

  1. download ml-100k.zip (from http://grouplens.org/datasets/movielens/)
  2. unzip it
  3. ETL
  $ wget http://files.grouplens.org/datasets/movielens/ml-100k.zip 
  $ unzip ml-100k.zip
  $ cd ml-100k
  $ awk '{print $4 "\t" "insert" "\t" "e" "\t" $1 "\t" $2 "\t" "grouplens_movielens_100k_data" "\t" "{\"rating\": " $3 "}"}' u.data > grouplens_movielens_100k_data.txt

Create a service

curl -XPOST default:9000/graphs/createService -H 'Content-Type: Application/json' -d '
{
  "serviceName": "grouplens"
}
'

will returns

{"id":1,"name":"grouplens","accessToken":"152911d3-2ae9-4d73-b2a9-820d23ced92f","cluster":"localhost","hTableName":"grouplens-dev","preSplitSize":1,"hTableTTL":null}%

Create a label

curl -XPOST default:9000/graphs/createLabel -H 'Content-Type: Application/json' -d '
{
  "label": "grouplens_movielens_100k_data",
  "srcServiceName": "grouplens",
  "srcColumnName": "user_id",
  "srcColumnType": "long",
  "tgtServiceName": "grouplens",
  "tgtColumnName": "item_id",
  "tgtColumnType": "long",
  "indices": [
    {
      "name": "idx_rating",
      "propNames": [
        "rating"
      ]
    }
  ],
  "props": [
    {
      "name": "rating",
      "defaultValue": 0,
      "dataType": "long"
    }
  ],
  "serviceName": "grouplens",
  "consistencyLevel": "weak"
}
'

will returns

{"labelName":"grouplens_movielens_100k_data","from":{"serviceName":"grouplens","columnName":"user_id","columnType":"long"},"to":{"serviceName":"grouplens","columnName":"item_id","columnType":"long"},"isDirected":true,"serviceName":"grouplens","consistencyLevel":"weak","schemaVersion":"v3","isAsync":false,"compressionAlgorithm":"gz","defaultIndex":{"name":"idx_rating","propNames":["rating"]},"extraIndex":[],"metaProps":[{"name":"rating","defaultValue":"0","dataType":"long"}]}

Insert dataset

$ mkdir -p tmp_split
$ split -a 15 -l 1000 grouplens_movielens_100k_data.txt tmp_split/
$ for s in `ls tmp_split/*`
do
    curl -XPOST -H "Content-Type: text/plain" --data-binary @$s default:9000/graphs/edges/bulkWithWait
done

$ rm -rf tmp_split

Query

simple 1-step query

curl -XPOST -H "Content-Type: application/json" -d '{
    "srcVertices": [
        {
            "columnName": "user_id",
            "id": 109,
            "serviceName": "grouplens"
        }
    ],
    "steps": [
        {
            "step": [
                {
                    "direction": "out",
                    "label": "grouplens_movielens_100k_data",
                    "offset": 0,
                    "limit": 100
                }
            ]
        }
    ]
}' default:9000/graphs/getEdges

Recommender system

recommending movies as follows:

  • the first step seeks the movies rated by the user 109
  • the second step seeks the users who rate the movies in the first step
  • the third step seeks the movies rated by the users in the second step (final step)
  • filterOut removes movies already rated by the user 109 from the final step
curl -XPOST -H "Content-Type: application/json" -d '{

    "filterOut": {
        "srcVertices": [
            {
                "columnName": "user_id",
                "id": 109,
                "serviceName": "grouplens"
            }
        ],
        "steps": [
            {
                "step": [
                    {
                        "direction": "out",
                        "label": "grouplens_movielens_100k_data",
                        "offset": 0,
                        "limit": 100
                   }
                ]
            }
        ]
    },
    "srcVertices": [
        {
            "columnName": "user_id",
            "id": 109,
            "serviceName": "grouplens"
        }
    ],
    "steps": [
        {
            "step": [
                {
                    "direction": "out",
                    "label": "grouplens_movielens_100k_data",
                    "offset": 0,
                    "limit": 10
                }
            ]
        },
        {
            "step": [
                {
                    "direction": "in",
                    "label": "grouplens_movielens_100k_data",
                    "offset": 0,
                    "limit": 10
                }
            ]
        },
        {
            "step": [
                {
                    "direction": "out",
                    "label": "grouplens_movielens_100k_data",
                    "offset": 0,
                    "limit": 10
                }
            ]
        }
    ]
}' default:9000/graphs/getEdges
Docker Pull Command
Owner
mskim

Comments (0)