Public Repository

Last pushed: 2 years ago
Short Description
Get Up and Running in 3'ish Easy Steps
Full Description

Dockerfile as follows:

FROM ubuntu:14.04

RUN apt-get update

RUN apt-get install -y default-jdk

ENV JAVA_HOME=/usr/lib/jvm/default-java

RUN apt-get install -y git

RUN apt-get install -y maven

RUN git clone git://git.apache.org/samza-hello-samza.git hello-samza

RUN apt-get install -y curl

RUN cd hello-samza && bin/grid bootstrap

FROM http://samza.apache.org/startup/hello-samza/0.9/

1.) start the container with:

docker run --rm --name hello-samza --net host -it -p 8088:8088 anaerobic/hello-samza bash

2.) build the "environment" and "deploy" the hello-samza tarball with:

cd hello-samza

bin/grid bootstrap

mvn clean package

mkdir -p deploy/samza

tar -xvf ./target/hello-samza-0.9.0-dist.tar.gz -C deploy/samza

3.) Run the jobs (and check their outputs) with:

deploy/samza/bin/run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-feed.properties

deploy/kafka/bin/kafka-console-consumer.sh  --zookeeper localhost:2181 --topic wikipedia-raw

deploy/samza/bin/run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-parser.properties

deploy/samza/bin/run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-stats.properties

deploy/kafka/bin/kafka-console-consumer.sh  --zookeeper localhost:2181 --topic wikipedia-edits

deploy/kafka/bin/kafka-console-consumer.sh  --zookeeper localhost:2181 --topic wikipedia-stats
Docker Pull Command
Owner
anaerobic

Comments (3)
thoma5b
2 years ago

ok, found the problem: my little aws-instance has not enough memory.

deploy/kafka/bin/kafka-server-start.sh
results in
``` config/server.propertiesOpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000bad30000, 986513408, 0) failed; error='Cannot allocate memory' (errno=12)

#  There is insufficient memory for the Java Runtime Environment to continue.
Native memory allocation (malloc) failed to allocate 986513408 bytes for committing reserved memo# /hello-samza/hs_err_pid573.log```
thoma5b
2 years ago

The last line in the build 2.)

tar -xvf ./target/hello-samza-0.9.0-dist.tar.gz -C deploy/samza

must be changed into the current version

tar -xvf ./target/hello-samza-*-dist.tar.gz -C deploy/samza

should work. Now the processes appear when ps aux | grep * but still curl localhost:8088 is empty.

Now I get

KafkaSystemAdmin [WARN] Failed to create topic __samza_coordinator_wikipedia-parser_1: org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/ids. Retrying.

thoma5b
2 years ago

Hi there. Thanks for uploading the image. However, the commands seem to have no effects in my setting.

What am I doing wrong?

curl http://localhost:8088 returns an empty string

deploy/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic [any]

returns

WARN [console-consumer-*], no brokers found when trying to rebalance. (kafka.consumer.ZookeeperConsumerConnector)

Checking

ps aux | grep kafka
ps aux | grep zookeeper
ps aux | grep samza

are all empty. I'd appreciate any hints. Thanks,
Thomas