What is SCARFF
SCAlable Real-time Fraud Finder (SCARFF) is an open source platform which processes and analyses credit card streaming data in order to return reliable alerts in a nearly real-time setting. This original framework for near real-time Streaming Fraud Detection integrates Big Data tools (Kafka, Spark and Cassandra) with a machine learning approach which deals with data imbalance, non-stationarity and feedback latency.
This image contain a all the tools needed to run a streaming fraud detection demo:
- Kafka, Spark and Cassandra;
- a compiled version of SCARFF;
- an artificial dataset and the program to stream it.
At the core of SCARFF there is a Spark application. The original code can be find inside the container /home/guest/SCARFFFiles/SCARFF/src/main/scala or in https://github.com/fabriziocarcillo/scarff
Start a SCARFF instance
You can follow the commands below or the video-tutorial:
To start a container:
docker run -p 4040:4040 -p 23:22 -ti --privileged fabriziocarcillo/scarff
Once you are in the terminal you need ti startup the needed services:
The container includes two artificial dataset:
You can start streaming the small dataset with the command:
nohup ./SCARFFFiles/data_send.sh 5 > /dev/null 2>&1 &
You can now start the SCARFF application in Spark:
spark-submit --num-executors 1 --executor-cores 1 --executor-memory 80m --driver-memory 500m --master local[*] --conf spark.yarn.maxAppAttempts=1 --packages com.google.guava:guava:18.0,org.pentaho.pentaho-commons:pentaho-package-manager:1.0.11,nz.ac.waikato.cms.weka:weka-dev:3.7.13,com.datastax.spark:spark-cassandra-connector_2.11:1.6.0-M2,org.apache.kafka:kafka_2.11:0.9.0.0,org.apache.spark:spark-streaming-kafka_2.11:1.6.1 --conf spark.driver.extraClassPath=/home/guest/guava-18.0.jar --conf spark.executor.extraClassPath=/home/guest/guava-18.0.jar --conf spark.yarn.executor.memoryOverhead=3000 --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j-spark.properties" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j-spark.properties" /home/guest/SCARFFFiles/SCARFF/target/scala-2.11/scarff_2.11-1.0.jar
The streaming performances can be monitored on 192.168.99.100:4040 .
After approximately one hour the stream will finish and you can kill the application using Ctrl-C (if you prefer, you can kill the application earlier than the end).
Type cqlsh in the terminal and then:
SELECT * FROM ksfraud.ranktrx;
to see all the alerts raised by SCARFF.
SCARFF is available thanks to the funding of the Brufence project (Scalable machine learning for automating defense system) supported by INNOVIRIS (Brussels Institute for the encouragement of scientific research and innovation).