Public Repository

Last pushed: 10 months ago
Short Description
SCARFF (SCAlable Real-time Fraud Finder) is a framework which enables credit card fraud detection.
Full Description

What is SCARFF

SCAlable Real-time Fraud Finder (SCARFF) is an open source platform which processes and analyses credit card streaming data in order to return reliable alerts in a nearly real-time setting. This original framework for near real-time Streaming Fraud Detection integrates Big Data tools (Kafka, Spark and Cassandra) with a machine learning approach which deals with data imbalance, non-stationarity and feedback latency.
This image contain a all the tools needed to run a streaming fraud detection demo:

  • Kafka, Spark and Cassandra;
  • a compiled version of SCARFF;
  • an artificial dataset and the program to stream it.

At the core of SCARFF there is a Spark application. The original code can be find inside the container /home/guest/SCARFFFiles/SCARFF/src/main/scala or in https://github.com/fabriziocarcillo/scarff


Start a SCARFF instance

You can follow the commands below or the video-tutorial:

To start a container:

docker run -p 4040:4040 -p 23:22 -ti --privileged fabriziocarcillo/scarff

Once you are in the terminal you need ti startup the needed services:

./SCARFFFiles/startup_script.sh

The container includes two artificial dataset:

  • /home/guest/SCARFFFiles/big-artificial-dataset.csv
  • /home/guest/SCARFFFiles/small-artificial-dataset.csv

You can start streaming the small dataset with the command:

nohup ./SCARFFFiles/data_send.sh 5 > /dev/null 2>&1 &

You can now start the SCARFF application in Spark:

spark-submit --num-executors 1 --executor-cores 1 --executor-memory 80m --driver-memory 500m --master local[*] --conf spark.yarn.maxAppAttempts=1 --packages com.google.guava:guava:18.0,org.pentaho.pentaho-commons:pentaho-package-manager:1.0.11,nz.ac.waikato.cms.weka:weka-dev:3.7.13,com.datastax.spark:spark-cassandra-connector_2.11:1.6.0-M2,org.apache.kafka:kafka_2.11:0.9.0.0,org.apache.spark:spark-streaming-kafka_2.11:1.6.1 --conf spark.driver.extraClassPath=/home/guest/guava-18.0.jar --conf spark.executor.extraClassPath=/home/guest/guava-18.0.jar --conf spark.yarn.executor.memoryOverhead=3000  --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j-spark.properties"  --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j-spark.properties" /home/guest/SCARFFFiles/SCARFF/target/scala-2.11/scarff_2.11-1.0.jar

The streaming performances can be monitored on 192.168.99.100:4040 .
After approximately one hour the stream will finish and you can kill the application using Ctrl-C (if you prefer, you can kill the application earlier than the end).
Type cqlsh in the terminal and then:

SELECT * FROM ksfraud.ranktrx;

to see all the alerts raised by SCARFF.


Acknowledgement

SCARFF is available thanks to the funding of the Brufence project (Scalable machine learning for automating defense system) supported by INNOVIRIS (Brussels Institute for the encouragement of scientific research and innovation).

Docker Pull Command
Owner
fabriziocarcillo