eddelbuettel/rvw
DEPRACATED: please use rvowpalwabbit/ci instead
69
Development of rvw package started as R Vowpal Wabbit (Google Summer of Code 2018) project.
Vowpal Wabbit is an online machine learning system that is known for its speed and scalability and is widely used in research and industry.
This package aims to bring its functionality to R.
First you have to install Vowpal Wabbit from here.
And then install the rvw package using devtools
:
install.packages("devtools")
devtools::install_github("rvw-org/rvw")
In this example we will try to predict age groups (based on number of abalone shell rings) from physical measurements. We will use Abalone Data Set from UCI Machine Learning Repository.
First we prepare our data:
library(mltools)
library(rvw)
set.seed(1)
aburl = 'http://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data'
abnames = c('sex','length','diameter','height','weight.w','weight.s','weight.v','weight.sh','rings')
abalone = read.table(aburl, header = F , sep = ',', col.names = abnames)
data_full <- abalone
# Split number of rings into groups with equal (as possible) number of observations
data_full$group <- bin_data(data_full$rings, bins=3, binType = "quantile")
group_lvls <- levels(data_full$group)
levels(data_full$group) <- c(1, 2, 3)
# Prepare indices to split data
ind_train <- sample(1:nrow(data_full), 0.8*nrow(data_full))
# Split data into train and test subsets
df_train <- data_full[ind_train,]
df_test <- data_full[-ind_train,]
Then we set up a Vowpal Wabbit model:
vwmodel <- vwsetup(option = "ect", num_classes = 3)
Now we start training:
vwtrain(vwmodel, data = df_train,
namespaces = list(NS1 = list("sex", "rings"),
NS2 = list("weight.w","weight.s","weight.v","weight.sh", "diameter", "length", "height")),
targets = "group"
)
And we get: average loss = 0.278060
NS1
and NS2
;And finally compute predictions using trained model:
predict.vw(vwmodel, data = df_test)
Here we get: average loss = 0.221292
We can add more learning algorithms to our model. For example we want to use boosting algorithm with 100 "weak" learners. Then we will just add this option to our model and train again:
vwmodel <- add_option(vwmodel, option = "boosting", num_learners=100)
vwtrain(vwmodel, data = df_train,
namespaces = list(NS1 = list("sex", "rings"),
NS2 = list("weight.w","weight.s","weight.v","weight.sh", "diameter", "length", "height")),
targets = "group"
)
We get: average loss = 0.229273
And compute predictions:
predict.vw(vwmodel, data = df_test)
Finally we get: average loss = 0.081340
In order to inspect parameters of our model we can simply print it:
vwmodel
Vowpal Wabbit model
Learning algorithm: sgd
Working directory: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpjO3DD1
Model file: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpjO3DD1/vw_1534253637_mdl.vw
General parameters:
random_seed : 0
ring_size : Not defined
holdout_off : FALSE
holdout_period : 10
holdout_after : 0
early_terminate : 3
loss_function : squared
link : identity
quantile_tau : 0.5
Feature parameters:
bit_precision : 18
quadratic : Not defined
cubic : Not defined
interactions : Not defined
permutations : FALSE
leave_duplicate_interactions : FALSE
noconstant : FALSE
feature_limit : Not defined
ngram : Not defined
skips : Not defined
hash : Not defined
affix : Not defined
spelling : Not defined
Learning algorithms / Reductions:
ect :
num_classes : 3
boosting :
num_learners : 100
gamma : 0.1
alg : BBM
Optimization parameters:
adaptive : TRUE
normalized : TRUE
invariant : TRUE
adax : FALSE
sparse_l2 : 0
l1_state : 0
l2_state : 1
learning_rate : 0.5
initial_pass_length : Not defined
l1 : 0
l2 : 0
no_bias_regularization : Not defined
feature_mask : Not defined
decay_learning_rate : 1
initial_t : 0
power_t : 0.5
initial_weight : 0
random_weights : Not defined
normal_weights : Not defined
truncated_normal_weights : Not defined
sparse_weights : FALSE
input_feature_regularizer : Not defined
Model evaluation. Training:
num_examples : 3341
weighted_example_sum : 3341
weighted_label_sum : 0
avg_loss : 0.2292727
total_feature : 33408
Model evaluation. Testing:
num_examples : 836
weighted_example_sum : 836
weighted_label_sum : 0
avg_loss : 0.08133971
total_feature : 8360
docker pull eddelbuettel/rvw