MATH/STAT 4450/8456 Machine Learning Competition #3
Predicting the return of the online purchased products
The final contest for this course is very similar as the first contest. You will again construct a model to predict whether a certain purchase is converted into a return. The data is available on Canvas as the file “contest3_train_test.Rdata”.
Description of variables
Historical data of 17 months are given. You are asked to make the prediction for the next 5 months. Here is a quick look at the data.
## 'data.frame':1797781 obs. of15 variables:
##$ ID: int1 2 3 4 5 6 7 8 9 10 ...
##$ orderID: chr"R1000001" "R1000001" "R1000002" "R1000002" ...
##$ orderDate: Date, format: "2014-01-01" "2014-01-01" ...
##$ itemID: chr"A1000382" "A1000550" "A1001991" "A1001999" ...
##$ colorCode: int1972 3854 2974 1992 1968 1972 1001 3976 1001 1968 ...
##$ sizeCode: chr"44" "44" "38" "38" ...
##$ typeCode: num3 3 8 8 8 8 8 8 14 3 ...
##$ price: num10 20 35 50 10 ...
##$ recommendedPrice: num30 40 50 50 36 ...
##$ voucherID: chr"NONE" "NONE" "NONE" "NONE" ...
##$ voucherAmount: num0 0 0 0 0 0 0 0 0 0 ...
##$ customerID: chr"C1010575" "C1010575" "C1045905" "C1045905" ...
##$ deviceCode: Factor w/ 4 levels "A","B","C","D": 1 1 4 4 1 1 1 1 2 1 ...
##$ paymentCode: Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 2 2 2 2 1 1 ...
##$ return: int0 0 0 1 0 0 0 0 1 1 ...
table(train$return)
##
##01
## 854280 943501
str(test)
## 'data.frame':495736 obs. of14 variables:
##$ ID: int1 2 3 4 5 6 7 8 9 10 ...
##$ orderID: chr"R1587679" "R1587679" "R1587680" "R1587680" ...
##$ orderDate: Date, format: "2015-06-01" "2015-06-01" ...
##$ itemID: chr"A1001429" "A1001429" "A1000498" "A1000520" ...
##$ colorCode: int1001 1493 2089 1090 1081 1000 1065 1000 1065 1624 ...
##$ sizeCode: chr"36" "34" "40" "40" ...
##$ typeCode: num5 5 3 3 3 8 8 8 8 17 ...
##$ price: num40 40 23 20 26 ...
##$ recommendedPrice: num40 40 23 20 26 ...
##$ voucherID: chr"NONE" "NONE" "V1000415" "V1000415" ...
##$ voucherAmount: num0 0 10 10 10 0 0 0 0 0 ...
##$ customerID: chr"C1055901" "C1055901" "C1219822" "C1219822" ...
##$ deviceCode: Factor w/ 4 levels "A","B","C","D": 1 1 3 3 3 4 4 1 1 1 ...
##$ paymentCode: Factor w/ 6 levels "A","B","C","D",..: 1 1 2 2 2 1 1 1 1 1 ...
Task
Your submission file should be in the csv format with two columns: id and return. And the csv file name must be “contest3_[your firstname]_[your lastname].csv”. There should be no spaces in the file name, please use “_” to replace all the spaces. Example of the submission:
id,return 1,0
2,1
... 495736,0
Deadlines:
Grading:
∗ Score = 10 * (accuracy rate)2
∗ Model matrix: 2
∗ Model selection: 2
∗ Model assessment: 2
∗ Results: 2
For solution, connect with our online professionals.