# MATH/STAT 4450/8456: Return of Online Purchased Products Assessment Answer

MATH/STAT 4450/8456 Machine Learning Competition #3

Predicting the return of the online purchased products

The final contest for this course is very similar as the first contest. You will again construct a model to predict whether a certain purchase is converted into a return. The data is available on Canvas as the file “contest3_train_test.Rdata”.

Description of variables

• id: row ID.
• orderID: Order ID. Different rows may have the same order ID.
• orderDate: Order date.
• itemID: Item ID. One order may have multiple items.
• colorCode: Color code of the item.
• sizeCode: Size code of the item.
• typeCode: Type code of the item.
• price: Price of the item.
• recommendedPrice: Recommended retail price. Missing values are included.
• voucherID: Voucher ID.
• voucherAmount: Voucher value PER ORDER.
• customerID: Customer ID.
• deviceCode: Device type.
• paymentCode: Payment type.
• return: If the item is returned or not. 1: yes. 0: no.

Historical data of 17 months are given.  You  are asked to make the prediction for the next 5 months.  Here is   a quick look at the data.

## 'data.frame':1797781 obs. of15 variables:

##\$ ID: int1 2 3 4 5 6 7 8 9 10 ...

##\$ orderID: chr"R1000001" "R1000001" "R1000002" "R1000002" ...

##\$ orderDate: Date, format: "2014-01-01" "2014-01-01" ...

##\$ itemID: chr"A1000382" "A1000550" "A1001991" "A1001999" ...

##\$ colorCode: int1972 3854 2974 1992 1968 1972 1001 3976 1001 1968 ...

##\$ sizeCode: chr"44" "44" "38" "38" ...

##\$ typeCode: num3 3 8 8 8 8 8 8 14 3 ...

##\$ price: num10 20 35 50 10 ...

##\$ recommendedPrice: num30 40 50 50 36 ...

##\$ voucherID: chr"NONE" "NONE" "NONE" "NONE" ...

##\$ voucherAmount: num0 0 0 0 0 0 0 0 0 0 ...

##\$ customerID: chr"C1010575" "C1010575" "C1045905" "C1045905" ...

##\$ deviceCode: Factor w/ 4 levels "A","B","C","D": 1 1 4 4 1 1 1 1 2 1 ...

##\$ paymentCode: Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 2 2 2 2 1 1 ...

##\$ return: int0 0 0 1 0 0 0 0 1 1 ...

table(train\$return)

##

##01

## 854280 943501

str(test)

## 'data.frame':495736 obs. of14 variables:

##\$ ID: int1 2 3 4 5 6 7 8 9 10 ...

##\$ orderID: chr"R1587679" "R1587679" "R1587680" "R1587680" ...

##\$ orderDate: Date, format: "2015-06-01" "2015-06-01" ...

##\$ itemID: chr"A1001429" "A1001429" "A1000498" "A1000520" ...

##\$ colorCode: int1001 1493 2089 1090 1081 1000 1065 1000 1065 1624 ...

##\$ sizeCode: chr"36" "34" "40" "40" ...

##\$ typeCode: num5 5 3 3 3 8 8 8 8 17 ...

##\$ price: num40 40 23 20 26 ...

##\$ recommendedPrice: num40 40 23 20 26 ...

##\$ voucherID: chr"NONE" "NONE" "V1000415" "V1000415" ...

##\$ voucherAmount: num0 0 10 10 10 0 0 0 0 0 ...

##\$ customerID: chr"C1055901" "C1055901" "C1219822" "C1219822" ...

##\$ deviceCode: Factor w/ 4 levels "A","B","C","D": 1 1 3 3 3 4 4 1 1 1 ...

##\$ paymentCode: Factor w/ 6 levels "A","B","C","D",..: 1 1 2 2 2 1 1 1 1 1 ...

1. Create the most accurate classifier that you can for the data, as measured by the test data.
2. Write a 8-10 page slides summarizing your approach to
1. formulating the model (design) matrix,
2. building the classifier,
3. results from all the models,
4. your findings from the data.

# Format of submission

Your submission file should be in the csv format with two columns: id and return. And the csv file name must be “contest3_[your firstname]_[your lastname].csv”. There should be no spaces in the file name, please use “_” to replace all the spaces. Example of the submission:

id,return 1,0

2,1

... 495736,0

• April 29 (11:59 pm): Final prediction submission.
• April 30 (3:30 pm): Slides and code submission.
• April 30 (4 pm): Presentations. Only the top 10 participants will present the result. Each presentation should be around 8 minutes.

• Total points: 20
• Accuracy of classifier: 10

∗ Score = 10 * (accuracy rate)2

• Slides: 8

∗ Model matrix:  2

∗ Model selection: 2

∗ Model assessment: 2

∗ Results: 2