# MATH/STAT 4450/8456: Return of Online Purchased Products Assessment Answer

Predicting the return of the online purchased products

The final contest for this course is very similar as the first contest. You will again construct a model to predict whether a certain purchase is converted into a return. The data is available on Canvas as the file “contest3_train_test.Rdata”.

Description of variables

• id: row ID.
• orderID: Order ID. Different rows may have the same order ID.
• orderDate: Order date.
• itemID: Item ID. One order may have multiple items.
• colorCode: Color code of the item.
• sizeCode: Size code of the item.
• typeCode: Type code of the item.
• price: Price of the item.
• recommendedPrice: Recommended retail price. Missing values are included.
• voucherID: Voucher ID.
• voucherAmount: Voucher value PER ORDER.
• customerID: Customer ID.
• deviceCode: Device type.
• paymentCode: Payment type.
• return: If the item is returned or not. 1: yes. 0: no.

Historical data of 17 months are given.  You  are asked to make the prediction for the next 5 months.  Here is   a quick look at the data.

## 'data.frame':1797781 obs. of15 variables:

##\$ ID: int1 2 3 4 5 6 7 8 9 10 ...

##\$ orderID: chr"R1000001" "R1000001" "R1000002" "R1000002" ...

##\$ orderDate: Date, format: "2014-01-01" "2014-01-01" ...

##\$ itemID: chr"A1000382" "A1000550" "A1001991" "A1001999" ...

##\$ colorCode: int1972 3854 2974 1992 1968 1972 1001 3976 1001 1968 ...

##\$ sizeCode: chr"44" "44" "38" "38" ...

##\$ typeCode: num3 3 8 8 8 8 8 8 14 3 ...

##\$ price: num10 20 35 50 10 ...

##\$ recommendedPrice: num30 40 50 50 36 ...

##\$ voucherID: chr"NONE" "NONE" "NONE" "NONE" ...

##\$ voucherAmount: num0 0 0 0 0 0 0 0 0 0 ...

##\$ customerID: chr"C1010575" "C1010575" "C1045905" "C1045905" ...

##\$ deviceCode: Factor w/ 4 levels "A","B","C","D": 1 1 4 4 1 1 1 1 2 1 ...

##\$ paymentCode: Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 2 2 2 2 1 1 ...

##\$ return: int0 0 0 1 0 0 0 0 1 1 ...

table(train\$return)

##

##01

## 854280 943501

str(test)

## 'data.frame':495736 obs. of14 variables:

##\$ ID: int1 2 3 4 5 6 7 8 9 10 ...

##\$ orderID: chr"R1587679" "R1587679" "R1587680" "R1587680" ...

##\$ orderDate: Date, format: "2015-06-01" "2015-06-01" ...

##\$ itemID: chr"A1001429" "A1001429" "A1000498" "A1000520" ...

##\$ colorCode: int1001 1493 2089 1090 1081 1000 1065 1000 1065 1624 ...

##\$ sizeCode: chr"36" "34" "40" "40" ...

##\$ typeCode: num5 5 3 3 3 8 8 8 8 17 ...

##\$ price: num40 40 23 20 26 ...

##\$ recommendedPrice: num40 40 23 20 26 ...

##\$ voucherID: chr"NONE" "NONE" "V1000415" "V1000415" ...

##\$ voucherAmount: num0 0 10 10 10 0 0 0 0 0 ...

##\$ customerID: chr"C1055901" "C1055901" "C1219822" "C1219822" ...

##\$ deviceCode: Factor w/ 4 levels "A","B","C","D": 1 1 3 3 3 4 4 1 1 1 ...

##\$ paymentCode: Factor w/ 6 levels "A","B","C","D",..: 1 1 2 2 2 1 1 1 1 1 ...

1. Create the most accurate classifier that you can for the data, as measured by the test data.
2. Write a 8-10 page slides summarizing your approach to
1. formulating the model (design) matrix,
2. building the classifier,
3. results from all the models,
4. your findings from the data.

# Format of submission

Your submission file should be in the csv format with two columns: id and return. And the csv file name must be “contest3_[your firstname]_[your lastname].csv”. There should be no spaces in the file name, please use “_” to replace all the spaces. Example of the submission:

id,return 1,0

2,1

... 495736,0

• April 29 (11:59 pm): Final prediction submission.
• April 30 (3:30 pm): Slides and code submission.
• April 30 (4 pm): Presentations. Only the top 10 participants will present the result. Each presentation should be around 8 minutes.

• Total points: 20
• Accuracy of classifier: 10

∗ Score = 10 * (accuracy rate)2

• Slides: 8

∗ Model matrix:  2

∗ Model selection: 2

∗ Model assessment: 2

∗ Results: 2