# Experimenting on Facebook Prophet

--

If you have ever worked with time series predictions, I am quite sure you are well aware of the strains and pains that come with them. Time series predictions are difficult and always require a very specialized data scientist to implement it.

Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well. You can read the paper in here.

So I decided to give a try on a small eCommerce in Vietnam. I have daily data from March to November. I’ve to feed in data like below as a csv format and remember the headers must be ds and y. (case sensitive)

**ds**,**y**

1/3/18,**1700000**

3/3/18,**2745000**

5/3/18,**1665000**

6/3/18,**1530000**

7/3/18,**2070000**

8/3/18,**1665000**

I decided to use Prophet for 3 predictions:

- Predicting Average Order Value
- Predicting number of sold SKUs
- Predicting number of Sales Orders

# Predicting Average Order Value

Here is the source code:

**import **pandas **as **pd

**from **fbprophet **import **Prophet

**from **fbprophet.diagnostics **import **cross_validation

**from **fbprophet.diagnostics **import **performance_metrics

dataFile = pd.read_csv(**'files/Mar-Nov-18-eCom-aov.csv'**)

dataFile.head()

*# adding the outliers into the model*

dataFile.loc[(dataFile[**'ds'**] == **'20-10-2018'**), **'y'**] = None

dataFile.loc[(dataFile[**'ds'**] == **'26/11/2018'**), **'y'**] = None

dataFile.loc[(dataFile[**'ds'**] == **'27/11/2018'**), **'y'**] = None

prophet = Prophet(

growth=**'linear'**,

seasonality_mode=**'additive'**)

prophet.fit(dataFile)

future = prophet.make_future_dataframe(freq=**'D'**, periods=30*6)

future.tail()

forecast = prophet.predict(future)

forecast[[**'ds'**, **'yhat'**, **'yhat_lower'**, **'yhat_upper'**]].tail()

fig1 = prophet.plot(forecast)

fig1.savefig(**'forecastAOV.png'**)

fig2 = prophet.plot_components(forecast)

fig2.savefig(**'forecastComponentsAOV.png'**)

cross_validation_results = cross_validation(prophet, initial=**'210 days'**, period=**'15 days'**, horizon=**'70 days'**)

**print **cross_validation_results

performance_metrics_results = performance_metrics(cross_validation_results)

**print **performance_metrics_results

Prophet includes functionality for time series cross validation to measure forecast error using historical data. This is done by selecting cutoff points in the history, and for each of them fitting the model using data only up to that cutoff point. We can then compare the forecasted values to the actual values. This cross validation procedure can be done automatically for a range of historical cutoffs using the `cross_validation`

function. We specify the forecast horizon (`horizon`

), and then optionally the size of the initial training period (`initial`

) and the spacing between cutoff dates (`period`

). By default, the initial training period is set to three times the horizon, and cutoffs are made every half a horizon.

The output of `cross_validation`

is a dataframe with the true values `y`

and the out-of-sample forecast values `yhat`

, at each simulated forecast date and for each cutoff date. In particular, a forecast is made for every observed point between `cutoff`

and `cutoff + horizon`

. This dataframe can then be used to compute error measures of `yhat`

vs. `y`

.

The `performance_metrics`

utility can be used to compute some useful statistics of the prediction performance (`yhat`

, `yhat_lower`

, and `yhat_upper`

compared to `y`

), as a function of the distance from the cutoff (how far into the future the prediction was). The statistics computed are mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), mean absolute percent error (MAPE), and coverage of the `yhat_lower`

and `yhat_upper`

estimates. These are computed on a rolling window of the predictions in `df_cv`

after sorting by horizon (`ds`

minus `cutoff`

). By default 10% of the predictions will be included in each window, but this can be changed with the `rolling_window`

argument.

This is the forecast of AOV in VND currency:

# Predicting number of sold SKUs

Here is the source code:

**import **pandas **as **pd

**from **fbprophet **import **Prophet

**from **fbprophet.diagnostics **import **cross_validation

**from **fbprophet.diagnostics **import **performance_metrics

dataFile = pd.read_csv(**'files/Mar-Nov-18-eCom-skuQty.csv'**)

dataFile.head()

*# adding the outliers into the model*

dataFile.loc[(dataFile[**'ds'**] == **'20-10-2018'**), **'y'**] = None

dataFile.loc[(dataFile[**'ds'**] == **'26/11/2018'**), **'y'**] = None

dataFile.loc[(dataFile[**'ds'**] == **'27/11/2018'**), **'y'**] = None

prophet = Prophet(

growth=**'linear'**,

seasonality_mode=**'additive'**)

prophet.fit(dataFile)

future = prophet.make_future_dataframe(freq=**'D'**, periods=30*6)

future.tail()

forecast = prophet.predict(future)

forecast[[**'ds'**, **'yhat'**, **'yhat_lower'**, **'yhat_upper'**]].tail()

fig1 = prophet.plot(forecast)

fig1.savefig(**'forecastskuQty.png'**)

fig2 = prophet.plot_components(forecast)

fig2.savefig(**'forecastComponentsskuQty.png'**)

cross_validation_results = cross_validation(prophet, initial=**'210 days'**, period=**'15 days'**, horizon=**'70 days'**)

**print **cross_validation_results

performance_metrics_results = performance_metrics(cross_validation_results)

**print **performance_metrics_results

The rest is similar as mentioned in above and this is the forecast:

# Predicting number of Sales Orders

Here is the source code:

**import **pandas **as **pd

**from **fbprophet **import **Prophet

**from **fbprophet.diagnostics **import **cross_validation

**from **fbprophet.diagnostics **import **performance_metrics

dataFile = pd.read_csv(**'files/Mar-Nov-18-eCom-SOQty.csv'**)

dataFile.head()

*# adding the outliers into the model*

dataFile.loc[(dataFile[**'ds'**] == **'20-10-2018'**), **'y'**] = None

dataFile.loc[(dataFile[**'ds'**] == **'26/11/2018'**), **'y'**] = None

dataFile.loc[(dataFile[**'ds'**] == **'27/11/2018'**), **'y'**] = None

prophet = Prophet(

growth=**'linear'**,

seasonality_mode=**'additive'**)

prophet.fit(dataFile)

future = prophet.make_future_dataframe(freq=**'D'**, periods=30*6)

future.tail()

forecast = prophet.predict(future)

forecast[[**'ds'**, **'yhat'**, **'yhat_lower'**, **'yhat_upper'**]].tail()

fig1 = prophet.plot(forecast)

fig1.savefig(**'forecastSOQty.png'**)

fig2 = prophet.plot_components(forecast)

fig2.savefig(**'forecastComponentsSOQty.png'**)

cross_validation_results = cross_validation(prophet, initial=**'210 days'**, period=**'15 days'**, horizon=**'70 days'**)

**print **cross_validation_results

performance_metrics_results = performance_metrics(cross_validation_results)

**print **performance_metrics_results

The rest is similar as mentioned in above and this is the forecast:

# Prediction vs Actual

After few days, this is the result comparing the `yhat`

and actual numbers. It’s not that bad but as I am not a data scientist or expert on any time series analysis I found this pretty good.

I’ll wait for few more days to verify the prediction vs actual then can see if this works or not. I am thinking of using this for various predictions such as: social network, budget, inventory demand, sales forecast, headcount planning, etc. Also we can integrate Prophet to your eCommerce (magento) to smartly select what product to feature in our homepage depending on time and day, as well as the up-sell and cross-sell recommendations.