Web Traffic Time Series Forecasting: October 2019

Thursday, October 24, 2019

Literature Review!!

Link: Literature Review

Have a look at our literature review!

Look at the data visualization above, it's quite evident that the number of hits for the English language is more than the rest. We can use this information to predict peak values in the time-series which is our goal. We want to optimize the use of servers. Here we can say that if the language of the webpage we are predicting for is not English then the numbers of servers required to host the website is less.

Other classifications:

PACF and ACF cutoff for tunable Parmeters in ARIMA model

We look at the cutoff points of PACF for Autoregressive and ACF for Moving Average models must be accounted for optimal results.
Here we can't go on to look at each time-series thus we have taken a random page and we try to visualize what actually is happening.
As we move forward preparing our model we plan to tune p,d,q parameters in the ARIMA model by minimizing the error in the whole dataset.

Here's a link you can refer to for more information:-
https://people.duke.edu/~rnau/411arim3.htm

Data Visualization Part-3: Seasonality and Trend Analysis

One major task performed before we start building our model for forecasting was Smoothening. We need to detrend the data, take care of the seasonality component and thereby taking care of the noisy data. As we saw there was a peak seen when we looked at the original data, that peak in the data can hamper our results hence we need to take care of that.

Decomposition of Time-series Data

For better understanding refer: https://machinelearningmastery.com/decompose-time-series-data-trend-seasonality/

Data Visualization Part-2

A basic assumption taken into consideration when performing analytics on time series data using various models is that the series is stationary. If the original data series is found to be non-stationary then we proceed to apply transformations to get optimum stationary series data. We took a random time series and applied first and second-order differentiation to show how transformations can affect stationarity.

Explore more about the importance of stationarity in time series prediction follow the link below:
https://towardsdatascience.com/stationarity-in-time-series-analysis-90c94f27322

Data Visualization Part-1

The Dataset is Huge, we can't make visualizations for each and so we decided that we are going to make visualization for a single time series and generalize it for others. Though analysis will be done on them separately.

Here is the original data for a time series plotted against date-time. Here, in this case, you see that the variation is more. In terms of the number of hits, values fluctuating from 0-200 is considerable and the peak that we see can be considered an outlier.

Friday, October 18, 2019

Handling Missing values

Handling missing values:
We noticed that there were a lot of missing values but the placement of those missing -NA- values were at the beginning of the data for individual time series. This meant that the web page was added to the domain after the given date and thus it was the best to replace them with 0.

Web Traffic Time Series Forecasting