System for Analysis and Prediction of Trends in Cryptocurrency Market

In this article forecasting of daily closing price series of Bitcoin, Ripple, Dash, Litecoin and Ethereum crypto currencies, using data on prices (open, low, high), market capital and volumes using prior days is focused. The value conduct of cryptographic forms of money remains to a great extent neglected, giving new chances to scientists and business analysts to feature the likenesses and contrasts with standard monetary costs. Hence the paper is focused on this area. he results are compared with various benchmarks. Predictions are done using statistical techniques and machine learning algorithms. A simple linear regression (SLR) model that uses only a single-variable sequence of closing prices for forecasting, and a multiple linear regression (MLR) model that uses a multivariate sequence of prices and quantities at the same time. The simple linear regression (SLR) model for univariate serial forecasting uses only closing prices. Mean Absolute Percentage Error (MAPE) and relative Root Mean Square Error (relative RMSE) performance measures are considered. The accuracy achieved by the ARIMA model on our dataset is the highest, followed by Multivariable Linear Regression and LSTM.


Introduction
Bit coin is a decentralized digital currency, without a central bank or single administrator that can be sent from user to user on the peer-to-peer bitcoin network without the need for intermediaries. Such digital currency allows decentralized peer-to-peer network or online transactions carried out by 'miners' within a network trading. As a result, there is no centralized authority or any third-party financial institutions that has control of the Bit coin network. All transactions on the Bitcoin network are embedded in blocks to the open ledger that is known as the block chain to be verified by the miners using cryptographic proof-of-work. Therefore, Bitcoin is an entry in currency markets, though it is officially considered as a commodity rather than a currency, and the value conduct of it remains to a great extent neglected, giving new chances to scientists and business analysts to feature the likenesses and contrasts with standard monetary costs. Regardless of whether the quantity of Bitcoin cost determining contemplates is expanding, it actually stays restricted. In this work, the figure of every day shutting value series of the Bitcoin cryptographic money utilizing information on costs and volumes of earlier days is examined. To predict currency rate different approaches like statistical techniques and machine learning algorithms are used on datasets. The dataset consists of the closing price, volume, market capital, open price, high price and low.

Literature Survey
Throughout the long-term numerous calculations have been created for gauging time series in financial exchanges. Over the years researchers have proposed many techniques for forecasting time series in stock markets. Most of the concept is adopted from [1]. A genetic and neural approach [2] is proposed to predict daily prices using technical analysis factor and daily prices. [ clustering with Fuzzy InferemceI System to predict an index level forecast. A combination of firefly and SVR is proposed by [ 4] to predict stock market prices. A very few papers focused on Bitcoin price prediction. Recently, many authors proposed [5,6,7] machine learning algorithms to predict Bitcoin price. [8] Authors used Fast Wavelet Transform to check the changes in Bitcoin prices. Univariate and multivariate models are proposed by [9] to predict Bitcoin, Litecoin,Ripple and Ethereum. These market conditions are shown in figure below ( Fig.1: bull-market condition; Fig. 2: bear-market condition). Our examination ranges over a time of over 4 years, described by different value elements. Accordingly, we had the option to prepare and test our models, remembering for each stage both bull-and bear-economic situations. Therefore, our investigation advances the best in class, as it is the most refreshed and manages the greatest and more complete dataset frame selection procedure.

Methods AND Implementations dataset
In this section notions on time series analysis which helps to take the operational decisions about the algorithms are discussed. Then the dataset, pre-processing analysis of the data is explored. Finally, proposed algorithms and the statistical tools adopted are discussed. Time series analysis [10,11,12]: -Three systematic components are used to analyze the data in time series. They are (i) base level, (ii) trend and (iii).seasonality, non-systematic component called 'noise'. Base level: The average value in the series. Trend: It is an area showing an increasing or decreasing slope in the time series. Seasonality: For a repeated pattern between regular intervals, due to seasonal factors. Noise: Random variations in the series. Time series is a combination of these four components, where base level and noise always occur whereas trend and seasonality are optional. Depending on the nature of the trend and seasonality, a time series can be described as an additive or multiplicative model.

Additive model: y(t) = BaseLevel + Trend + Seasonality + Noise
Multiplicative model [13]: is also called classical decomposition and it is calculated using equation (2): (2) Statistical measures: The statistical measures like mean (μ), the standard deviation (ơ) and the trimmed mean (ū) obtained discarding a portion of data from both tails of the distribution. The trimmed mean gives an estimate of central tendency and is useful for time series with high volatility.

Dataset
The data set consists of 1320 types of Crypto-currencies with 650000 observations of 10 variables.

Uni-variate and multivariate forecasting [14]
uni-variate forecast: -It predicts time series made by observations belonging to a single feature record. Here the closing price of the series is considered. In multivariate several features are used for prediction. Here 'Open', 'High', 'Low', 'Volume', 'Market Cap' with 'X' as independent variable and Y as dependent variable, Statistical analysis: As a first step, statistical analysis is carried out using Dickey-Fuller test and autocorrelation in order to check for non-stationary in the time series and its mathematical definition is shown in Eq (3). ( p-value: Lower the p-value, the higher the significance. Linear regression [15]: is a linear approach for modeling the relationship between a dependent variable and one independent variable, represented by the equation is: (4)

Multiple Linear Regression model (MLR) [16].
Multiple linear regressions refers to a statistical technique that is used to predict the outcome of a variable based on the value of two or more variables and it is represented by the following equation.

(5)
where the index xi refers to a particular independent variable and n is the dimension of the independent variables

ARIMA (Auto Regressive Integrated Moving average)[17]
An ARIMA model is a class of statistical models for analyzing and forecasting time series data A pure Auto Regressive (AR only) model is one where Yt depends only on its own lags. That is, Yt is a function of the 'lags of Yt' (6) where, Yt+1 is the lag 1 of the series, βt is the coefficient of lag 1 that the model estimates and α is the intercept term, also estimated by the model. Likewise, a pure Moving Average (MA only) model is one where Yt depends only on the lagged forecast errors. (7) where the error terms are the errors of the autoregressive models of the respective lags. The errors Yt and Y(t-1) are the errors from the following equations: (8) For AR and MA models respectively.
An ARIMA model is one where the time series was different at least once to make it stationary by combining the AR and the MA terms. So, the equation becomes: (9) Predicted Yt = Constant + Linear combination Lags of Y (up to p lags) + Linear Combination of Lagged forecast errors (up to q lags)

Long Short Term Memory (LSTM)[18]
Long short-term memory or LSTM network is a popular deep learning method in time series forecasting that was first introduced by Hochreiter and Schmidhuber in 1997 [HS97]. The LSTM has a Recurrent Neural Network (RNN) architecture, which uses a loop to pass information from one step of the network to the next. The LSTM is a variation of the RNN with a similar recurrent or chain-like structure but more layers. The key of the LSTM is the cell Ct that stores memory and three gates (input gate, output gate, and forget gate) that control the information added to or removed from the cell. Two activation functions play an essential role in the LSTM network. The first one is a sigmoid function, also known as a logistic function. It regulates how much information to be passed through in these gates, which is defined as: (10) The outputs of this function are the numbers within range 0 to 1.If the result is close to 0, less information is allowed to be let through. In contrast, if the result is close to 1, more information can be involved in the next step. The second one is a hyperbolic tangent or a tanh function that outputs the values within range -1 to 1,

EAI Endorsed Transactions on
Internet of Things 02 2022 -04 2022 | Volume 7 | Issue 27 | e3 which is defined as in eq (11). A LSTM unit involves four parts, illustrated in Figure below. The first part is called a forget gate layer that decides which previous information is discarded from the cell. It passes the hidden vector from the previous step ht−1 and new input Xt into the sigmoid function and produce a forget gate vector ft from the function: (11)   As we can clearly see from the table, ARIMA model is giving the least error for our dataset. Then Multivariable Linear Regression is giving the minimum error. LSTM is the worst performing model among these three.

Improvement as per reviewer comments
This system can be enhanced into an interactive webbased system where the user can know about the details on the go, can fetch the price data in real time and choose the crypto-currencies, they want to invest in. Also, the portal can be developed in such a way that once the user chooses which crypto currency to invest in, he/she can get a hyperlink for the same and they can directly reach the site from where they can invest in it. Also, we can integrate the other time series forecasting models on which the research is currently going on, in the future.