Prediction of dogecoin price using deep learning and social media trends

INTRODUCTION: Cryptocurrency is a digital, decentralized form of money based on blockchain technology, which makes it the most secure method of making a transaction. There has been a huge increase in the number of cryptocurrencies in the past few years. Cryptocurrencies such as Bitcoin and Ethereum have become an interesting subject of study in fields such as finance. In 2021, over 4,000 cryptocurrencies are already listed. There are many past studies that focus on predicting the price of cryptocurrencies using machine learning, but the majority of them only focused on Bitcoin. Moreover , the majority of the models implemented for price prediction only used the historical market prices, and do not utilize social signals related to the cryptocurrency. OBJECTIVES: In this paper, we propose a deep learning model for predicting the prices of dogecoin cryptocurrency. The proposed model is based on historical market price data as well as social trends of Dogecoin cryptocurrency. METHODS: The market data of Dogecoin is collected from Kaggle on the granularity of a day and for the same duration the verified tweets have also been collected with hashtags “Dogecoin” and “Doge”. Experimental results show that the proposed model yields a promising prediction of future price of Dogecoin, a cryptocurrency that has recently become the talk of the town of the crypto market. RESULTS: Minimum achieved RMSE in predicted price of Dogecoin was 0.02 where the feature vector consisted of OCVP (Open, Close, Volume, Polarity) values from combined dataset. RESULTS: Experimental results show that the proposed approach performs efficiently. s a major factor in the future price of Dogecoin.


Introduction
Cryptocurrencies were introduced back in 2009 with the announcement of Bitcoin by Satoshi Nakamoto as the first peer-to-peer electronic medium of exchange. All cryptocurrencies are a major application of the blockchain which is merely a more secure method of creating applications, and are now highly accepted by virtue of its secure nature of transactions and decentralized form of operation [35]. While it was not so popular in 2009, the prices started skyrocketing from 2013 and in April, 2021 bitcoin reached an all-time high of almost $65000. Inspired by bitcoin, other currencies such as Ethereum and Ripple have also gained popularity in the crypto market. At present, the currency has gained immense popularity due to its relevance with the meme material dog face, the Shiba, which is also the face of Dogecoin.
Since more and more people have started investing in crypto currencies, now it has become an interesting area of research to be able to predict future prices of cryptocurrencies with high accuracy. For over a decade, machine learning algorithms have gained popularity in predicting the stock market [18,20]. But with cryptocurrencies, the solvability of the problem reduces due to multiple factors such as economic and political conditions, as well as human behavior. Also, according to academic research, crypto currencies do not behave in a random manner but at the same time not in a linear, rather a dynamic way. Unlike the stock market, the prices of cryptocurrencies are much more volatile and difficult to predict [17][18][19][20].
Bitcoin has become the most used cryptocurrency, as asserted by the authors of [4,38]. At present, there are over 4000 listed cryptocurrencies. Bitcoin is not the only cryptocurrency that can offer a profitable trade. The second quarter of 2021 has proved to be a major growth period for Dogecoin [34]. Investors have gained huge sums of money and this along with events such as famous celebrities showing their enormous support for Dogecoin, have resulted in a huge increase in the current price of the crypto currency and the volume that is being traded each day.
In the case of Dogecoin though, very few or no significant studies are available in the literature yet [31,32]. This, along with the fact that Dogecoin has witnessed huge growth in terms of volume and value, has motivated the authors to focus their work on Dogecoin. In this paper a working algorithm that can predict the price of Dogecoin developed. The development of such a model can help in making a profitable trade in Dogecoin. Following are the significant motivating factors for this study: 1. Knowledge of price movements by a fraction of a second can lead to high profit investors can make, which makes this study a major motivation for the researcher.
2. It helps determine the intrinsic value of a company including tangible and intangible factors.
3. The results can be used for comparison with currencies' market value and finding out whether the currency is undervalued on the crypto market or not.
Neural networks, Support Vector Machines and Random Forests are few of the most widely used techniques used for stock market prediction [21,26]. On the other hand, deep learning approaches derived from Neural Networks have been used to predict prices of different crypto currencies like Bitcoin and Ethereum. A study conducted in [27,39,40] shows that Random Forests gave the least error among other models when the model was trained with ten technical parameters.
Apart from the publicly available market data, social media has been noticed to have a strong influence on cryptocurrency prices. Social media platforms such as Twitter have played a vital role in this domain [3,[22][23][24][25]. In this regard, Dogecoin prices were also affected due to recent tweets from events that took place on Twitter [36,41]. Thereby, it is a novel approach to utilize social signals and sentiment analysis for the prediction of prices of cryptocurrencies [1,2,29,33].
The number of machine learning and deep learning models implemented to predict prices of the stock market still outnumber those of the crypto market [28]. With the increasing growth in popularity of cryptocurrencies, it becomes absolutely essential to have a model that can predict the crypto market with an accuracy that is at par with that of the stock market.
Majority of the existing algorithms have been developed for predicting the prices of cryptocurrencies Bitcoin, Ethereum and Ripple. These outnumber the studies and research that has been made towards Dogecoin. On the contrary, while the algorithms made for the aforementioned cryptocurrencies have attained good accuracy, in case of the latter, algorithms that exist have considerable error and henceforth, there arises a need for the improvement of these models or the development of a new one.
While these models use different approaches and neural networks such as RNN [4], Bayesian neural networks (BNN) [30], LSTM and Gated Recurrent Units (GRU) [9], models such as [4] and [6], predict the future price of bitcoin using only attributes from historical market data. In [4], Open, High, Low, Close (OHLC) values were used through the implementation of a Bayesian optimized RNN and LSTM network. A highest classification accuracy of 52% and RMSE of 8% was achieved. Majority of these papers work using some or the other attributes available in the historical market data of a cryptocurrency. A few studies [37] use social media trends from Twitter but the area of study is Bitcoin. Henceforth, it is worth mentioning that at the time of writing this paper, no model exists that combines EAI Endorsed Transactions on Industrial Networks and Intelligent Systems 09 2021 -11 2021 | Volume 8 | Issue 29 | e2 attributes from market data and social media trends for Dogecoin and uses them for price prediction.
This paper aims at predicting real time prices of Dogecoin using market and twitter sentiment data and analyzing it for further comparative studies. Further, this paper aims to design a model that can consistently predict the price of Dogecoin.
The proposed model is based on feeding the LSTM network with a feature vector that not only comprises a set of values from the historical market data, but also contains useful information from sentiment analysis of social media data, most of which originates from twitter. This useful information from sentiment analysis can be the polarity or subjectivity of the tweets.
In the proposed model, initially, the historical market data is collected from Kaggle and tweets regarding Dogecoin are collected using Twint. After cleaning of market data, sentiment analysis is performed on tweets using Textblob. The two datasets are then combined over date and null values in the dataset are removed. The feature vector using the information from previous market history and sentiment analysis of tweets related to dogecoin is prepared, and finally this feature vector is fed to the LSTM network for future price prediction.
Predicting the exact price can be very difficult, thereby the key focus of this paper is being able to achieve the least error and the greatest accuracy. The objective is to develop a model that leads to Dogecoin price prediction accuracy by incorporating public sentiments from social media. This paper is divided into six sections. Following the introduction section, the second section presents a literature review of the various related works that have already been carried out in this domain. The third section introduces the proposed methodology along with the problem statement. The fourth section provides a detailed explanation of the materials, methods and the experimental settings used, followed by the results obtained in the fifth section. Further, the conclusion and future scope is presented followed by the research papers and articles used as references for the study purpose of this paper.

Related Work
As discussed in the previous section, several studies [7][8][9][10][11][12][13][14][15][16] have been conducted on the price prediction of cryptocurrencies like Bitcoin and Ethereum. In [5], a feature vector was constructed by taking ten values, five each from market data and twitter sentiment data. This feature vector was then fed into three different models, namely Multi-Layer Perceptron (MLP), Support Vector Machines (SVM) and Random Forests. The results obtained were compared and a conclusion was made that SVM was the best performing model. In another study [6], data was retrieved from Kaggle in USD. After preprocessing and splitting of the data into training set and testing set, an RNN model was used for the computation of price prediction. Another study [30] employed Bayesian neural networks (BNN). Blockchain data and macroeconomic variables were used in this for predicting the Bitcoin prices to get a rmse of 0.23.
Most of the research done earlier did not include all the parameters affecting the price of Dogecoin. Some researchers [6] considered only market data while predicting the prices which resulted in wrong predictions when a change in sentiment caused a sudden change in the price of Dogecoin. On the other hand, certain researches focused solely on the sentiments of people [5] and made it the base of their research in predicting prices. Few studies [9] considered only opening and closing prices of each day to predict the prices. This research lacked the day-to-day changes occurring in the markets which the sentiments were unable to depict. They also lacked the coordination between all these factors to obtain a model that combined all these factors to obtain the result. To overcome these loopholes this study combined the market price and sentiment component instead of focusing on the two aspects separately in a single feature vector to predict the price of dogecoin. A significant decrease in the rmse value in this model can was observed as compared to the earlier models. Table 1 summarizes the various deep learning models employed in predicting the prices of various cryptocurrencies.
Dogecoin is an emerging cryptocurrency, it is important to put an emphasis on the fact that the existing crypto-trading and cryptocurrency price prediction algorithms only work for cryptocurrencies like Bitcoin and Ethereum. Due to the increasing trading volume of Dogecoin in 2021, there arises a need for development of an algorithm for efficient and accurate prediction of its prices. Several magazines such as The Express UK [42] have mentioned and discussed the importance and scope of dogecoin [43][44][45]. The methodologies discussed in table 1 summarize several different models for Bitcoin. As such no scientific algorithm exists in the literature for predicting the price of dogecoin. Several events across the globe play an important role in deciding the uprise or downfall of a cryptocurrency. Examples of such events include banning of a cryptocurrency by a nation or on the contrary, the adoption of a cryptocurrency by a nation as its primary currency or medium of exchange. This gives a hint towards working on including these events in the model somehow. In this paper we use a hybrid model of existing methodologies to solve the problem of price prediction of Dogecoin.

Proposed Approach
The proposed model is developed for predicting the dogecoin cryptocurrency prices based on the historical market data and public opinion towards dogecoin over the social media. Figure 1 depicts the proposed approach. The proposed model works as follows. Initially, we collect the historical and social trend data of the dogecoin cryptocurrency for the same time duration in a granularity of a day. Further, important key attributes such as retweet count, polarity and subjectivity from Twitter using sentiment analysis are extracted, and in parallel, the historical market price values such as market close, open, high and low prices are considered. Further, the information from both kinds of data is extracted and a feature vector is formed which has the important information for predicting the dogecoin market price. The feature vector is further used to develop different deep neural networks-based models such as long short-term memory (LSTM), and Gated Recurrent Units (GRU). We explain each step of the proposed approach in detail in the subsequent sections.

Preprocessing
Firstly, the collected data is preprocessed in order to clean it for further processing. The tweets dataset has inconsistencies, for some dates it has multiple entries/tweets while some dates are missing (no tweet on that day). These inconsistencies are removed. The previous day's sentiment value is considered for that day and for the dates with multiple entries, the mean of the polarity scores was taken for each day.

Feature extraction
The price of dogecoin cryptocurrency depends on a number of factors. Different factors affect the price of dogecoin differently in magnitude. The following factors are key factors that may affect the change in price of dogecoin. The parameters included from market data covers the change in price due to statistical changes in the market while tweets polarity covers the change in price due to the sentimental changes. The information related to the social media trends are extracted using sentiment analysis carried out on the twitter dataset using textblob, whereas, the historical data related features on the corresponding days are extracted from the data collected from the Kaggle. The feature vector has the following attributes: Before dividing the data into training and testing sets, these values are reshaped into the value range 0 to 1.

Deep learning model
After the feature vector is formed, it is fed into a deep neural network for learning the pattern. The LSTM is used to train the model. The Long Short-Term Memory (LSTM) network is a variant of recurrent neural networks, which makes it easier to remember past data in memory [6]. LSTM is well-suited to classify, process and predict time series given time lags of unknown duration. LSTM units include a 'memory cell' that can maintain information in memory for long periods of time. A set of gates is used to control when information enters the memory, when it's output, and when it's forgotten. Long Short-Term Memory (LSTM) networks are capable of learning order dependence in sequence prediction problems hence can learn the order dependence between items in a sequence. LSTMs have the promise of being able to learn the context required to make predictions in time series forecasting problems, rather than having this context pre-specified and fixed. In this paper, we focus on the Dogecoin closing price for the development of the predictive model. The increase or decrease in the Dogecoin price with the higher volatility makes it harder to predict, but the LSTM model is well suited to predict Dogecoin prices. The proposed models are evaluated on the basis of root mean square error.

Dataset collection
The market data is collected from Kaggle, Dogecoin market data is available at the granularity of a day. In parallel to the market data, verified tweets containing the hashtags "dogecoin", "doge" are collected. Both datasets are collected over a period of Jan 2016 to May 2021.
Historical market data was obtained from the top performing cryptocurrency exchanges. The data was fetched from the kaggle dataset from source [2]. The data obtained had a daily granularity and contained the closing price, opening price, lowest price, highest price and the transactional volume traded for the day. Time duration of historical market data used: January 1, 2016 to May 15, 2021. Further, all tweets featuring the hashtags "Dogecoin", "Doge", "Cryptocurrency" were collected using the Twint library. The selected tweets had the following attributes: 1.
--verified: The verified attribute was used to eliminate dump and meaningless tweets. 2.
--en: The tweets selected were in the English language. 3.
Total number of samples for the historical market data were 1958, and the number of twitter dataset were 22147 for the same duration. The multiple tweets of the same day are combined as described below. Further, the data for training is divided into 70 percent and 30 percent for testing, so the training dataset consists of 1372 samples and the test dataset contains 588 samples. Few sample tweets are shown in Table 3 in order to give an idea about how the opinion expressed over twitter social media is about dogecoin.

Data Visualization
In order to better understand the dataset, we visualize different attributes such as open, close and volume of dogecoin prices with respect to the time.
• In Fig 2, the graph depicts the changes in closing price of dogecoin versus time over a duration of January 1, 2016 to May 15, 2021. The graph makes it clearly evident that till the later part of 2020, the close price did not show a significant rise. On the other hand, another observation can be made that the close price has increased by over 800% in 2021 alone.

Fig. 2. Data Visualization of closing price of Dogecoin
• In Fig 3, the graph depicts the changes in closing price of dogecoin versus time. The graph makes it clearly evident that the high price of dogecoin shows a similar plot as the close price. One reason for this is that it is the first time that dogecoin has reached an all-time high.

Evaluation measure
Time series usually focus on predicting real values, which are called regression problems. Hence, the performance measurements in this paper focus on real-value prediction evaluation methods. The most commonly used method to measure error in time series models is root-mean squared error (RMSE) [11]. The root-mean-square error (RMSE) is a frequently used measure of the differences between values (sample values and predicted values of a model.

Hyperparameter tuning
We have performed hyperparameter tuning to optimize the parameters for the proposed models. Firstly, eight different optimizers such as Adadelta, Adagrad, Adam, Adamax, Ftrl, Nadam, Rmsprop, and Sgd were compared against one another whilst keeping the values of other hyperparameters constant. Fig 5  demonstrates the comparison of RMSE values obtained using different optimizers. The best performing optimizers were reported to be adam and nadam, whilst the worst performing were sgd and adadelta. Adam was chosen as the appropriate optimizer due to its lower degree of reported errors when compared to nadam, adadelta, ftrl and adagrad. Adam was chosen as the best optimizer; a similar method was adopted to obtain the other hyperparameters such as the epochs value and loss function.

Results and Discussion
The results obtained with different combinations of the proposed models varying different deep neural network models are presented and described in the following cases.

Case 1. Analysis using historical Market Data HIST_LSTM model
In this case, only the historical data related features are considered, and we discard the sentiment analysis related features. Figure 8 depicts the price prediction graph when the traditional market data was fed to the LSTM without adding any twitter sentiment data. RMSE value of 0.03 was achieved and the approximate percentage error calculated was 31.3 . MODEL  Fig 9. depicts the price prediction graph when the feature vector composed of all values from market data (open, close, high, low prices and volume traded) were used along with polarity from processed twitter data. RMSE value of 0.027 was achieved and the approximate percentage error calculated was 20.8%. There is a slight improvement over the conventional "only market data" feature vector. Fig. 10 depicts the price prediction graph when the feature vector consists of open, close prices and volume traded from market data and polarity from processed twitter data. This feature vector yielded an RMSE value of 0.020 and calculated percentage error was approximately 15%. Upon detailed analysis, the conclusion drawn about the reason for the improvement of this model over that in Case 2 was that the high and low prices of a day show heavy deviation over closing price in case of Dogecoin in the recent time. Table 5 shows the results of various combinations of the proposed models with varying features in order to understand the main reasons which contribute to the prediction of the dogecoin prices.

Discussion
Papers such as [4], [6] and [9]  paper is on price prediction of Dogecoin, the results discussed previously in section II are compared with the results yielded by the proposed model in this paper. Proposed model was tested over three cases: 1. Feature vector with only values from market data. 2. Feature vector with all values as discussed in section 3. Feature vector with open price, close price, volume traded and polarity from sentiment analysis data.
The proposed model gave the least error in case 3. The minimum achieved RMSE over all test runs was 0.020. It can be very well observed that in the very low-price range in which Dogecoin is traded as of May 2021, a reduction in the RMSE by 0.01 from case 1 to case 3 is remarkable. This clearly indicates that using social signals from twitter have increased the prediction accuracy of the proposed model. The result obtained in case 3 has been compared with several similar models in Table 5 . Fig 11 also compares the error of the proposed model with other similar models in graphical format. Fig. 11. Comparison between best performing LSTM and GRU models.

Conclusion and Future Work
The aim of this paper is to develop a deep neural networkbased model for predicting dogecoin prices by combining the market data of Dogecoin and sentiments extracted from the tweets. The developed model using LSTM and sentiment analysis is more accurate than the already existing traditional models. LSTM has been used because of its capability to recognize long-term dependencies. Two deep learning models namely LSTM and GRU were used in different combinations. Experimental results show that the error is minimum when historical market data (without high and low prices) and polarity from twitter sentiment are used in the feature vector employed in GRU model. Therefore, it can be concluded that this approach of combining the historical market data of dogecoin prices with that of the sentiment polarity is more efficient in predicting the dogecoin prices. The authors wish to extend the scope of this work in future by expanding the feature vector by including a new feature vector with news sentiment analysis that can prove to be promising in terms of higher accuracy and lower error.