Bitcoin has been in news quite a bit lately with the price soaring. It was named the top performing currency four of the last five year. And it’ price has the potential to hit over $100,000 in 10 years, which would mark a 3,483 percent rise from its recent record high. In this post, we are modeling Bitcoin’s Market Capitalization.
Bitcoin is the decentralized network which allows users to transact directly, peer to peer, without a middle man to manage the exchange of funds. Please click this link for more information. Data used here is from Bitcoin put together by Quandl, which is a magnificent platform to scout for financial and economic-related data.
Understanding the data for Modeling Bitcoin’s Market Capitalization
We have the data for bitcoin spanning across 14 attributes with very technical lingo. I researched the web to understand these technical terms and frame the analysis to understand which terms are important. These attributes are as follows:
- Date: The data for stocks for Bitcoin is distributed over time-series and is recorded beginning August 28th, 2013 till June 30, 2017
- Total BTC — This is the total number of bitcoins being involved in transactions till date. This is like a cumulative frequency which keeps getting added on the Total BTC value on it’s predecessor day.
- Market Cap — The total USD value of bitcoin supply in circulation, as calculated by the daily average market price across major exchanges.
- Transactions last 24h — The aggregate number of confirmed Bitcoin transactions in the past 24 hours and vary differently everyday. This is not a cumulative value.
- Transactions avg. per hour — The number of confirmed Bitcoin transactions per hour
- Bitcoins sent last 24h — tells you how many bitcoins were sent in the last 24 hours.
- Bitcoins sent avg. per hour — tells you how many bitcoins were sent per hour
- Count: Current block count
- Blocks last 24h — In terms of Bitcoin, a block is a storage section where your transaction data gets permanently recorded. Blocks are basically files which can be thought of as being organized into linear sequences over a period of time known as the block chain.The average block size in MB.https://blockchain.info/charts
- Blocks avg. per hour — The average block size in MB per hour
- Difficulty — A relative measure of how difficult it is to find a new block. The difficulty is adjusted periodically as a function of how much hashing power has been deployed by the network of miners. The Bitcoin network has a global block difficulty. Valid blocks must have a hash below this target. Mining pools also have a pool-specific share difficulty setting a lower limit for shares. https://en.bitcoin.it/wiki/Difficulty
- Next Difficulty — https://bitcoinwisdom.com/bitcoin/difficulty
- Network Hash-rate Trahashs — The estimated number of tera hashes per second (trillions of hashes per second) the Bitcoin network is performing.
- Network Hash-rate PetaFLOPS — http://bitcoin.sipa.be/
Data Types: Date object Total BTC float64 Market Cap float64 Transactions last 24h float64 Transactions avg. per hour float64 Bitcoins sent last 24h float64 Bitcoins sent avg. per hour float64 Count float64 Blocks last 24h float64 Blocks avg. per hour float64 Difficulty float64 Next Difficulty float64 Network Hashrate Terahashs float64 Network Hashrate PetaFLOPS float64 dtype: object
There are 101 values missing from Transactions last 24h, Transactions avg. per hour, Bitcoins sent last 24h and Bitcoins sent avg. per hour.
Further, from the statistics :
- 28th November 2016 to 11th November 2016
- 29th October 2016 to 15th September 2016
- 12th May 2016 to 26th May 2016
- 18th July 2015, 16th July 2015 and 14th July 2015
- 16th April 2014 and 15th April 2014
- 23rd Feb 2014 and 22nd Feb 2014
- 2nd Feb 2014 and 2nd Jan 2014
Total BTC, Count, Blocks last 24h, Blocks avg. per hour, Difficulty, Next Difficulty, Network Hashrate Terahashs, Network Hashrate PetaFLOPS are same. The value of blocks is zero with Market Cap changing.
We are going to add three more features to see if it helps in modeling Bitcoin’s market capitalization. These features are
After an analysis, we see that ‘Total BTC’, ‘Transactions last 24h’, ‘Transactions avg. per hour’, ‘Count’, ‘Difficulty’, ‘Next Difficulty’, ‘Network Hashrate Terahashs’, ‘Network Hashrate PetaFLOPS’, ‘Total transactions per day’ have an effect on ‘Market Cap’.
We will keep these columns for our model. We have different columns and we will be predicting Market Cap as it gives a clear idea about how Bitcoin is doing on transactions each day.
We fill any missing or NaN(Not a number) value with -99999 since with ML algorithms, it would then be treated as an outlier value and would just be rejected. Remember that we also need to perform the scaling because we have a lot of differences in some of the values in the data.
Compile The Model, Fit The Data
We are going to use linear regression for modeling Bitcoin’s Market Capitalization. We are going to use all the parameters (‘Total BTC’, ‘Transactions last 24h’, ‘Transactions avg. per hour’, ‘Count’, ‘Difficulty’, ‘Next Difficulty’, ‘Network Hashrate Terahashs’, ‘Network Hashrate PetaFLOPS’, ‘Total transactions per day’) to fit a linear regression model.
Estimated intercept coefficient: 14493862.65
Data frame with features and estimated coefficients
As you can see from the data frame that there is a high correlation between Total BTC(4.5), Transactionavg, per hour Network Hashrate Peta Flops and Market Cap. Let’s plot a scatter plot between True Market Cap and True Total BTC.
Below we are plotting a scatter plot to compare the first 200 Bitcoin’s, Market Cap and the predicted market cap.
You can notice that there is some error in the prediction as the market capitalization decreases.