BitCoin Market Cap

Cryptocurrencies: A Digital Goldrush?

Part I: Market Capitalization

Market capitalization is realized through a simple formula:

Digital currencies are all the rage, and may currently be imploding. We’ll have to wait and find out what happens in due time.

While we wait, we can at least gawk at the obscene amount of value they have generated in their infancy.

We’re going to survey the field of cryptocurrencies currently on market, and calculate the wealth contained within via a market capitalization analysis. To accomplish this, we’re going to utilize Python and Pandas to generate our data, and visualize it with MatPlotLib using a Jupyter Notebook.

Getting Started - Loading Data

So, let’s grab some data from the available API and see what we have. First, notice we’ve imported Pandas as pd, matplotlib.pyplot as plt, and set our style to fivethirtyeight, I’d also suggest setting it to ggplot for another snazzy(and professional) theme.


# Importing pandas
import pandas as pd

# Importing matplotlib and setting aesthetics for plotting later.
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'svg' 
plt.style.use('fivethirtyeight')

# Reading in current data from coinmarketcap.com
current = pd.read_json("https://api.coinmarketcap.com/v1/ticker/")

Exploratory Data Analysis - EDA

As always, let’s start by taking a peek at what we have.


# Print Column Names
print(current.columns)

# Check Head
print(current.head())

# Check Tail
print(current.tail())

Our output looks like this…

Index(['24h_volume_usd', 'available_supply', 'id', 'last_updated',
       'market_cap_usd', 'max_supply', 'name', 'percent_change_1h',
       'percent_change_24h', 'percent_change_7d', 'price_btc', 'price_usd',
       'rank', 'symbol', 'total_supply'],
      dtype='object')

   24h_volume_usd  available_supply            id  last_updated  \
0      9898900000          16819812       bitcoin    1516660460   
1      3594420000          97146771      ethereum    1516660452   
2      2119670000       38739142811        ripple    1516660442   
3       642242000          16926100  bitcoin-cash    1516660457   
4       475086000       25927070538       cardano    1516660460   

   market_cap_usd    max_supply          name  percent_change_1h  \
0    177126076210  2.100000e+07       Bitcoin               2.06   
1     94059737996           NaN      Ethereum               3.04   
2     46788749296  1.000000e+11        Ripple               2.53   
3     26496116940  2.100000e+07  Bitcoin Cash               1.88   
4     14143424395  4.500000e+10       Cardano               3.25   

   percent_change_24h  percent_change_7d  price_btc     price_usd  rank  \
0               -7.93             -24.03   1.000000  10530.800000     1   
1               -6.59             -25.29   0.092869    968.223000     2   
2              -12.06             -29.91   0.000116      1.207790     3   
3              -10.83             -34.82   0.150147   1565.400000     4   
4              -10.06             -31.01   0.000052      0.545508     5   

  symbol  total_supply  
0    BTC      16819812  
1    ETH      97146771  
2    XRP   99993093880  
3    BCH      16926100  
4    ADA   31112483745  


    24h_volume_usd  available_supply          id  last_updated  \
95        15913400         617314171  quantstamp    1516660463   
96         3304050          10891318     bitcore    1516660454   
97        11584700         104661310        tenx    1516660455   
98          142157        1000000000       xplay    1516660458   
99        16741300         342699966       civic    1516660457   

    market_cap_usd  max_supply        name  percent_change_1h  \
95       252963618         NaN  Quantstamp               5.32   
96       251393411  21000000.0     Bitcore               5.56   
97       230343844         NaN        TenX               3.48   
98       228942000         NaN       XPlay               3.01   
99       220564097         NaN       Civic               2.04   

    percent_change_24h  percent_change_7d  price_btc  price_usd  rank symbol  \
95               -7.65              -4.53   0.000039   0.409781    96    QSP   
96              -13.68             -21.03   0.002214  23.082000    97    BTX   
97               -7.61             -35.45   0.000211   2.200850    98    PAY   
98               -8.60             -20.43   0.000022   0.228942    99    XPA   
99               -9.57             -34.37   0.000062   0.643607   100    CVC   

    total_supply  
95     976442388  
96      16763281  
97     205218256  
98   10000000000  
99    1000000000

Let’s make note of our columns, immediately we should realize we don’t even need our simple market capitalization formula, as it’s already been calculated for us as ‘market_cap_usd’.

Let’s make a quick list of useful columns:

id
market_cap_usd
percent_change_24h
percent_change_7d

Now, we’ve only peeked at a limited set of the JSON data, so let’s look at a more comprehensive list from a stored csv that’s been translated from the JSON source.


# Read the CSV
csv_data = pd.read_csv('datasets/coinsJan2018.csv')
raw_df = pd.DataFrame(csv_data)

# Extract 'id' and 'market_cap_usd' 
market_cap_raw = raw_df[['id','market_cap_usd']]

# Using .info()
print(market_cap_raw.info())

# Summary Statistics
print(market_cap_raw.describe())

# Counting the number of values
market_cap_raw.count()


<class 'pandas.core.frame.DataFrame'>

RangeIndex: 1474 entries, 0 to 1473
Data columns (total 2 columns):
id                1474 non-null object
market_cap_usd    1123 non-null float64
dtypes: float64(1), object(1)
memory usage: 17.3+ KB
None



       market_cap_usd
count    1.123000e+03
mean     4.737687e+08
std      6.510767e+09
min      1.200000e+01
25%      7.937405e+05
50%      6.616421e+06
75%      4.118481e+07
max      1.840834e+11



id                1474
market_cap_usd    1123
dtype: int64

Notice that the id and market_cap_usd values are different? We have missing data. We need to handle that.

There are multiple ways to handle the missing data, often we will choose to simply utilize .dropna(), but in this case let’s use .query().


# Filtering out rows without a market capitalization
market_cap_filtered = market_cap_raw.query('market_cap_usd > 0')

# Counting the number of values again
print(market_cap_filtered.info())
print(market_cap_filtered.count())


<class 'pandas.core.frame.DataFrame'>
Int64Index: 1123 entries, 0 to 1122
Data columns (total 2 columns):
id                1123 non-null object
market_cap_usd    1123 non-null float64
dtypes: float64(1), object(1)
memory usage: 21.9+ KB
None


id                1123
market_cap_usd    1123
dtype: int64

Great! We’ve removed the missing/non-capitalized coins from our dataset.

Now we can proceed.

Let’s get a sense of the distributions of value within the crypto field. We’re going to plot the top 10 currencies ranked by market capitalization and displayed as a bar plot. Bar plots are great in this situation because they communicate a density of information with easy audience intuition, and fit our situation as we are binning categorical data by ‘id’.

First, let’s make some decisions about what we’re doing to help frame our process. Let’s set our title, and the y-axis labels as ‘Top 10 Currencies by Market Cap’, and ‘Percent Total (%)’ TOP_TEN_TITLE = ''Top 10 Currencies by Market Cap'' TOP_TEN_YLABEL = 'Percent Total (%)'

We can neglect naming the X-axis because we’re going to be displaying the names of each coin, so we don’t need to communicate this label to our audience for them to understand the plot and properly interpret the data, otherwise you should really give more consideration about your axes designations. Your goal is not only to communicate data, but also give your audience the variables they need to understand what they’re looking at, to craft their own analysis, and ask intelligent questions.

Let’s use slicing to return the top ten results already sorted for us, so we don’t need to utilize a sorting method now (spoiler: We will later) to yield our list.

So, we will slice indices [0:10] from our filtered list and set our index to id

top_10 = pd.DataFrame(market_cap_filtered[0:10]).set_index('id')

Now let’s do some math with our dataframe, and create a new column using .assign and named market_cap_percent, with index values representing the percentage of each coin’s market cap. The math is simply each coin’s value divided by the total market’s multiplied by 100, which we will model as:

# market_cap_perc top_10 = top_10.assign(market_cap_perc=(top_10.market_cap_usd / market_cap_filtered.market_cap_usd.sum()) * 100)

Now we can create our plot of just the percentages representing the top-10 currencies as follows:


# Plot- Figure 1
ax = top_10.plot.bar(y='market_cap_perc', title=TOP_CAP_TITLE)
ax.set_ylabel(TOP_CAP_YLABEL)
plt.show()

So, it looks like Bitcoin is around 35% of the total market. This is actually lower than I expected, so I’m glad I plotted this one out. I guesss the 1000+ other coins have accrued some value with the rise of BitCoin.

It’s a boring graph though.

Let’s throw in a second column per bin, representing the value in USD, and set a log scale on the y-axis to deal with the large numbers($180 billion in BitCoin’s case).


#Plot- Figure 2

# Colors for the bar plot
COLORS = ['orange', 'green']

# adding the colors and scaling the y-axis
ax = top_10.plot.bar(title = TOP_CAP_TITLE, color = COLORS)
ax.set_yscale('log')

# Annotating the y axis with 'USD'
ax.set_ylabel('USD')

# Removing useless x-label
ax.set_xlabel('')
plt.show()

Part II: Measuring volatility

Now, things have been turbulent recently. Let’s get a sense of the movers and shakers. We’ll assay the movement via percent change of market value, from a 24 hour interval, and over a week’s time.

Nothing too fancy here, let’s again extract our target columns percent_change_24h and percent_change_7d.

The spoiler from earlier is appropriate now, we’re going to sort our dataframe by id and with ascending = True to separate our winners/losers by conveniently utilizing .head() and .tail()


# Selecting the id, percent_change_24h and percent_change_7d columns
volatility = raw_df[['id', 'percent_change_24h', 'percent_change_7d']]
# Setting the index to 'id' and dropping all NaN rows
volatility = volatility.dropna().set_index('id')
# Sorting the DataFrame by percentage_change_24h in ascending order
volatility = volatility.sort_values(by = 'percent_change_24h', ascending = True)
# Checking the first few rows
print(volatility.head())
print(volatility.tail())

Let’s check the output…

                        percent_change_24h  percent_change_7d
id                                                           
yellow-token                        -80.77             -52.88
aerium                              -76.69             -51.57
ethereum-movie-venture              -73.67             -10.73
debitcoin                           -65.10             -60.04
kilocoin                            -61.52             -81.44
               percent_change_24h  percent_change_7d
id                                                  
newbium                    149.02             234.05
b2bx                       176.00              97.82
bigup                      241.04             -10.49
xios                       311.48             275.85
pirate-blocks              473.79             214.81

It looks like volatility is alive in well during this correction/implosion/bursting bubble/discount opportunity. But we’re visual creatures, let’s take a look and plot our data again…


# Defining a function with 2 parameters, the series to plot and the title
def top10_subplot(volatility_series, title):
    # Making the subplot and the figure for two side by side plots
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10, 6))

    # Plotting with pandas the barchart for the top 10 losers
    ax1 = volatility_series[:10].plot.bar(color="red", ax=axes[0])

    # Setting the figure's main title to the text passed as parameter
    fig.suptitle(title)

    # Setting the ylabel to '% change'
    ax1.set_ylabel('% change')
    ax1.set_xlabel('')

    # Same as above, but for the top 10 winners
    ax2 = volatility_series[-10:].plot.bar(color="green", ax=axes[1])
    ax2.set_xlabel('')
    plt.tight_layout()
    plt.show()
    # Returning this for good practice, might use later
    return fig, axes


DTITLE = "24 hours top losers and winners"

# Calling the function above with the 24 hours period series and title DTITLE
fig, ax = top10_subplot(volatility['percent_change_24h'], DTITLE)

# Sorting in ascending order
volatility7d = volatility['percent_change_7d'].sort_values(ascending = True)

WTITLE = "Weekly top losers and winners"

# Calling the top10_subplot function
fig, ax = top10_subplot(volatility7d, WTITLE)

24 Hour Winners/Losers

7-Day Winners/Losers

These are massive swings in either direction. However, it’s important to note that those affected are largely very low value cryptocurrencies.

We should classify the coins based on their relative values. We’ll do this by filtering our dataframe into subsets we can categorize. We’ll define large, medium, and small cap value groups.


# Function to return cap counts, pass in an argument 'market_cap_usd larger or smaller than value'
def capcount(query_string):
    #return a query for the designated size condition, returns a count of that query
    return cap.query(query_string).count().id

# Labels for the plot
LABELS = ["large", "mid", "small"]

# Using capcount count the 'large' coins
medium = capcount('market_cap_usd > 300000000')

# get the micro counts
micro = capcount('market_cap_usd < 300000000 and market_cap_usd > 50000000')

# and the nano counts
nano =  capcount('market_cap_usd < 50000000')

# Populate a list with the 3 counts
values = [large, micro, nano]

# Plot them out with matplotlib 
plt.bar(range(len(values)), values, tick_label = LABELS)
plt.title("Crypto Size Chart")
plt.ylabel("Observations")
plt.xlabel("Market Cap Size")
plt.tight_layout()
plt.show()

Classification counts of cryptocurrencies

Written on January 1, 2018