Business

Analyzing 1 billion rows of BitClout data

Ashish Singal
May 5, 2022

NOTE: Fidap is in early alpha. There will be bugs and issues — please send them our way and we’ll do our best to fix ASAP. Send me a LinkedIn message or join our Slack group.

My friend introduced me to BitClout a couple weeks ago. I’m no crypto or blockchain expert (though the Bitcoin rally over the past few years has been very kind to me), but it seems like a really powerful idea.

A Youtube video of this article is below —

What is BitClout?

From Wikipedia — “BitClout is an open source social media platform where users can buy and sell cryptocurrency “creator coins” based on people’s reputations and speculation of future value.” Basically, you can bet on the reputation of different people, and the value of their coins go up and down based on demand (via an automated market making model).

BitClout is backed by a bunch of really high profile investors, including Sequoia, a16z, and Social Capital. It has gotten a bunch of traction, including almost 25k Twitter followers as of this writing.

BitClout has detailed documentation here. It was founded by a person that goes by the name of diamondhands, and the central thesis is to build an open social network via blockchain. BitClout.com is actually just an example of what you can build on the BitClout platform, and their vision is that multiple front ends are built on the same blockchain.

What about BitClout data?

The interesting thing about BitClout is because it is based on blockchain, all the data is publicly accessible by scraping this chain. However, just because it is possible does not mean it is easy. Every person wanting to analyze BitClout data or build an app on top of BitClout needs to build and maintain large scale data pipelines.

There have actually been several requests by some leaders in the BitClout community (just look at his Creator Coin price :) to have this data made more easily accessible to all -

So we have grabbed the data and put it into BigQuery and made it accessible on the Fidap data platform.

Exploring the data

Let’s dive in and start exploring the data — here’s a link to the BitClout dataset on Fidap. It’s big, but not huge, especially compared to some other datasets on Fidap. But still, at 1 billion rows and 500+ GB, it’s a non trivial amount of data.

Note that we are currently updating the data about once a week — though we do have the ability to turn on live, streaming updates.

The table names are a bit difficult to understand at first glance -

Let’s dive into one pretty straightforward table, PrefixProfileUsernameToProfilePubKey_25. This contains a couple key columns — namely the PKID and the Username.

Using this data, we can write a simple query that just gets back a given user’s PKID given their Username — for example -

select * from fidap-301014.bitclout.PrefixProfileUsernameToProfilePubKey_25 where Username = ‘elonmusk’

This gives us the result -

BC1YLiFNARSWF6qtXM5acrS7q8VWPeeS2gycVBtqLALkE4c1V3kx4US:tBCKXNcBbiTfCkpGFeu4u2MdXoXmyACxmigQuZm5AiZH1Wxv9tngoj

You can run this query on platform here.

Top users by likes

Let’s now do something a bit more complex. We’ll explore how to find users with the most “clout” on BitClout — as measured by how many likes they have.

Here’s the notebook on Fidap. You can run it easily in Google Colab here.

In order to do this, we need to join four different tables together -

  1. PrefixProfileUsernameToProfilePubKey_25
  2. PrefixPKIDToPublicKey_37
  3. PrefixLikerPubKeyToLikedPostHash_30
  4. PrefixPostHashToPostEntry_17

Here’s what the SQL query looks like -

SELECT Username, NumUsernames, SUM(Likes) AS total_likes, ROUND(SUM(Likes)*100/SUM(SUM(Likes)) OVER(), 2) AS percent_of_total
FROM
 (SELECT t.Body, u.Username, u.NumUsernames, COUNT(l.PublicKey) AS Likes
  FROM fidap-301014.bitclout.PrefixPostHashToPostEntry_17 AS t
  LEFT JOIN fidap-301014.bitclout.PrefixLikerPubKeyToLikedPostHash_30 AS l
  ON t.PostHash = l.LikedPostHash
  JOIN fidap-301014.bitclout.PrefixPKIDToPublicKey_37 AS p
  ON t.PosterPublicKey = p.PublicKey
  JOIN (SELECT PKID, Username, NumUsernames FROM (
          SELECT PKID, Username, RANK() OVER(PARTITION BY PKID ORDER BY Time DESC) Rank,
                 COUNT(DISTINCT Username) OVER (PARTITION BY PKID) NumUsernames
          FROM fidap-301014.bitclout.PrefixProfileUsernameToProfilePubKey_25
       ) WHERE Rank = 1 GROUP BY 1,2,3
  ) AS u ON p.PKID = u.PKID
  GROUP BY Body, Username, NumUsernames
  ORDER BY Likes DESC
 ) AS table1
GROUP BY Username, NumUsernames
ORDER BY total_likes DESC LIMIT 100

Our results are here -

As of this writing, krassenstein, huntsauce and rajlahoti have the most number of likes on their posts.

What else should we do on BitClout data? I want to see which coins are undervalued based on followers and likes on BitClout.

We are fidap on BitClout and you can also post your ideas in our Discussion section on Fidap.

Ashish Singal
Ash is the founder / CEO of Fidap. Previously, he was at Google and Bloomberg. He loves chocolate, puppies, and clean data.

Our latest news

Find our company news, product announcements, and in depth data analysis on our blog.

Ready to get started?

Start for Free