Home Blockchain DIY Building A Simple Blockchain Data Structure With Python

Building A Simple Blockchain Data Structure With Python

Here, I am going to build a simple blockchain data structure which is the foundation of Bitcoin. This data structure only is not enough to build even a simple cryptocurrency. But we have to start somewhere.

Before building a blockchain data structure, I have to explain about hashing. Bitcoin uses SHA-256. Here how you can do it in Python:

>>> import hashlib
>>> hashlib.sha256(b”hello world”).hexdigest()

That is how you do hashing (SHA-256) in Python. But what is actually hash? What does 256 in SHA-256 means actually?

Hashing is a process which you turn anything (as long as you can represent it as a string) into a fixed 256 bit string. In the previous example, the string “hello world” has a length 11. Actually the length of “hello world” is depended on how you count it. But for simplicity, we just count how many characters. That “hello world” string is turned into a fixed-size string, which is ‘b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9’. Say I hash another string with different length.

>>> import hashlib
>>> hashlib.sha256(b”I am the best president. Ever.”).hexdigest()

The string “I am the best president. Ever.” has a different length than “hello world”. But the output has the same length, which is about 64 characters. So any input will be turned into 64 random characters string. Even a string which has a 23 kilometers length will be turned into a 64 random characters string.

This is a hexadecimal string. That’s why it has 64 characters. If you turn it into a bit string, it will have a 256 characters length.

>>> bin(0xb94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9)

That is why it is called SHA-256.

Now, there are some properties from hashing which is very important for Bitcoin. The first is called collision free. I mean if you are going to turn anything into a fixed 256 bit string, surely there will be more than 1 input that has the same output. The size of possible inputs is bigger than the size of possible output. Yes, that’s correct. But finding x and y where x is different than y and hash(x) is equal hash(y) is extremely hard (in SHA-256 case; some smart people have found collision in SHA-1).

So there is no apparent relation between the input and the output. Even you change the tiny bit of the input, the output will be totally different. This is the second property.

>>> hashlib.sha256(b”1").hexdigest()
>>> hashlib.sha256(b”2").hexdigest()
>>> hashlib.sha256(b”3").hexdigest()
>>> hashlib.sha256(b”11").hexdigest()

So the only way to to find different inputs which have the same output, you need to test all combination of characters with different length. “abc”, “maybe a loooooong string”, “17”, etc. It’s totally impractical.

But is there any possibility that when you hash something, the output is already same as hashing of “hello world”? Or just any two different strings but have the same hash? Of course, there is. The question is how minuscule the probability is. There are around 2²⁵⁶ possibilities of the output of SHA-256. How big is 2²⁵⁶? 115792089237316195423570985008687907853269984665640564039457584007913129639936. Or 1.15 e+77. Or roughly 10⁷⁷. If you think that number is big but not very big, I have a bad news for you. The total atom in observable universe (that is the universe that you can see up to 46 billions light years in any direction from your chair) is 10⁷⁸ to 10⁸². https://www.universetoday.com/36302/atoms-in-the-universe/

My computer has Nvidia Geforce 1080 Ti. It has 11.3 teraflops (tera = 10¹²). Flop is floating operation. Hashing is integer operation. So it’s apple to orange. But for simplicity, say hashing is an floating operation as well and requires 3000 operations per hash. So my graphic card can compute 3766666666 hash per second. To find a collision, described in birthday attack, we need only to compute 2¹²⁸ hashes. Say every human on this planet has my graphic card and together we compute the collision attack. It takes:

>>> 2**128 / (7000000000 * 3766666666.6666665)

That number is longer than the age of universe (around 10¹⁷ seconds). https://www.space.com/24054-how-old-is-the-universe.html

To describe the hashing algorithm, it is quite a work. But someday I’ll explain the code behind SHA-256. Now that you can comprehend the vastness of hashing, let’s move on. Blockchain is like a linked list, a data structure known by many computer science students. Let’s create a block. The first block in Bitcoin is called genesis block.

import hashlib, jsonblock_genesis = {
‘prev_hash’: None,
‘transactions’: [1, 3, 4, 2]

The transactions represents the… well, transactions. In Bitcoin, it will be like “Jason pays 2 btc to Mary Sue”, “Kylo Rein pays 10 btc to Yoda”. For simplicity, we just put normal integers.

We serialized the block so it can be hashed.

block_genesis_serialized = json.dumps(block_genesis, sort_keys=True).encode(‘utf-8’)
block_genesis_hash = hashlib.sha256(block_genesis_serialized).hexdigest()

Now we have another block.

block_2 = {
‘prev_hash’: block_genesis_hash,
‘transactions’: [3, 3, 3, 8, 7, 12]

We hash the block 2.

block_2_serialized = json.dumps(block_2, sort_keys=True).encode(‘utf-8’)
block_2_hash = hashlib.sha256(block_2_serialized).hexdigest()

We build another block.

block_3 = {
‘prev_hash’: block_2_hash,
‘transactions’: [3, 4, 4, 8, 34]

We hash the block 3. This will be the last block, I promise.

block_3_serialized = json.dumps(block_3, sort_keys=True).encode(‘utf-8’)
block_3_hash = hashlib.sha256(block_3_serialized).hexdigest()

To make sure that data has not been tampered, I only need to check the last block’s hash, instead of checking all the data from genesis block to the last block. If it is different, than someone tried to tamper the data.

import hashlib, jsonblock_genesis = {
‘prev_hash’: None,
‘transactions’: [1, 3, 4, 2]
}block_2 = {
‘prev_hash’: None,
‘transactions’: [3, 3, 3, 8, 7, 12]
}block_3 = {
‘prev_hash’: None,
‘transactions’: [3, 4, 4, 8, 34]
}def hash_blocks(blocks):
prev_hash = None
for block in blocks:
block[‘prev_hash’] = prev_hash
block_serialized = json.dumps(block, sort_keys=True).encode(‘utf-8’)
block_hash = hashlib.sha256(block_serialized).hexdigest()
prev_hash = block_hash return prev_hashprint(“Original hash”)
print(hash_blocks([block_genesis, block_2, block_3]))print(“Tampering the data”)
block_genesis[‘transactions’][0] = 3print(“After being tampered”)
print(hash_blocks([block_genesis, block_2, block_3]))

The result:

Original hash
Tampering the data
After being tampered

This is the basic of the blockchain.

Source link

Must Read

Artificial Brains Need Sleep Too

 States that resemble sleep-like cycles in simulated neural networks quell the instability that comes with uninterrupted self-learning in artificial analogs of brains.No one can...

Differenciating Bitcoin and Electronic Money

Bitcoin has the largest market share among virtual currencies, and is already being used on a daily basis overseas. Since it is a virtual...

Answering the Woes of Staking Centralization

What if better behavior on blockchains could be encouraged with fun rather than value?Josh Lee and Tony Yun of Chainapsis built a staking demo at the Cross-Chain...

The future of Machine Learning

Machine learning (ML) is the process which enables a computer to perform something that it has not been explicitly told to do. Hence, ML...

Is Automation the solution for rapid scaling in response to the Pandemic

Thanks to the pandemic, the nature of work for federal agencies changed almost overnight. Agencies are now attempting to meet the challenges of a...

Siemens and SparkCognition unveil AI-driven cybersecurity solutions

Today, Siemens and industrial AI-firm, SparkCognition, announced a new cybersecurity solution for industrial control system (ICS) endpoints.DeepArmor Industrial, fortified by Siemens, leverages artificial intelligence (AI) to...

Amazon and Microsoft follow IBM, no longer in Face Recognition business

At least its bandwagon-detection AI still worksMicrosoft said on Thursday it will not sell facial-recognition software to the police in the US until the...

Developing smart contracts with buffered data model

How specifying world state data model with protocol buffers can help in developing smart contracts

Reasons why your AI Project might fail

Here is a common story of how companies trying to adopt AI fail. They work closely with a promising technology vendor. They invest the...

Pointers for Investing in AI Startups

The COVID-19 pandemic has drastically deranged the economic activity globally, and the startup ecosystem hasn’t been spared as well. Majority of them have been...
banner image