The New Era of Web3 Data Retrieval: Blockchain Indexer Analysis and Applications

2025-07-18 16:55:21

The Evolution of Web3 Data Access: Analyzing Indexers and Related Projects

Data is the core of blockchain technology and the foundation for developing decentralized applications ( dApp ). Currently, most discussions focus on data availability ( DA ), which ensures that all network participants can access the latest transaction data for validation. However, another equally important but often overlooked aspect is data accessibility.

In the modular blockchain era, DA solutions have become an indispensable part. These solutions ensure that all participants can access transaction data, enabling real-time verification and maintaining network integrity. However, the functionality of the DA layer is more akin to a billboard rather than a database. This means that data is not stored indefinitely and will be deleted over time, just as posters on a billboard are eventually replaced by new ones.

In contrast, data accessibility focuses on the ability to retrieve historical data, which is crucial for developing dApps and conducting blockchain analysis. This aspect is vital for tasks that require access to past data to ensure accurate representation and execution. Although discussions on data accessibility are less common, it is equally important as data availability. Both play different but complementary roles in the blockchain ecosystem, and a comprehensive data management approach must address both issues concurrently to support robust and efficient blockchain applications.

The Evolution of Blockchain Data Retrieval

Since its inception, blockchain has completely transformed infrastructure and driven the creation of dApps in various fields such as gaming, finance, and social networking. However, building these dApps requires access to a large amount of blockchain data, which is both difficult and expensive.

For dApp developers, one option is to host and run their own archive RPC nodes. These nodes store all historical blockchain data from the beginning, allowing full access to the data. However, maintaining archive nodes is costly, and their querying capabilities are limited, making it impossible to query data in the formats developers require. While running cheaper nodes is an option, these nodes have limited data retrieval capabilities, which may impact the operation of the dApp.

Another approach is to use commercial RPC node providers. These providers are responsible for the costs and management of the nodes, providing data through RPC endpoints. Public RPC endpoints, while free, have rate limits that may negatively impact the user experience of dApps. Private RPC endpoints offer better performance by reducing congestion, but even simple data retrieval requires a significant amount of back-and-forth communication. This makes them resource-intensive and inefficient for complex data queries. Additionally, private RPC endpoints are often difficult to scale and lack compatibility across different networks.

Better Option: Blockchain Indexer

Blockchain indexers play a key role in organizing on-chain data and sending it to databases for easier querying, and are often referred to as the "search engine of the blockchain". They work by indexing blockchain data and making it readily available through SQL-like query languages (using APIs such as GraphQL). By providing a unified data query interface, indexers allow developers to quickly and accurately retrieve the information they need using standardized query languages, greatly simplifying this process.

Different types of indexers optimize data retrieval in various ways:

Full Node Indexer: Extracts data directly from full blockchain nodes, ensuring data completeness and accuracy, but requires significant storage and processing power.
Lightweight indexer: relies on full nodes to obtain specific data on demand, reducing storage requirements but potentially increasing query time.
Dedicated Indexer: Optimized for specific types of data or specific blockchains, such as NFT data or DeFi transactions.
Aggregated Indexer: Extracts data from multiple blockchains and sources, including off-chain information, providing a unified query interface, particularly useful for multi-chain dApps.

Ethereum alone requires 3TB of storage space, and as the blockchain continues to grow, the amount of data storage continues to increase. The indexer protocol deploys multiple indexers, which can efficiently index and quickly query large amounts of data, something that RPC cannot achieve.

Indexers also allow for complex queries, easy data filtering, and post-analysis extraction. Some indexers can also aggregate data from multiple sources, avoiding the need to deploy multiple APIs in multi-chain dApps. By being distributed across multiple nodes, indexers provide enhanced security and performance, while RPC providers may experience disruptions and downtime due to their centralized nature.

Overall, compared to RPC node providers, indexers improve the efficiency and reliability of data retrieval while reducing the cost of deploying a single node. This makes blockchain indexing protocols the preferred choice for dApp developers.

Indexer Application Scenarios

Building dApps requires retrieving and reading blockchain data to operate services. This includes various types of dApps, such as DeFi, NFT platforms, games, and even social networks, as these platforms need to read data first to execute other transactions.

DeFi

DeFi protocols require different information to quote specific prices, rates, fees, and so on for users. Automated Market Maker (AMM) needs price and liquidity information from liquidity pools to calculate swap rates, while lending protocols need utilization rates to determine lending rates and the debt ratio for liquidation. It is essential to input information into the dApp before calculating the rates executed by users.

Game

GameFi requires fast indexing and access to data to ensure a smooth gaming experience for users. Only through quick data retrieval and execution can Web3 games compete with Web2 games in terms of performance, thereby attracting more users. These games need data such as land ownership, in-game token balances, and in-game operations. By using indexers, they can better ensure stable data flow and uptime, guaranteeing a perfect gaming experience.

NFT

NFT markets and lending platforms need to index data to access various information, such as NFT metadata, ownership and transfer data, royalty information, etc. Quickly indexing such data can avoid having to browse each NFT individually to find ownership or attribute data.

Whether it's a DeFi AMM that requires price and liquidity information or a SocialFi application that needs to update new user posts, quickly retrieving data is crucial for the normal operation of dApps. With the help of indexers, they can efficiently and accurately retrieve data, providing a smooth user experience.

analysis

The indexer provides a way to extract specific data from the raw blockchain data, including smart contract events in each block. This creates opportunities for more specific data analysis, thereby providing comprehensive insights.

For example, perpetual trading protocols can identify which tokens have high trading volumes and generate significant fees, thereby deciding whether to list them as perpetual contracts on the platform. DEX developers can create dashboards for their products to gain insights into which liquidity pools offer the highest returns or the strongest liquidity. They can also create public dashboards that allow developers to freely and flexibly query any type of data and display it on charts.

With multiple blockchain indexers available, identifying the differences between indexing protocols is crucial to ensure that developers choose the indexer that best fits their needs.

Overview of Blockchain Indexers

The Graph

The Graph is one of the earliest indexing protocols launched on Ethereum, allowing easy access to previously hard-to-reach transaction data. It uses subgraph definitions and filters to collect subsets of data from the blockchain, such as all transactions related to a particular DEX USDC/ETH pool.

Using index proof, indexers stake the native token GRT for indexing and query services, and delegators can choose to stake their tokens here. Curators can access high-quality subgraphs to assist indexers in determining which subgraphs to curate data for in order to earn the best query fees. In the process of transitioning to greater decentralization, The Graph will eventually stop its hosted service and require subgraphs to upgrade to its network, while providing upgrade indexers.

Its infrastructure allows the average cost per million queries to reach $40, which is much lower than the cost of self-hosted nodes. By using file data sources, it also supports parallel indexing of both on-chain and off-chain data for efficient data retrieval.

The indexer rewards of The Graph have steadily increased over the past few quarters. This is partly due to the increase in query volume, but also attributed to the rise in token prices, as they plan to integrate AI-assisted queries in the future.

Subsquid

Subsquid is a peer-to-peer, horizontally scalable decentralized data lake that efficiently aggregates a large amount of on-chain and off-chain data and protects it through zero-knowledge proofs. As a decentralized worker network, each node is responsible for storing data from a specific subset of blocks, accelerating the data retrieval process by quickly identifying the nodes that hold the required data.

Subsquid also supports real-time indexing, allowing indexing to occur before a block is finalized. It supports storing data in formats chosen by developers, making it easier to analyze using tools like BigQuery, Parquet, or CSV. Additionally, subgraphs can be deployed on the Subsquid network without having to migrate to the Squid SDK, enabling no-code deployment.

Despite still being in the testnet phase, Subsquid has achieved impressive statistics, boasting over 80,000 testnet users, deploying over 60,000 Squid indexers, and having more than 20,000 verified developers on the network. Recently, Subsquid launched the mainnet for its data lake.

In addition to indexing, the Subsquid Network data lake can also replace RPC in use cases such as analytics, ZK/TEE co-processors, AI agents, and Oracles.

SubQuery

SubQuery is a decentralized middleware infrastructure network that provides RPC and indexing data services. It originally supported the Polkadot and Substrate networks and has now expanded to include over 200 chains. Its working mechanism is similar to The Graph, which uses indexing proofs; indexers index data and provide query requests, while delegators stake their shares to the indexers. However, it introduces consumers to submit purchase orders to ensure the income of the indexers is secured, rather than relying on the managers.

It will introduce SubQuery data nodes that support sharding to prevent continuous synchronization of new data between each node, thereby optimizing query efficiency and moving towards greater decentralization. Users can choose to pay a computational fee of about 1 SQT token for every 1000 requests, or set custom fees for indexers through the protocol.

Although SubQuery launched its token earlier this year, the issuance rewards for nodes and delegators have also increased in USD value month-on-month, which represents a growing number of query services offered on its platform. Since the TGE, the total amount of staked SQT has increased from 6 million to 125 million, highlighting the growth of network participation.

Covalent

Covalent is a decentralized indexing network that creates copies of blockchain data through batch exports by network nodes known as Block Sample Producers (BSP), and publishes proofs on the Covalent L1 blockchain. This data is then refined by Block Result Producer (BRP) nodes according to established rules to filter out the data that meets the requirements.

Through a unified API, developers can easily extract relevant blockchain data in a consistent request and response format without the need to write custom complex queries to access the data. CQT tokens, settled on a certain platform, can be used as a means of payment to extract these pre-configured datasets from network operators.

The rewards from Covalent seem to show an overall upward trend from the first quarter of 2023 to the first quarter of 2024, partly due to the increase in the price of the Covalent token CQT.

Considerations for Choosing an Indexer

Customizability of Data

Some indexers (such as Covalent) are general-purpose indexers that provide standard pre-configured datasets via API. While they may be fast, they do not offer the flexibility that developers need for custom datasets. By using an indexer framework, it allows for more customized data processing to meet application-specific requirements.

Security

The indexed data must be secure; otherwise, the dApps built on these indexers can also be vulnerable to attacks. For example, if transactions and wallet balances can be manipulated, the dApp may lose liquidity, thereby affecting its users. Although all indexers adopt some form of security by staking tokens for the indexers.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

11 Likes