Outline of Project Milestones

This project is in a very early stage at the time of writing. Below is a brief outline of necessary steps in this project, not necessarily in a fixed order:

Develop a test framework with basic features necessary to facilitate the desired API calls
- Focus on Twitter API, but with intention of eventually expanding into other domains
- Object-oriented approach
  - "API Client" class instance should have a single function per API endpoint
    - Differences in use (e.g. input parameters provided) should be facilitated through optional inputs
  - "User" class as an abstraction of web accounts,
  - "Data Manager" class for redundancy checking, etc.
    - Owns the "User" instances
    - This is effectively the main server class instance
Perform preliminary review of data collection scope and limitations
- Study API rate limits in conjunction with the specific requests planned
- Estimate size of network to explore
  - Currently, Binance has the most followers of any known account (6.7M)
  - One possible approach is to begin with the followers of a single account
  - Alternatively, could begin with the set of followers of many large accounts (Binance, Coinbase, CoinMarketCap, Coingecko, VitalikButerin, CZ, Saylor, APompliano, Bitcoin, Ethereum)
Decide upon factors that influence the nature of the data collection itself
- These quantities can be modified and expanded over time
- The initial focus should be to include only decisions which are necessary for a minimum viable prototype
- Examples of (potentially) important decisions include:
  - Search order and heuristics
  - Metrics for decisions about whether to include a given entity within the "Crypto User" network
  - Priority direction (e.g. followers vs. following)
  - Minimum amount of data to collect in every case
  - Scenarios in which to collect additional data beyond minimum (if ever)
  - Scenarios in which to collect repeat data over time, instead of ignoring repeated queries (if ever)
Finalize details of client-server approach
- Division of tasks and data
- Representation of data object instances as files, database tables, etc.
- Pseudocode for each individual system and the communications between them
Develop client application for (standard rate limited) single-API-user requests, managed through a queue
- Queue will be self-managed to begin with, but eventually all queue additions will be assigned by the server
Develop a separate local service (eventually server application) for
- Processing of single-client-provided data (results of API requests) into single persistent "database" across disparate client sessions
- Decisions about future searches and expansion of network, resulting from client-provided data and based on previously-defined heuristics
- Assignment of additional requests into (single) client queue
Expand local service to function on an external server, separate from the client
Expand server application to manage multiple disparate clients running simultaneously
Develop additional software related to "post-collection" distribution of data
- It's possible that an additional "data server" may be developed, instead of using the "control server" for this purpose
  - Clients would receive their instructions from the "control server", but would send their resulting data to the "data server"
- The fundamental question is that of granting each "client user" access to the pool of data
- Data distribution is likely to contribute heavily to the operating costs of this project
  - Some sort of peer-to-peer approach, possibly even a private torrent network, might be the most realistic solution to this
Revisit metrics and heuristics from step 1, going beyond "minimum viable prototype" into more advanced coordination of multiple clients
Decide upon factors relating not to the data itself, but to the operation of this network as an organization
- Error prevention
  - Through parity checking of purposefully-redundant queries
    - This may require significant thought to balance time-efficiency with security
    - We can probably get away with assumptions of altruism, but this becomes harder as the organization scales
  - Through other means?
- Incentivization of user contributions
  - Client runtime
  - Software development (not a requirement for organization membership)
- Ensuring access to data by contributor users (those who reach some minimum client runtime threshold)
Additional routes of potential optimization not mentioned above:
- Integration of automatic LZMA(2?) compression, wherever important but especially w/r/t the "data distribution" step
- Expansion of data collection timeline: not only single snapshots, but change-data over time
  - As the number of simultaneously-live clients grows larger, certain snapshot-based data collection tasks become trivial
  - Should consider the priority of change-data in various contexts, in order to balance available resources
Integrate blockchain data (Ethereum and possibly others) directly into search decisions made by the server
- Example: ENS Domains text records (com.twitter, com.github, etc.)
Expand client and server software to include additional APIs
- Examples: Google Search, GitHub, Reddit, StackExchange

The above outline is an extreme simplification, but does highlight a variety of important milestones and a general direction for development.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Outline of Project Milestones

Outline of Project Milestones

Clone this wiki locally