-
Notifications
You must be signed in to change notification settings - Fork 0
Outline of Project Milestones
mathMakesArt edited this page Dec 23, 2021
·
1 revision
This project is in a very early stage at the time of writing. Below is a brief outline of necessary steps in this project, not necessarily in a fixed order:
- Develop a test framework with basic features necessary to facilitate the desired API calls
- Focus on Twitter API, but with intention of eventually expanding into other domains
- Object-oriented approach
- "API Client" class instance should have a single function per API endpoint
- Differences in use (e.g. input parameters provided) should be facilitated through optional inputs
- "User" class as an abstraction of web accounts,
- "Data Manager" class for redundancy checking, etc.
- Owns the "User" instances
- This is effectively the main server class instance
- "API Client" class instance should have a single function per API endpoint
- Perform preliminary review of data collection scope and limitations
- Study API rate limits in conjunction with the specific requests planned
- Estimate size of network to explore
- Currently, Binance has the most followers of any known account (6.7M)
- One possible approach is to begin with the followers of a single account
- Alternatively, could begin with the set of followers of many large accounts (Binance, Coinbase, CoinMarketCap, Coingecko, VitalikButerin, CZ, Saylor, APompliano, Bitcoin, Ethereum)
- Decide upon factors that influence the nature of the data collection itself
- These quantities can be modified and expanded over time
- The initial focus should be to include only decisions which are necessary for a minimum viable prototype
- Examples of (potentially) important decisions include:
- Search order and heuristics
- Metrics for decisions about whether to include a given entity within the "Crypto User" network
- Priority direction (e.g. followers vs. following)
- Minimum amount of data to collect in every case
- Scenarios in which to collect additional data beyond minimum (if ever)
- Scenarios in which to collect repeat data over time, instead of ignoring repeated queries (if ever)
- Finalize details of client-server approach
- Division of tasks and data
- Representation of data object instances as files, database tables, etc.
- Pseudocode for each individual system and the communications between them
- Develop client application for (standard rate limited) single-API-user requests, managed through a queue
- Queue will be self-managed to begin with, but eventually all queue additions will be assigned by the server
- Develop a separate local service (eventually server application) for
- Processing of single-client-provided data (results of API requests) into single persistent "database" across disparate client sessions
- Decisions about future searches and expansion of network, resulting from client-provided data and based on previously-defined heuristics
- Assignment of additional requests into (single) client queue
- Expand local service to function on an external server, separate from the client
- Expand server application to manage multiple disparate clients running simultaneously
- Develop additional software related to "post-collection" distribution of data
- It's possible that an additional "data server" may be developed, instead of using the "control server" for this purpose
- Clients would receive their instructions from the "control server", but would send their resulting data to the "data server"
- The fundamental question is that of granting each "client user" access to the pool of data
- Data distribution is likely to contribute heavily to the operating costs of this project
- Some sort of peer-to-peer approach, possibly even a private torrent network, might be the most realistic solution to this
- It's possible that an additional "data server" may be developed, instead of using the "control server" for this purpose
- Revisit metrics and heuristics from step 1, going beyond "minimum viable prototype" into more advanced coordination of multiple clients
- Decide upon factors relating not to the data itself, but to the operation of this network as an organization
- Error prevention
- Through parity checking of purposefully-redundant queries
- This may require significant thought to balance time-efficiency with security
- We can probably get away with assumptions of altruism, but this becomes harder as the organization scales
- Through other means?
- Through parity checking of purposefully-redundant queries
- Incentivization of user contributions
- Client runtime
- Software development (not a requirement for organization membership)
- Ensuring access to data by contributor users (those who reach some minimum client runtime threshold)
- Error prevention
- Additional routes of potential optimization not mentioned above:
- Integration of automatic LZMA(2?) compression, wherever important but especially w/r/t the "data distribution" step
- Expansion of data collection timeline: not only single snapshots, but change-data over time
- As the number of simultaneously-live clients grows larger, certain snapshot-based data collection tasks become trivial
- Should consider the priority of change-data in various contexts, in order to balance available resources
- Integrate blockchain data (Ethereum and possibly others) directly into search decisions made by the server
- Example: ENS Domains text records (
com.twitter
,com.github
, etc.)
- Example: ENS Domains text records (
- Expand client and server software to include additional APIs
- Examples: Google Search, GitHub, Reddit, StackExchange
The above outline is an extreme simplification, but does highlight a variety of important milestones and a general direction for development.