- Beijing
Lists (1)
Sort Name ascending (A-Z)
Stars
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
The Java gRPC implementation. HTTP/2 based RPC
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance …
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
ByConity is an open source cloud data warehouse
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Official electron build of draw.io
A composable and fully extensible C++ execution engine library for data management systems.
光 HikariCP・A solid, high-performance, JDBC connection pool at last.
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Spark: The Definitive Guide's Code Repository
Notes talking about the design and implementation of Apache Spark
120+ interactive Python coding interview challenges (algorithms and data structures). Includes Anki flashcards.
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
A complete computer science study plan to become a software engineer.
wxl24life / fast-data-dev
Forked from lensesio/fast-data-devKafka Docker for development. Kafka, Zookeeper, Schema Registry, Kafka-Connect, Landoop Tools, 20+ connectors
Apache Spark - A unified analytics engine for large-scale data processing
Scalable datastore for metrics, events, and real-time analytics
A curated list of awesome big data frameworks, ressources and other awesomeness.
Benchmark comparing serialization libraries on the JVM
Web tool for Avro Schema Registry |
Snippets and small examples demonstrating kafka features and configs
Iglu is a machine-readable, open-source schema repository for JSON Schema from the team at Snowplow