Update quarterly roadmap for Q2 (#2133)

matthewmturner · web-flow · commit f99c2719a11d · 2022-04-04T14:25:50.000-04:00
* Update roadmap

* IO options comment

* Add streams

* Update with feedback
diff --git a/docs/source/specification/quarterly_roadmap.md b/docs/source/specification/quarterly_roadmap.md
@@ -21,52 +21,70 @@
 
 A quarterly roadmap will be published to give the DataFusion community visibility into the priorities of the projects contributors. This roadmap is not binding.
 
-## 2022 Q1
+## 2022 Q2
 
 ### DataFusion Core
 
-- Publish official Arrow2 branch
-- Implementation of memory manager (i.e. to enable spilling to disk as needed)
+- IO Improvements
+  - Reading, registering, and writing more file formats from both DataFrame API and SQL
+  - Additional options for IO including partitioning and metadata support
+- Work Scheduling
+  - Improve predictability, observability and performance of IO and CPU-bound work
+  - Develop a more explicit story for managing parallelism during plan execution
+- Memory Management
+  - Add more operators for memory limited execution
+- Performance
+  - Incorporate row-format into operators such as aggregate
+  - Add row-format benchmarks
+  - Explore JIT-compiling complex expressions
+  - Explore LLVM for JIT, with inline Rust functions as the primary goal
+  - Improve performance of Sort and Merge using Row Format / JIT expressions
+- Documentation
+  - General improvements to DataFusion website
+  - Publish design documents
+- Streaming
+  - Create `StreamProvider` trait
 
-### Benchmarking
+### Ballista
 
-- Inclusion in Db-Benchmark with all quries covered
-- All TPCH queries covered
+- Make production ready
+  - Shuffle file cleanup
+  - Fill functional gaps between DataFusion and Ballista
+  - Improve task scheduling and data exchange efficiency
+  - Better error handling
+    - Task failure
+    - Executor lost
+    - Schedule restart
+  - Improve monitoring and logging
+  - Auto scaling support
+- Support for multi-scheduler deployments. Initially for resiliency and fault tolerance but ultimately to support sharding for scalability and more efficient caching.
+- Executor deployment grouping based on resource allocation
 
-### Performance Improvements
+### Extensions ([datafusion-contrib](https://github.com/datafusion-contrib]))
 
-- Predicate evaluation
-- Improve multi-column comparisons (that can't be vectorized at the moment)
-- Null constant support
+#### [DataFusion-Python](https://github.com/datafusion-contrib/datafusion-python)
 
-### New Features
+- Add missing functionality to DataFrame and SessionContext
+- Improve documentation
 
-- Read JSON as table
-- Simplify DDL with DataFusion-Cli
-- Add Decimal128 data type and the attendant features such as Arrow Kernel and UDF support
-- Add new experimental e-graph based optimizer
+#### [DataFusion-S3](https://github.com/datafusion-contrib/datafusion-objectstore-s3)
 
-### Ballista
+- Create Python bindings to use with datafusion-python
 
-- Begin work on design documents and plan / priorities for development
-
-### Extensions ([datafusion-contrib](https://github.com/datafusion-contrib]))
+#### [DataFusion-Tui](https://github.com/datafusion-contrib/datafusion-tui)
 
-- Stable S3 support
-- Begin design discussions and prototyping of a stream provider
+- Create multiple SQL editors
+- Expose more Context and query metadata
+- Support new data sources
+  - BigTable, HDFS, HTTP APIs
 
-## Beyond 2022 Q1
+#### [DataFusion-BigTable](https://github.com/datafusion-contrib/datafusion-bigtable)
 
-There is no clear timeline for the below, but community members have expressed interest in working on these topics.
+- Python binding to use with datafusion-python
+- Timestamp range predicate pushdown
+- Multi-threaded partition aware execution
+- Production ready Rust SDK
 
-### DataFusion Core
-
-- Custom SQL support
-- Split DataFusion into multiple crates
-- Push based query execution and code generation
-
-### Ballista
+#### [DataFusion-Streams](https://github.com/datafusion-contrib/datafusion-streams)
 
-- Evolve architecture so that it can be deployed in a multi-tenant cloud native environment
-- Ensure Ballista is scalable, elastic, and stable for production usage
-- Develop distributed ML capabilities
+- Create experimental implementation of `StreamProvider` trait