Skip to content

Commit f99c271

Browse files
Update quarterly roadmap for Q2 (#2133)
* Update roadmap * IO options comment * Add streams * Update with feedback
1 parent 5ae3434 commit f99c271

File tree

1 file changed

+51
-33
lines changed

1 file changed

+51
-33
lines changed

docs/source/specification/quarterly_roadmap.md

+51-33
Original file line numberDiff line numberDiff line change
@@ -21,52 +21,70 @@
2121

2222
A quarterly roadmap will be published to give the DataFusion community visibility into the priorities of the projects contributors. This roadmap is not binding.
2323

24-
## 2022 Q1
24+
## 2022 Q2
2525

2626
### DataFusion Core
2727

28-
- Publish official Arrow2 branch
29-
- Implementation of memory manager (i.e. to enable spilling to disk as needed)
28+
- IO Improvements
29+
- Reading, registering, and writing more file formats from both DataFrame API and SQL
30+
- Additional options for IO including partitioning and metadata support
31+
- Work Scheduling
32+
- Improve predictability, observability and performance of IO and CPU-bound work
33+
- Develop a more explicit story for managing parallelism during plan execution
34+
- Memory Management
35+
- Add more operators for memory limited execution
36+
- Performance
37+
- Incorporate row-format into operators such as aggregate
38+
- Add row-format benchmarks
39+
- Explore JIT-compiling complex expressions
40+
- Explore LLVM for JIT, with inline Rust functions as the primary goal
41+
- Improve performance of Sort and Merge using Row Format / JIT expressions
42+
- Documentation
43+
- General improvements to DataFusion website
44+
- Publish design documents
45+
- Streaming
46+
- Create `StreamProvider` trait
3047

31-
### Benchmarking
48+
### Ballista
3249

33-
- Inclusion in Db-Benchmark with all quries covered
34-
- All TPCH queries covered
50+
- Make production ready
51+
- Shuffle file cleanup
52+
- Fill functional gaps between DataFusion and Ballista
53+
- Improve task scheduling and data exchange efficiency
54+
- Better error handling
55+
- Task failure
56+
- Executor lost
57+
- Schedule restart
58+
- Improve monitoring and logging
59+
- Auto scaling support
60+
- Support for multi-scheduler deployments. Initially for resiliency and fault tolerance but ultimately to support sharding for scalability and more efficient caching.
61+
- Executor deployment grouping based on resource allocation
3562

36-
### Performance Improvements
63+
### Extensions ([datafusion-contrib](https://github.com/datafusion-contrib]))
3764

38-
- Predicate evaluation
39-
- Improve multi-column comparisons (that can't be vectorized at the moment)
40-
- Null constant support
65+
#### [DataFusion-Python](https://github.com/datafusion-contrib/datafusion-python)
4166

42-
### New Features
67+
- Add missing functionality to DataFrame and SessionContext
68+
- Improve documentation
4369

44-
- Read JSON as table
45-
- Simplify DDL with DataFusion-Cli
46-
- Add Decimal128 data type and the attendant features such as Arrow Kernel and UDF support
47-
- Add new experimental e-graph based optimizer
70+
#### [DataFusion-S3](https://github.com/datafusion-contrib/datafusion-objectstore-s3)
4871

49-
### Ballista
72+
- Create Python bindings to use with datafusion-python
5073

51-
- Begin work on design documents and plan / priorities for development
52-
53-
### Extensions ([datafusion-contrib](https://github.com/datafusion-contrib]))
74+
#### [DataFusion-Tui](https://github.com/datafusion-contrib/datafusion-tui)
5475

55-
- Stable S3 support
56-
- Begin design discussions and prototyping of a stream provider
76+
- Create multiple SQL editors
77+
- Expose more Context and query metadata
78+
- Support new data sources
79+
- BigTable, HDFS, HTTP APIs
5780

58-
## Beyond 2022 Q1
81+
#### [DataFusion-BigTable](https://github.com/datafusion-contrib/datafusion-bigtable)
5982

60-
There is no clear timeline for the below, but community members have expressed interest in working on these topics.
83+
- Python binding to use with datafusion-python
84+
- Timestamp range predicate pushdown
85+
- Multi-threaded partition aware execution
86+
- Production ready Rust SDK
6187

62-
### DataFusion Core
63-
64-
- Custom SQL support
65-
- Split DataFusion into multiple crates
66-
- Push based query execution and code generation
67-
68-
### Ballista
88+
#### [DataFusion-Streams](https://github.com/datafusion-contrib/datafusion-streams)
6989

70-
- Evolve architecture so that it can be deployed in a multi-tenant cloud native environment
71-
- Ensure Ballista is scalable, elastic, and stable for production usage
72-
- Develop distributed ML capabilities
90+
- Create experimental implementation of `StreamProvider` trait

0 commit comments

Comments
 (0)