|
21 | 21 |
|
22 | 22 | A quarterly roadmap will be published to give the DataFusion community visibility into the priorities of the projects contributors. This roadmap is not binding.
|
23 | 23 |
|
24 |
| -## 2022 Q1 |
| 24 | +## 2022 Q2 |
25 | 25 |
|
26 | 26 | ### DataFusion Core
|
27 | 27 |
|
28 |
| -- Publish official Arrow2 branch |
29 |
| -- Implementation of memory manager (i.e. to enable spilling to disk as needed) |
| 28 | +- IO Improvements |
| 29 | + - Reading, registering, and writing more file formats from both DataFrame API and SQL |
| 30 | + - Additional options for IO including partitioning and metadata support |
| 31 | +- Work Scheduling |
| 32 | + - Improve predictability, observability and performance of IO and CPU-bound work |
| 33 | + - Develop a more explicit story for managing parallelism during plan execution |
| 34 | +- Memory Management |
| 35 | + - Add more operators for memory limited execution |
| 36 | +- Performance |
| 37 | + - Incorporate row-format into operators such as aggregate |
| 38 | + - Add row-format benchmarks |
| 39 | + - Explore JIT-compiling complex expressions |
| 40 | + - Explore LLVM for JIT, with inline Rust functions as the primary goal |
| 41 | + - Improve performance of Sort and Merge using Row Format / JIT expressions |
| 42 | +- Documentation |
| 43 | + - General improvements to DataFusion website |
| 44 | + - Publish design documents |
| 45 | +- Streaming |
| 46 | + - Create `StreamProvider` trait |
30 | 47 |
|
31 |
| -### Benchmarking |
| 48 | +### Ballista |
32 | 49 |
|
33 |
| -- Inclusion in Db-Benchmark with all quries covered |
34 |
| -- All TPCH queries covered |
| 50 | +- Make production ready |
| 51 | + - Shuffle file cleanup |
| 52 | + - Fill functional gaps between DataFusion and Ballista |
| 53 | + - Improve task scheduling and data exchange efficiency |
| 54 | + - Better error handling |
| 55 | + - Task failure |
| 56 | + - Executor lost |
| 57 | + - Schedule restart |
| 58 | + - Improve monitoring and logging |
| 59 | + - Auto scaling support |
| 60 | +- Support for multi-scheduler deployments. Initially for resiliency and fault tolerance but ultimately to support sharding for scalability and more efficient caching. |
| 61 | +- Executor deployment grouping based on resource allocation |
35 | 62 |
|
36 |
| -### Performance Improvements |
| 63 | +### Extensions ([datafusion-contrib](https://github.com/datafusion-contrib])) |
37 | 64 |
|
38 |
| -- Predicate evaluation |
39 |
| -- Improve multi-column comparisons (that can't be vectorized at the moment) |
40 |
| -- Null constant support |
| 65 | +#### [DataFusion-Python](https://github.com/datafusion-contrib/datafusion-python) |
41 | 66 |
|
42 |
| -### New Features |
| 67 | +- Add missing functionality to DataFrame and SessionContext |
| 68 | +- Improve documentation |
43 | 69 |
|
44 |
| -- Read JSON as table |
45 |
| -- Simplify DDL with DataFusion-Cli |
46 |
| -- Add Decimal128 data type and the attendant features such as Arrow Kernel and UDF support |
47 |
| -- Add new experimental e-graph based optimizer |
| 70 | +#### [DataFusion-S3](https://github.com/datafusion-contrib/datafusion-objectstore-s3) |
48 | 71 |
|
49 |
| -### Ballista |
| 72 | +- Create Python bindings to use with datafusion-python |
50 | 73 |
|
51 |
| -- Begin work on design documents and plan / priorities for development |
52 |
| - |
53 |
| -### Extensions ([datafusion-contrib](https://github.com/datafusion-contrib])) |
| 74 | +#### [DataFusion-Tui](https://github.com/datafusion-contrib/datafusion-tui) |
54 | 75 |
|
55 |
| -- Stable S3 support |
56 |
| -- Begin design discussions and prototyping of a stream provider |
| 76 | +- Create multiple SQL editors |
| 77 | +- Expose more Context and query metadata |
| 78 | +- Support new data sources |
| 79 | + - BigTable, HDFS, HTTP APIs |
57 | 80 |
|
58 |
| -## Beyond 2022 Q1 |
| 81 | +#### [DataFusion-BigTable](https://github.com/datafusion-contrib/datafusion-bigtable) |
59 | 82 |
|
60 |
| -There is no clear timeline for the below, but community members have expressed interest in working on these topics. |
| 83 | +- Python binding to use with datafusion-python |
| 84 | +- Timestamp range predicate pushdown |
| 85 | +- Multi-threaded partition aware execution |
| 86 | +- Production ready Rust SDK |
61 | 87 |
|
62 |
| -### DataFusion Core |
63 |
| - |
64 |
| -- Custom SQL support |
65 |
| -- Split DataFusion into multiple crates |
66 |
| -- Push based query execution and code generation |
67 |
| - |
68 |
| -### Ballista |
| 88 | +#### [DataFusion-Streams](https://github.com/datafusion-contrib/datafusion-streams) |
69 | 89 |
|
70 |
| -- Evolve architecture so that it can be deployed in a multi-tenant cloud native environment |
71 |
| -- Ensure Ballista is scalable, elastic, and stable for production usage |
72 |
| -- Develop distributed ML capabilities |
| 90 | +- Create experimental implementation of `StreamProvider` trait |
0 commit comments