From bb73d798e9e7409f6d623873d2e70cfa1ef0ef0f Mon Sep 17 00:00:00 2001 From: Orestis Ousoultzoglou Date: Tue, 21 Jan 2025 12:42:08 +0200 Subject: [PATCH] Remove Parquet GSoC project --- jsoc/gsoc/tables.md | 29 ----------------------------- jsoc/projects.md | 1 - 2 files changed, 30 deletions(-) delete mode 100644 jsoc/gsoc/tables.md diff --git a/jsoc/gsoc/tables.md b/jsoc/gsoc/tables.md deleted file mode 100644 index f810b2ae0f..0000000000 --- a/jsoc/gsoc/tables.md +++ /dev/null @@ -1,29 +0,0 @@ - -# Tabular Data – Summer of Code - -## Parquet.jl enhancements - -**Difficulty**: Medium - -**Duration**: 175 hours - -[Apache Parquet](https://parquet.apache.org/) is a binary data format for tabular data. It has features for compression and memory-mapping of datasets on disk. A decent implementation of Parquet in Julia is likely to be highly performant. It will be useful as a standard format for distributing tabular data in a binary format. There exists a Parquet.jl package that has a Parquet reader and a writer. It currently conforms to the Julia Tabular file IO interface at a very basic level. It needs more work to add support for critical elements that would make Parquet.jl usable for fast large scale parallel data processing. Each of these goals can be targeted as a single, short duration (175 hrs) project. -@@tight-list -* Lazy loading and support for out-of-core processing, with Arrow.jl and Tables.jl integration. Improved usability and performance of Parquet reader and writer for large files. -* Reading from and writing data on to cloud data stores, including support for partitioned data. -* Support for missing data types and encodings making the Julia implementation fully featured. -@@ - -**Resources:** -@@tight-list -* The [Parquet](https://parquet.apache.org/documentation/latest/) file format (also are many articles and talks on the Parquet storage format on the internet) -* [A tour of the data ecosystem in Julia](https://quinnj.home.blog/2019/07/21/a-tour-of-the-data-ecosystem-in-julia/) -* [Tables.jl](https://github.com/JuliaData/Tables.jl) -* [Arrow.jl](https://github.com/JuliaData/Arrow.jl) -@@ - -**Recommended skills:** Good knowledge of Julia language, Julia data stack and writing performant Julia code. - -**Expected Results:** Depends on the specific projects we would agree on. - -**Mentors:** [Tanmay Mohapatra](https://github.com/tanmaykm) diff --git a/jsoc/projects.md b/jsoc/projects.md index 4ff296f798..b1e79a2cd4 100644 --- a/jsoc/projects.md +++ b/jsoc/projects.md @@ -30,7 +30,6 @@ We have our project ideas organized below roughly by domain but you can also see * [QuantumOptics](/jsoc/gsoc/quantumoptics) - Quantum dynamics and master equations * [Signal processing](/jsoc/gsoc/kalmanbucy/) - Continuous time Signal Processing * [Symbolic computation](/jsoc/gsoc/symbolics/) - User friendly symbolic programming -* [Tabular Data](/jsoc/gsoc/tables/) - Working with data * [Taija](/jsoc/gsoc/taija/) - Trustworthy Artificial Intelligence in Julia * [Turing](/jsoc/gsoc/turing/) - for probabilistic modelling and probabilistic programming * [Topology optimisation](/jsoc/gsoc/topopt/) - improving topology optimisation tools in Julia.