diff --git a/python/taichi/CHANGELOG.md b/python/taichi/CHANGELOG.md
deleted file mode 100644
index 310839579f761..0000000000000
--- a/python/taichi/CHANGELOG.md
+++ /dev/null
@@ -1,1711 +0,0 @@
-Highlights:
-   - **AMDGPU backend**
-      - Enable shared array on amdgpu backend (#7403) (by **Zeyu Li**)
-      - Add print kernel amdgcn (#7357) (by **Zeyu Li**)
-      - Add amdgpu backend profiler (#7330) (by **Zeyu Li**)
-   - **Aot module**
-      - Let AOT kernel inherit CallableBase and use LaunchContextBuilder (by **lin-hitonami**)
-      - Deprecate element shape and field dim for AOT symbolic args (#7100) (by **Haidong Lan**)
-   - **Bug fixes**
-      - Fix Erroneous handling of ndarray in real function in CFG (#8245) (by **Lin Jiang**)
-      - Fix issue with passing python-scope Matrix as ti.func argument (#8197) (by **Zhanlue Yang**)
-      - Fix incorrect CFG Graph structure due to missing Block wiith OffloadedStmts on LLVM backend (#8113) (by **Zhanlue Yang**)
-      - Fix type inference error with LowerMatrixPtr pass (#8105) (by **Zhanlue Yang**)
-      - Set initial value for Cuda device allocation (#8063) (by **Zhanlue Yang**)
-      - Fix the insertion position of the access chain (#7957) (by **Lin Jiang**)
-      - Fix wrong datatype size when writing to ndarray from Python scope (by **Ailing Zhang**)
-      - Fix copy_from() of StructField (#7294) (by **Yi Xu**)
-      - Fix caching same loop invariant global vars inside nested fors (#7285) (by **Lin Jiang**)
-      - Fix num_splits in parallel_struct_for (#7121) (by **Yi Xu**)
-      - Fix ret_type and cast_type of UnaryOpStmt in Scalarize (#7082) (by **Yi Xu**)
-      - Fix getting 64-bit data from ndarray in Python scope (#6836) (by **Yi Xu**)
-      - Avoid overwriting global tmp with dynamic_index=True (#6820) (by **Yi Xu**)
-      - Add argument 'module' to 'warn_explicit' to show the deprecated warning (#6467) (by **Lin Jiang**)
-      - Fix cache_loop_invariant_global_vars pass (#6462) (by **Lin Jiang**)
-      - Fix memory leak in SPIRV module (#6449) (by **yekuang**)
-      - Make dimension check for GlobalPtrStmt aware of whether it is a cell access (#6275) (by **Yi Xu**)
-      - Allow numpy int as snode dimension (#6211) (by **Yi Xu**)
-      - Fix augmented assign for sar (#6153) (by **Yi Xu**)
-   - **Build system**
-      - Deprecate export_core (#7028) (by **Zhanlue Yang**)
-   - **Command line interface**
-      - Add "ti cache clean" command to clean the offline cache files manually (#6937) (by **PGZXB**)
-   - **CUDA backend**
-      - Warn driver version if it doesn't support memory pool. (#7912) (by **Haidong Lan**)
-      - Better handling shared array shape check (#7818) (by **Haidong Lan**)
-      - Support large shared memory for CUDA backend (#7452) (by **Haidong Lan**)
-      - Add maximum stack limit to CompileConfig (#6455) (by **Lin Jiang**)
-   - **Documentation**
-      - Update documentation (#8089) (by **Zhao Liang**)
-      - Update docstring for inverse func (#8170) (by **Zhao Liang**)
-      - Update type.md, add descriptions of the vector (#8048) (by **Chenzhan Shang**)
-      - Fix a bug in faq.md (#7992) (by **Zhao Liang**)
-      - Fix problems in type_system.md (#7949) (by **秋云未云**)
-      - Add doc about struct arguments (#7959) (by **Lin Jiang**)
-      - Fix docstring of mix function (#7922) (by **Zhao Liang**)
-      - Update faq and ggui, and add them to CI (#7861) (by **Zhao Liang**)
-      - Add kernel sync doc (#7831) (by **Zhao Liang**)
-      - Update doc for dynamic snode (#7804) (by **Zhao Liang**)
-      - Update field.md (#7819) (by **zhoooou**)
-      - Update readme (#7808) (by **yanqingzhang**)
-      - Update write_test.md (#7745) (by **Qian Bao**)
-      - Update performance.md (#7720) (by **Zhao Liang**)
-      - Update readme (#7673) (by **Zhao Liang**)
-      - Update tutorial.md (#7512) (by **Chenzhan Shang**)
-      - Update gui_system.md (#7628) (by **Qian Bao**)
-      - Remove deprecated api docstrings (#7596) (by **pengyu**)
-      - Fix the cexp docstring (#7588) (by **Zhao Liang**)
-      - Add doc about returning struct (#7556) (by **Lin Jiang**)
-      - Update GGUI docs with correct API (#7525) (by **pengyu**)
-      - Fix typos and improve example code in data_oriented_class.md (#7520) (by **pengyu**)
-      - Update gui_system.md, remove unnecessary example (#7487) (by **NextoneX**)
-      - Fix typo in API doc (#7511) (by **pengyu**)
-      - Update math_module (#7405) (by **Zhao Liang**)
-      - Update hello_world.md (#7400) (by **Zhao Liang**)
-      - Update debugging.md (#7401) (by **Zhao Liang**)
-      - Update hello_world.md (#7380) (by **Zhao Liang**)
-      - Update type.md (#7376) (by **Zhao Liang**)
-      - Update kernel_function.md (#7375) (by **Zhao Liang**)
-      - Update hello_world.md (#7369) (by **Zhao Liang**)
-      - Update hello_world.md (#7368) (by **Zhao Liang**)
-      - Update data_oriented_class.md (#6790) (by **Zhao Liang**)
-      - Update hello_world.md (#7367) (by **Zhao Liang**)
-      - Update kernel_function.md (#7364) (by **Zhao Liang**)
-      - Update hello_world.md (#7354) (by **Zhao Liang**)
-      - Update llvm_sparse_runtime.md (#7323) (by **Gabriel Vainer**)
-      - Update profiler.md (#7358) (by **Zhao Liang**)
-      - Update kernel_function.md (#7356) (by **Zhao Liang**)
-      - Update tut.md (#7352) (by **Gabriel Vainer**)
-      - Update type.md (#7350) (by **Zhao Liang**)
-      - Update hello_world.md (#7337) (by **Zhao Liang**)
-      - Update append docstring (#7265) (by **Zhao Liang**)
-      - Update ndarray.md (#7236) (by **Gabriel Vainer**)
-      - Update llvm_sparse_runtime.md (#7215) (by **Zhao Liang**)
-      - Remove doc tutorial (#7198) (by **Olinaaaloompa**)
-      - Rename tutorial doc (#7186) (by **Zhao Liang**)
-      - Update tutorial.md (#7176) (by **Zhao Liang**)
-      - Update math_module.md (#7175) (by **Zhao Liang**)
-      - Update debugging.md (#7173) (by **Zhao Liang**)
-      - Fix C++ tutorial does not display on doc site (#7174) (by **Zhao Liang**)
-      - Update doc regarding dynamic index (#7148) (by **Yi Xu**)
-      - Move glossary to top level (#7118) (by **Zhao Liang**)
-      - Update type.md (#7038) (by **Zhao Liang**)
-      - Fix docstring (#7065) (by **Zhao Liang**)
-      - Remove packed mode in doc (#7030) (by **Zhao Liang**)
-      - Minor doc update (#6952) (by **Zhao Liang**)
-      - Glossary (#6101) (by **Olinaaaloompa**)
-      - Update dac (#6875) (by **Gabriel Vainer**)
-      - Update faq.md (#6921) (by **Zhao Liang**)
-      - Update dataclass.md (#6876) (by **Gabriel Vainer**)
-      - Update the documentation about Dynamic SNode (#6752) (by **Lin Jiang**)
-      - Stop mentioning packed mode (#6755) (by **Yi Xu**)
-      - Update global_settings.md (#6668) (by **Zhao Liang**)
-      - Update external.md (#6424) (by **Zhao Liang**)
-      - Update differences_between_taichi_and_python_programs.md (#6454) (by **Zhao Liang**)
-      - Update global_settings.md (#6370) (by **Zhao Liang**)
-      - Update global data access rule checker in doc (#6347) (by **Mingrui Zhang**)
-      - Add instructions about running clang-tidy checks locally (by **Ailing Zhang**)
-      - Renamed syntax.md to kernel_function.md, plus miscellaneous edits (#6277) (by **Vissidarte-Herman**)
-      - Rename index.md to hello_world.md (#6244) (by **Vissidarte-Herman**)
-      - Update syntax.md (#6236) (by **Zhao Liang**)
-      - Update math_module.md (#6235) (by **Zhao Liang**)
-      - Update debugging.md (#6238) (by **Zhao Liang**)
-      - Update global settings (#6201) (by **Olinaaaloompa**)
-      - Update hello world (#6191) (by **Olinaaaloompa**)
-      - Update math module (#6203) (by **Olinaaaloompa**)
-      - Update profiler (#6214) (by **Olinaaaloompa**)
-      - Update debugging.md (#6212) (by **Zhao Liang**)
-      - Update debugging.md (#6200) (by **Zhao Liang**)
-      - Fixed broken links (#6193) (by **Olinaaaloompa**)
-      - Update field.md (#6182) (by **Zhao Liang**)
-      - Update data_oriented_class.md (#6181) (by **Zhao Liang**)
-      - Update kernels and functions (#6176) (by **Zhao Liang**)
-      - Update type.md (#6180) (by **Zhao Liang**)
-      - Update getting started (#6175) (by **Zhao Liang**)
-      - Refactor debugging (#6102) (by **Olinaaaloompa**)
-      - Refactor global settings (#6071) (by **Zhao Liang**)
-      - Refactor external arrays (#6065) (by **Zhao Liang**)
-      - Refactor simt (#6151) (by **Zhao Liang**)
-      - Refactor Profiler (#6142) (by **Olinaaaloompa**)
-      - Add doc for math module (#6145) (by **Zhao Liang**)
-      - Update gui_system (#6152) (by **Zhao Liang**)
-      - Move developer utilities to contribution (#6109) (by **Olinaaaloompa**)
-      - Added Accelerate PyTorch (#6106) (by **Vissidarte-Herman**)
-      - Refactor ODOP (#6013) (by **Zhao Liang**)
-   - **Error messages**
-      - Warn before calling the external function (#8177) (by **Lin Jiang**)
-      - Add option to print full traceback in Python (#8160) (by **Lin Jiang**)
-      - Let to_primitive_type throw an error if the type is a pointer (by **lin-hitonami**)
-      - Update deprecation warning of the graph arguments (#7965) (by **Lin Jiang**)
-      - Allow IfExp on matrices when the condition is scalar (#7241) (by **Lin Jiang**)
-      - Remove deprecations in ti.ui in 1.6.0 (#7229) (by **Lin Jiang**)
-      - Remove deprecated ti.linalg.sparse_matrix_builder in 1.6.0 (#7228) (by **Lin Jiang**)
-      - Remove deprecations in ASTTransformer in 1.6.0 (#7226) (by **Lin Jiang**)
-      - Remove deprecated a.atomic_op(b) in Taichi v1.6.0 (#7225) (by **Lin Jiang**)
-      - Remove deprecations in taichi/__init__.py in v1.6.0 (#7222) (by **Lin Jiang**)
-      - Raise error when using deprecated ifexp on matrices (#7224) (by **Lin Jiang**)
-      - Better error message when creating sparse snodes on backends that do not support sparse (#7191) (by **Lin Jiang**)
-      - Raise errors when using metal sparse (#7113) (by **Lin Jiang**)
-      - Do not show warning when the offline cache path does not exist (#7005) (by **PGZXB**)
-      - Shorten traceback for _BoundedDifferentiableMethod (#6475) (by **Lin Jiang**)
-      - Return NotImplemented for operations between field and Expr/Matrix/Struct (#6474) (by **Lin Jiang**)
-      - Return TaichiTypeError in ASTTransformer when a binary op is not supported (#6477) (by **Lin Jiang**)
-      - Deprecate ndrange with number of the loop variables != the dimension of the ndrange (#6422) (by **Lin Jiang**)
-      - Add error message when the number of the loop variables does not match the dimension of the ndrange (#6360) (by **Lin Jiang**)
-   - **Examples**
-      - Add karman vortex street example (#6249) (by **Zhao Liang**)
-   - **GUI**
-      - GGUI use shader "factory" (GGUI rework n/N) (#7271) (by **Bob Cao**)
-      - Support colored texts (#7036) (by **Dunfan Lu**)
-   - **Intermediate representation**
-      - Unified type system for internal operations (#6337) (by **daylily**)
-      - Allow a maximum of 12 SNode indices (#6901) (by **Dunfan Lu**)
-   - **Language and syntax**
-      - Add TensorType support for Constant Folding (#8250) (by **Zhanlue Yang**)
-      - Support TensorType for irpass::alg_simp() (#8225) (by **Zhanlue Yang**)
-      - Support vector/matrix ndarray arguments in real function (by **Lin Jiang**)
-      - Fix error on ndarray type check (by **Lin Jiang**)
-      - Support real function in data-oriented classes (by **lin-hitonami**)
-      - Let kernel support return type annotated with 'typing.Tuple' (by **lin-hitonami**)
-      - Support tuple return value for kernel and real function (by **lin-hitonami**)
-      - Let static assert be in static scope (#8217) (by **Lin Jiang**)
-      - Avoid scalarization for AOS GlobalPtrStmt (#8187) (by **Zhanlue Yang**)
-      - Support matrix return value for real function (by **lin-hitonami**)
-      - Support ndarray argument for real function (by **lin-hitonami**)
-      - Cast the scalar arguments and return values of ti.func if the type hints exist (#8193) (by **Lin Jiang**)
-      - Handle MatrixPtrStmt for uniquely_accessed_pointers() (#8165) (by **Zhanlue Yang**)
-      - Support struct arguments for real function (by **lin-hitonami**)
-      - Merge irpass::half2_vectorize() with irpass::scalarize() (#8102) (by **Zhanlue Yang**)
-      - Migrate irpass::scalarize() after optimize_bit_struct_stores & determine_ad_stack_size (#8097) (by **Zhanlue Yang**)
-      - Migrate irpass::scalarize() after irpass::demote_operations() (#8096) (by **Zhanlue Yang**)
-      - Migrate irpass::scalarize() after irpass::lower_access() (#8091) (by **Zhanlue Yang**)
-      - Migrate irpass::scalarize() after irpass::make_block_local() (#8090) (by **Zhanlue Yang**)
-      - Support TensorType for Dead-Store-Elimination (#8065) (by **Zhanlue Yang**)
-      - Optimize alias checking conditions for store-to-load forwarding (#8079) (by **Zhanlue Yang**)
-      - Support TensorType for Load-Store-Forwarding (#8058) (by **Zhanlue Yang**)
-      - Fix TensorTyped error with irpass::make_thread_local() (#8051) (by **Zhanlue Yang**)
-      - Fix numerical issue with auto_diff() (#8025) (by **Zhanlue Yang**)
-      - Migrate irpass::scalarize() after irpass::make_mesh_block_local() (#8030) (by **Zhanlue Yang**)
-      - Migrate irpass::scalarize() after irpass::make_thread_local() (#8028) (by **Zhanlue Yang**)
-      - Support allocate with cuda memory pool and reduce preallocation size accordingly (#7929) (by **Zhanlue Yang**)
-      - Migrate irpass::scalarize() after irpass::demote_no_access_mesh_fors() (#7956) (by **Zhanlue Yang**)
-      - Fix error with irpass::check_out_of_bound() for TensorTyped ExternalPtrStmt (#7997) (by **Zhanlue Yang**)
-      - Migrate irpass::scalarize() after irpass::demote_atomics() (#7943) (by **Zhanlue Yang**)
-      - Separate out preallocation logics for runtime objects (#7938) (by **Zhanlue Yang**)
-      - Remove deprecated funcs in __init__.py (#7941) (by **Lin Jiang**)
-      - Remove deprecated sparse_matrix_builder function (#7942) (by **Lin Jiang**)
-      - Remove deprecated compile option ndarray_use_cached_allocator (#7937) (by **Zhanlue Yang**)
-      - Migrate irpass::scalarize() after irpass::detect_read_only() (#7939) (by **Zhanlue Yang**)
-      - Remove deprecated funcs in ti.ui (#7940) (by **Lin Jiang**)
-      - Remove the support for 'is' (#7930) (by **Lin Jiang**)
-      - Migrate irpass::scalarize() after irpass::offload() (#7919) (by **Zhanlue Yang**)
-      - Raise error when the dimension of the ndrange does not equal to the number of the loop variable (#7933) (by **Lin Jiang**)
-      - Remove a.atomic(b) (#7925) (by **Lin Jiang**)
-      - Cancel deprecating native min/max (#7928) (by **Lin Jiang**)
-      - Fix the api doc search problem (#7918) (by **Zhao Liang**)
-      - Move irpass::scalarize() after irpass::auto_diff() (#7902) (by **Zhanlue Yang**)
-      - Fix Ndarray fill with Matrix/Vector typed values (#7901) (by **Zhanlue Yang**)
-      - Add cast to field.fill() interface (#7899) (by **Zhanlue Yang**)
-      - Let nested data classes have methods (#7909) (by **Lin Jiang**)
-      - Let kernel argument support matrix nested in a struct (by **lin-hitonami**)
-      - Support the functions of dataclass as kernel argument and return value (#7865) (by **Lin Jiang**)
-      - Fix a bug on PosixPath (#7860) (by **Zhao Liang**)
-      - Postpone MatrixType scalarization to irpass::differentiation_validation_check() (#7839) (by **Zhanlue Yang**)
-      - Postpone MatrixType scalarization to irpass::gather_meshfor_relation_types() (#7838) (by **Zhanlue Yang**)
-      - Seprate out the scalarization for MatrixOfMatrixPtrStmt and MatrixOfGlobalPtrStmt (#7803) (by **Zhanlue Yang**)
-      - Fix pylance warning (#7805) (by **Zhao Liang**)
-      - Support taking structs as kernel arguments (by **lin-hitonami**)
-      - Fix math module circular import bugs (#7762) (by **Zhao Liang**)
-      - Support formatted printing in str.format() and f-strings (#7686) (by **魔法少女赵志辉**)
-      - Replace internal representation of Python-scope ti.Matrix with numpy arrays (#7559) (by **Yi Xu**)
-      - Stop letting ti.Struct inherit from TaichiOperations (#7474) (by **Yi Xu**)
-      - Support writing sparse matrix as matrix market file (#7529) (by **pengyu**)
-      - Keep ti.pyfunc (#7530) (by **Lin Jiang**)
-      - Type check assignments between tensors (#7480) (by **Yi Xu**)
-      - Fix pylance warnings raised by ti.static (#7437) (by **Zhao Liang**)
-      - Deprecate arithmetic operations and fill() on ti.Struct (#7456) (by **Yi Xu**)
-      - Fix pylance warnnings by ti.random (#7439) (by **Zhao Liang**)
-      - Fix pylance types warning (#7417) (by **Zhao Liang**)
-      - Add better error message for dynamic snode (#7238) (by **Zhao Liang**)
-      - Simplify the swizzle generator (#7216) (by **Zhao Liang**)
-      - Remove the deprecated dynamic_index switch (#7195) (by **Yi Xu**)
-      - Remove deprecated packed switch (#7104) (by **Yi Xu**)
-      - Raise errors when using the packed switch (#7125) (by **Yi Xu**)
-      - Fix cannot use taichi in REPL (#7114) (by **Zhao Liang**)
-      - Remove deprecated ti.Matrix.rotation2d() (#7098) (by **Yi Xu**)
-      - Remove filename kwarg in aot Module save() (#7085) (by **Ailing**)
-      - Remove sourceinspect deprecation warning message (#7081) (by **Zhao Liang**)
-      - Make slicing a single row/column of a matrix return a vector (#7068) (by **Yi Xu**)
-      - Deprecate the dynamic_index switch (#7071) (by **Yi Xu**)
-      - Add irpass::eliminate_immutable_local_vars() test cases for TensorType (#7043) (by **Zhanlue Yang**)
-      - Fix gui docstring (#7003) (by **Zhao Liang**)
-      - Support dynamic indexing in spirv (#6990) (by **Yi Xu**)
-      - Support dynamic indexing in metal (#6985) (by **Yi Xu**)
-      - Support  LU sparse solver on CUDA backend (#6967) (by **pengyu**)
-      - Fix struct type problem (#6949) (by **Zhao Liang**)
-      - Add warning message when converting dynamic snode to numpy (#6853) (by **Zhao Liang**)
-      - Deprecate sourceinspect dependency (#6894) (by **Zhao Liang**)
-      - Warn users if ndarray size is out of int32 boundary (#6846) (by **Yi Xu**)
-      - Remove the real_matrix switch (#6885) (by **Yi Xu**)
-      - Enable real_matrix and real_matrix_scalarize by default (#6801) (by **Zhanlue Yang**)
-      - Raise an error for the semantic change of transpose() (#6813) (by **Yi Xu**)
-      - Add bool type in python as an alias to i32 (#6742) (by **daylily**)
-      - Add deprecation warning for the removal of the packed switch (#6753) (by **Yi Xu**)
-      - Enable packed mode by default (#6721) (by **Yi Xu**)
-      - Fix warning messages (#6716) (by **Zhao Liang**)
-      - Limit non-first division of an axis on a SNodeTree path to a power of two (#6690) (by **Yi Xu**)
-      - Deprecate field_dim in ndarray annotation (#6687) (by **Haidong Lan**)
-      - Deprecate element_dim and element_shape in ndarray annotations. (#6665) (by **Haidong Lan**)
-      - Clean element shape in tests (#6643) (by **Haidong Lan**)
-      - Remove element shape and element dim in unit tests (#6620) (by **Haidong Lan**)
-      - MatrixType refactor: Support inverse() (#6542) (by **Yi Xu**)
-      - MatrixType refactor: Support matrix factories (#6560) (by **Yi Xu**)
-      - MatrixType refactor: Support dot/cross/outer_product (#6545) (by **Yi Xu**)
-      - Add deactivate attribute to dynamic snodes (#6512) (by **Zhao Liang**)
-      - Matrix lib: Stop changing dimension in transpose() (#6528) (by **Yi Xu**)
-      - Add element access for sparse matrix on CUDA (#6250) (by **Jiafeng Liu**)
-      - MatrixType refactor part 2: add more ops (#6425) (by **Mike He**)
-      - MatrixType refactor: Support matrix slice (#6430) (by **Yi Xu**)
-      - Allow augmented assign on matric slice (#6382) (by **Yi Xu**)
-      - Allow filling a field with Expr (#6391) (by **Yi Xu**)
-      - Replace matrix warning param by current logging level (#6377) (by **Zhao Liang**)
-      - Matrix/Vector refactor: Matrix operations part 1 (#6319) (by **Mike He**)
-      - MatrixNdarray refactor part13: Add scalarization for TernaryOpStmt (#6314) (by **Zhanlue Yang**)
-      - MatrixNdarray refactor part12: Add scalarization for AtomicOpStmt (#6312) (by **Zhanlue Yang**)
-      - MatrixNdarray refactor part11: Fuse ExternalPtrStmt and PtrOffsetStmt (#6189) (by **Zhanlue Yang**)
-      - MatrixNdarray refactor part10: Remove redundant MatrixInitStmt generated from scalarization (#6171) (by **Zhanlue Yang**)
-      - MatrixNdarray refactor part9: Add scalarization for AllocaStmt (#6168) (by **Zhanlue Yang**)
-      - Support GPU solve with analyzePattern and factorize (#6158) (by **pengyu**)
-      - MatrixField refactor 9/n: Allow dynamic index of matrix field when real_matrix=True (#6194) (by **Yi Xu**)
-      - MatrixNdarray refactor part8: Add scalarization for BinaryOpStmt with TensorType-operands (#6086) (by **Zhanlue Yang**)
-      - Matrix/Vector refactor: support basic matrix ops (#6077) (by **Mike He**)
-      - Support basic sparse matrix operations on GPU. (#6082) (by **Jiafeng Liu**)
-      - MatrixField refactor 6/n: Add tests for MatrixField scalarization (#6137) (by **Yi Xu**)
-      - MatrixField refactor 5/n: Lower access of matrix field element into CHI IR (#6119) (by **Yi Xu**)
-      - Fix invalid assertion for matrix values (#6125) (by **Zhanlue Yang**)
-      - MatrixNdarray refactor part7: Add scalarization for UnaryOpStmt with TensorType-operand (#6080) (by **Zhanlue Yang**)
-   - **LLVM backend (CPU and CUDA)**
-      - Add runtime overflow detection on LLVM-based backends (#6178) (by **Lin Jiang**)
-      - Add runtime overflow detection on LLVM-based backends (#6166) (by **Lin Jiang**)
-      - Fix codegen for div (unsigned) (#6128) (by **Yi Xu**)
-   - **Metal backend**
-      - Raise deprecate warning and error when using sparse snodes on metal (#6739) (by **Lin Jiang**)
-   - **Miscellaneous**
-      - Make clang-tidy happy on 'explicit' (#7999) (by **秋云未云**)
-      - Strictly check ndim with external array (#7126) (by **Haidong Lan**)
-      - Refactored flattend_values() to avoid potential conflicts in flattened statements (#6749) (by **Zhanlue Yang**)
-      - Improve gif output compression rate (#6289) (by **Zhao Liang**)
-      - Add prefix sum executor to avoid multiple field allocations (#6132) (by **YuZhang**)
-   - **OpenGL backend**
-      - Fix: runtime caught error cannot be displayed in opengl (#7998) (by **秋云未云**)
-   - **IR optimization passes**
-      - Make merging casts int(int(x)) less aggressive (#7944) (by **Ailing**)
-      - Fix redundant clone of stmts across offloaded tasks (#7927) (by **Ailing**)
-   - **Refactor**
-      - Refactor the argument passing logic of rwtexture and remove extra_args (#7914) (by **Lin Jiang**)
-   - **Tests**
-      - Add scipy to test GPU sparse solver (#6162) (by **pengyu**)
-   - **Vulkan backend**
-      - Fix repeated generation of array ranges in spirv codegen. (#7625) (by **Haidong Lan**)
-      - Change the format string of 64bit unsigned integer type from %llu to %lu (#6308) (by **Lin Jiang**)
-      - Add overflow detection on vulkan when debug=True (#6279) (by **Lin Jiang**)
-
-Full changelog:
-   - Add test for clz (by **Jett Chen**)
-   - Add test for clz (by **Jett Chen**)
-   - fix spirv issue (by **Jett Chen**)
-   - Update taichi/codegen/cuda/codegen_cuda.cpp (by **Bob Cao**)
-   - [pre-commit.ci] auto fixes from pre-commit.com hooks (by **pre-commit-ci[bot]**)
-   - add clz instruction (by **Jett Chen**)
-   - [refactor] Add base class GfxProgramImpl (by **listerily**)
-   - [Lang] Add TensorType support for Constant Folding (#8250) (by **Zhanlue Yang**)
-   - [autodiff] Fix loop index not stored (#8200) (by **Mingrui Zhang**)
-   - [Bug] Fix Erroneous handling of ndarray in real function in CFG (#8245) (by **Lin Jiang**)
-   - [bug] Add is_grad to printer of ExternalTensorBasePtrExpression/Stmt (#8244) (by **Lin Jiang**)
-   - [gui] Doc updates for SceneV2 (#8234) (by **Antonio Ferreras**)
-   - [Lang] Support TensorType for irpass::alg_simp() (#8225) (by **Zhanlue Yang**)
-   - [example] Add real func ver of stable_fluid_graph (by **Lin Jiang**)
-   - [Lang] Support vector/matrix ndarray arguments in real function (by **Lin Jiang**)
-   - [Lang] [bug] Fix error on ndarray type check (by **Lin Jiang**)
-   - [gui] Remove cached arrays (#8223) (by **Antonio Ferreras**)
-   - [gui] Tests for scenev2 (#8222) (by **Antonio Ferreras**)
-   - [aot] Fix bug in compute graph jit (#8238) (by **Chenzhan Shang**)
-   - [aot] Support matrix/vector for compute graph in C-API (#8228) (by **Chenzhan Shang**)
-   - [aot] Support matrix/vector for compute graph in Python (#8198) (by **Chenzhan Shang**)
-   - [gui] SceneV2 for a more efficient/cleaner implementation of Scene (#8205) (by **Antonio Ferreras**)
-   - [example] Add real func ver of taichi_ngp (by **lin-hitonami**)
-   - [Lang] Support real function in data-oriented classes (by **lin-hitonami**)
-   - [Lang] Let kernel support return type annotated with 'typing.Tuple' (by **lin-hitonami**)
-   - [example] Add real function ver of poisson_disk_sampling (by **lin-hitonami**)
-   - [Lang] Support tuple return value for kernel and real function (by **lin-hitonami**)
-   - [lang] Remove useless insertion to global_vars (by **lin-hitonami**)
-   - [ir] Add lower_matrix_ptr pass to compile_function and move scalarize to the end (by **lin-hitonami**)
-   - [Lang] [bug] Let static assert be in static scope (#8217) (by **Lin Jiang**)
-   - [build] Fix build of RHI examples (temporarily) (#8213) (by **Bob Cao**)
-   - [ci] Add taichiCourse01 & marching_squares release tests (#8157) (by **Xinhao Yuan**)
-   - [llvm] Allocate the buffers of the real function in the entry block (#8206) (by **Lin Jiang**)
-   - [Lang] Avoid scalarization for AOS GlobalPtrStmt (#8187) (by **Zhanlue Yang**)
-   - [lang] Add matrixfree BICGSTAB solver (#8196) (by **Qian Bao**)
-   - [bug] Fix incorrect error message in checking matrix shape (#8201) (by **秋云未云**)
-   - [Bug] Fix issue with passing python-scope Matrix as ti.func argument (#8197) (by **Zhanlue Yang**)
-   - [example] Add real func version of marching_square (by **lin-hitonami**)
-   - [Lang] Support matrix return value for real function (by **lin-hitonami**)
-   - [Lang] Support ndarray argument for real function (by **lin-hitonami**)
-   - [ir] Let the type of ExternalTensorExpression be an ndarray struct (by **lin-hitonami**)
-   - [lang] Record the element_type of the AnyArray (by **lin-hitonami**)
-   - [ir] Let expand_exprs support expanding nested structs (by **lin-hitonami**)
-   - [opengl] Add import of external OpenGL context. (#8185) (by **Sam**)
-   - [Lang] Cast the scalar arguments and return values of ti.func if the type hints exist (#8193) (by **Lin Jiang**)
-   - [Lang] Handle MatrixPtrStmt for uniquely_accessed_pointers() (#8165) (by **Zhanlue Yang**)
-   - [gui] Refactor scene renderables (#8168) (by **Antonio Ferreras**)
-   - [bug] Fix failed IR verification after ndarray clamp (#8181) (by **Ailing**)
-   - [lang] Include argument name in error message (#8180) (by **Ailing**)
-   - [Doc] Update documentation (#8089) (by **Zhao Liang**)
-   - [Error] [bug] Warn before calling the external function (#8177) (by **Lin Jiang**)
-   - [ci] Ghstack bot treats neutral checks as success and ignore copilot for PRs (#8174) (by **Xinhao Yuan**)
-   - [Lang] Support struct arguments for real function (by **lin-hitonami**)
-   - [opt] Run demote_atomics pass on real function (by **lin-hitonami**)
-   - [refactor] [ir] Change Block::parent_kernel to parent_callable (by **lin-hitonami**)
-   - [Doc] Update docstring for inverse func (#8170) (by **Zhao Liang**)
-   - [doc] Added introduction to argument pack (#8164) (by **秋云未云**)
-   - [opt] Treat FuncCallStmt better in store-to-load forwarding in CFG (#8155) (by **Lin Jiang**)
-   - [test] Added test for argpack (by **listerily**)
-   - [lang] [refactor] Support parameter passing for argpack (by **listerily**)
-   - [bug] Fix sparse matrix memory release error (by **listerily**)
-   - [lang] Added argpack as a new type (by **listerily**)
-   - [lang] Added frontend type check for structs (by **listerily**)
-   - [lang] Added type check for return value (by **listerily**)
-   - [bug] Fix: entries cannot be used as a struct member name (by **listerily**)
-   - [Error] Add option to print full traceback in Python (#8160) (by **Lin Jiang**)
-   - [ci] Fix perf monitoring upload on releases (#8159) (by **Proton**)
-   - [lang] Raise TaichiSyntaxError when there are default values in dataclasses (#8135) (by **秋云未云**)
-   - [ci] Modify the release tests branch (#8153) (by **Xinhao Yuan**)
-   - [aot] Fix unused args in cppgen (#8154) (by **Ailing**)
-   - [lang] Support clamp type hint for ndarrays (#8136) (by **Ailing**)
-   - [ci] Add /rerun-failed (#8152) (by **Xinhao Yuan**)
-   - [misc] Add FrontendAllocaStmt::is_shared into offline cache key (#8145) (by **PGZXB**)
-   - [ci] Add games201 in release tests (#8148) (by **Xinhao Yuan**)
-   - [misc] Remove struct LlvmLaunchArgInfo (#8146) (by **PGZXB**)
-   - [ci] Restart xserver before testing (#8142) (by **Xinhao Yuan**)
-   - [opt] Properly handle FuncCallStmt in CFG and simplify passes (#8139) (by **Lin Jiang**)
-   - [lang] Enforce dtype check for numpy arrays (#8141) (by **Ailing**)
-   - [bug] Fix passing ndarray to a taichi function (#8138) (by **Lin Jiang**)
-   - [metal] Support i64 with MSL2.3.0 (#8140) (by **Ailing**)
-   - [gui] Fix for ImGui widget size on HiDPI (#8129) (by **Antonio Ferreras**)
-   - [bug] Fix struct field error on bool on cuda (#8134) (by **秋云未云**)
-   - [ir] [refactor] Let the type of Alloca be pointer (by **lin-hitonami**)
-   - [ir] Update ASTBuilder for the refactor of Alloca (by **lin-hitonami**)
-   - [ir] [refactor] Update passes for the refactor of Alloca (by **lin-hitonami**)
-   - [ir] Update the codegen for the refactor of Alloca (by **lin-hitonami**)
-   - [Error] Let to_primitive_type throw an error if the type is a pointer (by **lin-hitonami**)
-   - [autodiff] Update autodiff for the refactor of Alloca (by **lin-hitonami**)
-   - [refactor] [ir] Use get_rvalue_type in frontend IR instead of ret_type (by **lin-hitonami**)
-   - [lang] Use get_rvalue_type instead of get_ret_type in python (by **lin-hitonami**)
-   - [ir] Update get_rvalue_dtype and move it to class Expr (by **lin-hitonami**)
-   - [aot] Misc fixes for cppgen (#8106) (by **Ailing**)
-   - [misc] Print the IR after every sub-pass of full-simplify (#8127) (by **Lin Jiang**)
-   - [gui] Per circle/particle radius (#8121) (by **Antonio Ferreras**)
-   - [bug] Add parameter list and rets of Kernel into offline cache key (#8054) (by **PGZXB**)
-   - [Bug] Fix incorrect CFG Graph structure due to missing Block wiith OffloadedStmts on LLVM backend (#8113) (by **Zhanlue Yang**)
-   - [misc] Relax python_requires to play nice with poetry (#8116) (by **Proton**)
-   - [test] Fix previous discovered flaky test_sparse_linear_solver.py. (#8110) (by **Qian Bao**)
-   - [ir] Update ExpressionPrinter (by **Lin Jiang**)
-   - [Bug] Fix type inference error with LowerMatrixPtr pass (#8105) (by **Zhanlue Yang**)
-   - [Lang] Merge irpass::half2_vectorize() with irpass::scalarize() (#8102) (by **Zhanlue Yang**)
-   - [bug] Fix: clang-tidy complaining explicit (#8109) (by **秋云未云**)
-   - [Lang] Migrate irpass::scalarize() after optimize_bit_struct_stores & determine_ad_stack_size (#8097) (by **Zhanlue Yang**)
-   - [bug] Fix MatrixFreeCG so it can handle multiple input sizes. (#8070) (by **Qian Bao**)
-   - [bug] Fix SparseMatrix's dtype; check for dtype in SparseSolver. (#8071) (by **Qian Bao**)
-   - [Lang] Migrate irpass::scalarize() after irpass::demote_operations() (#8096) (by **Zhanlue Yang**)
-   - [bug] Fix compilation warning in constant fold on bool (#8100) (by **秋云未云**)
-   - [Lang] Migrate irpass::scalarize() after irpass::lower_access() (#8091) (by **Zhanlue Yang**)
-   - [Lang] Migrate irpass::scalarize() after irpass::make_block_local() (#8090) (by **Zhanlue Yang**)
-   - [bug] Fix extraction of field with None offset to external array (#8093) (by **魔法少女赵志辉**)
-   - [aot] Make generated tcm deterministic hash-wise (#8088) (by **Ailing**)
-   - [Lang] Support TensorType for Dead-Store-Elimination (#8065) (by **Zhanlue Yang**)
-   - [refactor] Get rid of unnecessary element_dim in ExternalPtrStmt (by **Ailing Zhang**)
-   - [refactor] Get rid of unnecessary element_dim in ExternalTensorExpression (by **Ailing Zhang**)
-   - [refactor] Rename dim to ndim for consistency in ExternalTensorExpression (by **Ailing Zhang**)
-   - [Lang] Optimize alias checking conditions for store-to-load forwarding (#8079) (by **Zhanlue Yang**)
-   - [ci] Fix Linux build image timezone (#8080) (by **Proton**)
-   - [bug] Fix misbehaviour and assertion error on ti.math.sign (#8082) (by **秋云未云**)
-   - [ci] Build.py polishing, wave 5 (#8073) (by **Proton**)
-   - [refactor] Simplify ndarray arg declaration (by **Ailing Zhang**)
-   - [refactor] Simplify ndarray impl by separating ndim and element_shape in (by **Ailing Zhang**)
-   - [bug] Fix vector/matrix ndarray zero fill (#8068) (by **Ailing**)
-   - [bug] Exclude quant type when doing store-to-load forwarding and skip bit struct store fusion when unfeasible (#8023) (by **魔法少女赵志辉**)
-   - [build] Include Windows import lib in built wheel (#8067) (by **Proton**)
-   - [bug] Fix: replaced i32 with bool in cook_dtype (by **listerily**)
-   - [Lang] Support TensorType for Load-Store-Forwarding (#8058) (by **Zhanlue Yang**)
-   - [doc] Added boolean to primitive types in docs (#8062) (by **秋云未云**)
-   - [spirv] [ir] [lang] Support struct object as return value in spir-v (#8061) (by **秋云未云**)
-   - [refactor] Get rid of element shape in runtime struct type of ndarray (by **Ailing Zhang**)
-   - [refactor] Simplify ndarray type formation in llvm codegen (by **Ailing Zhang**)
-   - [refactor] Simplify ndarray type formation in scalarize pass (by **Ailing Zhang**)
-   - [Doc] Update type.md, add descriptions of the vector (#8048) (by **Chenzhan Shang**)
-   - [Bug] Set initial value for Cuda device allocation (#8063) (by **Zhanlue Yang**)
-   - [refactor] Return values on gfx passes through LaunchContextBuilder like llvm now (by **listerily**)
-   - [Lang] Fix TensorTyped error with irpass::make_thread_local() (#8051) (by **Zhanlue Yang**)
-   - [Lang] Fix numerical issue with auto_diff() (#8025) (by **Zhanlue Yang**)
-   - Revert "[build] Fix build on arm64 (#7978)" (#8050) (by **Proton**)
-   - [lang] Enable Kernel* access from Block in CHI IR (#8044) (by **Zhanlue Yang**)
-   - [build] Fix build on arm64 (#7978) (by **Oliver Batchelor**)
-   - [aot] [bug] Fix cached kernel name lookup (#8035) (by **Ailing**)
-   - [Lang] Migrate irpass::scalarize() after irpass::make_mesh_block_local() (#8030) (by **Zhanlue Yang**)
-   - [test] Temporarily skip tests/python/test_sparse_linear_solver.py. (#8038) (by **Qian Bao**)
-   - [lang] [test] Fixed logical operation on numeric values and added support on real type (by **listerily**)
-   - [test] Added test for bool operations (by **listerily**)
-   - [lang] Added u1 as boolean type to taichi lang, replacing i32 (by **listerily**)
-   - [ir] Fix: Updated type check for logical not (by **listerily**)
-   - [ir] Constant fold support for u1 (by **listerily**)
-   - [spirv] [ir] Support type u1 as arg, in buffer and as return value (by **listerily**)
-   - [lang] [ir] Add logical and, logical or in ir (by **listerily**)
-   - [lang] Support ti.types.vector and matrix as type annotation (by **Ailing Zhang**)
-   - [Lang] Migrate irpass::scalarize() after irpass::make_thread_local() (#8028) (by **Zhanlue Yang**)
-   - [cuda] Add fast intrinsics support to sin / cos / log (#7991) (by **Bob Cao**)
-   - [ci] Fix nightly builds (#8022) (by **Proton**)
-   - [lang] Update ticache to cache (#8020) (by **Nanase**)
-   - [aot] Remove redundant copy of libtaichi_c_api.so to build (#8014) (by **Ailing**)
-   - [cuda] Fix LLVM preallocate memory size calculation (#8000) (by **Bob Cao**)
-   - [refactor] Simplify ndarray dtype check_match logic (by **Ailing Zhang**)
-   - [refactor] Remove redundant TensorType class in python scope (by **Ailing Zhang**)
-   - [Lang] Support allocate with cuda memory pool and reduce preallocation size accordingly (#7929) (by **Zhanlue Yang**)
-   - [llvm] Simplified and add support for type u1 in logical not operation (by **listerily**)
-   - [ir] Update codegen for `if` `while` `assert` to support type u1. (by **listerily**)
-   - [lang] Added ti.u1 definition (by **listerily**)
-   - [aot] Export aot kernels with decorator properly (#8016) (by **PENGUINLIONG**)
-   - [Lang] Migrate irpass::scalarize() after irpass::demote_no_access_mesh_fors() (#7956) (by **Zhanlue Yang**)
-   - [refactor] Let the type of reference arguments be a pointer (by **lin-hitonami**)
-   - [Misc] Make clang-tidy happy on 'explicit' (#7999) (by **秋云未云**)
-   - [Lang] Fix error with irpass::check_out_of_bound() for TensorTyped ExternalPtrStmt (#7997) (by **Zhanlue Yang**)
-   - [Opengl] Fix: runtime caught error cannot be displayed in opengl (#7998) (by **秋云未云**)
-   - [ci] Add dedicated build pipeline (by **Proton**)
-   - [build] Guard Windows LTO with flags (by **Proton**)
-   - [build] Use Ninja and MSVC to build on Windows (by **Proton**)
-   - [build] Not generating PDB files by default (for compliation caching) (by **Proton**)
-   - [ci] Tag wheel with TI_WITH_xxx tags (by **Proton**)
-   - [ci] build.py: Add nice when compiling (by **Proton**)
-   - [ci] Do not try to terminate sccache server after compilation (by **Proton**)
-   - [misc] Do not print CHANGELOG when specified --save (make_changelog.py) (by **Proton**)
-   - [Lang] Migrate irpass::scalarize() after irpass::demote_atomics() (#7943) (by **Zhanlue Yang**)
-   - [Doc] Fix a bug in faq.md (#7992) (by **Zhao Liang**)
-   - [Doc] Fix problems in type_system.md (#7949) (by **秋云未云**)
-   - [refactor] Simplify arg_features logic for ndarray (#7974) (by **Ailing**)
-   - [bug] Fix extraction of field with offset to external array (#7945) (by **魔法少女赵志辉**)
-   - [autodiff] Support passing vector/matrix args in autodiff kernel (#7973) (by **Ailing**)
-   - [lang] Fix failure with taichi_benchmark (#7975) (by **Zhanlue Yang**)
-   - [doc] Update contents in linear_solver.md (#7967) (by **Qian Bao**)
-   - [build] Guard  CLANG_EXECUTABLE verifications with TI_WITH_LLVM (#7969) (by **Zhanlue Yang**)
-   - [refactor] [llvm] Allocate the runtime context for real functions on the stack (#7971) (by **Lin Jiang**)
-   - [autodiff] Support passing requires_grad=True tensor to an arg with needs_grad=False (#7970) (by **Ailing**)
-   - [CUDA] Warn driver version if it doesn't support memory pool. (#7912) (by **Haidong Lan**)
-   - [autodiff] Support grad tensor in primal kernel (#7962) (by **Ailing**)
-   - [lang] Improve misc error report related to passing ndarray to a kernel (#7966) (by **Ailing**)
-   - [Doc] Add doc about struct arguments (#7959) (by **Lin Jiang**)
-   - [Error] Update deprecation warning of the graph arguments (#7965) (by **Lin Jiang**)
-   - [autodiff] Support ExternalTensorShapeAlongAxis in autodiff (#7963) (by **Ailing**)
-   - [windows] Workaround C++ mangling special chars (#7964) (by **Ailing**)
-   - [doc] Add sparse grid example. (#7858) (by **chunleili**)
-   - [Lang] Separate out preallocation logics for runtime objects (#7938) (by **Zhanlue Yang**)
-   - [Lang] Remove deprecated funcs in __init__.py (#7941) (by **Lin Jiang**)
-   - [build] Remove redundant C-API shared object in wheel (#7950) (by **Proton**)
-   - [Bug] [spirv] Fix the insertion position of the access chain (#7957) (by **Lin Jiang**)
-   - [doc] Split linear solver article from sparse_matrix.md. (#7921) (by **Qian Bao**)
-   - [Lang] Remove deprecated sparse_matrix_builder function (#7942) (by **Lin Jiang**)
-   - [lang] Refactor cg solvers (#7911) (by **Qian Bao**)
-   - [Lang] Remove deprecated compile option ndarray_use_cached_allocator (#7937) (by **Zhanlue Yang**)
-   - [Lang] Migrate irpass::scalarize() after irpass::detect_read_only() (#7939) (by **Zhanlue Yang**)
-   - [bug] Fix vector/matrix dtype created in the python scope (#7948) (by **Ailing**)
-   - [Lang] Remove deprecated funcs in ti.ui (#7940) (by **Lin Jiang**)
-   - [Lang] Remove the support for 'is' (#7930) (by **Lin Jiang**)
-   - [Opt] Make merging casts int(int(x)) less aggressive (#7944) (by **Ailing**)
-   - [Lang] Migrate irpass::scalarize() after irpass::offload() (#7919) (by **Zhanlue Yang**)
-   - [Lang] Raise error when the dimension of the ndrange does not equal to the number of the loop variable (#7933) (by **Lin Jiang**)
-   - [Lang] Remove a.atomic(b) (#7925) (by **Lin Jiang**)
-   - [Opt] Fix redundant clone of stmts across offloaded tasks (#7927) (by **Ailing**)
-   - [Lang] Cancel deprecating native min/max (#7928) (by **Lin Jiang**)
-   - [Lang] Fix the api doc search problem (#7918) (by **Zhao Liang**)
-   - [Refactor] Refactor the argument passing logic of rwtexture and remove extra_args (#7914) (by **Lin Jiang**)
-   - [refactor] Let the matrix argument be compiled as a struct (by **lin-hitonami**)
-   - [ci] Build.py: Source generated env in new spawned shell (by **Proton**)
-   - [misc] Fix changelog commit extract code (by **Proton**)
-   - [Doc] Fix docstring of mix function (#7922) (by **Zhao Liang**)
-   - [ci] More robust build.py bootstrapping (#7920) (by **Proton**)
-   - [example] PR apply for an example of differential algorithm (#7881) (by **Nanase**)
-   - [lang] Postpone scalarization to after flag access (#7890) (by **魔法少女赵志辉**)
-   - [refactor] Remove PyTaichi.compiled_functions (#7867) (by **PGZXB**)
-   - [example] Fix ti example bugs (#7903) (by **Zhao Liang**)
-   - [lang] Postpone scalarization to after out-of-bound checking (#7872) (by **魔法少女赵志辉**)
-   - [Lang] Move irpass::scalarize() after irpass::auto_diff() (#7902) (by **Zhanlue Yang**)
-   - [Lang] Fix Ndarray fill with Matrix/Vector typed values (#7901) (by **Zhanlue Yang**)
-   - [autodiff] Make loss seed only set once in the tape (#7910) (by **Mingrui Zhang**)
-   - [Lang] Add cast to field.fill() interface (#7899) (by **Zhanlue Yang**)
-   - [Lang] [bug] Let nested data classes have methods (#7909) (by **Lin Jiang**)
-   - [cuda] Only set CU_LIMIT_STACK_SIZE when necessary (#7906) (by **Ailing**)
-   - [Lang] Let kernel argument support matrix nested in a struct (by **lin-hitonami**)
-   - [autodiff] Fix missing grad with using torch tensor with tape (#7898) (by **Ailing**)
-   - [lang] Support TensorType for AutoDiff (#7846) (by **Zhanlue Yang**)
-   - [example] Raise warning message for ngp renderer (#7434) (by **Zhao Liang**)
-   - [ci] Build.py: Do not try to bootstrap pip (too many issues) (#7897) (by **Proton**)
-   - [ci] Update AMDGPU LLVM & base image (#7895) (by **Proton**)
-   - [ci] Build.py quirks fix (#7894) (by **Proton**)
-   - [test] Skip torch involved ad_ndarray tests if torch is not installed (#7893) (by **Proton**)
-   - [refactor] Improve AMDGPU kernel launch logic for external arrays and ndarrays (#7883) (by **Zeyu Li**)
-   - [autodiff] Enfore strict check for ndarrays when used in ti.ad.Tape (by **Ailing Zhang**)
-   - [refactor] Get rid of has_grad in LaunchContextBuilder (by **Ailing Zhang**)
-   - [refactor] Rename get_num_elements in StructType to get_flattened_num_elements (by **Ailing Zhang**)
-   - [autodiff] Support autodiff for torch Tensor and taichi ndarray on CPU and CUDA (by **Ailing Zhang**)
-   - [Bug] Fix wrong datatype size when writing to ndarray from Python scope (by **Ailing Zhang**)
-   - [lang] Support 0 dim ndarray read & write in python scope (by **Ailing Zhang**)
-   - [Doc] Update faq and ggui, and add them to CI (#7861) (by **Zhao Liang**)
-   - [refactor] Replace TaichiLLVMContext::jit with LlvmRuntimeExecutor::jit_session_ (#7868) (by **PGZXB**)
-   - [build] Remove unused apt pkg 'libmirclient-dev' to make 'build.py' run properly on ubuntu 22.04 (#7871) (by **Yu Zhang**)
-   - [Lang] Support the functions of dataclass as kernel argument and return value (#7865) (by **Lin Jiang**)
-   - [Lang] Fix a bug on PosixPath (#7860) (by **Zhao Liang**)
-   - [refactor] Improve CUDA kernel launch logic for external arrays and ndarrays (by **Ailing Zhang**)
-   - [spirv] Fix generating array type in SPIR-V (#7863) (by **Lin Jiang**)
-   - [rhi] Remove `fetch_result_uint64` from device api (#7851) (by **Bob Cao**)
-   - [lang] Add grad_ptr into ndarray struct (by **Ailing Zhang**)
-   - [refactor] Get rid of unused is_grad in ArgLoadStmt (by **Ailing Zhang**)
-   - [refactor] Split Program::compile() (#7847) (by **PGZXB**)
-   - [ci] Polishing build.py, wave 4 (#7857) (by **Proton**)
-   - [refactor] Add the shape of the Ndarray to the argument struct (by **lin-hitonami**)
-   - [lang] [ir] Added atomic multiplication support for all backends (#7854) (by **秋云未云**)
-   - [build] Use LLVM without zstd dependency on M1 Macs (#7856) (by **Proton**)
-   - [sparse] Add sparse_grid (#7832) (by **chunleili**)
-   - [bug] Fix rerun tests with offline cache (#7852) (by **PGZXB**)
-   - [refactor] Compile the Ndarray argument to a struct (by **lin-hitonami**)
-   - [spirv] Support struct as kernel argument (by **Lin Jiang**)
-   - [doc] Update dev_install.md to reflect build.py usage (#7848) (by **Proton**)
-   - [Lang] Postpone MatrixType scalarization to irpass::differentiation_validation_check() (#7839) (by **Zhanlue Yang**)
-   - [ci] Polishing build.py, wave 3 (#7845) (by **Proton**)
-   - [bug] Allow to re-write the offline cache files (by **PGZXB**)
-   - [misc] Deserialize the offline cache structs on strict mode (by **PGZXB**)
-   - [Lang] Postpone MatrixType scalarization to irpass::gather_meshfor_relation_types() (#7838) (by **Zhanlue Yang**)
-   - [lang] Refactor allocation logic for SNodeTreeBufferManager (#7795) (by **Zhanlue Yang**)
-   - [bug] Adjust build structure for RHI to resolve undefined symbol problem with window_system (#7827) (by **Zhanlue Yang**)
-   - [lang] Postpone scalarize and lower_matrix_ptr to after full_simplify I (#7798) (by **Zhanlue Yang**)
-   - [Doc] Add kernel sync doc (#7831) (by **Zhao Liang**)
-   - [cc] Remove cc backend (by **lin-hitonami**)
-   - [spirv] Fix the ret type of frexp (by **lin-hitonami**)
-   - [misc] Bump version to v1.7.0 (#7841) (by **Proton**)
-   - [lang] Rewrite scalarization for PrintStmt (#7835) (by **魔法少女赵志辉**)
-   - [lang] Add popcnt to llvm intrinsic support (#7772) (by **Garry Ling**)
-   - [Doc] Update doc for dynamic snode (#7804) (by **Zhao Liang**)
-   - [ci] Fix release build failure (#7834) (by **Proton**)
-   - [ci] More robust build.py bootstrapping (#7833) (by **Proton**)
-   - [Doc] Update field.md (#7819) (by **zhoooou**)
-   - [autodiff] Remove redundant autodiff mode in kernel name (#7829) (by **Ailing**)
-   - [lang] Migrate Caching Allocation logics from CudaDevice/AmdgpuDevice to DeviceMemoryPool (#7793) (by **Zhanlue Yang**)
-   - [misc] Resolve code formatter frictions (#7828) (by **Proton**)
-   - [Lang] Seprate out the scalarization for MatrixOfMatrixPtrStmt and MatrixOfGlobalPtrStmt (#7803) (by **Zhanlue Yang**)
-   - [bug] Fix imgui_context in destroying multiple GGUI windows (#7812) (by **Ailing**)
-   - [misc] Update git-blame-ignore-revs (#7825) (by **Proton**)
-   - [ci] Complete doc test list, remove redundant default prelude (#7823) (by **Proton**)
-   - [misc] Relax Black formatter line length limit to 120 (#7824) (by **Proton**)
-   - [Doc] Update readme (#7808) (by **yanqingzhang**)
-   - [misc]  Switch code formatter from `yapf` to `black` (#7785) (by **Proton**)
-   - [CUDA] Better handling shared array shape check (#7818) (by **Haidong Lan**)
-   - [misc] Improve ::liong::json::deserialize() (by **PGZXB**)
-   - [bug] Fix gen_offline_cache_key (#7810) (by **PGZXB**)
-   - [ci] Fix build.py ensurepip (#7811) (by **Proton**)
-   - [Lang] Fix pylance warning (#7805) (by **Zhao Liang**)
-   - [lang] Support frexp on spirv-based backends (#7770) (by **Ailing**)
-   - [lang] Split MemoryPool into DeviceMemoryPool and HostMemoryPool (#7786) (by **Zhanlue Yang**)
-   - [misc] Optimize import overhead: pytorch and get_clangpp (#7797) (by **Haidong Lan**)
-   - [ci] [doc] Tighten up document testing (#7801) (by **Proton**)
-   - [ci] Polishing build.py, wave 2 (#7800) (by **Proton**)
-   - [aot] Remove unused AotDataConverter (#7799) (by **Lin Jiang**)
-   - [perf] Fix Taichi CPU backend compile parameter to pair performance with Numba. (#7731) (by **zhengxianli**)
-   - [ci] Polishing build.py (#7794) (by **Proton**)
-   - [bug] Returning nan for ti.sym_eig on identity matrix (#7443) (by **Yimin Tang**)
-   - [Lang] Support taking structs as kernel arguments (by **lin-hitonami**)
-   - [ir] Add 'create_load' to ArgLoadStmt (by **lin-hitonami**)
-   - [ir] Let the src of GetElementStmt be a pointer (by **lin-hitonami**)
-   - [lang] Clean up runtime allocation functions (#7773) (by **Zhanlue Yang**)
-   - [lang] Migrate CUDA preallocation logic to CudaMemoryPool (#7746) (by **Zhanlue Yang**)
-   - [gfx] Fix runtime buffer/image copy barrier semantics (#7781) (by **Bob Cao**)
-   - [misc] Remove unnecessary TaskCodeGenLLVM::task_counter (#7777) (by **PGZXB**)
-   - [ci] Temporarily force Windows release builds to run on sm70 nodes (#7767) (by **Proton**)
-   - [refactor] Remove Kernel::lowered_ (#7765) (by **PGZXB**)
-   - [gui] Fluid visualization utilities (#7682) (by **Qian Bao**)
-   - [Lang] Fix math module circular import bugs (#7762) (by **Zhao Liang**)
-   - [misc] Make pre-commit happy (#7768) (by **Proton**)
-   - [ci] Build iOS AOT static library (by **Proton**)
-   - [misc] Wrap path with std::filesystem::path (#7754) (by **Bob Cao**)
-   - [lang] Support vector and matrix dtypes in ti.field (#7761) (by **Ailing**)
-   - [ir] Remove unnecessary field_dims_ in ArgLoadStmt (#7755) (by **Ailing**)
-   - [refactor] Remove Kernel::task_counter_ (#7751) (by **PGZXB**)
-   - [ci] Build.py: Introduce TAICHI_CMAKE_ARGS manager for better log readability (by **Proton**)
-   - [ci] Reorganize build.py code (by **Proton**)
-   - [refactor] Let KernelCompilationManager manage kernel compilation in gfx::AotModuleBuilderImpl (#7715) (by **PGZXB**)
-   - [misc] Remove unused FullSimplifyPass::Args::program (#7750) (by **PGZXB**)
-   - [refactor] Re-impl LlvmAotModule using LLVM::KernelLauncher (#7744) (by **PGZXB**)
-   - [lang] Implement experimental CG(Conjugate Gradient) solver in Taichi-lang (#7690) (by **Qian Bao**)
-   - [lang] Transform bit_shr to bit_sar for uint (#7757) (by **Ailing**)
-   - [ir] Postpone scalarize and lower_matrix_ptr to after bit loop vectorization (#7726) (by **魔法少女赵志辉**)
-   - [ci] Isolate post sm70 tests (#7740) (by **Proton**)
-   - [cuda] Suppport using SparseMatrix on more CUDA versions (#7724) (by **Yu Zhang**)
-   - [cuda] Update the data layout of CUDA (#7748) (by **Lin Jiang**)
-   - [ci] Ignore dup benchmark data points (#7749) (by **Proton**)
-   - [bug] Fix reduction of atomic max (#7747) (by **Lin Jiang**)
-   - [Doc] Update write_test.md (#7745) (by **Qian Bao**)
-   - [refactor] Remove 'args' from 'RuntimeContext' (by **lin-hitonami**)
-   - [gfx] Let gfx backends use LaunchContextBuilder to build arguments in struct type (by **lin-hitonami**)
-   - [gfx] [refactor] Convert f16 in LaunchContextBuilder (by **lin-hitonami**)
-   - [gfx] Record the struct type of arguments and results in KernelContextAttributes (by **lin-hitonami**)
-   - [gfx] Compile struct type of result and arguments in gfx backends (by **lin-hitonami**)
-   - [refactor] Implement CompiledKernelData::check() (#7743) (by **PGZXB**)
-   - [doc] [test] Update docs for printing with f-strings and formatted strings (#7733) (by **魔法少女赵志辉**)
-   - [lang] Improve error message for mismatched index for ndarrays in python scope (#7737) (by **Ailing**)
-   - [bug] Avoid redundant cache loading (#7741) (by **PGZXB**)
-   - [refactor] Let KernelCompilationManager manage kernel compilation in LlvmAotModuleBuilder (#7714) (by **PGZXB**)
-   - [ci] Skip large shared memory test for Turing GPUs. (#7739) (by **Haidong Lan**)
-   - [cuda] Remove deprecated cusparse functions (#7725) (by **Yu Zhang**)
-   - [misc] Update pull_request_template.md (#7738) (by **Ailing**)
-   - [misc] Remove TI_WARN for cuda in memory_pool.cpp (#7734) (by **Ailing**)
-   - [CUDA] Support large shared memory for CUDA backend (#7452) (by **Haidong Lan**)
-   - [vulkan] Update SPIR-V codegen to emit FP16 consts (#7676) (by **Bob Cao**)
-   - [lang] Support frexp on cuda backend (#7721) (by **Ailing**)
-   - [refactor] Unify implementation of ProgramImpl::compile() (by **PGZXB**)
-   - [refactor] Introduce LLVM::KernelLauncher (by **PGZXB**)
-   - [refactor] Introduce gfx::KernelLauncher (by **PGZXB**)
-   - [test] Enable test offline cache on amdgpu and dx11 (#7703) (by **PGZXB**)
-   - [lang] Refactor ownership and inheritance of allocators (#7685) (by **Zhanlue Yang**)
-   - [ci] Fix git cache quirks (#7722) (by **Proton**)
-   - [lang] Improve error msg in create ndarray (#7709) (by **Garry Ling**)
-   - [Doc] Update performance.md (#7720) (by **Zhao Liang**)
-   - [bug] Switch the gallery image used by README. (#7716) (by **Chengchen(Rex) Wang**)
-   - [lang] Merge AMDGPUCachingAllocator to the generic CachingAllocator (#7717) (by **Zhanlue Yang**)
-   - [bug] Invalid Field cache, RWAccessors cache, and Kernel cache upon SNodeTree destruction (#7704) (by **Zhanlue Yang**)
-   - [ci] [test] Enable cc test on CI (by **lin-hitonami**)
-   - [test] [cc] Skip tests that cc backend doesn't support (by **lin-hitonami**)
-   - [test] Exclude the cc backend from tests that involve dynamic indexing (#7705) (by **魔法少女赵志辉**)
-   - [bug] Fix camera controls (#7681) (by **liblaf**)
-   - [bug] [cc] Fix comparison op in cc backend (by **Lin Jiang**)
-   - [bug] [cc] Set external ptr for cc backend (by **lin-hitonami**)
-   - [lang] Merged VirtualMemoryAllocator into MemoryPool for LLVM-CPU backend (#7671) (by **Zhanlue Yang**)
-   - [misc] Remove useless JITEvaluatorId (#7700) (by **PGZXB**)
-   - [bug] Fixed building with clang on Windows failed (#7699) (by **PGZXB**)
-   - [Lang] Support formatted printing in str.format() and f-strings (#7686) (by **魔法少女赵志辉**)
-   - [ci] Git caching proxy in CI (#7692) (by **Proton**)
-   - [build] Let msvc generate pdb for cpp & c_api tests (by **lin-hitonami**)
-   - [refactor] Stop storing pointers to array devallocs in kernel args (by **lin-hitonami**)
-   - [aot] Implement bin2c in AOT cppgen (#7687) (by **PENGUINLIONG**)
-   - [cpu] Remove atomics demotion for single-thread CPU targets. (#7631) (by **Haidong Lan**)
-   - [aot] Export templated kernels (#7683) (by **PENGUINLIONG**)
-   - [ci] Revive /benchmark (#7680) (by **Proton**)
-   - [Doc] Update readme (#7673) (by **Zhao Liang**)
-   - [misc] Device API public headers and CMake rework part 1 (#7624) (by **Bob Cao**)
-   - [misc] Move optimize cpu module to KernelCodeGen (#7667) (by **PGZXB**)
-   - [lang] [ir] Extract and save the format specifiers in str.format() (#7660) (by **魔法少女赵志辉**)
-   - [example] Add 2D euler fluid simulation example (#7568) (by **Lee-abcde**)
-   - [wasm] Remove WASM backend (by **lin-hitonami**)
-   - [build] Fix ssize_t type undefined errors when building with TI_WITH_LLVM=OFF on windows (#7665) (by **Yu Zhang**)
-   - [misc] Remove unused Kernel::is_evaluator (#7669) (by **PGZXB**)
-   - [misc] Remove unused Program::jit_evaluator_cache and Program::jit_evaluator_cache_mut (#7668) (by **PGZXB**)
-   - [misc] Simplify test_offline_cache.py (#7663) (by **PGZXB**)
-   - [lang] Improve error reporting for FieldsBuilder finalization (#7640) (by **Zhanlue Yang**)
-   - [misc] Rename taichi::lang::llvm to taichi::lang::LLVM (#7659) (by **PGZXB**)
-   - [refactor] Remove MemoryPool daemon in LLVM runtime (#7648) (by **Zhanlue Yang**)
-   - [opt] Cleanup unncessary options in constant fold pass (#7661) (by **Ailing**)
-   - [ci] Use build.py to prepare testing environment on Windows (#7658) (by **Proton**)
-   - [opt] Move binary jit evaluator to host (by **Ailing Zhang**)
-   - [test] Update C++ constant fold tests to test operator one by one (by **Ailing Zhang**)
-   - [aot] Avoid shared library file being packaged into wheel data (#7652) (by **Chenzhan Shang**)
-   - [ci] Fix scipy install (#7649) (by **Proton**)
-   - [misc] Remove an unnecessary parameter of KernelCompilationManager::make_filename (by **PGZXB**)
-   - [refactor] Remove some unnecessary functions of KernelCodeGen (by **PGZXB**)
-   - [refactor] Re-impl JIT and Offline Cache on LLVM backends (by **PGZXB**)
-   - [refactor] Implement llvm::KernelCompiler (by **PGZXB**)
-   - [refactor] Gen code for KernelCodeGen::ir instead of KernelCodeGen::kernel->ir (by **PGZXB**)
-   - [Doc] Update tutorial.md (#7512) (by **Chenzhan Shang**)
-   - [ci] Test manylinux2014 build on PR (#7647) (by **Proton**)
-   - [bug] Fix logical comparison returns -1 (#7641) (by **Ailing**)
-   - [doc] Fix gui_system.md tests (#7646) (by **Proton**)
-   - [Doc] Update gui_system.md (#7628) (by **Qian Bao**)
-   - [aot] Hand-written CMake target script (#7644) (by **PENGUINLIONG**)
-   - [ci] Do not use Android toolchain for perf testing (#7642) (by **Proton**)
-   - [ci] Support Python 3.11 (#7627) (by **Proton**)
-   - [build] Setup Android SDK environment for performance bot (#7635) (by **Zhanlue Yang**)
-   - [ci] Update perf mon image (#7639) (by **Proton**)
-   - [ci] Fix perf mon break (#7638) (by **Proton**)
-   - [doc] Add documentation on using ghstack (#7632) (by **Proton**)
-   - [build] Static linking libstdc++ on Linux (by **Proton**)
-   - [ci] Rewrite Dockerfiles (by **Proton**)
-   - [ci] Resolve "Needed single revision" workaround failure when the repo directory is empty (#7633) (by **Proton**)
-   - [Vulkan] Fix repeated generation of array ranges in spirv codegen. (#7625) (by **Haidong Lan**)
-   - [build] Switch to use docker with Android-SDK for performance bot (#7630) (by **Zhanlue Yang**)
-   - [opengl] glfw finalize crash fix (by **Proton**)
-   - [ci] build.py: Android support, entering shell, export env (by **Proton**)
-   - [ci] Do not run tests with mixed backends (by **Proton**)
-   - [refactor] Use f16 function from external lib (by **lin-hitonami**)
-   - [refactor] Migrate members from RuntimeContext to LaunchContextBuilder (by **lin-hitonami**)
-   - [bug] Fix setting arguments exceeding the max arg num (by **lin-hitonami**)
-   - [cpu] Explicitly make cpu multithreading loop for range-fors. (#7593) (by **Haidong Lan**)
-   - [aot] Fixed generator for compute graph (#7626) (by **PENGUINLIONG**)
-   - [ir] Postpone scalarize and lower_matrix_ptr to after typecheck (#7589) (by **魔法少女赵志辉**)
-   - [aot] Header generator completed (#7609) (by **PENGUINLIONG**)
-   - [amdgpu] Initialize AMDGPUContext with defaults (by **Proton**)
-   - [build] Remove libSPIRV-Tools-shared.(so|dll) in wheel (by **Proton**)
-   - [lang] Removed cpu_device(), cuda_device(), and amdgpu_device() from LlvmRuntimeExecutor (#7544) (by **Zhanlue Yang**)
-   - [refactor] Remove the get/set functions in RuntimeContext (by **lin-hitonami**)
-   - [aot] Pass LaunchContextBuilder to CompiledGraph::init_runtime_context (by **lin-hitonami**)
-   - [gfx] Let GfxRuntime use LaunchContextBuilder (by **lin-hitonami**)
-   - Let LaunchContextBuilder be the argument of the kernel launch function (by **lin-hitonami**)
-   - [llvm] [refactor] Set the llvm runtime when executing (by **lin-hitonami**)
-   - [refactor] Migrate {set, get}_{arg, ret} functions from RuntimeContext (by **lin-hitonami**)
-   - [bug] Fix compilation error (#7606) (by **PGZXB**)
-   - [aot] Hide map memory failure (#7604) (by **PENGUINLIONG**)
-   - [refactor] Fix KernelCodeGen::kernel from Kernel * to const Kernel * (by **PGZXB**)
-   - [refactor] Remove legacy implementation of llvm offline cache (by **PGZXB**)
-   - [refactor] Impl llvm::CompiledKernelData (by **PGZXB**)
-   - [bug] Type check for logical not op with real type inputs (#7600) (by **Ailing**)
-   - [bug] Improve ndarray creation to fix segmentation fault (#7577) (by **pengyu**)
-   - [lang] Add assembly printer for CPU backend (#7590) (by **Zhanlue Yang**)
-   - [misc] Update docker filer (#7598) (by **Zeyu Li**)
-   - [aot] Fix absolute path in generated TaichiTargets.cmake (#7597) (by **Chenzhan Shang**)
-   - [Doc] Remove deprecated api docstrings (#7596) (by **pengyu**)
-   - [llvm] Compile the kernel arguments to a StructType (by **Lin Jiang**)
-   - [lang] Fix issue with llvm opaque pointer (#7557) (by **Zhanlue Yang**)
-   - [opt] Constant folding for unary ops on host (#7573) (by **Ailing**)
-   - [bug] Type check for bit_not op with real type inputs (#7592) (by **Ailing**)
-   - [Doc] Fix the cexp docstring (#7588) (by **Zhao Liang**)
-   - [Lang] Replace internal representation of Python-scope ti.Matrix with numpy arrays (#7559) (by **Yi Xu**)
-   - [bug] Avoid cuda compilation via clang and ship pre-compiled .bc file instead (#7570) (by **Zhanlue Yang**)
-   - [aot] Taichi kernel AOT command (#7565) (by **PENGUINLIONG**)
-   - [bug] Fix struct members registered to StructField class (#7574) (by **Ailing**)
-   - [aot] Mobile platform AOT build scripts (#7567) (by **PENGUINLIONG**)
-   - [misc] Revert "Security upgrade ipython from 7.34.0 to 8.10.0 (#7341)" (#7571) (by **Proton**)
-   - [test] Add cpp tests for constant folding pass (#7566) (by **Ailing**)
-   - [misc] Security upgrade ipython from 7.34.0 to 8.10.0 (#7341) (by **Chengchen(Rex) Wang**)
-   - [lang] Refactor CudaCachingAllocator into a more generic caching allocator (#7531) (by **Zhanlue Yang**)
-   - [aot] Load GfxRuntime140 module from TCM (#7539) (by **PENGUINLIONG**)
-   - [lang] Fixed useless serial shader to blit ExternalTensorShapeAlongAxisStmt on Metal (#7562) (by **PENGUINLIONG**)
-   - [aot] Enable Vulkan 8bit storage (#7564) (by **PENGUINLIONG**)
-   - [bug] Fix crashing on printing FrontendFuncCallStmt with no return value (by **lin-hitonami**)
-   - [refactor] Remove LaunchContextBuilder::set_arg_raw (by **lin-hitonami**)
-   - [llvm] Generalize TaskCodeGenLLVM::create_return to set_struct_to_buffer (by **lin-hitonami**)
-   - [bug] Fix Cuda memory leak during TiRuntime destruction (#7345) (by **Zhanlue Yang**)
-   - [ir] Let void struct type represent void type (by **lin-hitonami**)
-   - [aot] Let C-API use LaunchContextBuilder to manage RuntimeContext (by **lin-hitonami**)
-   - [ir] Let the reference type declare a pointer argument (by **lin-hitonami**)
-   - [Doc] Add doc about returning struct (#7556) (by **Lin Jiang**)
-   - [bug] Fix returning struct containing vec3 (#7552) (by **Lin Jiang**)
-   - [lang] [ir] Extract and save the format specifiers in the f-string (#7514) (by **魔法少女赵志辉**)
-   - [Lang] Stop letting ti.Struct inherit from TaichiOperations (#7474) (by **Yi Xu**)
-   - [aot] Recover AOT CI branch names (#7543) (by **PENGUINLIONG**)
-   - [aot] Put TiRT in Python wheel and CMake script to find it in wheel (#7537) (by **PENGUINLIONG**)
-   - [refactor] Remove the difficult-to-implement CompiledKernelData::size() (#7540) (by **PGZXB**)
-   - [bug] Implement the missing clone function for FrontendFuncCallStmt (#7538) (by **PGZXB**)
-   - [misc] Bump version to v1.6.0 (#7536) (by **Haidong Lan**)
-   - [doc] Handle 2 digit minor versions correctly (#7535) (by **Ritoban Roy-Chowdhury**)
-   - [aot] GfxRuntime140 convention docs (#7527) (by **PENGUINLIONG**)
-   - [rhi] Refactor allocate_memory API to use RhiResult (#7463) (by **Bob Cao**)
-   - [metal] Choose the proper msl version according to the device capability (#7506) (by **Yu Zhang**)
-   - [Lang] Support writing sparse matrix as matrix market file (#7529) (by **pengyu**)
-   - [Lang] Keep ti.pyfunc (#7530) (by **Lin Jiang**)
-   - [bug] Fix symbol conflicts with taichi_cpp_tests (#7528) (by **Zhanlue Yang**)
-   - [bug] Fix numerical issue with TensorType'd arithmetics (#7526) (by **Zhanlue Yang**)
-   - [aot] Enable Metal AOT test (#7461) (by **PENGUINLIONG**)
-   - [Doc] Update GGUI docs with correct API (#7525) (by **pengyu**)
-   - [misc] Implement KernelCompialtionManager::clean_offline_cache (#7515) (by **PGZXB**)
-   - [ir] Except shared array from demote atomics pass. (#7513) (by **Haidong Lan**)
-   - [bug] Fix error with windows-clang compilation for cuda_runtime.cu (#7519) (by **Zhanlue Yang**)
-   - [misc] Deprecate field dim and update deprecation warnings (#7491) (by **Haidong Lan**)
-   - [build] Fix build failure without nvcc (#7521) (by **Ailing**)
-   - [Doc] Fix typos and improve example code in data_oriented_class.md (#7520) (by **pengyu**)
-   - [aot] Kernel argument count limit (#7518) (by **PENGUINLIONG**)
-   - [Doc] Update gui_system.md, remove unnecessary example (#7487) (by **NextoneX**)
-   - [AOT] [llvm] Let AOT kernel inherit CallableBase and use LaunchContextBuilder (by **lin-hitonami**)
-   - [llvm] Let the offline cache record the type info of arguments and return values (by **lin-hitonami**)
-   - [ir] Separate LaunchContextBuilder from Kernel (by **lin-hitonami**)
-   - [Doc] Fix typo in API doc (#7511) (by **pengyu**)
-   - [aot] Build Runtime C-API by default (#7508) (by **PENGUINLIONG**)
-   - [bug] Fix run_tests.py --with-offline-cache (#7507) (by **PGZXB**)
-   - [vulkan] Support printing constant strings containing % (#7499) (by **魔法少女赵志辉**)
-   - [ci] Fix nightly version number, 2nd try (#7501) (by **Proton**)
-   - [aot] Fixed memory leak in metal backend (#7500) (by **PENGUINLIONG**)
-   - [ci] Fix nightly version number issue (#7498) (by **Proton**)
-   - [example] Remove cv2, cairo dependency (#7496) (by **Zhao Liang**)
-   - [type] Let Type * be serializable (by **lin-hitonami**)
-   - [ci] Second attempt at permission check for ghstack landing (#7490) (by **Proton**)
-   - [docs] Reword words of warning about building from source (#7488) (by **Anselm Schüler**)
-   - [lang] Fixed double release of Metal command buffer (#7484) (by **PENGUINLIONG**)
-   - [ci] Switch Android bots lock redis to bot-master (#7482) (by **Proton**)
-   - [ci] Status check of ghstack CI bot (#7479) (by **Proton**)
-   - [Lang] Type check assignments between tensors (#7480) (by **Yi Xu**)
-   - [doc] Fix typo in ndarray.md (#7476) (by **Chenzhan Shang**)
-   - [opt] Enable half2 optimization for atomic_add operations on CUDA backend (#7465) (by **Zhanlue Yang**)
-   - [Lang] Fix pylance warnings raised by ti.static (#7437) (by **Zhao Liang**)
-   - Let the LaunchContextBuilder manage the result buffer (by **lin-hitonami**)
-   - [ci] Fix nightly build failure, and minor improvements (#7475) (by **Proton**)
-   - [ci] Fix duplicated names in aot tests (#7471) (by **Ailing**)
-   - [lang] Improve float16 support from Taichi type system (#7402) (by **Zhanlue Yang**)
-   - [Lang] Deprecate arithmetic operations and fill() on ti.Struct (#7456) (by **Yi Xu**)
-   - [misc] Add out of bound check for ndarray (#7458) (by **Ailing**)
-   - [aot] Remove graph kernel interfaces (#7466) (by **PENGUINLIONG**)
-   - [llvm] Let the RuntimeContext use the host result buffer (by **lin-hitonami**)
-   - [gui] Fix 3d line drawing & add test (#7454) (by **Bob Cao**)
-   - [lang] Fixed texture assertions (#7450) (by **PENGUINLIONG**)
-   - [aot] Fixed header generator (#7455) (by **PENGUINLIONG**)
-   - [aot] AOT module convention GfxRuntime140 (#7440) (by **PENGUINLIONG**)
-   - [misc] Add an explicit error in cc backend codegen for dynamic indexing (#7449) (by **Ailing**)
-   - [ci] Lower C++ tests concurrency (#7451) (by **Proton**)
-   - [aot] Properly handle texture attributes (#7433) (by **PENGUINLIONG**)
-   - [Lang] Fix pylance warnnings by ti.random (#7439) (by **Zhao Liang**)
-   - [ir] Get the StructType of the kernel parameters (by **lin-hitonami**)
-   - [ci] Report failure (not throwing exception) when C++ tests fail (#7435) (by **Proton**)
-   - [llvm] Allocate the result buffer from preallocated memory (by **lin-hitonami**)
-   - [vulkan] Fix GGUI and vulkan swapchain on AMD drivers (#7382) (by **Bob Cao**)
-   - [autodiff] Handle return statement (#7389) (by **Mingrui Zhang**)
-   - [misc] Remove unnecessary functions of gfx::AotModuleBuilderImpl (#7425) (by **PGZXB**)
-   - [bug] Fix offline_cache::clean_offline_cache_files (ti cache clean) (#7426) (by **PGZXB**)
-   - [test] Refactor C++ tests runner (#7421) (by **Proton**)
-   - [ci] Adjust perfmon GPU freq (#7429) (by **Proton**)
-   - [misc] Remove AotModuleParams::enable_lazy_loading (#7424) (by **PGZXB**)
-   - [aot] Use graphs.json instead of TCB (#7392) (by **PENGUINLIONG**)
-   - [refactor] Introduce KernelCompilationManager (#7409) (by **PGZXB**)
-   - [IR] Unified type system for internal operations (#6337) (by **daylily**)
-   - [lang] Add is_lvalue() to Expr to check writeback_binary operand (#7414) (by **魔法少女赵志辉**)
-   - [bug] Fix get_error_string ret type typo (#7418) (by **Zeyu Li**)
-   - [aot] Reorganize graph argument creation process (#7412) (by **PENGUINLIONG**)
-   - [Amdgpu] Enable shared array on amdgpu backend (#7403) (by **Zeyu Li**)
-   - [Lang] Fix pylance types warning (#7417) (by **Zhao Liang**)
-   - [aot] Simplify device capability assignment (#7407) (by **PENGUINLIONG**)
-   - [Doc] Update math_module (#7405) (by **Zhao Liang**)
-   - [ci] Lock GPU frequency in perf benchmarking (#7413) (by **Proton**)
-   - [ci] Add 'Needed single revision' workaround to all tasks (#7408) (by **Proton**)
-   - [Doc] Update hello_world.md (#7400) (by **Zhao Liang**)
-   - [refactor] Introduce KernelCompiler and implement spirv::KernelCompiler (#7371) (by **PGZXB**)
-   - [Amdgpu] Add print kernel amdgcn (#7357) (by **Zeyu Li**)
-   - [Doc] Update debugging.md (#7401) (by **Zhao Liang**)
-   - [refactor] Disable ASTSerializer::allow_undefined_visitor (#7391) (by **PGZXB**)
-   - [amdgpu] Enable llvm FpOpFusion option on AMDGPU backend (#7398) (by **Zeyu Li**)
-   - [aot] Add test for shared array (#7387) (by **Ailing**)
-   - [vulkan] Change command list submit error message & misc device API cleanups (#7395) (by **Bob Cao**)
-   - [bug] Fix arch_uses_spirv (#7399) (by **PGZXB**)
-   - [gui] Fix ggui & vulkan swapchain sizes on HiDPI displays (#7394) (by **Bob Cao**)
-   - [Doc] Update hello_world.md (#7380) (by **Zhao Liang**)
-   - [aot] Remove support for depth24stencil8 format on Metal (#7377) (by **PENGUINLIONG**)
-   - [bug] Add DeviceCapabilityConfig to offline cache key (#7384) (by **PGZXB**)
-   - [Doc] Update type.md (#7376) (by **Zhao Liang**)
-   - [refactor] Remove dependencies on Callable::program in cpp tests (#7373) (by **PGZXB**)
-   - [lang] Experimental support of conjugate gradient solver (#7035) (by **pengyu**)
-   - [aot] Metal interop APIs (#7366) (by **PENGUINLIONG**)
-   - [Doc] Update kernel_function.md (#7375) (by **Zhao Liang**)
-   - [gui] Add `fps_limit` for GGUI (#7374) (by **Bob Cao**)
-   - [Doc] Update hello_world.md (#7369) (by **Zhao Liang**)
-   - [aot] Fix blockers in static library build with XCode (#7365) (by **PENGUINLIONG**)
-   - [vulkan] Remove GLFW from Vulkan rhi dependency (#7351) (by **Bob Cao**)
-   - [misc] Remove useless semicolon in llvm_program.h (#7372) (by **PGZXB**)
-   - [Doc] Update hello_world.md (#7368) (by **Zhao Liang**)
-   - [Amdgpu] Add amdgpu backend profiler (#7330) (by **Zeyu Li**)
-   - [lang] Stop broadcasting scalar cond in select statements (#7344) (by **魔法少女赵志辉**)
-   - [bug] Fix validation erros due to inactive VK_KHR_16bit_storage (#7360) (by **Zhanlue Yang**)
-   - [aot] Support texture in Metal (#7363) (by **PENGUINLIONG**)
-   - [Doc] Update data_oriented_class.md (#6790) (by **Zhao Liang**)
-   - [Doc] Update hello_world.md (#7367) (by **Zhao Liang**)
-   - [refactor] Introduce lang::CompiledKernelData (#7340) (by **PGZXB**)
-   - [bug] Fix matrix initialization error with numpy.floating data (#7362) (by **Zhanlue Yang**)
-   - [Doc] Update kernel_function.md (#7364) (by **Zhao Liang**)
-   - [test] [amdgpu] Fix bug with allocs bb in function body (#7308) (by **Zeyu Li**)
-   - [Doc] Update hello_world.md (#7354) (by **Zhao Liang**)
-   - [aot] Fixed C-API docs (#7361) (by **PENGUINLIONG**)
-   - [refactor] Remove dependencies on Callable::program in lang::CompiledGraph::run (#7288) (by **PGZXB**)
-   - [DOC] Update llvm_sparse_runtime.md (#7323) (by **Gabriel Vainer**)
-   - [Doc] Update profiler.md (#7358) (by **Zhao Liang**)
-   - [Doc] Update kernel_function.md (#7356) (by **Zhao Liang**)
-   - [aot] Improve Taichi C++ wrapper implementation (#7347) (by **PENGUINLIONG**)
-   - [Doc] Update tut.md (#7352) (by **Gabriel Vainer**)
-   - [ci] Add doc snippet CI requirements (#7355) (by **Proton**)
-   - [amdgpu] Update device memory free (#7346) (by **Zeyu Li**)
-   - [Doc] Update type.md (#7350) (by **Zhao Liang**)
-   - [aot] Enable 16-bit dtype support for Taichi AOT (#7315) (by **Zhanlue Yang**)
-   - [example] Re-implement the Cornell Box demo with shorter lines of code (#7252) (by **HK-SHAO**)
-   - [aot] AOT CI refactorization (#7339) (by **PENGUINLIONG**)
-   - [llvm] Let the kernel return struct (by **lin-hitonami**)
-   - [Doc] Update hello_world.md (#7337) (by **Zhao Liang**)
-   - [ci] Reduce doc test concurrency (#7336) (by **Proton**)
-   - [ir] Refactor result fetching (by **lin-hitonami**)
-   - [ir] Get the offsets of elements in StructType (by **lin-hitonami**)
-   - [misc] Delete test.py (#7332) (by **Bob Cao**)
-   - [vulkan] More subgroup operations (#7328) (by **Bob Cao**)
-   - [vulkan] Add vulkan profiler (#7295) (by **Haidong Lan**)
-   - [refactor] Move TaichiLLVMContext::runtime_jit_module and TaichiLLVMContext::create_jit_module() to LlvmRuntimeExecutor (#7320) (by **PGZXB**)
-   - [refactor] Remove dependencies on LlvmProgramImpl::get_llvm_context() in TaskCodeGenLLVM (#7321) (by **PGZXB**)
-   - [ci] Checkout with privileged token when landing ghstack PRs (#7331) (by **Proton**)
-   - [ir] Add fields to StructType (by **lin-hitonami**)
-   - [gui] Remove renderable reuse & make renderable immediate (#7327) (by **Bob Cao**)
-   - [Gui] GGUI use shader "factory" (GGUI rework n/N) (#7271) (by **Bob Cao**)
-   - [bug] Fix u64 field cannot be assigned value >= 2 ** 63 (#7319) (by **Lin Jiang**)
-   - [type] Let the compute type of quant uint be unsigned int (by **lin-hitonami**)
-   - [doc] Replace slack with discord (#7318) (by **yanqingzhang**)
-   - [refactor] Change print statement to warnings.warn in taichi.lang.util.warning (#7301) (by **Jett Chen**)
-   - [ci] ChatOps: ghstack land (#7314) (by **Proton**)
-   - [refactor] Remove TaichiLLVMContext::lookup_function_pointer() (#7312) (by **PGZXB**)
-   - [misc] Update MSVC flags (#7254) (by **Bob Cao**)
-   - [doc] [ci] Cover code snippets in docs (#7309) (by **Proton**)
-   - [refactor] Remove dependencies on LlvmProgramImpl::get_llvm_context() in KernelCodeGen (#7289) (by **PGZXB**)
-   - [rhi] Device upload readback functions (#7278) (by **Bob Cao**)
-   - [aot] Fixed external project inclusion (#7297) (by **PENGUINLIONG**)
-   - [Doc] Update append docstring (#7265) (by **Zhao Liang**)
-   - [refactor] Remove dependencies on Callable::program in lang::get_hashed_offline_cache_key (#7287) (by **PGZXB**)
-   - [ci] [amdgpu] Enable amdgpu backend python unit tests (#7293) (by **Zeyu Li**)
-   - [Bug] Fix copy_from() of StructField (#7294) (by **Yi Xu**)
-   - [ci] Adapt new Android phone behavior (#7306) (by **Proton**)
-   - [Bug] Fix caching same loop invariant global vars inside nested fors (#7285) (by **Lin Jiang**)
-   - [amdgpu] Part5 enable the api of amdgpu (#7202) (by **Zeyu Li**)
-   - [amdgpu] Enable struct for on amdgpu backend (#7247) (by **Zeyu Li**)
-   - [misc] Update external/asset which was accidentally downgraded in #7248 (#7284) (by **Lin Jiang**)
-   - [amdgpu] Update runtime module (#7248) (by **Zeyu Li**)
-   - [llvm] Remove unused argument 'arch' in LlvmProgramImpl::get_llvm_context (#7282) (by **Lin Jiang**)
-   - [misc] Remove deprecated kwarg in rw_texture type annotations (#7267) (by **Ailing**)
-   - [ci] Tolerate duplicates when registering version (#7281) (by **Proton**)
-   - [gui] Fix GGUI destruction order (#7279) (by **Bob Cao**)
-   - [doc] Rename /doc/ndarray_android to /doc/tutorial (#7273) (by **Lin Jiang**)
-   - [llvm] Unify the llvm context of host and device (#7249) (by **Lin Jiang**)
-   - [misc] Fix manylinux2014 warning not printing (#7270) (by **Proton**)
-   - [ci] Building: add complete PATH set for conda (#7268) (by **Proton**)
-   - [autodiff] Support rsqrt operator (#7259) (by **Mingrui Zhang**)
-   - [ci] Update pre-commit repos version (#7257) (by **Proton**)
-   - [refactor] Fix "const CompileConfig *" to "const CompileConfig &" (Part2) (#7253) (by **PGZXB**)
-   - [refactor] Fix "const CompileConfig *" to "const CompileConfig &" (#7243) (by **PGZXB**)
-   - [aot] Added third-party render thread task injection for Unity (#7151) (by **PENGUINLIONG**)
-   - [aot] Support statically linked C-API library on MacOS (#7207) (by **Zhanlue Yang**)
-   - [gui] Force GGUI to go through host memory (nuking interops) (#7218) (by **Bob Cao**)
-   - [Error] Allow IfExp on matrices when the condition is scalar (#7241) (by **Lin Jiang**)
-   - [bug] Fix the parity of the RNG (#7239) (by **Lin Jiang**)
-   - [Lang] Add better error message for dynamic snode (#7238) (by **Zhao Liang**)
-   - [DOC] Update ndarray.md (#7236) (by **Gabriel Vainer**)
-   - [Error] Remove deprecations in ti.ui in 1.6.0 (#7229) (by **Lin Jiang**)
-   - [Doc] Update llvm_sparse_runtime.md (#7215) (by **Zhao Liang**)
-   - [lang] Add validation checks for subscripts to reject negative indices (#7212) (by **Zhanlue Yang**)
-   - [refactor] Remove legacy num_bits and acc_offsets from AxisExtractor (#7227) (by **Yi Xu**)
-   - [Error] Remove deprecated ti.linalg.sparse_matrix_builder in 1.6.0 (#7228) (by **Lin Jiang**)
-   - [Error] Remove deprecations in ASTTransformer in 1.6.0 (#7226) (by **Lin Jiang**)
-   - [misc] Export DeviceAllocation into Python & support devalloc in field_info (#7233) (by **Bob Cao**)
-   - [gui] Use templated bulk copy to simplify VBO preperation (#7234) (by **Bob Cao**)
-   - [rhi] Add create_image_unique stub & misc RHI bug fixes (#7232) (by **Bob Cao**)
-   - [opengl] Fix GLFW global context issue (#7230) (by **Bob Cao**)
-   - [examples] Remove dependency on `ti.u8` compute type for ngp (#7220) (by **Bob Cao**)
-   - [refactor] Remove Kernel::offload_to_executable (#7210) (by **PGZXB**)
-   - [opengl] RW image binding & FP16 support (#7219) (by **Bob Cao**)
-   - [Error] Remove deprecated a.atomic_op(b) in Taichi v1.6.0 (#7225) (by **Lin Jiang**)
-   - [Error] Remove deprecations in taichi/__init__.py in v1.6.0 (#7222) (by **Lin Jiang**)
-   - [Error] Raise error when using deprecated ifexp on matrices (#7224) (by **Lin Jiang**)
-   - [refactor] Remove legacy BitExtractStmt (#7221) (by **Yi Xu**)
-   - [amdgpu] Part4 link bitcode file (#7180) (by **Zeyu Li**)
-   - [example] Reorganize example oit_renderer (#7208) (by **Lin Jiang**)
-   - [aot] Fix ndarray aot with information from type hints (#7214) (by **Ailing**)
-   - [gui] Fix wide line support on macOS (#7205) (by **Bob Cao**)
-   - [Lang] Simplify the swizzle generator (#7216) (by **Zhao Liang**)
-   - [refactor] Split constructing and compilation of lang::Function (#7209) (by **PGZXB**)
-   - [doc] Fix netlify build command (#7217) (by **Ailing**)
-   - [ci] M1 buildbot release tag (#7213) (by **Proton**)
-   - [misc] Remove unused task_funcs (#7211) (by **PGZXB**)
-   - [refactor] Program::this_thread_config() -> Program::compile_config() (#7199) (by **PGZXB**)
-   - [doc] Fix format issues of windows debugging (#7197) (by **Olinaaaloompa**)
-   - [aot] More OpenGL interop in C-API (#7204) (by **PENGUINLIONG**)
-   - [metal] Disable a kernel test in offline cache to unblock CI (#7154) (by **Ailing**)
-   - [ci] Switch Windows build script to build.py (#6993) (by **Proton**)
-   - [misc] Update submodule taichi_assets (#7203) (by **Lin Jiang**)
-   - [mac] Use ObjectLinkingLayer instead of RTDyldObjectLinkingLayer for aarch64 mac (#7201) (by **Ailing**)
-   - [misc] Remove unused Program::jit_evaluator_id (#7200) (by **PGZXB**)
-   - [misc] Remove legacy latex generation (#7196) (by **Yi Xu**)
-   - [Lang] Remove the deprecated dynamic_index switch (#7195) (by **Yi Xu**)
-   - [bug] Fix check_matched() failure with Ndarray holding TensorType'd element (#7178) (by **Zhanlue Yang**)
-   - [Doc] Remove doc tutorial (#7198) (by **Olinaaaloompa**)
-   - [bug] Fix example circle-packing (#7194) (by **Lin Jiang**)
-   - [aot] C-API opengl runtime interop (#7120) (by **damnkk**)
-   - [Error] Better error message when creating sparse snodes on backends that do not support sparse (#7191) (by **Lin Jiang**)
-   - [example] Fix ti gallery close warning (#7187) (by **Zhao Liang**)
-   - [lang] Interface refactors for MatrixType and VectorType (#7143) (by **Zhanlue Yang**)
-   - [aot] Find Taichi in python wheel (#7181) (by **PENGUINLIONG**)
-   - [gui] Update circles rendering to use quads (#7163) (by **Bob Cao**)
-   - [Doc] Rename tutorial doc (#7186) (by **Zhao Liang**)
-   - [ir] Fix gcc cannot compile inline template specialization (#7179) (by **Lin Jiang**)
-   - [Doc] Update tutorial.md (#7176) (by **Zhao Liang**)
-   - [aot] Replace std::exchange with local implementation for C++11 (#7170) (by **PENGUINLIONG**)
-   - [ci] Fix near cache urls (missing comma) (#7158) (by **Proton**)
-   - [docs] Create windows_debug.md (#7164) (by **Bob Cao**)
-   - [Doc] Update math_module.md (#7175) (by **Zhao Liang**)
-   - [aot] FindTaichi CMake module to help outside project integration (#7168) (by **PENGUINLIONG**)
-   - [aot] Removed unused archs in C-API (#7167) (by **PENGUINLIONG**)
-   - [Doc] Update debugging.md (#7173) (by **Zhao Liang**)
-   - [refactor] Remove dependencies on Program::this_thread_config() in irpass::constant_fold (#7159) (by **PGZXB**)
-   - [Doc] Fix C++ tutorial does not display on doc site (#7174) (by **Zhao Liang**)
-   - [aot] C++ wrapper for memory slice and memory allocation with host access (#7171) (by **PENGUINLIONG**)
-   - [aot] Fixed ti_get_last_error signature (#7165) (by **PENGUINLIONG**)
-   - [misc] Log to stderr instead of stdout (#7166) (by **PENGUINLIONG**)
-   - [aot] C-API get version wrapper (#7169) (by **PENGUINLIONG**)
-   - [doc] Fix spelling of "paticle_field" (#7024) (by **Xiang (Kevin) Li**)
-   - [misc] Remove useless Program::sync (#7160) (by **PGZXB**)
-   - [doc] Update accelerate_python.md to use ti.max (#7161) (by **Tao Jin**)
-   - [doc] Add doc ndarray (#7157) (by **Olinaaaloompa**)
-   - [mac] Add .dylib and .cmake to built wheel (#7156) (by **Ailing**)
-   - [refactor] Remove dependencies on Program::this_thread_config() in some tests (#7155) (by **PGZXB**)
-   - [refactor] Remove dependencies on Program::this_thread_config() in llvm backends codegen (#7153) (by **PGZXB**)
-   - [Lang] Remove deprecated packed switch (#7104) (by **Yi Xu**)
-   - [example] Update quaternion arithmetics in fractal_3d_ggui (#7139) (by **Zhao Liang**)
-   - [doc] Update field.md (Fields advanced) (#6867) (by **Gabriel Vainer**)
-   - [ci] Use make_changelog.py to generate the full changelog (#7152) (by **Lin Jiang**)
-   - [refactor] Rename Callable::*arg* to Callable::*param* (#7133) (by **PGZXB**)
-   - [aot] Introduce new AOT deployment tutorial (#7144) (by **PENGUINLIONG**)
-   - [bug] Unify error message matching with/without validation layers for CapiTest.FailMapDeviceOnlyMemory (#7110) (by **Zhanlue Yang**)
-   - [lang] Remove redundant TensorType expansion for function returns (#7124) (by **Zhanlue Yang**)
-   - [lang] Sign python library for Apple M1 (#7138) (by **PENGUINLIONG**)
-   - [gui] Fix particle size limits (#7149) (by **Bob Cao**)
-   - [lang] Migrate TensorType expansion in MatrixType/VectorType from Python code to Frontend IR (#7127) (by **Zhanlue Yang**)
-   - [aot] Support texture arguments for AOT kernels (#7142) (by **Zhanlue Yang**)
-   - [metal] Retain Metal commandBuffers & build command buffers directly (#7137) (by **Bob Cao**)
-   - [rhi] Update `create_pipeline` API and add support of VkPipelineCache (#7091) (by **Bob Cao**)
-   - [autodiff] Support grad in ndarray (#6906) (by **PhrygianGates**)
-   - [Doc] Update doc regarding dynamic index (#7148) (by **Yi Xu**)
-   - [refactor] Remove dependencies on Program::this_thread_config() in spirv::lower (#7134) (by **PGZXB**)
-   - [Misc] Strictly check ndim with external array (#7126) (by **Haidong Lan**)
-   - [ci] Run test when pushing to rc branches (#7146) (by **Lin Jiang**)
-   - [refactor] Remove dependencies on Program::this_thread_config() in KernelCodeGen (#7086) (by **PGZXB**)
-   - [ci] Disable backward_cpp on macOS (#7145) (by **Proton**)
-   - [gui] Fix scene line renderable (#7131) (by **Bob Cao**)
-   - [refactor] Remove useless Kernel::from_cache_ (#7132) (by **PGZXB**)
-   - [cpu] Reuse VirtualMemoryAllocator for CPU ndarray memory allocation (#7128) (by **Ailing**)
-   - [Lang] Raise errors when using the packed switch (#7125) (by **Yi Xu**)
-   - [ci] Temporarily disable ad_external_array on Metal (#7136) (by **Bob Cao**)
-   - [Error] Raise errors when using metal sparse (#7113) (by **Lin Jiang**)
-   - [aot] AOT compat test in workflow (#7033) (by **damnkk**)
-   - [Lang] Fix cannot use taichi in REPL (#7114) (by **Zhao Liang**)
-   - [lang] Free ndarray memory when it's GC-ed in Python (#7072) (by **Ailing**)
-   - [lang] Migrate TensorType expansion for FuncCallExpression from Python code to Frontend IR (#6980) (by **Zhanlue Yang**)
-   - [amdgpu] Part2 add runtime (#6482) (by **Zeyu Li**)
-   - [refactor] Remove dependencies on Program::this_thread_config() in codegen_cc.cpp (#7088) (by **PGZXB**)
-   - [refactor] Remove dependencies on Program::this_thread_config() in gfx::run_codegen (#7089) (by **PGZXB**)
-   - [Bug] Fix num_splits in parallel_struct_for (#7121) (by **Yi Xu**)
-   - [Doc] Move glossary to top level (#7118) (by **Zhao Liang**)
-   - [metal] Update Metal RHI impl & add support for shared arrays (#7107) (by **Bob Cao**)
-   - [ci] Update amdgpu ci (#7117) (by **Zeyu Li**)
-   - [refactor] Move Kernel::lower() outside the taichi::lang::Kernel (#7048) (by **PGZXB**)
-   - [amdgpu] Part1 add codegen (#6469) (by **Zeyu Li**)
-   - [Aot] Deprecate element shape and field dim for AOT symbolic args (#7100) (by **Haidong Lan**)
-   - [refactor] Remove Program::current_ast_builder() (#7075) (by **PGZXB**)
-   - [aot] Switch Metal to SPIR-V codegen (#7093) (by **PENGUINLIONG**)
-   - [Lang] Remove deprecated ti.Matrix.rotation2d() (#7098) (by **Yi Xu**)
-   - [doc] Modified some errors in the function examples (#7094) (by **welann**)
-   - [ci] More Windows git hacks (#7102) (by **Proton**)
-   - [Lang] Remove filename kwarg in aot Module save() (#7085) (by **Ailing**)
-   - [aot] Rename device capability atomic_i64 to atomic_int64 for consistency (#7095) (by **PENGUINLIONG**)
-   - [Lang] Remove sourceinspect deprecation warning message (#7081) (by **Zhao Liang**)
-   - [example] Remove gui warning message (#7090) (by **Zhao Liang**)
-   - [refactor] Remove unnecessary Kernel::arch (#7074) (by **PGZXB**)
-   - [refactor] Remove unnecessary parameter of irpass::scalarize (#7087) (by **PGZXB**)
-   - [Bug] Fix ret_type and cast_type of UnaryOpStmt in Scalarize (#7082) (by **Yi Xu**)
-   - [lang] Migrate TensorType expansion for TextureOpExpression from Python code to Frontend IR (#6968) (by **Zhanlue Yang**)
-   - [lang] Migrate TensorType expansion for ReturnStmt from Python code to Frontend IR (#6946) (by **Zhanlue Yang**)
-   - [doc] Update ndarray deprecation warning to 1.5.0 (#7083) (by **Haidong Lan**)
-   - [amdgpu] Update amdgpu module call (#7022) (by **Zeyu Li**)
-   - [amdgpu] Add convert addressspace pass related unit test (#7023) (by **Zeyu Li**)
-   - [ir] Let real function return nested StructType (by **lin-hitonami**)
-   - [ir] Replace FuncCallExpression with FrontendFuncCallStmt (by **lin-hitonami**)
-   - [example] Update gallery images (#7053) (by **Zhao Liang**)
-   - [Doc] Update type.md (#7038) (by **Zhao Liang**)
-   - [misc] Bump version to v1.5.0 (#7077) (by **Lin Jiang**)
-   - [rhi] Update Stream `new_command_list` API (#7073) (by **Bob Cao**)
-   - [Doc] Fix docstring (#7065) (by **Zhao Liang**)
-   - [ci] Workaround windows checkout 'Needed a single revision' issue (#7078) (by **Proton**)
-   - [Lang] Make slicing a single row/column of a matrix return a vector (#7068) (by **Yi Xu**)
-   - [lang] Make sure ndarrays created in python frontend are initialized as zero (#7060) (by **Ailing**)
-   - [Lang] Deprecate the dynamic_index switch (#7071) (by **Yi Xu**)
-   - [misc] Update python package metadata (#7063) (by **Proton**)
-   - [bug] Fixed compilation error caused by #7047 (#7069) (by **PGZXB**)
-   - [opt] Automatically identify allocas to scalarize (#7055) (by **Yi Xu**)
-   - [refactor] Remove ir parameter of KernelCodeGen::KernelCodeGen(Kernel *kernel, IRNode *ir) (#7046) (by **PGZXB**)
-   - [refactor] Remove unnecessary IRNode::kernel (#7047) (by **PGZXB**)
-   - [refactor] Remove dependencies on Program::current_ast_builder() in C++ side (#7044) (by **PGZXB**)
-   - [ci] Version sanity check before publishing (#7062) (by **Proton**)
-   - [ci] Make changelog generation working again (#7058) (by **Proton**)
-   - [rhi] Update CommandList dispatch API (#7052) (by **Bob Cao**)
-   - [aot] C-API versioning (#7050) (by **PENGUINLIONG**)
-   - [refactor] Remove offloaded parameter of Program::compile() (#7045) (by **PGZXB**)
-   - [lang] Migrate TensorType expansion for subscription indices from Python to Frontend IR (#6942) (by **Zhanlue Yang**)
-   - [opt] Add ExtractPointers pass for dynamic index (#7051) (by **Yi Xu**)
-   - [Lang] Add irpass::eliminate_immutable_local_vars() test cases for TensorType (#7043) (by **Zhanlue Yang**)
-   - [Lang] Fix gui docstring (#7003) (by **Zhao Liang**)
-   - [rhi] Update compute CommandList APIs (except dispatch) (#7037) (by **Bob Cao**)
-   - [ir] Let GetElementExpression&Statement support index list (#7049) (by **Lin Jiang**)
-   - [aot] C-API opengl runtime interop (#7042) (by **PENGUINLIONG**)
-   - [ci] Pin pre-commit python version to 3.10 (#7041) (by **Proton**)
-   - [opengl] Enable more gles tests in CI (#7031) (by **Ailing**)
-   - [ci] Tuning headless demo VRAM usage (#7039) (by **Proton**)
-   - [Build] Deprecate export_core (#7028) (by **Zhanlue Yang**)
-   - [GUI] Support colored texts (#7036) (by **Dunfan Lu**)
-   - [aot] Revert "C-API opengl runtime interop (#7014)" (#7032) (by **Proton**)
-   - [ci] Update pre-commit app versions (#7025) (by **Proton**)
-   - [Doc] Remove packed mode in doc (#7030) (by **Zhao Liang**)
-   - Revert "[opengl] Enable more gles tests in CI" (#7029) (by **Ailing**)
-   - [build] Remove libexport_core.so dependency for Android App CI (#6997) (by **Zhanlue Yang**)
-   - [opengl] Enable more gles tests in CI (#7010) (by **Ailing**)
-   - [aot] C-API opengl runtime interop (#7014) (by **damnkk**)
-   - [misc] Add macro to control amdgpu-related header file (#7021) (by **Zeyu Li**)
-   - [bug] Fix device memory allocation for numpy array on CUDA backend (#7008) (by **Zhanlue Yang**)
-   - [ci] Try enabling MSVC and check build times (#6905) (by **Bob Cao**)
-   - [gfx] Update Device API: Splitting ResourceBinder into seperate Shade… (#7020) (by **Proton**)
-   - [gfx] Revert "Update Device API: Splitting ResourceBinder into sepera… (#7019) (by **Proton**)
-   - [amdgpu] Update amdgpu device to new API (#7018) (by **Bob Cao**)
-   - [perf] Fix fill ndarray size problem. (#6992) (by **Haidong Lan**)
-   - [cuda] Fix LLVM15 rsqrt perf regression (#7012) (by **Haidong Lan**)
-   - [gfx] Update Device API: Splitting ResourceBinder into seperate ShaderResourceSet & RasterResources (#6954) (by **Bob Cao**)
-   - [opt] Add ImmediateIRModifier to provide amortized constant-time replace_usages_with() (#7001) (by **Yi Xu**)
-   - [amdgpu] Part0 add render hardware interface (#6464) (by **Zeyu Li**)
-   - [Error] Do not show warning when the offline cache path does not exist (#7005) (by **PGZXB**)
-   - [Lang] [spirv] Support dynamic indexing in spirv (#6990) (by **Yi Xu**)
-   - [misc] Remove unnecessary CompileConfig::lazy_compilation (#7009) (by **PGZXB**)
-   - [ci] Add C++ tests on AMDGPU RHI (#6597) (by **Zeyu Li**)
-   - [ci] Update taichi-release-tests branch (disable QuanTaichi GOL) (#7011) (by **Proton**)
-   - [amdgpu] Part3 update runtime module (#6486) (by **Zeyu Li**)
-   - [opengl] Fix tests running both on opengl and vulkan (#7006) (by **Ailing**)
-   - [ir] Record the return types to a StructType (#6995) (by **Lin Jiang**)
-   - [lang] Get the CHI-IR struct type in python (#6994) (by **Lin Jiang**)
-   - [ir] Change type maps to unordered maps and add mutexes (#7000) (by **Lin Jiang**)
-   - [ir] Add struct type to CHI-IR (#6982) (by **Lin Jiang**)
-   - [misc] Add repography activity stats (#6991) (by **Proton**)
-   - [aot] Enable validation layers for C-API tests (#6893) (by **Zhanlue Yang**)
-   - [opengl] Add ti.gles arch and enable tests (#6988) (by **Ailing**)
-   - [Lang] [metal] Support dynamic indexing in metal (#6985) (by **Yi Xu**)
-   - [opengl] Reset opengl context when taichi program resets (#6987) (by **Ailing**)
-   - [Lang] Support  LU sparse solver on CUDA backend (#6967) (by **pengyu**)
-   - [misc] Keeping up with new python-wheel implementation (#6986) (by **Proton**)
-   - [aot] Recover AOT CI script (#6970) (by **PENGUINLIONG**)
-   - [lang] Migrate TensorType expansion for svd from Python code to Frontend IR (#6972) (by **Zhanlue Yang**)
-   - [misc] Adding XCode project support (#6976) (by **Bob Cao**)
-   - [bug] Fix taichi_ngp starting from ti example (#6973) (by **Ailing**)
-   - [ci] Revert "Fix missing c_api.so in linux nightly" (#6974) (by **Ailing**)
-   - [ci] Build: auto install vulkan on Linux (#6969) (by **Proton**)
-   - [ci] Auto setup miniforge3 env when build (#6966) (by **Proton**)
-   - [Lang] Fix struct type problem (#6949) (by **Zhao Liang**)
-   - [aot] C-API breaking changes! (#6955) (by **PENGUINLIONG**)
-   - [lang] Fix scalarization for PrintStmt (#6945) (by **Zhanlue Yang**)
-   - [bug] Allow StructType as type hint to ti.func (#6964) (by **Yi Xu**)
-   - [refactor] Remove legacy code for dynamic index (#6961) (by **Yi Xu**)
-   - [aot] Fix rwtexture with template_args (#6960) (by **Ailing**)
-   - [ci] Fix missing c_api.so in linux nightly (#6962) (by **Ailing**)
-   - [lang] Migrate TensorType expansion for SNode indices from Python to Frontend IR (#6934) (by **Zhanlue Yang**)
-   - [doc] New FAQ added (#6963) (by **Olinaaaloompa**)
-   - [ci] Sync CI cache script & workflow (#6959) (by **Proton**)
-   - [ci] Update release test branch, reduce running time (#6944) (by **Proton**)
-   - [ci] Remove redundant tests (#6947) (by **Proton**)
-   - [bug] Fix recompilation of filling a matrix field with the same matrix (#6951) (by **Yi Xu**)
-   - [aot] Fixed C-API behavior tests (#6939) (by **PENGUINLIONG**)
-   - [refactor] Remove _PyScopeMatrixImpl (#6943) (by **Yi Xu**)
-   - [aot] Fix validation warning: OpImageFetch should operate on OpImage instead of OpSampledImage (#6925) (by **Zhanlue Yang**)
-   - [CLI] Add "ti cache clean" command to clean the offline cache files manually (#6937) (by **PGZXB**)
-   - [Doc] Minor doc update (#6952) (by **Zhao Liang**)
-   - [ci] Fix forgotten build script paths (#6941) (by **Proton**)
-   - [opt] Add pass eliminate_immutable_local_vars (#6926) (by **Yi Xu**)
-   - [ci] Fix pre-commit errors (#6940) (by **Proton**)
-   - [doc] Editorial updates (#6935) (by **Olinaaaloompa**)
-   - [ci] Workflow Rewrite: Building on Linux (#6848) (by **Proton**)
-   - [refactor] Remove _IntermediateMatrix and _MatrixFieldElement (#6932) (by **Yi Xu**)
-   - [aot] C_API behavior test (#6904) (by **damnkk**)
-   - [lang] Fix matrix type inference and remove _MatrixEntriesInitializer (#6928) (by **Yi Xu**)
-   - [lang] Reorder sparse matrix before solving (#6886) (by **pengyu**)
-   - [Doc] Glossary (#6101) (by **Olinaaaloompa**)
-   - [aot] Refactor C-API error tests (#6890) (by **Zhanlue Yang**)
-   - [doc] Update layout.md (Fields) (#6868) (by **Gabriel Vainer**)
-   - [Doc] Update dac (#6875) (by **Gabriel Vainer**)
-   - [lang] Support 'len' with Matrix-typed operands (#6923) (by **Zhanlue Yang**)
-   - [doc] Update sparse.md (#6908) (by **Gabriel Vainer**)
-   - [doc] Update performance.md (#6911) (by **Gabriel Vainer**)
-   - [doc] Update debugging.md (#6909) (by **Gabriel Vainer**)
-   - [doc] Update profiler.md (#6910) (by **Gabriel Vainer**)
-   - [bug] Add GetElementExpression to offline cache key (#6918) (by **PGZXB**)
-   - [ci] Reenable AMDGPU CI, disable OpenGL tests in AMDGPU task (#6887) (by **Proton**)
-   - [lang] Fix accidental changes during matrix refactor (#6914) (by **Yi Xu**)
-   - [example] Add circle-packing example (#6870) (by **Zhao Liang**)
-   - [Doc] Update faq.md (#6921) (by **Zhao Liang**)
-   - [misc] Show suggestion when locking metadata.lock fails (#6919) (by **PGZXB**)
-   - [doc] New FAQs (#6055) (by **Olinaaaloompa**)
-   - [example] Add poission disk sampling example (#6852) (by **Zhao Liang**)
-   - [vulkan] Improve Vulkan RHI impl with lower overhead internal implementations (#6912) (by **Bob Cao**)
-   - [doc] Link to LLVM 15 built for Visual Studio 2022 (#6916) (by **PENGUINLIONG**)
-   - [lang] Fix issue of IfExpr with TensorTyped operands (#6897) (by **Zhanlue Yang**)
-   - [doc] Update hello_world.md (#6889) (by **Gabriel Vainer**)
-   - [IR] Allow a maximum of 12 SNode indices (#6901) (by **Dunfan Lu**)
-   - [doc] Update odop.md (#6874) (by **Gabriel Vainer**)
-   - [doc] Update external.md (#6869) (by **Gabriel Vainer**)
-   - [Doc] Update dataclass.md (#6876) (by **Gabriel Vainer**)
-   - [doc] Update cloth_simulation.md (#6898) (by **Vissidarte-Herman**)
-   - [example] Update marching squares example (#6851) (by **Zhao Liang**)
-   - [Lang] Add warning message when converting dynamic snode to numpy (#6853) (by **Zhao Liang**)
-   - [Lang] Deprecate sourceinspect dependency (#6894) (by **Zhao Liang**)
-   - [aot] Added C-API behavior tests (#6871) (by **damnkk**)
-   - [aot] Gather satellite repo URLs (#6860) (by **PENGUINLIONG**)
-   - [refactor] Remove _TiScopeMatrixImpl (#6892) (by **Yi Xu**)
-   - [ci] Python test minor fixes (#6891) (by **Proton**)
-   - [ir] Add ir_traits namespace to use less dynamic casts & Run CFG only ever once (#6812) (by **Bob Cao**)
-   - [Lang] Warn users if ndarray size is out of int32 boundary (#6846) (by **Yi Xu**)
-   - [build] Enable strip for libtaichi_c_api.so with Release Build (#6845) (by **Zhanlue Yang**)
-   - [Lang] Remove the real_matrix switch (#6885) (by **Yi Xu**)
-   - [build] Turn on function level linking for taichi_c_api (#6840) (by **Zhanlue Yang**)
-   - [test] Remove tests with real_matrix=True and real_matrix_scalarize=True (#6873) (by **Yi Xu**)
-   - [misc] Revert back to master after #6843 merged (#6883) (by **Bob Cao**)
-   - [vulkan] Cleanup spdlog related logging from Vulkan RHI (#6843) (by **Bob Cao**)
-   - [ci] Temporarily disable AMDGPU CI (#6872) (by **Proton**)
-   - [Lang] Enable real_matrix and real_matrix_scalarize by default (#6801) (by **Zhanlue Yang**)
-   - [bug] MatrixType bug fix: Fix error with static-grouped-ndrange (#6839) (by **Zhanlue Yang**)
-   - [example] Fix jacobian example (#6849) (by **Mingrui Zhang**)
-   - [bug] Fix flaky mass_spring_game_ggui.py on Mac M1 by setting up default values for VulkanCapabilities (#6850) (by **Zhanlue Yang**)
-   - [example] Solve implicit fem using sparsee solver (#6827) (by **pengyu**)
-   - [build] Migrate cmake targets from OBJECT to STATIC for libtaichi_c_api.so (#6831) (by **Zhanlue Yang**)
-   - [Bug] Fix getting 64-bit data from ndarray in Python scope (#6836) (by **Yi Xu**)
-   - [test] Avoid constant folding in overflow tests (#6835) (by **Ailing**)
-   - [aot] Added C-API behavior test (#6837) (by **damnkk**)
-   - [bug] Matrix refactor bug fix: Fix cross scope matrix operations (#6822) (by **Zhanlue Yang**)
-   - [build] Refactored and removed RuntimeCUDA and RuntimeCUDAInjector (#6830) (by **Zhanlue Yang**)
-   - [bug] Matrix refactor bug fix: Fix logical binary operations with TensorTyped operands (#6817) (by **Zhanlue Yang**)
-   - [example] Add order-independent transparency example (#6829) (by **Lin Jiang**)
-   - [opt] Re-enable constant folding when debug=True (#6824) (by **Ailing**)
-   - [Bug] Avoid overwriting global tmp with dynamic_index=True (#6820) (by **Yi Xu**)
-   - [bug] Matrix refactor bug fix: Fix restrictions on BinaryOp/TernaryOp operands' broadcasting (#6805) (by **Zhanlue Yang**)
-   - [aot] C-API Device capability improvements (#6773) (by **PENGUINLIONG**)
-   - [misc] Headers dependency cleanup from RHI (#6699) (by **Bob Cao**)
-   - [ci] Revert "Temporarily disable desktop headless tests (#6811)" (#6816) (by **Proton**)
-   - [misc] Bump version to v1.4.0 (#6804) (by **PENGUINLIONG**)
-   - [ci] Add AMDGPU relected ci (#6743) (by **Zeyu Li**)
-   - [test] Remove unnecessary duplicated python runtime test runs (#6808) (by **Ailing**)
-   - [Lang] Raise an error for the semantic change of transpose() (#6813) (by **Yi Xu**)
-   - [refactor] Remove unnecessary checks in program (#6802) (by **Ailing**)
-   - [vulkan] Support texture type args in aot add_kernel (#6796) (by **Ailing**)
-   - [ci] Temporarily disable desktop headless tests (#6811) (by **Proton**)
-   - [bug] Fix name collision in ti.dataclass (#6737) (by **Yi Xu**)
-   - [bug] MatrixType bug fix: Add additional restrictions for unpacking a Matrix (#6795) (by **Zhanlue Yang**)
-   - [doc] Update docstring for grad replaced (#6800) (by **Mingrui Zhang**)
-   - [build] Add MSBuild option to setup.py (#6724) (by **Bob Cao**)
-   - [Lang] [type] Add bool type in python as an alias to i32 (#6742) (by **daylily**)
-   - [lang] Use less gpu memory when building sparse matrix (#6781) (by **pengyu**)
-   - [example] Add cuda options for sparse matrix examples (#6785) (by **pengyu**)
-   - [misc] Remove usage of deprecated num_channels/channel_format type hint in rw_texture in codebase (#6791) (by **Ailing**)
-   - [bug] MatrixType bug fix: Fix error with BLS (#6664) (by **Zhanlue Yang**)
-   - [vulkan] Support rw_texture in aot add_kernel (#6789) (by **Ailing**)
-   - [bug] MatrixType bug fix: Fix error with quant (#6776) (by **Yi Xu**)
-   - [bug] MatrixType bug fix: Fix test_ad_gdar_diffmpm (#6786) (by **Yi Xu**)
-   - [vulkan] Deprecate num_channels and channel_format args in rw_texture type annotation (#6782) (by **Ailing**)
-   - [misc] Remove the default potential_bug label on bug report issues (#6784) (by **Ailing**)
-   - [bug] MatrixType bug fix: Fix error with texture (#6775) (by **Yi Xu**)
-   - [vulkan] Make sure kernel recompiles when texture dtype changes (#6774) (by **Ailing**)
-   - [aot] Clean up exported symbols for libtaichi_c_api.so (#6140) (by **Zhanlue Yang**)
-   - [Misc] Refactored flattend_values() to avoid potential conflicts in flattened statements (#6749) (by **Zhanlue Yang**)
-   - [aot] Warn the user about out-of-range access in C++ wrapper (#6492) (by **PENGUINLIONG**)
-   - [build] Initial distributed compiling support (#6762) (by **Proton**)
-   - [aot] Revert C-API Device capability improvements (#6772) (by **PENGUINLIONG**)
-   - [aot] C-API Device capability improvements (#6702) (by **PENGUINLIONG**)
-   - [aot] C-API to get available archs (#6766) (by **PENGUINLIONG**)
-   - [doc] Update sparse matrix document (#6719) (by **pengyu**)
-   - [autodiff] Separate non-linear operators to an individual class (#6700) (by **Mingrui Zhang**)
-   - [bug] Fix dereferencing nullptr (#6763) (by **Yi Xu**)
-   - [Doc] Update the documentation about Dynamic SNode (#6752) (by **Lin Jiang**)
-   - [doc] Update dev install about clang version (#6759) (by **Ailing**)
-   - [build] Improve TI_WITH_CUDA guards for CUDA related test cases (#6698) (by **Zhanlue Yang**)
-   - [Lang] Add deprecation warning for the removal of the packed switch (#6753) (by **Yi Xu**)
-   - [lang] Improve sparse matrix building on GPU (#6748) (by **pengyu**)
-   - [aot] JSON serde (#6754) (by **PENGUINLIONG**)
-   - [bug] MatrixType bug fix: Fix error with to_numpy() and from_numpy() (#6726) (by **Zhanlue Yang**)
-   - [Doc] Stop mentioning packed mode (#6755) (by **Yi Xu**)
-   - [lang] Get the length of dynamic SNode by x.length() (#6750) (by **Lin Jiang**)
-   - [llvm] Support nested struct with matrix return value on real function (#6734) (by **Lin Jiang**)
-   - [Metal] [error] Raise deprecate warning and error when using sparse snodes on metal (#6739) (by **Lin Jiang**)
-   - [build] Integrate backward_cpp to test targets for enabling C++ stack trace (#6697) (by **Zhanlue Yang**)
-   - [aot] Load AOT module from memory (#6692) (#6714) (by **PENGUINLIONG**)
-   - [ci] Add dockerfile.ubuntu-18.04.amdgpu (#6736) (by **Zeyu Li**)
-   - [doc] Update LLVM10 -> LLVM15 in installation guide (#6747) (by **Zhanlue Yang**)
-   - [misc] Fix warnings of taichi examples (#6740) (by **PGZXB**)
-   - [example] Ti-example: instant ngp renderer (#6673) (by **Youtian Lin**)
-   - [build] Use a separate prebuilt llvm15 binary for manylinux environment (#6732) (by **Ailing**)
-   - [Lang] Enable packed mode by default (#6721) (by **Yi Xu**)
-   - [test] Add jupyter notebook to tests (#6717) (by **Zhao Liang**)
-   - [bug] Add lock to protect the CompileConfig map (#6723) (by **Lin Jiang**)
-   - [lang] Support f64 cpu sparse linear solver (#6657) (by **pengyu**)
-   - [lang] Fix implicit mass spring example bugs (#6703) (by **pengyu**)
-   - [ci] Add dockerfile.ubuntu-20.04.amdgpu (#6711) (by **Zeyu Li**)
-   - [bug] MatrixType bug fix: Fix error with nested StructType and MatrixType (#6689) (by **Zhanlue Yang**)
-   - [Lang] Fix warning messages (#6716) (by **Zhao Liang**)
-   - [llvm] Let real function support returning struct (#6614) (by **Lin Jiang**)
-   - [opt] Turn mod/div into bit_and/bit_shr if possible in packed mode (#6718) (by **Yi Xu**)
-   - [opt] Eliminate redundant mod in demote_dense_struct_fors under packed mode (#6709) (by **Yi Xu**)
-   - [Lang] Limit non-first division of an axis on a SNodeTree path to a power of two (#6690) (by **Yi Xu**)
-   - [ci] Run release tests in PRs (#6706) (by **Proton**)
-   - [bug] MatrixType bug fix: Fix error with use of python scope ti.Matrix (#6635) (by **Zhanlue Yang**)
-   - [bug] MatrixType bug fix: Support broadcasting for ternary operations (#6640) (by **Zhanlue Yang**)
-   - [Doc] Update global_settings.md (#6668) (by **Zhao Liang**)
-   - [aot] Revert "Load AOT module from memory (#6692)" (#6704) (by **Proton**)
-   - [lang] Revert "Add spmv and sparse linear solver using ndarray as vector (#6651)" (#6701) (by **Proton**)
-   - [aot] Load AOT module from memory (#6692) (by **PENGUINLIONG**)
-   - [lang] Add spmv and sparse linear solver using ndarray as vector (#6651) (by **pengyu**)
-   - [Lang] Deprecate field_dim in ndarray annotation (#6687) (by **Haidong Lan**)
-   - [doc] Fix a format issue (#6686) (by **Olinaaaloompa**)
-   - [doc] Update deprecated function call "ti.ui.make_camera()" to "ti.ui.Camera()" (#6694) (by **Zhi Qi**)
-   - [build] Delete all llvm10 code and TI_LLVM_15 macro (#6685) (by **Ailing**)
-   - [test] Skip f16 tests if not supported (for Pascal) (#6661) (by **Proton**)
-   - [build] Support any 13<clang<=15 as CLANG_EXECUTABLE on macOS (#6682) (by **Ailing**)
-   - [misc] Remove legacy ti benchmark (#6670) (by **Lin Jiang**)
-   - [autodiff] Enforce an empty block a smallest independent block (#6681) (by **Mingrui Zhang**)
-   - [autodiff] Fix missing global load stmt in independent blocks (#6662) (by **Mingrui Zhang**)
-   - [lang] Merge triplets  in the same position when building GPU sparse matrix (#6605) (by **pengyu**)
-   - [aot] Load AOT module from Zip archive (#6677) (by **PENGUINLIONG**)
-   - [Lang] Deprecate element_dim and element_shape in ndarray annotations. (#6665) (by **Haidong Lan**)
-   - [lang] MatrixType refactor: Support validating swizzle patterns (#6631) (by **Yi Xu**)
-   - [ci] Add unzip & sudo to manylinux2014 build image (#6683) (by **Proton**)
-   - [build] Support any clang<=15 as CLANG_EXECUTABLE (#6678) (by **Ailing**)
-   - [autodiff] Refactor independent block identification (#6659) (by **Mingrui Zhang**)
-   - Revert "[ci] Install unzip in manylinux build" (#6680) (by **Ailing**)
-   - [bug] MatrixType bug fix: Fix numerical error with grouped ndrange (#6632) (by **Zhanlue Yang**)
-   - [misc] Add VS Chromium Code Search config (#6675) (by **Bob Cao**)
-   - [misc] Fix warning with TI_DLL_EXPORT (#6676) (by **Bob Cao**)
-   - [ci] Install unzip in manylinux build (#6658) (by **Ailing**)
-   - [misc] Remove class Statistics (#6671) (by **Lin Jiang**)
-   - [ci] Adaptively laying off test workers on CUDA OOM, 2nd try (#6641) (by **Proton**)
-   - [ci] Resolve GitHub Actions warnings (#6642) (by **Proton**)
-   - [aot] Modify Unity CI to track master branch of Taichi-Unity (#6655) (by **Zhanlue Yang**)
-   - [Lang] [ci] Clean element shape in tests (#6643) (by **Haidong Lan**)
-   - [build] Turn TI_LLVM_15 on by default (#6649) (by **Ailing**)
-   - [doc] Update prebuilt llvm15 links in dev install doc (#6648) (by **Ailing**)
-   - [lang] MatrixType refactor: Support svd(), polar_decompose() (#6636) (by **Yi Xu**)
-   - [bug] Add a mutex to protect counters_map in Statistics (#6645) (by **Lin Jiang**)
-   - [build] Restore installed c_api headers in the python wheel (#6647) (by **Ailing**)
-   - [ci] Move macos10.15 job to build with LLVM 15 (#6634) (by **Ailing**)
-   - [ci] Delete windows llvm10 ci job (#6639) (by **Ailing**)
-   - [gui] Tolerate swapchain image count (#6629) (by **PENGUINLIONG**)
-   - [ci] Revert "[Lang] [ci] Remove element shape and element dim in unit tests" (#6638) (by **Haidong Lan**)
-   - [ci] Revert "Adaptively laying off test workers on CUDA OOM (#6628)" (#6637) (by **Proton**)
-   - [refactor] Simplify callback logic for torch & paddle args (#6626) (by **Ailing**)
-   - [Lang] [ci] Remove element shape and element dim in unit tests (#6620) (by **Haidong Lan**)
-   - [build] Add AOT headless tests to Linux CI (#6509) (by **Zhanlue Yang**)
-   - [autodiff] Clear adjoint after global store (#6579) (by **Mingrui Zhang**)
-   - [lang] MatrixType refactor: Support eig(), sym_eig(), solve() (#6627) (by **Yi Xu**)
-   - [lang] MatrixType bug fix: Fix result type of matmul (#6613) (by **Yi Xu**)
-   - [ci] Adaptively laying off test workers on CUDA OOM (#6628) (by **Proton**)
-   - [ci] Move m1 job to use build with LLVM 15 (#6483) (by **Ailing**)
-   - [Doc] Update external.md (#6424) (by **Zhao Liang**)
-   - [Doc] Update differences_between_taichi_and_python_programs.md (#6454) (by **Zhao Liang**)
-   - [test] Tuning CUDA tests resource consumption (#6625) (by **Proton**)
-   - [aot] Yet another device capability test (#6623) (by **PENGUINLIONG**)
-   - [doc] Add faq about how to use code completion in vscode (#6621) (by **Ailing**)
-   - [aot] Test for AOT device capability (#6618) (by **PENGUINLIONG**)
-   - [misc] Cleanup unused files in root dir (#6619) (by **Ailing**)
-   - [ci] Remove automatic version bump PR (#6617) (by **Ailing**)
-   - [bug] MatrixType bug fix: Fix indexing support for custom vector types (#6609) (by **Zhanlue Yang**)
-   - [aot] C-API device capability query (#6549) (by **PENGUINLIONG**)
-   - [bug] Loose memory leak constraint for test_memory (#6602) (by **Zhanlue Yang**)
-   - Revert "[bug] MatrixType bug fix: Fix indexing support for custom vector types" (#6608) (by **Zhanlue Yang**)
-   - [lang] MatrixType bug fix: Add support for rows and cols when real_matrix is ON (#6428) (by **Zhanlue Yang**)
-   - [ci] CUDA tests speed up (#6516) (by **Proton**)
-   - [bug] MatrixType bug fix: Fix indexing support for custom vector types (#6568) (by **Zhanlue Yang**)
-   - [ci] Relax timeout for gpu jobs and memory limit (#6601) (by **Ailing**)
-   - [lang] Fix sparse matrix builder segment fault (#6592) (by **pengyu**)
-   - [lang] MatrixType bug fix: Add attributes n & m (#6585) (by **Yi Xu**)
-   - [lang] Unify Cuda and CPU linear solve API (#6578) (by **pengyu**)
-   - [build] Remove macOS 10.14 (#6596) (by **Zhanlue Yang**)
-   - [aot] Make MoltenVK builds more flexible (#6582) (by **PENGUINLIONG**)
-   - [ci] Terminate workers even earlier when test fails (#6587) (by **Proton**)
-   - [cuda] Parallelize random state initialization in ti.init() (#6586) (by **Proton**)
-   - [build] Remove macOS 10.14 from release tests (#6584) (by **Zhanlue Yang**)
-   - Revert "[lang] Build sparse matrix using ndarray" (#6583) (by **Yi Xu**)
-   - [lang] Build sparse matrix using ndarray (#6563) (by **pengyu**)
-   - [lang] Unify Cuda and CPU Spmv API (#6575) (by **pengyu**)
-   - [lang] Add fused foreach check (#6525) (by **Mike He**)
-   - [doc] Update sparse.md (#6580) (by **Zhao Liang**)
-   - [Lang] MatrixType refactor: Support inverse() (#6542) (by **Yi Xu**)
-   - [bug] Sync after executing kernels which have print inside (#6577) (by **Lin Jiang**)
-   - [aot] Make arch an optional argument in ti.aot.Module (#6574) (by **Ailing**)
-   - [aot] [ci] Change taichi-aot-demo branch back to master (#6567) (by **Ailing**)
-   - [bug] Fix real_matrix support for Dynamic SNode (#6553) (by **Zhanlue Yang**)
-   - [build] Downgrade molten-vk version to v1.1.10 (#6564) (by **Zhanlue Yang**)
-   - [ci] Terminate (and restart) test worker when test fails (#6562) (by **Proton**)
-   - [aot] Switch aot tests to use a script in taichi-aot-demo (#6559) (by **Ailing**)
-   - [ci] Skip match_any and match_all intrinsics for Pascal (#6555) (by **Haidong Lan**)
-   - [lang] MatrixType bug fix: Make test_mpm88 work properly (#6557) (by **Yi Xu**)
-   - [Lang] MatrixType refactor: Support matrix factories (#6560) (by **Yi Xu**)
-   - [lang] MatrixType bug fix: Fix type error with struct-for loop index (#6531) (by **Zhanlue Yang**)
-   - [ci] Marker for running resource intensive tests in serial (#6565) (by **Proton**)
-   - [ci] GitHub Actions: Merge CPU & GPU tasks (#6476) (by **Proton**)
-   - [bug] Rehash compile config map in advance (#6550) (by **Lin Jiang**)
-   - [aot] Deprecate filename arg in ti.aot.Module.save() (#6554) (by **Ailing**)
-   - [lang] Add matrix support for Dynamic SNode (#6535) (by **Lin Jiang**)
-   - [aot] Enable aot tutorial demo in CI (#6543) (by **Ailing**)
-   - [refactor] Remove redundant checks when demoting dense struct fors (#6484) (by **Yi Xu**)
-   - [lang] MatrixType bug fix: Fix matrix indexing error with SharedArray (#6534) (by **Zhanlue Yang**)
-   - [bug] [aot] Fix texture struct for with cgraph (#6536) (by **Ailing**)
-   - [Lang] MatrixType refactor: Support dot/cross/outer_product (#6545) (by **Yi Xu**)
-   - [Lang] Add deactivate attribute to dynamic snodes (#6512) (by **Zhao Liang**)
-   - [lang] MatrixType bug fix: Allow builtin any() and all() (#6533) (by **Yi Xu**)
-   - [lang] MatrixType bug fix: Fix error message mismatch (#6532) (by **Zhanlue Yang**)
-   - [opt] Eliminate redundant BitExtractStmt for SNode access under non-packed mode (#6530) (by **Yi Xu**)
-   - [build] Fix error with AOT file generation & reset Android test folder per run (#6496) (by **Zhanlue Yang**)
-   - [Lang] Matrix lib: Stop changing dimension in transpose() (#6528) (by **Yi Xu**)
-   - [misc] Add dynamic_index to offline cache key (#6524) (by **PGZXB**)
-   - [ci] GitHub Action slack bot message, again :( (#6529) (by **Proton**)
-   - [Lang] Add element access for sparse matrix on CUDA (#6250) (by **Jiafeng Liu**)
-   - [ci] Raise CPU tasks timeout (temporary workaround) (#6527) (by **Proton**)
-   - [ci] Update slack message to include workflow_url instead of head commit (#6526) (by **Ailing**)
-   - [test] Test real function on enabling the offline cache (#6523) (by **PGZXB**)
-   - [lang] MatrixType refactor: Simplify reduction ops (#6521) (by **Yi Xu**)
-   - [lang] Add support for real matrix args on real function (#6522) (by **Lin Jiang**)
-   - [bug] Fix that the cgraph doesn't respect the caps set by ti.aot.Module() (#6520) (by **PGZXB**)
-   - [aot] Fixed vulkan texture usage export (#6519) (by **PENGUINLIONG**)
-   - Revert "[Misc] Improve gif output compression rate" (#6517) (by **Ailing**)
-   - [lang] MatrixType refactor: Properly raise error when dynamic_index=False (#6499) (by **Yi Xu**)
-   - [Lang] MatrixType refactor part 2: add more ops (#6425) (by **Mike He**)
-   - [Misc] Improve gif output compression rate (#6289) (by **Zhao Liang**)
-   - [lang] Add support for struct on Dynamic SNode (#6502) (by **Lin Jiang**)
-   - [ci] Fix master notification bot json format (#6511) (by **Proton**)
-   - [cuda] Enable test on dense dynamic field (#6510) (by **Lin Jiang**)
-   - [lang] MatrixType refactor: Support vector swizzle (#6506) (by **Yi Xu**)
-   - [aot] Select any of vkGetPhysicalDeviceMemoryProperties2 (#6498) (by **PENGUINLIONG**)
-   - [ci] Fix master bot text format (#6507) (by **Proton**)
-   - [refactor] Simplify logic in get_function_body. (#6503) (by **Ailing**)
-   - [lang] Support matrix args on real function (#6493) (by **Lin Jiang**)
-   - [ci] Add taichi bot to send message to slack on master failure (#6501) (by **Ailing**)
-   - [aot] Move arch in Runtime wrapper object (#6504) (by **PENGUINLIONG**)
-   - [example] Fix out-of-bound access in examples (#6500) (by **Yi Xu**)
-   - [refactor] Remove unused has_external_arrays flag (#6497) (by **Ailing**)
-   - [aot] Disable physical storage buffer in runtime (#6494) (by **PENGUINLIONG**)
-   - [ci] Package C-API in nightly builds (#6479) (by **Proton**)
-   - [llvm] Add support for data types of different sizes on dynamic SNode (#6490) (by **Lin Jiang**)
-   - [llvm] Set the name of the LLVM function of the real function to its name (#6495) (by **Lin Jiang**)
-   - [opt] Revert "Eliminate redundant BitExtractStmt for SNode access under non-packed mode (#6491) (by **Proton**)
-   - [aot] Add JSON serde (#6470) (by **PENGUINLIONG**)
-   - [opt] Eliminate redundant BitExtractStmt for SNode access under non-packed mode (#6485) (by **Yi Xu**)
-   - [lang] MatrixType bug fix: Demote ret_type for ArgLoadStmt after scalarization (#6433) (by **Zhanlue Yang**)
-   - [test] MatrixType refactor: Add tests for writing to matrix slice (#6480) (by **Yi Xu**)
-   - [Error] Shorten traceback for _BoundedDifferentiableMethod (#6475) (by **Lin Jiang**)
-   - [Error] Return NotImplemented for operations between field and Expr/Matrix/Struct (#6474) (by **Lin Jiang**)
-   - [Error] Return TaichiTypeError in ASTTransformer when a binary op is not supported (#6477) (by **Lin Jiang**)
-   - [ci] Switch linux gpu job to use LLVM 15 (#6465) (by **Ailing**)
-   - [aot] Modify aot-demo.sh to track master branch of taichi-aot-demo (#6473) (by **Zhanlue Yang**)
-   - [ci] Move implicit fem android demo to master branch (#6463) (by **Ailing**)
-   - [aot] Add VkDeviceMemory to TiVulkanMemoryInteropInfo (#6442) (by **Zhanlue Yang**)
-   - [aot] Disabled physical storage buffer temporarily (#6468) (by **PENGUINLIONG**)
-   - [ci] Add PR tag for amdgpu. (#6466) (by **Zeyu Li**)
-   - [Bug] [error] Add argument 'module' to 'warn_explicit' to show the deprecated warning (#6467) (by **Lin Jiang**)
-   - [bug] Fix false overflow alarm in struct fors under packed mode (#6457) (by **Yi Xu**)
-   - [Bug] Fix cache_loop_invariant_global_vars pass (#6462) (by **Lin Jiang**)
-   - [llvm] Fix broken llvm module on CUDA backend when TI_LLVM_15 is on (#6458) (by **Ailing**)
-   - Revert "[ci] Add unix end-to-end CI tests for meshtaichi" (#6460) (by **Ailing**)
-   - [lang] MatrixType bug fix: support star & loop operations over matrix-typed Expr (#6427) (by **Zhanlue Yang**)
-   - [ci] Add unix end-to-end CI tests for meshtaichi (#6453) (by **yixu**)
-   - [CUDA] Add maximum stack limit to CompileConfig (#6455) (by **Lin Jiang**)
-   - [ci] Swtich CPU jobs to use prebuilt LLVM15 (#6435) (by **Ailing**)
-   - [Bug] Fix memory leak in SPIRV module (#6449) (by **yekuang**)
-   - [llvm] Allocate RuntimeContext on heap (#6420) (by **Lin Jiang**)
-   - [opt] Eliminate redundant mod for SNode access under packed mode (#6444) (by **Yi Xu**)
-   - [vulkan] Disable NoSignedWrap decoration on vulkan backend (#6440) (by **Ailing**)
-   - [example] Ti-example: phase field simulation of snow growth (#6439) (by **莫翰轩**)
-   - [misc] Bump version to v1.2.1 (#6436) (by **Zhanlue Yang**)
-   - [Lang] MatrixType refactor: Support matrix slice (#6430) (by **Yi Xu**)
-   - [aot] Warns the user when mapping device-only memory (#6417) (by **PENGUINLIONG**)
-   - [Error] Deprecate ndrange with number of the loop variables != the dimension of the ndrange (#6422) (by **Lin Jiang**)
-   - [aot] Support push_arg for Kernel in C++ wrapper (#6419) (by **Ailing**)
-   - [aot] Use count instead of size-in-bytes for ND-array access in C++ wrapper (#6409) (by **PENGUINLIONG**)
-   - [aot] Rename incomplete to truncated (#6418) (by **PENGUINLIONG**)
-   - [error] Warn Linux users about manylinux2014 build on startup (#6416) (by **Proton**)
-   - [vulkan] [bug] Stop using the buffer device address feature on macOS (#6415) (by **Yi Xu**)
-   - [aot] Fixed default list (#6403) (by **PENGUINLIONG**)
-   - [aot] Device capability refactorization (#6184) (by **PENGUINLIONG**)
-   - [bug] Fix flaky test with taichi_sparse_test.cpp (#6400) (by **Zhanlue Yang**)
-   - [Lang] Allow augmented assign on matric slice (#6382) (by **Yi Xu**)
-   - [Doc] Update global_settings.md (#6370) (by **Zhao Liang**)
-   - [example] Change unnecessary ti.ndrange call to range in Karmal vortex example (#6389) (by **Zhao Liang**)
-   - [Lang] [bug] Allow filling a field with Expr (#6391) (by **Yi Xu**)
-   - [bug] [lang] Fix copyback for fortran contiguous numpy arrays" (#6390) (by **Ailing Zhang**)
-   - [lang] Fix from_numpy() for vector field with a non-contiguous array (by **Ailing Zhang**)
-   - [Doc] Update global data access rule checker in doc (#6347) (by **Mingrui Zhang**)
-   - [lang] Texture format for better expressiveness (#6338) (by **PENGUINLIONG**)
-   - [spirv] Fix duplicated interface id for global_tmp buffers (#6392) (by **Ailing**)
-   - Revert "[bug] [lang] Fix copyback for fortran contiguous numpy arrays" (#6390) (by **Ailing**)
-   - [bug] [lang] Fix copyback for fortran contiguous numpy arrays (#6376) (by **Ailing**)
-   - [Lang] Replace matrix warning param by current logging level (#6377) (by **Zhao Liang**)
-   - [misc] Tweak CMake clang exec messages (#6379) (by **yekuang**)
-   - [lang] Support structured for loop for RW texture (#6336) (by **PENGUINLIONG**)
-   - [autodiff] Fix mode capture for validation kernels (#6361) (by **Mingrui Zhang**)
-   - [lang] MatrixType bug fix: Support returning an Expr with MatrixType (#6375) (by **Yi Xu**)
-   - [lang] MatrixType bug fix: Fix error with print of MatrixType (#6340) (by **Zhanlue Yang**)
-   - [autodiff] Add complicated test case for gdar checker (#6366) (by **Mingrui Zhang**)
-   - [lang] [refactor] Refine implementation and tests for matrix slice (#6373) (by **Yi Xu**)
-   - [lang] MatrixType bug fix: Add support for unpacking Expr with MatrixType (#6341) (by **Zhanlue Yang**)
-   - [Lang] Matrix/Vector refactor: Matrix operations part 1 (#6319) (by **Mike He**)
-   - [bug] Fix potential bug in #6362 (#6363) (by **Yi Xu**)
-   - [mesh] Fix MeshTaichi warnings in CUDA backend (#6369) (by **Chang Yu**)
-   - [bug] Fix returning uint64 as signed integer (#6364) (by **Lin Jiang**)
-   - [doc] Update doc about how to use vulkan print (#6359) (by **Ailing**)
-   - [Error] Add error message when the number of the loop variables does not match the dimension of the ndrange (#6360) (by **Lin Jiang**)
-   - [example] Add example "laplace equation" (#6302) (by **猫猫子Official**)
-   - [ci] Android Demo: leave Docker containers intact for debugging (#6357) (by **Proton**)
-   - [autodiff] Skip gradient kernel compilation for validation kernel (#6356) (by **Mingrui Zhang**)
-   - [autodiff] Move autodiff gdar checker to release (#6355) (by **Mingrui Zhang**)
-   - [aot] Removed constraint on same-allocation copy (#6354) (by **PENGUINLIONG**)
-   - [ci] Add new performance monitoring (#6349) (by **Proton**)
-   - [dx12] Only use llvm to compile dx12. (#6339) (by **Xiang Li**)
-   - [opengl] Fix with_opengl when TI_WITH_OPENGL is off (#6353) (by **Ailing**)
-   - [Doc] Add instructions about running clang-tidy checks locally (by **Ailing Zhang**)
-   - [build] Enable readability-redundant-member-init in clang-tidy check (by **Ailing Zhang**)
-   - [build] Enable TI_WITH_VULKAN and TI_WITH_OPENGL for clang-tidy checks (by **Ailing Zhang**)
-   - [build] Enable a few modernize checks in clang-tidy (by **Ailing Zhang**)
-   - [autodiff] Recover kernel autodiff mode after validation (#6265) (by **Mingrui Zhang**)
-   - [test] Adjust rtol for sparse_linear_solver tests (#6352) (by **Ailing**)
-   - [lang] MatrixType bug fix: Fix array indexing with MatrixType-index (#6323) (by **Zhanlue Yang**)
-   - [Lang] MatrixNdarray refactor part13: Add scalarization for TernaryOpStmt (#6314) (by **Zhanlue Yang**)
-   - [Lang] MatrixNdarray refactor part12: Add scalarization for AtomicOpStmt (#6312) (by **Zhanlue Yang**)
-   - [build] Enable a few modernize checks in clang-tidy (by **Ailing Zhang**)
-   - [build] Enable google-explicit-constructor check in clang-tidy (by **Ailing Zhang**)
-   - [build] Enable google-build-explicit-make-pair check in clang-tidy (by **Ailing Zhang**)
-   - [build] Enable a few bugprone related rules in clang-tidy (by **Ailing Zhang**)
-   - [build] Enable modernize-use-override in clang-tidy (by **Ailing Zhang**)
-   - [ci] Use .clang-tidy for check_static_analyzer job (by **Ailing Zhang**)
-   - [mesh] Support arm64 backend for MeshTaichi (#6329) (by **Chang Yu**)
-   - [lang] Throw proper error message if calling ti.append with vector/matrix (#6322) (by **Ailing**)
-   - [aot] Fixed buffer device address import (#6326) (by **PENGUINLIONG**)
-   - [aot] Fixed export of get_instance_proc_addr (#6324) (by **PENGUINLIONG**)
-   - [build] Allow building test when LLVM is off (#6327) (by **Ailing**)
-   - [bug] Fix generating LLVM AOT module for the second time failed (#6311) (by **PGZXB**)
-   - [aot] Per-parameter documentation in C-API header (#6317) (by **PENGUINLIONG**)
-   - [ci] Revert "Add end-to-end CI tests for meshtaichi (#6321)" (#6325) (by **Proton**)
-   - [ci] Add end-to-end CI tests for meshtaichi (#6321) (by **yixu**)
-   - [doc] Update the document about offline cache (#6313) (by **PGZXB**)
-   - [aot] Include taichi_cpu.h in taich.h (#6315) (by **Zhanlue Yang**)
-   - [Vulkan] [bug] Change the format string of 64bit unsigned integer type from %llu to %lu (#6308) (by **Lin Jiang**)
-   - [mesh] Refactor MeshTaichi API (#6306) (by **Chang Yu**)
-   - [lang] MatrixType bug fix: Allow dynamic_index=True when real_matrix_scalarize=True (#6304) (by **Yi Xu**)
-   - [lang] MatrixType bug fix: Enable irpass::cfg_optimization if real_matrix_scalarize is on (#6300) (by **Zhanlue Yang**)
-   - [metal] Enable offline cache by default on Metal (#6307) (by **PGZXB**)
-   - [Vulkan] Add overflow detection on vulkan when debug=True (#6279) (by **Lin Jiang**)
-   - [aot] Inline documentations (#6301) (by **PENGUINLIONG**)
-   - [aot] Support exporting interop info for TiMemory on Cpu/Cuda backends (#6242) (by **Zhanlue Yang**)
-   - [lang] MatrixType bug fix: Avoid checks for legacy Matrix-class when real_matrix is on (#6292) (by **Zhanlue Yang**)
-   - [aot] Support setting vector/matrix argument in C++ wrapper of C-API (#6298) (by **Ailing**)
-   - [lang] MatrixType bug fix: Fix MatrixType validations in build_call_if_is_type() (#6294) (by **Zhanlue Yang**)
-   - [bug] Fix asserting failed when registering kernels with same name on Metal (#6271) (by **PGZXB**)
-   - [ci] Add more release tests (#5839) (by **Proton**)
-   - [lang] MatrixType bug fix: Allow indexing a matrix r-value (#6291) (by **Yi Xu**)
-   - [bug] Fix duplicate runs with 'run_tests.py --cpp -k' when selecting AOT tests (#6296) (by **Zhanlue Yang**)
-   - [bug] Fix segmentation fault with TextureOpStmt ir_printer (#6297) (by **Zhanlue Yang**)
-   - [ci] Add taichi-aot-demo headless demos (#6280) (by **Proton**)
-   - [bug] Serialize missing fields of metal::TaichiKernelAttributes and metal::KernelAttributes (#6270) (by **PGZXB**)
-   - [metal] Implement offline cache cleaning on metal (#6272) (by **PGZXB**)
-   - [aot] Reorganized C-API headers (#6199) (by **PENGUINLIONG**)
-   - [lang] [bug] Fix setting integer arguments within u64 range but greater than i64 range (#6267) (by **Lin Jiang**)
-   - [autodiff] Skip gdar checking for user defined grad kernel (#6273) (by **Mingrui Zhang**)
-   - [bug] Fix AotModuleBuilder::add_compiled_kernel (#6287) (by **PGZXB**)
-   - [Bug] [lang] Make dimension check for GlobalPtrStmt aware of whether it is a cell access (#6275) (by **Yi Xu**)
-   - [refactor] Move setting visible device to vulkan instance initialization (by **Ailing Zhang**)
-   - [bug] Add unit test to detect memory leak from data_oriented classes (#6278) (by **Zhanlue Yang**)
-   - [aot] Ship runtime *.bc files with C-API for LLVM AOT (#6285) (by **Zhanlue Yang**)
-   - [bug] Convert non-i32 type indices to i32 for GlobalPtrStmt (#6276) (by **Zhanlue Yang**)
-   - [Doc] Renamed syntax.md to kernel_function.md, plus miscellaneous edits (#6277) (by **Vissidarte-Herman**)
-   - [lang] Fixed validation scope (#6262) (by **PENGUINLIONG**)
-   - [bug] Prevent ti.kernel from directly caching the passed-in arguments to avoid memory leak (#6256) (by **Zhanlue Yang**)
-   - [autodiff] Add demote atomics before gdar checker (#6266) (by **Mingrui Zhang**)
-   - [autodiff] Add grad check feature and related test (#6245) (by **PhrygianGates**)
-   - [lang] Fixed contraction cast (#6255) (by **PENGUINLIONG**)
-   - [Example] Add karman vortex street example (#6249) (by **Zhao Liang**)
-   - [ci] Lift GitHub CI timeout (#6260) (by **Proton**)
-   - [metal] Support offline cache on metal (#6227) (by **PGZXB**)
-   - [dx12] Add DirectX-Headers as a submodule (#6259) (by **Xiang Li**)
-   - [bug] Fix link error with TI_WITH_OPENGL:BOOL=ON but TI_WITH_VULKAN:BOOL=OFF (#6257) (by **PGZXB**)
-   - [dx12] Disable DX12 for cpu only test. (#6253) (by **Xiang Li**)
-   - [Lang] MatrixNdarray refactor part11: Fuse ExternalPtrStmt and PtrOffsetStmt (#6189) (by **Zhanlue Yang**)
-   - [Doc] Rename index.md to hello_world.md (#6244) (by **Vissidarte-Herman**)
-   - [Doc] Update syntax.md (#6236) (by **Zhao Liang**)
-   - [spirv] Generate OpBitFieldUExtract for BitExtractStmt (#6208) (by **Yi Xu**)
-   - [Bug] [lang] Allow numpy int as snode dimension (#6211) (by **Yi Xu**)
-   - [doc] Update document about building and running Taichi C++ tests (#6228) (by **PGZXB**)
-   - [misc] Disable the offline cache if printing ir is enabled (#6234) (by **PGZXB**)
-   - [vulkan] [opengl] Enable offline cache by default on Vulkan and OpenGL (#6233) (by **PGZXB**)
-   - [Doc] Update math_module.md (#6235) (by **Zhao Liang**)
-   - [Doc] Update debugging.md (#6238) (by **Zhao Liang**)
-   - [dx12] Add ti.dx12. (#6174) (by **Xiang Li**)
-   - [lang] Set ret_type for AtomicOpStmt (#6213) (by **Ailing**)
-   - [Doc] Update global settings (#6201) (by **Olinaaaloompa**)
-   - [doc] Editorial updates (#6216) (by **Vissidarte-Herman**)
-   - [Doc] Update hello world (#6191) (by **Olinaaaloompa**)
-   - [Doc] Update math module (#6203) (by **Olinaaaloompa**)
-   - [Doc] Update profiler (#6214) (by **Olinaaaloompa**)
-   - [autodiff] Store if condition in adstack (#6207) (by **Mingrui Zhang**)
-   - [Doc] Update debugging.md (#6212) (by **Zhao Liang**)
-   - [Doc] Update debugging.md (#6200) (by **Zhao Liang**)
-   - [bug] Fixed type inference error with ExternalPtrStmt (#6210) (by **Zhanlue Yang**)
-   - [example] Request to add my code into examples (#6185) (by **JiaoLuhuai**)
-   - [Lang] MatrixNdarray refactor part10: Remove redundant MatrixInitStmt generated from scalarization (#6171) (by **Zhanlue Yang**)
-   - [aot] Apply ti_get_last_error_message() for all C-API test cases (#6195) (by **Zhanlue Yang**)
-   - [llvm] [refactor] Merge create_call and call (#6192) (by **Lin Jiang**)
-   - [build] Support executing manually-specified cpp tests for run_tests.py (#6206) (by **Zhanlue Yang**)
-   - [doc] Editorial updates to field.md (#6202) (by **Vissidarte-Herman**)
-   - [Lang] MatrixNdarray refactor part9: Add scalarization for AllocaStmt (#6168) (by **Zhanlue Yang**)
-   - [Lang] Support GPU solve with analyzePattern and factorize (#6158) (by **pengyu**)
-   - [Lang] MatrixField refactor 9/n: Allow dynamic index of matrix field when real_matrix=True (#6194) (by **Yi Xu**)
-   - [Doc] Fixed broken links (#6193) (by **Olinaaaloompa**)
-   - [ir] MatrixField refactor 8/n: Rename PtrOffsetStmt to MatrixPtrStmt (#6187) (by **Yi Xu**)
-   - [Doc] Update field.md (#6182) (by **Zhao Liang**)
-   - [bug] Relax dependent Pillow version (#6170) (by **Ailing**)
-   - [Doc] Update data_oriented_class.md (#6181) (by **Zhao Liang**)
-   - [Doc] Update kernels and functions (#6176) (by **Zhao Liang**)
-   - [Doc] Update type.md (#6180) (by **Zhao Liang**)
-   - [Doc] Update getting started (#6175) (by **Zhao Liang**)
-   - [llvm] MatrixField refactor 7/n: Simplify codegen for TensorType allocation and access (#6169) (by **Yi Xu**)
-   - [LLVM] Add runtime overflow detection on LLVM-based backends (#6178) (by **Lin Jiang**)
-   - Revert "[LLVM] Add runtime overflow detection on LLVM-based backends" (#6177) (by **Ailing**)
-   - [dx12] Add aot for dx12. (#6099) (by **Xiang Li**)
-   - [LLVM] Add runtime overflow detection on LLVM-based backends (#6166) (by **Lin Jiang**)
-   - [doc] C-API documentation & generator (#5736) (by **PENGUINLIONG**)
-   - [gui] Support for setting the initial position of GGUI window (#6156) (by **Mocki**)
-   - [metal] Maintain a print string table per kernel (#6160) (by **PGZXB**)
-   - [Lang] MatrixNdarray refactor part8: Add scalarization for BinaryOpStmt with TensorType-operands (#6086) (by **Zhanlue Yang**)
-   - [Doc] Refactor debugging (#6102) (by **Olinaaaloompa**)
-   - [doc] Updated the position of Sparse Matrix (#6167) (by **Vissidarte-Herman**)
-   - [Doc] Refactor global settings (#6071) (by **Zhao Liang**)
-   - [Doc] Refactor external arrays (#6065) (by **Zhao Liang**)
-   - [Doc] Refactor simt (#6151) (by **Zhao Liang**)
-   - [Doc] Refactor Profiler (#6142) (by **Olinaaaloompa**)
-   - [Doc] Add doc for math module (#6145) (by **Zhao Liang**)
-   - [aot] Fixed texture interop (#6164) (by **PENGUINLIONG**)
-   - [misc] Remove TI_UI namespace macros (#6163) (by **Lin Jiang**)
-   - [llvm] Add comment about the structure of the CodeGen (#6150) (by **Lin Jiang**)
-   - [Bug] [lang] Fix augmented assign for sar (#6153) (by **Yi Xu**)
-   - [Test] Add scipy to test GPU sparse solver (#6162) (by **pengyu**)
-   - [bug] Fix crashing when loading old offline cache files (for gfx backends) (#6157) (by **PGZXB**)
-   - [lang] Remove print at the end of parallel sort (#6161) (by **Haidong Lan**)
-   - [misc] Move some offline cache utils from analysis/ to util/ (#6155) (by **PGZXB**)
-   - [Lang] Matrix/Vector refactor: support basic matrix ops (#6077) (by **Mike He**)
-   - [misc] Remove namespace macros (#6154) (by **Lin Jiang**)
-   - [Doc] Update gui_system (#6152) (by **Zhao Liang**)
-   - [aot] Track layouts for imported image & tests (#6138) (by **PENGUINLIONG**)
-   - [ci] Fix build cache problems (#6149) (by **Proton**)
-   - [Misc] Add prefix sum executor to avoid multiple field allocations (#6132) (by **YuZhang**)
-   - [opt] Cache loop-invariant global vars to local vars (#6072) (by **Lin Jiang**)
-   - [aot] Improve C++ wrapper implementation (#6146) (by **PENGUINLIONG**)
-   - [doc] Refactored ODOP (#6143) (by **Vissidarte-Herman**)
-   - [Lang] Support basic sparse matrix operations on GPU. (#6082) (by **Jiafeng Liu**)
-   - [Lang] MatrixField refactor 6/n: Add tests for MatrixField scalarization (#6137) (by **Yi Xu**)
-   - [vulkan] Fix SPV physical ptr load alignment (#6139) (by **Bob Cao**)
-   - [bug] Let every thread has its own CompileConfig (#6124) (by **Lin Jiang**)
-   - [refactor] Remove redundant codegen of floordiv (#6135) (by **Yi Xu**)
-   - [doc] Miscellaneous editorial updates (#6131) (by **Vissidarte-Herman**)
-   - Revert "[spirv] Fixed OpLoad with physical address" (#6136) (by **Lin Jiang**)
-   - [bug] [llvm] Fix is_same_type when the suffix of a type is the prefix of the suffix of the other type (#6126) (by **Lin Jiang**)
-   - [bug] [vulkan] Only enable non_semantic_info cap when validation layer is on (#6129) (by **Ailing**)
-   - [Llvm] Fix codegen for div (unsigned) (#6128) (by **Yi Xu**)
-   - [Lang] MatrixField refactor 5/n: Lower access of matrix field element into CHI IR (#6119) (by **Yi Xu**)
-   - [Lang] Fix invalid assertion for matrix values (#6125) (by **Zhanlue Yang**)
-   - [opengl] Fix GLES support (#6121) (by **Ailing**)
-   - [Lang] MatrixNdarray refactor part7: Add scalarization for UnaryOpStmt with TensorType-operand (#6080) (by **Zhanlue Yang**)
-   - [doc] Editorial updates (#6116) (by **Vissidarte-Herman**)
-   - [misc] Allow more commits in changelog generation (#6115) (by **Yi Xu**)
-   - [aot] Import MoltenVK (#6090) (by **PENGUINLIONG**)
-   - [vulkan] Instruct users to install vulkan sdk if they want to use validation layer (#6098) (by **Ailing**)
-   - [ci] Use local caches on self-hosted runners, and code refactoring. (#5846) (by **Proton**)
-   - [misc] Bump version to v1.1.4 (#6112) (by **Taichi Gardener**)
-   - [doc] Fixed a broken link (#6111) (by **Vissidarte-Herman**)
-   - [doc] Update explanation on data-layout (#6110) (by **Qian Bao**)
-   - [Doc] Move developer utilities to contribution (#6109) (by **Olinaaaloompa**)
-   - [Doc] Added Accelerate PyTorch (#6106) (by **Vissidarte-Herman**)
-   - [Doc] Refactor ODOP (#6013) (by **Zhao Liang**)
-   - [opengl] Support offline cache on opengl (#6104) (by **PGZXB**)
-   - [build] Fix building with TI_WITH_OPENGL:BOOL=OFF and TI_WITH_DX11:BOOL=ON failed (#6108) (by **PGZXB**)