You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
% \textbf{Purpose} & \textcolor{blue}{Increases or decreases} the number of partitions. & \textcolor{blue}{Decreases} the number of partitions. \\
202
-
% \hline
203
-
% \textbf{Mechanism} & \textcolor{blue}{Shuffles all} the data across the network to create a new set of partitions. & \textcolor{blue}{Merges existing} partitions \textcolor{blue}{without} a full data shuffle. \\
204
-
% \hline
205
-
% \textbf{Use Case} & Ideal for increasing the number of partitions or significantly \textcolor{blue}{changing the distribution} of data. & Efficient for \textcolor{blue}{reducing the number} of partitions when the target number is less than the current number. \\
206
-
% \hline
207
-
% \textbf{Cost} & Expensive due to the \textcolor{blue}{full data shuffle}. & Less expensive than \texttt{repartition} as it \textcolor{blue}{minimizes data movement}. \\
208
-
% \hline
209
-
% \end{tabular}
210
-
% }
211
-
% \caption{Comparison of Repartition and Coalesce in Apache Spark}\label{tab:rerepartition-coalesce}
212
-
% \end{table}
213
-
%\end{frame}
214
-
%
180
+
\subsection{Repartition vs. Coalesce}\label{subsec:repartition-vs-coalesce}
181
+
\begin{frame}
182
+
\frametitle{Repartition vs. Coalesce}
183
+
\begin{itemize}
184
+
\item In Apache Spark, repartition and coalesce are two methods used to change the number of partitions in an RDD (Resilient
\textbf{Purpose} & \textcolor{blue}{Increases or decreases} the number of partitions. & \textcolor{blue}{Decreases} the number of partitions. \\
202
+
\hline
203
+
\textbf{Mechanism} & \textcolor{blue}{Shuffles all} the data across the network to create a new set of partitions. & \textcolor{blue}{Merges existing} partitions \textcolor{blue}{without} a full data shuffle. \\
204
+
\hline
205
+
\textbf{Use Case} & Ideal for increasing the number of partitions or significantly \textcolor{blue}{changing the distribution} of data. & Efficient for \textcolor{blue}{reducing the number} of partitions when the target number is less than the current number. \\
206
+
\hline
207
+
\textbf{Cost} & Expensive due to the \textcolor{blue}{full data shuffle}. & Less expensive than \texttt{repartition} as it \textcolor{blue}{minimizes data movement}. \\
208
+
\hline
209
+
\end{tabular}
210
+
}
211
+
\caption{Comparison of Repartition and Coalesce in Apache Spark}\label{tab:rerepartition-coalesce}
0 commit comments