- New data is generated every second, data is
collected
andstored
,charts
,graphs
andtables
add meaning to data. - Data begins to tell some story, data is illustrated using visualization.
Matplotlib
: Most widely used Python data visualization library and 3rd party library.
pip install matplotlib
Seaborn
: Generateinformative statistical graphics
, it is dependent onmatplotlib
pip install seaborn
Bokeh
: Generate interactive plots, accessed asJSON
,HTML
objects or interactive web applications.
pip install bokeh
Figure
: Whole area choosen for plotting.Axes
: Area were the data is plotted.
# Axes is added by using add_subplot method.
add_subplot(nrows, ncols, index)
# add_subplot(1, 1, 1) and add_subplot(111) are same.
Axis
: Horizontal and vertical number lines, which defines the graph limits.
import matplotlib.pyplot as plt
# Figure
-------------------------------
fig = plt.figure(figsize=(8,6)) # Adjusting figure size # Default (width=6, height=4)
-------------------------------
# Axes
-------------------------
ax = fig.add_subplot(111)
-------------------------
# Set Parameters
-----------------------------------------------------------------------------------------
ax.set(title='My Plot Title', xlabel='X Axis', ylabel='Y Axis', xlim=(0, 5), ylim=(0,10))
or
ax.set_title("My Plot Title")
ax.set_xlabel("X Axis"); ax.set_ylabel('Y Axis')
ax.set_xlim([0,5]); ax.set_ylim([0,10])
-----------------------------------------------------------------------------------------
# Plot
----------------------------------
x = [1, 2, 3, 4]; y = [2, 4, 6, 8]
plt.plot(x, y)
----------------------------------
# Plot Attributes
------------------------------------------
plt.title('My First Plot')
plt.xlabel('X-Axis'); plt.ylabel('Y-Axis')
plt.xlim(0,5); plt.ylim(0,10)
plt.plot(x, y, label='linear-growth')
------------------------------------------
# Legends
------------
plt.legend()
------------
# Show Plot
----------
plt.show()
----------
- Make
Comparison
( Magnitudes ) - Ask lots of questions to data, chart selection, design and labels.
- We should keep
cause
onX Axis
and it'seffect
onY Axis
- Always keep
magnitude
related toheight
orfrequency
in vertical position. - Always put
time
onX Axis
- Data comparison | Comparing categories ( Categorical Features : Nominal | Ordinal )
- Can be used for
categorical
as well asnumerical
data. - Change over a period of
time
| Compare magnitude, ranking, length ( Height, Width, Distance ) - Represented vertically or horizontally or grouped ( If we want to measure more than one variable )
Similar
Charts : Lolipop Chart ( Bubble at Top | Can be used if Number ofBars
are more in Bar Chart )
# Vertical Bar
bar(x,height)
# Horizontal Bar
barh(y, width)
Common parameters
of bar
color
: Sets the color of bars.edgecolor
: Sets the color of the border line of bars.width
: Sets the width of barsalign
: Aligns the bars with respect tox
coordinateslabel
: Sets label to a bar, appearing in legend.
Distribution
orspread
of data | Frequency | Occurence of continuous data- Also used for comparing two entities
Common parameters
of hist
color
: Sets the color of bars.bins
: Sets the number ofbins
to be used.normed
: Sets toTrue
where bins displayfraction
and not the count.
- Determine the
relationship
betweendependent
andindependent
variables. - Values of one variable determines the position on the horizontal axis
X Axis
- Values of second variable determines the position on the vertical axis
Y Axis
- If the spread of data points is
linear
, then two variables arehighly
correlated.
scatter(x, y)
Common parameters
of scatter
plot
c
: Sets color of markers.s
: Sets size of markers.marker
: Selects a marker. e.g: circle, triangle, etcedgecolor
: Sets the color of lines on edges of markers.
- Tracking the Changes over time
- Useful to Represent Time Series Relation
- A circular graph divided into
segments
orslices
- Represent
percentage
orproportion
ofcategorical
data where eachslice
of pie representscategory
Common parameters
of pie
colors
: Sets the colors of portions.labels
: Sets the labels of portions.startangle
: Sets the start angle at which portion drawing starts.autopct
: Sets the percentage display format of an area, covering portions.
- Represent
spread
ordistribution
of data. - Also used to compare distributions.
- Helps to detect an
outlier
in the data
Common parameters
of boxplot
labels
: Sets the labels for box plots.notch
: Sets toTrue
if notches need to be created around the median.bootstrap
: Number set to indicate that notches around the median are bootstrapped.vert
: Sets toFalse
for plotting Box plots horizontally.
- Change over
Time
| Trends | Profits | Loss | Increase | Decrease | Flow - Univariate and Multivariate (Compare two variables)
plot(x, y) # x, y data values representing two variables.
Common Parameters
of plot
function
color
: Sets the color of the line.linestyle
: Sets the line style, e.g., solid, dashed, etc.linewidth
: Sets the thickness of a line.marker
: Chooses a marker for data points, e.g., circle, triangle, etc.markersize
: Sets the size of the chosen marker.label
: Names the line, which will come in legend.
matplotlib.pyplot
comes with a lot of styles.- Based on the chosen style, the display of figure changes.
- You can view various styles available in pyplot by running the following commands.
import matplotlib.pyplot as plt
print(plt.style.available)
# Use a style
plt.style.use('ggplot')
- Create multiple
plots
in singlefigure
Subplot
creates theAxes
object at index position and returns it.
fig = plt.figure(figsize=(10,8))
axes1 = plt.subplot(2, 2, 1, title='Plot1')
axes2 = plt.subplot(2, 2, 2, title='Plot2')
axes3 = plt.subplot(2, 2, 3, title='Plot3')
axes4 = plt.subplot(2, 2, 4, title='Plot4')
plt.show()
fig = plt.figure(figsize=(10,8))
axes1 = plt.subplot(2, 2, (1,2), title='Plot1')
axes1.set_xticks([]); axes1.set_yticks([])
axes2 = plt.subplot(2, 2, 3, title='Plot2')
axes2.set_xticks([]); axes2.set_yticks([])
axes3 = plt.subplot(2, 2, 4, title='Plot3')
axes3.set_xticks([]); axes3.set_yticks([]) # Removes all ticks of x and y axes.
plt.show()