Table of contents
- Introduction
- Dataset Overview
- Let's dive into exploring the analysis code
- Import the necessary libraries
- Loading the dataset with pandas
- Exploratory Data Analysis (EDA)
- Part 1: Data Cleaning and Exploration
- Part 2: Time Series Analysis / Rolling Window / Moving Averages
- Create a line chart to visualize the 'Close' prices over time.
- Calculate and plot the daily percentage change in closing prices.
- Investigate the presence of any trends or seasonality in the stock prices.
- Apply moving averages to smooth the time series data in 15/30 day intervals against the original graph.
- Calculate the average closing price for each stock.
- Identify the top 5 and bottom 5 stocks based on average closing price.
- Part 3: Volatility Analysis
- Calculate and plot the rolling standard deviation of the 'Close' prices.
- Create a new column for daily price change (Close - Open)
- Analyze the distribution of daily price changes.
- Identify days with the largest price increases and decreases.
- Identify stocks with unusually high trading volume on certain days.
- Part 4: Correlation and Heatmaps
- Conclusion
Introduction
This blog provides a comprehensive analysis of the Dhaka stock market of 2022, focusing on its dynamics and trends. Using Python and Jupyter-notebook, the blog explores market trends, data patterns, and insights tailored to the Dhaka market landscape. The blog serves as an essential guide for traders, investors, and enthusiasts seeking a deeper understanding of Bangladesh's financial landscape. Through Python-powered analysis and visualizations, the blog uncovers compelling narratives driving stock movements in Dhaka, offering valuable insights for traders, investors, and enthusiasts.
Dataset Overview
This analysis uses a CSV
file with 49,159
rows and 7
columns to explore the Dhaka stock market from January 2022 to June 2022
. The dataset includes 412 companies
and features columns like Date, Name, Open, High, Low, Close, and Volume
. This comprehensive data provides in-depth analysis and insights into the Dhaka stock market dynamics.
The stock market dataset is provided here for your analysis:
Please feel free to explore this extensive dataset to acquire useful insights into the dynamics of the Dhaka stock market.
Let's dive into exploring the analysis code
Import the necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
These libraries are essential components for our analysis.
numpy (np): Used for numerical computing, offering support for multi-dimensional arrays and mathematical functions.
pandas (pd): Crucial for data manipulation and analysis, boasting powerful data structures and tools tailored for handling structured data effectively.
matplotlib.pyplot (plt): Enables the creation of visualizations, including plots, charts, and graphs.
seaborn (sns): A powerful library for statistical data visualization, enhancing the aesthetics and overall appeal of visualizations.
Loading the dataset with pandas
In this section, we import the stock market data using the read_csv()
function from the pandas library within a .py file in Jupyter notebook. Subsequently, we utilize the head()
function to display the first 5 rows of the dataset, offering an initial glimpse into the structure and contents of the Dhaka stock market data.
# Read the CSV file
stock_data = pd.read_csv('Stock_Market_Data.csv')
# 1st 5 rows of dataset
stock_data.head()
Exploratory Data Analysis (EDA)
print(stock_data.shape)
# Check the data types
stock_data.dtypes
We've noticed that the Date column in our data is currently in the wrong format—it's labeled as an object when it should be in DateTime format. To fix this, we'll convert the Date column to the correct format. Let's initiate that conversion process.
# Convert 'Date' column to datetime format
stock_data['Date'] = pd.to_datetime(stock_data['Date'], dayfirst = True)
Utilizing pd.to_datetime(), we perform the conversion. Additionally, we set dayfirst = True since the date format is %d%m%y. Let's recheck the data types to confirm the successful conversion.
stock_data.dtypes
Part 1: Data Cleaning and Exploration
Calculate basic summary statistics for each column (mean, median, standard deviation, etc.)
The code generates essential statistical measures for the stock_data DataFrame using the describe() function, providing key statistical metrics such as count, mean, standard deviation, percentiles(25%, 50% also median, 75%), minimum, and maximum values for each numerical column in the dataset.
stock_data.describe()
Get the top 5 companies with the highest total volume:
Selecting specific companies from the dataset allows for a more thorough examination of their influence on the market. The provided code snippet identifies the top 5 companies based on their total trading volume.
# Calculate total volume for each company
volume_per_company = stock_data.groupby('Name')['Volume'].sum()
# Get the top 5 companies with the highest total volume
top_5_companies = volume_per_company.nlargest(5).index
top_5_companies.to_list()
Explore the distribution of the 'Close' prices over time.
Analyzing the distribution of 'Close' prices over time for the top 5 companies, this code generates histograms for each company. These visualizations offer insights into the fluctuations in closing prices, aiding in a comparative evaluation of their stock performance.
for name in top_5_companies:
# Filter data for the current company
company_data = stock_data[stock_data['Name'] == name]
plt.figure(figsize = (15, 5))
# Histogram for 'Close' prices over time for the current company
sns.histplot(data = company_data, x = 'Close', bins = 30, label = name)
# Set labels and title
plt.xlabel('Closing Price Distribution')
plt.ylabel('Frequency')
plt.title('Distribution of Close Prices Over Time of {}'.format(name))
# Add legend, rotate x-axis labels, and show the plot
plt.legend()
plt.xticks(rotation = 45)
plt.show()
Here are 2 selected plots provided from the 5 output plots for reference.
Identify and analyze any outliers (if any) in the dataset.
The code iterates through the top 5 companies in a stock dataset. For each company, it filters the data and creates a box plot to visualize the distribution of 'Close' prices. Then, it calculates the Interquartile Range (IQR) to identify outliers in the 'Close' prices using a threshold of 1.5 times the IQR. Outliers are determined by comparing 'Close' prices against the threshold. Finally, it prints information about the outliers for each company, including the number of outliers found and their details. This analysis helps identify significant deviations in 'Close' prices for the top 5 companies, offering insights into potential irregularities in the stock prices.
for name in top_5_companies:
# Filter data for the current company
company_data = stock_data[stock_data['Name'] == name]
# Box plot to visualize outliers in 'Close' prices for the current company
plt.figure(figsize = (8, 6))
sns.boxplot(x = 'Name', y = 'Close', data = company_data)
plt.title(f'Box Plot of Close Prices for {name}')
plt.show()
# Calculate the Interquartile Range (IQR) for the current company
Q1 = company_data['Close'].quantile(0.25)
Q3 = company_data['Close'].quantile(0.75)
IQR = Q3 - Q1
# Define a threshold for identifying outliers for the current company
threshold = 1.5 * IQR
# Identify and analyze outliers for the current company
outliers = company_data[(company_data['Close'] < Q1 - threshold) | (company_data['Close'] > Q3 + threshold)]
# Print information about outliers for the current company
print(f"\nCompany: {name}")
print("Number of outliers:", len(outliers))
print("Outliers:")
print(outliers[['Date', 'Close']])
Here are 1 selected plots provided from the 5 output plots for reference.
Part 2: Time Series Analysis / Rolling Window / Moving Averages
Create a line chart to visualize the 'Close' prices over time.
for name in top_5_companies:
# Filter data for each specific company
company_data = stock_data[stock_data['Name'] == name]
# Create a separate line chart for each company's 'Close' prices over time
plt.figure(figsize=(10, 4))
plt.plot(company_data['Date'], company_data['Close'])
# Set labels and title for the plot
plt.xlabel('Date')
plt.ylabel('Closing Price')
plt.title(f'Close Prices Over Time for {name}')
plt.xticks(rotation = 45)
# Show the plot for each company
plt.show()
During the specified period, IFIC and UNIONBANK witnessed a notable decline in their 'Close' prices, while FUWANGFOOD exhibited a significant increase. Understanding the drivers behind these trends, whether influenced by company-specific factors or broader market conditions, is crucial for informed decision-making and understanding market dynamics.
Calculate and plot the daily percentage change in closing prices.
# Calculate daily percentage change for each company and plot individually
for name in top_5_companies:
plt.figure(figsize=(15, 4))
company_data = stock_data[stock_data['Name'] == name]
company_data['Daily_PCT_Change'] = company_data['Close'].pct_change()
# Plot the daily percentage change for each company
plt.plot(company_data['Date'], company_data['Daily_PCT_Change'], label = name)
# Set labels and title for the plot
plt.xlabel('Date')
plt.ylabel('Daily Percentage Change')
plt.title(f'Daily Percentage Change in Closing Prices of {name}')
plt.legend()
plt.xticks(rotation = 45)
plt.show()
Here are 2 selected plots provided from the 5 output plots for reference.
Investigate the presence of any trends or seasonality in the stock prices.
The code snippet generates line charts displaying the stock prices over time for the top 5 companies. It includes a rolling average (30-day) trend line to provide a smoother representation of trends. The visualizations offer insight into the actual closing prices and trends in stock prices over time for each company. Adjustments to the rolling window size or other visualization aspects can be made for better analysis and presentation as needed
for name in top_5_companies:
company_data = stock_data[stock_data['Name'] == name]
plt.plot(company_data['Date'],company_data['Close'], label = name)
# Plotting a rolling average (e.g., 30 days) for trend visualizations
rolling_avg = company_data['Close'].rolling(window = 30).mean()
plt.plot(company_data['Date'], rolling_avg, label = f'{name} - Trend Line', linestyle='--')
# Set labels and title for the plot
plt.title('Stock Prices Trend Line Over Time')
plt.xlabel('Date')
plt.ylabel('Closing Price')
plt.legend()
plt.show()
Here are 2 selected plots provided from the 5 output plots for reference.
Apply moving averages to smooth the time series data in 15/30 day intervals against the original graph.
The code analyzes original closing prices alongside 15-day and 30-day moving averages, offering insights into market trends, signal analysis, volatility, support/resistance levels, and momentum confirmation. It aids in identifying long-term trends, potential trend shifts, market volatility, key price levels, and momentum direction, assisting traders and analysts in making informed decisions in trading and investment activities.
for name in top_5_companies:
plt.figure(figsize=(12, 6))
company_data = stock_data[stock_data['Name'] == name]
# Plotting original closing prices
plt.plot(company_data['Date'], company_data['Close'], label = name, color = 'blue')
# Calculate and plot moving averages (15-day and 30-day)
company_data['15_Day_MA'] = company_data['Close'].rolling(window = 15).mean()
company_data['30_Day_MA'] = company_data['Close'].rolling(window = 30).mean()
plt.plot(company_data['Date'], company_data['15_Day_MA'], label = f'{name} - 15-day MA', linestyle='--', color = 'red')
plt.plot(company_data['Date'], company_data['30_Day_MA'], label = f'{name} - 30-day MA', linestyle='-.', color = 'green')
# Set labels, title, and legend
plt.title('Stock Prices with Moving Averages Over Time')
plt.xlabel('Date')
plt.ylabel('Closing Price')
plt.legend()
plt.xticks(rotation=45)
# Show the plot
plt.show()
Here are 3 selected plots provided from the 5 output plots for reference.
Calculate the average closing price for each stock.
The code utilizes the groupby()
function to group the DataFrame by the Name column. Subsequently, it calculates the mean of the Close prices for each group, thereby obtaining the average closing price for each stock. The output presents the average closing price for all stocks available in the dataset.
# Calculate average closing price for each stock
average_closing_price = stock_data.groupby('Name')['Close'].mean()
# Display the average closing prices
average_closing_price
Identify the top 5 and bottom 5 stocks based on average closing price.
# Sort stocks based on average closing price
sorted_stocks = average_closing_price.sort_values()
top_5_stocks = sorted_stocks.head(5)
bottom_5_stocks = sorted_stocks.tail(5)
# Display top and bottom stocks
print("Top 5 Stocks based on Average Closing Price:")
print(top_5_stocks)
print("\nBottom 5 Stocks based on Average Closing Price:")
print(bottom_5_stocks)
Part 3: Volatility Analysis
Calculate and plot the rolling standard deviation of the 'Close' prices.
# Calculate and plot rolling standard deviation for each of the top 5 companies
plt.figure(figsize=(12, 6))
for name in top_5_companies:
company_data = stock_data[stock_data['Name'] == name]
company_data['Rolling_Std'] = company_data['Close'].rolling(window = 30).std()
plt.plot(company_data['Date'], company_data['Rolling_Std'], label = f'{name}')
plt.title(f'Rolling Standard Deviation (30-day) of Close Prices')
plt.xlabel('Date')
plt.ylabel('Rolling Standard Deviation')
plt.legend()
plt.grid()
plt.xticks(rotation = 45)
plt.show()
Create a new column for daily price change (Close - Open)
stock_data['Daily_Price_Change'] = stock_data['Close'] - stock_data['Open']
# Display the updated DataFrame with the new column
stock_data.head()
Analyze the distribution of daily price changes.
The code iterates through the top 5 companies, retrieves stock data for each company, and generates a histogram illustrating the distribution of daily price changes over time. It utilizes the Seaborn library to create histograms, with each company's data. This visualization offers insights into the frequency and distribution of daily price changes for each company, aiding in the analysis of stock price movements.
# Analyze distribution of daily price changes for top 5 companies
for name in top_5_companies:
company_data = stock_data[stock_data['Name'] == name]
plt.figure(figsize=(8, 6))
plt.hist(company_data['Daily_Price_Change'], bins = 30, edgecolor='black')
plt.title(f'Distribution of Daily Price Changes for {name}')
plt.xlabel('Daily Price Change')
plt.ylabel('Frequency')
plt.grid(axis = 'y', alpha = 0.5)
plt.show()
Here are 2 selected plots provided from the 5 output plots for reference.
Identify days with the largest price increases and decreases.
largest_increase_day = stock_data.loc[stock_data['Daily_Price_Change'].idxmax()]
largest_decrease_day = stock_data.loc[stock_data['Daily_Price_Change'].idxmin()]
print("Days with the Largest Price Increases:")
print(largest_increase_day)
print("\nDays with the Largest Price Decreases:")
print(largest_decrease_day)
Identify stocks with unusually high trading volume on certain days.
This analysis helps identify increases in trading activity, which could indicate important market events or heightened investor interest. It reveals patterns, events, or irregularities that might affect specific stocks or the overall market, providing valuable insights for investors and traders.
for name in top_5_companies:
company_data = stock_data[stock_data['Name'] == name]
plt.plot(company_data['Date'],company_data['Volume'],label = name)
threshold = company_data['Volume'].quantile(0.95)
unusual_high_volume_data = company_data[company_data['Volume'] > threshold]
plt.scatter(unusual_high_volume_data['Date'], unusual_high_volume_data['Volume'], color="red", marker='o', label="{} - Unusual High Volume Days".format(name))
plt.title('Trading Volume Over Time with Emphasis on Unusually High Volume Days')
plt.xlabel('Date')
plt.ylabel('Trading Volume')
plt.legend()
plt.show()
Here are 2 selected plots provided from the 5 output plots for reference.
Part 4: Correlation and Heatmaps
Explore the relationship between trading volume and volatility.
The analysis examines the relationship between trading volume and volatility using a scatter plot, regression line, and correlation coefficient. The scatter plot displays individual data points representing trading volume and volatility pairs. The regression line indicates the direction and strength of the relationship, while the correlation coefficient quantifies it. A positive slope and correlation coefficient imply a tendency for trading volume and volatility to increase together, while a negative slope and coefficient suggest an inverse relationship. A correlation coefficient near 0 indicates a weak or no linear relationship. It's important to note that correlation doesn't imply causation, and the choice of the rolling window size for standard deviation can affect results.
for name in top_5_companies:
company_data = stock_data[stock_data['Name'] == name]
# Plotting the relationship between trading volume and volatility with regression line
plt.figure(figsize = (8, 5))
company_data['Rolling_Std'] = company_data['Close'].rolling(window = 30).std()
# Remove rows with missing values
company_data_cleaned = company_data.dropna(subset = ['Volume', 'Rolling_Std'])
# Scatter plot with regression line
sns.regplot(x = company_data_cleaned['Volume'], y = company_data_cleaned['Rolling_Std'])
plt.title(f'Relationship between Trading Volume and Volatility for {name}')
plt.xlabel('Trading Volume')
plt.ylabel('Volatility')
# Calculate and print the correlation coefficient
correlation_coefficient = np.corrcoef(company_data_cleaned['Volume'], company_data_cleaned['Rolling_Std'])[0, 1]
print(f'Correlation Coefficient of {name}: {correlation_coefficient:.2f}')
plt.show()
Here are 3 selected plots provided from the 5 output plots for reference.
Calculate the correlation matrix between the 'Open' & 'High', 'Low' &'Close' prices.
# Iterate over each top company
for name in top_5_companies:
# Filter data for the current company
company_data = stock_data[stock_data['Name'] == name]
# Calculate correlation matrix
correlation_matrix = company_data[['Open', 'High', 'Low', 'Close']].corr()
print(f'Correlation Matrix of {name}:\n{correlation_matrix}\n')
Create a heatmap to visualize the correlations using the
seaborn
package.
# Iterate over each top company
for name in top_5_companies:
# Filter data for the current company
company_data = stock_data[stock_data['Name'] == name]
# Calculate correlation matrix
correlation_matrix = company_data[['Open', 'High', 'Low', 'Close']].corr()
# Create heatmap
plt.figure(figsize=(6, 4))
sns.heatmap(correlation_matrix, annot=True, cmap = 'coolwarm', fmt='.3f', linewidths = .5)
plt.title(f'Correlation Matrix Heatmap for {name}')
plt.show()
Here are 3 selected plots provided from the 5 output plots for reference.
Conclusion
In summary, this analysis has illuminated the relationship between trading volume and volatility within the Dhaka stock market of 2022. Through the examination of scatter plots, regression lines, and correlation coefficients, we have discerned patterns suggesting a correlation between trading volume and volatility specific to this market. The positive or negative slopes of regression lines, alongside correlation coefficients, offer insights into the direction and strength of this relationship. However, it's crucial to acknowledge that correlation does not imply causation, and other factors may influence market dynamics.
Additionally, I extend sincere appreciation to Bohubrihi for their invaluable contributions to this analysis, which have enriched the project significantly. Looking ahead, continued research and analysis in this domain can deepen our understanding of financial markets and aid in making informed decisions amidst market complexities.