This project aims to generate an anime dataset by utilizing the 🚀 Jikan API (4.0.0). It retrieves information about anime such as title, score, genres, synopsis, producers, studios, and more. Additionally, it includes functionality to generate a list of usernames from MyAnimeList and fetch user details and anime scores for those usernames. By using these scripts together, you can generate a comprehensive anime dataset that includes information about both individual anime and user-specific scores and details. This dataset can be used for various purposes, such as analysis, recommendation systems, or building anime-related applications.
- 💻 Python 3.10 or higher
- 📦 requests library
- 📦 csv library
- 📦 re library
- 📦 json library
- 📦 BeautifulSoup library
-
Clone the repository:
git clone https://github.com/your-username/anime-dataset-generator.git
-
Install the required dependencies:
pip install requests beautifulsoup4
-
Run the Python scripts:
There are total 4 python files and their functions are:-
This Python script
anime_dataset.py
allows you to generate an anime dataset by utilizing the Jikan API. It fetches anime information such as title, score, genres, synopsis, producers, studios, and more. The data is then stored in a CSV fileanime_dataset.csv
for further analysis and use.Explore my dataset on Kaggle: MyAnimeList Dataset
- Specify the range of anime IDs you want to retrieve by setting the
start_id
andend_id
variables. - The script makes API calls to retrieve data for each anime ID.
- It processes the API responses, extracting relevant information such as anime titles, scores, genres, and more.
- The fetched data is cleaned and stored in a list of dictionaries, where each dictionary represents an anime entry.
- After processing a certain number of anime entries (controlled by the
count
variable), the data is written to the output CSV file. - Finally, the script prints the total number of anime data fetched.
The generated anime dataset (anime_dataset.csv) will contain the following columns:
animeID
: Unique identifier for each animeName
: Name of the animeEnglish name
: English name of the animeJapanese name
: Japanese name of the animeScore
: Score of the animeGenres
: Genres of the animeSynopsis
: Synopsis of the animeType
: Type of the animeEpisodes
: Number of episodesAired
: Airing dates of the animePremiered
: Premiered season and yearStatus
: Current status of the animeProducers
: Producers of the animeLicensors
: Licensors of the animeStudios
: Studios of the animeSource
: Source material of the animeDuration
: Duration of each episodeRating
: Age rating of the animeRank
: Rank of the animePopularity
: Popularity ranking of the animeFavorites
: Number of users who favorited the animeScored By
: Number of users who scored the animeMembers
: Number of members in the anime's community
Run the script to generate the anime list:
python anime_dataset.py
This Python script
username_dataset.py
allows you to generate a list of users by utilizing the Jikan API. It retrieves user information such as username and user URL for a specified range of user IDs. The user details are then stored in a CSV fileuserlist.csv
. These usernames will be used in the subsequent scripts to fetch user-specific anime scores and details.- Specify the range of user IDs you want to retrieve by setting the
start_id
andend_id
variables. - The script makes API calls to retrieve user data for each user ID.
- It checks the response status code and retries the API call up to 5 times if it fails.
- If the response is successful (status code 200), the script parses the JSON response and extracts the user details such as username and user URL.
- The user details are stored as dictionaries in a list.
- After processing all the user IDs in the specified range, the user details are written to the output CSV file which is stored in the userlist folder.
- Finally, the script prints the total number of user data fetched.
Run the script to generate the User list:
python username_dataset.py
This Python script
user_scores_dataset.py
integrates the user list generated from the userlist.py script with the anime scoring data fetched from MyAnimeList. It retrieves the anime scores for each user in the user list and stores the data in a CSV fileuser_score.csv
for further analysis and use.- The script reads the user list from the
userlist.csv
file from the userlist folder, which should be generated using theusername_dataset.py
script. - It iterates through the user list and calls the
scrape_user_profile()
function to fetch the anime scores for each user. - The
scrape_user_profile()
function performs web scraping on the respective user's anime list page to extract the anime scores. - It uses the BeautifulSoup library to parse the HTML response and extract relevant data.
- The function handles two different HTML structures found on differnet users animelist page to retrieve the anime scores.
- For each user, the fetched anime scores are stored in a list of lists, where each inner list contains the user ID, username, anime ID, anime title, and score.
- The script writes the fetched anime scores to the
user_score.csv
file, including column names. - To prevent overwhelming the MyAnimeList servers, the script implements a delay between batches of user requests, randomly chosen within a specified range of seconds.
- After processing all the users in the user list, the script prints a success message if anime scores were fetched successfully, or a failure message if no scores were found.
NOTE
:- There are some users who have chose to hide their list so score list of those users can't be fetched for obvious reason and I have not included those users that have not rated any anime i.e. they have animes in their animelist page but still not rated them.Also I was trying to fetch the score list using
jikan
. Using jikan v4 we can now fetch each user's anime scores . However, the issue with the Jikan API was that it only fetched scores for users who had rated a limited number of anime (usually 4-5), skipping users who had rated more than that. Maybe they can resolve the issue in future so you can also use jikan to do the same.Run the script to generate the User-Score list:
python user_score_dataset.py
This Python script
user_details_dataset.py
fetches detailed information for a list of usernames from MyAnimeList using the Jikan API. It retrieves user- specific data such as gender, birthday, location, join date, anime statistics, and more. The fetched user details are then stored in a CSV fileuser_details.csv
for further analysis and use.- The script reads a list of usernames from the userlist.csv file from the userlist folder, which should be generated using the userlist.py script.
- It prepares the headers for the user_details.csv file, specifying the columns for each user detail.
- The script initializes variables to keep track of the progress, fetch count, and total usernames.
- It sets the batch size and delay between batches to manage the API requests and prevent overwhelming the servers.
- The script iterates over the list of usernames in batches.
- For each batch, it sends API requests to fetch the detailed user information using the Jikan API.
- If the API response is successful (status code 200), the script extracts the relevant user details from the JSON response.
- The fetched user details are stored in a list of lists, where each inner list contains the user details in the specified order.
- The script tracks the fetch count and total usernames processed, displaying the progress periodically.
- It also incorporates a batch delay to introduce a pause between batches to comply with API usage guidelines.
- The script calculates the elapsed time and usernames fetched per second to provide performance metrics.
- Finally, it saves the user details to the user_details.csv file, including the headers and the fetched user data.
Run the script to generate the anime list:
python user_details_dataset.py
- Specify the range of anime IDs you want to retrieve by setting the
🙌 Contributions are welcome! If you have any ideas, suggestions, or improvements, feel free to open an issue or submit a pull request. Your contributions can help enhance the functionality and usability of this project. Together, we can make it even better! 👍🎉
📝 This project is licensed under the MIT License. You are free to modify and use the code in accordance with the terms and conditions of the license. Feel free to adapt the project to suit your needs and contribute to open-source development. 📜🔒