Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selecting some features and inputing them as a character vector to the "select" parameter of ggpicrust2 dosen't work. #134

Closed
Geonhui-Kang opened this issue Jan 25, 2025 · 1 comment

Comments

@Geonhui-Kang
Copy link

Geonhui-Kang commented Jan 25, 2025

Hello. I got warnings from my ggpicrust2 analysis.

Below are my codes for ggpicrust2 analysis.
results_file_input_4 <- ggpicrust2(data = abundance_data_filtered[, c(1,40:47,48:57)], metadata = metadata[c(39:46,47:56), ], group = "Group", pathway = "KO", daa_method = "edgeR", ko_to_kegg = TRUE, order = "pathway_class", p_values_bar = TRUE, x_lab = "pathway_name", select = result_4_select)

The number of features with statistical significance exceeds 30, leading to suboptimal visualization. Please use 'select' to reduce the number of features.
Currently, you have these features: "ko05412", "ko03450", "ko04142", "ko00604", "ko04260", "ko05142", "ko04973", "ko04974", "ko04976", "ko00565", "ko00624", "ko00941", "ko01053", "ko00100", "ko05219", "ko00531", "ko00364", "ko05130", "ko03050", "ko00361", "ko05143", "ko04020", "ko05414", "ko05012", "ko05150", "ko05131", "ko00196", "ko02060", "ko04622", "ko00511", "ko04972", "ko00540", "ko00140", "ko05100", "ko05410", "ko00906", "ko04210", "ko00944", "ko04144", "ko00930".
You can find the statistically significant features with the following command:
daa_results_df %>% filter(p_adjust < 0.05) %>% select(c("feature","p_adjust"))

So, I selected some featues not exceeding its number over 30, and I gave this information to ggpicrust as the "select" parameter.

result_4_select= c("ko05412", "ko03450", "ko04142", "ko00604", "ko04260", "ko05142", "ko04973", "ko04974", "ko04976", "ko00565", "ko00624", "ko00941", "ko01053", "ko00100", "ko05219", "ko00531", "ko00364", "ko05130", "ko03050", "ko00361", "ko05143", "ko04020", "ko05414", "ko05012", "ko05150", "ko05131", "ko00196", "ko02060", "ko04622")"

But I got an error saying that "Some selected samples are not present in the abundance data."
I can't undetstand above error because the pathways I selected are the one given from the analysis result.

Can I get some information about this problem?

Thank you.

@cafferychen777
Copy link
Owner

Hi! I've identified the issue with your code. The error message "Some selected samples are not present in the abundance data" occurs due to incorrect parameter usage. Let me explain in detail:

  1. In your code, you're using:
results_file_input_4 <- ggpicrust2(
    data = abundance_data_filtered[, c(1,40:47,48:57)], 
    metadata = metadata[c(39:46,47:56), ], 
    group = "Group", 
    pathway = "KO", 
    daa_method = "edgeR", 
    ko_to_kegg = TRUE, 
    order = "pathway_class", 
    p_values_bar = TRUE, 
    x_lab = "pathway_name", 
    select = result_4_select
)
  1. The issue is with how the select parameter is being used. In the ggpicrust2 function, the select parameter should be used during visualization, not during the initial analysis. Here's how to modify your code:
# Step 1: Perform differential analysis without using select parameter
results_file_input_4 <- ggpicrust2(
    data = abundance_data_filtered[, c(1,40:47,48:57)], 
    metadata = metadata[c(39:46,47:56), ], 
    group = "Group", 
    pathway = "KO", 
    daa_method = "edgeR", 
    ko_to_kegg = TRUE, 
    order = "pathway_class", 
    p_values_bar = TRUE, 
    x_lab = "pathway_name"
)

# Step 2: Get the differential analysis results dataframe
daa_results_df <- results_file_input_4[[1]]$results

# Step 3: Use pathway_errorbar for visualization with select parameter
p <- pathway_errorbar(
    abundance = abundance_data_filtered[, c(1,40:47,48:57)],
    daa_results_df = daa_results_df,
    Group = metadata[c(39:46,47:56), ]$Group,
    ko_to_kegg = TRUE,
    p_values_threshold = 0.05,
    order = "pathway_class",
    select = result_4_select,  # Use select here
    p_value_bar = TRUE,
    x_lab = "pathway_name"
)

This modification should resolve the issue. The select parameter is meant for filtering pathways during the visualization stage, not during the differential analysis stage.

If you still encounter issues, please check:

  1. Ensure that the pathway IDs in result_4_select exactly match those in the feature column of daa_results_df
  2. Verify that the sample order is consistent between your data and metadata
  3. Confirm that the grouping information in the Group column is correct

I hope this helps! Feel free to ask if you have any questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants