This project explores how plants regulate microbiome assembly by identifying key transcription factors (TFs) and their gene targets in Arabidopsis thaliana and Lotus japonicus. Below is the full documentation of the bioinformatics pipeline, supporting scripts, and figures.
Gene expression clusters responsive to synthetic microbial communities (SCs) were extracted from Wippel et al. (2021). Clusters were filtered for transcription factors (TFs) using PlantTFDB annotations.
Script used: background_cluster3.py
Arabidopsis promoter sequences were extracted (1000 bp upstream of TSS) using TAIR10. For Lotus, homologous Arabidopsis gene IDs were used due to limited genomic annotation.
Scripts used: extract_upstream_promoter_sequences.py
, background_cluster3.py
Gene_annotation.py
, Shuffled_control_At_100_times.py
Motif_distribution_visualization.py
, Shuffled_control_At_100_times.py
Used GRNBoost2 (via Arboreto) on transcriptome data from Arabidopsis and Lotus to infer regulatory relationships between TFs and target genes.
Scripts used: GRNBoost2_AtSC.py
, GRNBoost2_global_GRN_At.py
Networks were visualized in Cytoscape. Perturbation analysis simulated TF removals to assess network stability.
Scripts used: perturbation_analysis_part_1.py
, perturbation_part_2_visualization_plot.py
Script Name | Description |
---|---|
Gene_annotation.py |
Maps FIMO motifs to gene features using GFF3 annotations |
GRNBoost2_global_GRN_At.py |
Infers a global GRN using expression data across all samples |
GRNBoost2_AtSC.py |
Infers a GRN using only At-SC samples to highlight context-specific regulation |
Motif_distribution_visualization.py |
Visualizes positional bias of TF motifs in promoters and computes enrichment |
Shuffled_control_At_100_times.py |
Performs FIMO scans on 100 randomized promoter sets to calculate empirical p-values |
Extract_upstream_promoter_sequences.py |
Extracts 1kb upstream sequences from Arabidopsis and Lotus genomes |
Background_Arabidopsis_vs_Lotus.py |
Creates background gene sets for motif enrichment analysis |
perturbation_analysis_part_1.py |
Removes individual TFs from GRN and computes network fragmentation |
perturbation_part_2_visualization_plot.py |
Plots network robustness metrics from perturbation results |
Each figure visualizes a key result or method step in the pipeline.
Figure | Description |
---|---|
 | Figure S1: Full bioinformatics pipeline overview |
 | Figure 1: PCA and expression heatmaps from Wippel et al. |
 | Figure 2: Top TF motifs detected by FIMO |
 | Figure 3: Statistical enrichment of specific motifs |
 | Figure 4: Motif significance in Lotus via shuffled control |
 | Figure 5: Same as above for Arabidopsis |
 | Figure 6: Differential TF roles in global vs. SC GRNs |
 | Figure 7: Cytoscape-rendered GRN of Arabidopsis |
 | Figure 8: Zoom-in on top TFs and their targets |
 | Figure 9: Effect of TF removals on network fragmentation |
 | Figure 10: Quantitative summary of perturbation outcomes |
If reusing this pipeline, please cite the original data sources and this GitHub repository.