High-priority TODO list
=======================

- igbrowser() improvements:
  - Display pairwise alignment between BCR query sequence and germline
    V/D/J/C sequences.
  - Take a look at visualization tool from IMGT/V-QUEST for inspiration.

- Implement summarizeMismatches().
  The function will take the AIRR data.frame returned by igblastn() as input
  and, for each BCR query sequence, identify the positions of the mismatches,
  insertions, and deletions in the pairwise alignment. Then it should count
  how many of them fall in the following regions:
    - V, D, J, C
    - FWR1-4, CDR1-3
  The counts should be returned in a matrix (or data.frame or tibble?)
  with 1 row per query sequence and 11 columns (one per region).
  summarizeMismatches() should also have an argument that lets the user
  choose whether they want the matrix of counts added to the input AIRR
  data.frame as additional columns or not. Should be FALSE by default.
  E.g. 'bind.results=FALSE'.

- Maybe implement the following advice given by IgBLAST when using one
  of the num_alignments_V/D/J arguments:

  Warning messages:
  1: In .parse_and_issue_warnings(stderr_file) :
    Warning: To obtain better run time performance, please run blastdb_aliastool
    -seqid_file_in <INPUT_FILE_NAME> -seqid_file_out <OUT_FILE_NAME> and use
    <OUT_FILE_NAME> as the argument to -seqidlist

- Add igblastp(), a wrapper to the igblastp standalone executable included
  in IgBLAST. Requested by Dr Iman Haddad in an email from Aug 12, 2025.


Things to do at BioC 3.22 release time
======================================

- Update README.md:
  - Add "devel" row in BioC_version/Status table.
  - Update "Install and load igblastr" section.

- Advertize igblastr:
  - Announce on various bioc-community Slack channels.
  - Announce on the FH-Data Slack (fhdata.slack.com) on channels
    #r-user-comm and #general.
  - Announce on LinkedIn.
  - Try to get an entry in the next R Journal advertizing igblastr.
  - Bioinformatics accepts short articles introducing new software.


Low-priority TODO list
======================

- Add 'clonotype_out' arg to igblastn(). Add examples in man page and
  vignette that use this functionality.

- It was mentioned that some people use mixeR to analyse TCR sequences.
  How does this compare to using igblastn(..., ig_seqtype="TCR")?

- Add functionality to install/use the updated internal and/or auxiliary
  files that are sometimes made available at:
    https://ftp.ncbi.nih.gov/blast/executables/igblast/release/patch/
  See https://ncbi.github.io/igblast/cook/How-to-set-up.html for the details.

- Add bibliography to vignette. See AuthoringRmdVignettes.Rmd vignette in
  BiocStyle for how to do this.

- Add Seqinfo to Imports (but wait until BioC 3.23 for that). Note
  that we'll still need GenomeInfoDb just for list_ftp_dir().

- Clarify provenance of 1279067_1_Paired_sequences.fasta.gz and its licence.
  Give appropriate credit. See https://opig.stats.ox.ac.uk/webapps/oas/

- More investigation to assess the consequences of using the static auxiliary
  data included in IgBLAST.

- Try online IgBLAST here https://www.ncbi.nlm.nih.gov/igblast/ and compare
  results with igblastr.

- Figure out a way to automatically stamp AIRR germline dbs with a
  version number that allows to go back in time when needed.

- One should be able to pass the name of an IMGT germline db to
  install_IMGT_germline_db(), or a vector of names.

- Improve outfmt7-utils.Rd man page (e.g. document customized format 7
  and list_outfmt7_specifiers()) as well as associated unit tests (in
  tests/testthat/test-outfmt7-utils.R).

- Make 'num_threads' an explicit argument with default to 4. The doc should
  show how to specify a higher but still reasonable custom value based on
  detectCores().

- Parse $footer part of output format 7.

- Implement parsing of output formats 3 and 4?

- Set environment variable IGDATA to point to the internal_data directory.
  Note that IGDATA must be set to the **parent** directory of the internal_data
  directory.

- Add 'loci' argument to install_IMGT_germline_db(). Set to "IGH+IGK+IGL"
  by default but could also be any subset of the 3 loci e.g. "IGH" or
  "IGK+IGL" etc...

- Great resource for how to use AIRR Community Reference germline sets with
  IgBLAST: https://williamdlees.github.io/receptor_utils/_build/html/airrc_sets_with_igblast.html
  In particular, the author seems to be using an OGRDB REST API version 2:
    https://ogrdb.airr-community.org/api_v2
  but where is this API documented?
  All the download utilities implemented in igblastr/R/AIRR-utils.R use
  the OGRDB REST API at
    https://ogrdb.airr-community.org/api
  which is poorly documented and is somewhat confusing (see below).

- Investigate the following mysteries about the germline sets provided
  by AIRR/OGRDB:

  1. The OGRDB API at https://ogrdb.airr-community.org/api/ allows downloading
     the germline sequences in 2 formats: ungapped or ungapped_ex.
     Which format is appropriate to use with IgBLAST?
     Note that downloading germline sets directly by clicking on
     the "FASTA Ungapped" links here
       https://ogrdb.airr-community.org/germline_sets/Homo%20sapiens
     or
       https://ogrdb.airr-community.org/germline_sets/Mus%20musculus
     seems to retrieve the "ungapped_ex" sequences for Human and the "ungapped"
     sequences for Mouse. Confusing!

  2. For some Mouse strains, OGRDB seems to provide germline sequences
     only for a limited number of loci/groups. For example for strain A/J,
     only sequences from the light chain (i.e. groups IGKV, IGKJ, IGLV,
     and IGLJ) seem to be available.
     See https://ogrdb.airr-community.org/germline_sets/Mus%20musculus

- Implement install_AIRR_germline_db(). Will download the germline sequences
  from https://ogrdb.airr-community.org/ (link provided by Kellie).

