------------------------------------------------------------------------------- 12/16/2019 PDNET: A dataset and methods for deep learning protein inter-residue distances Badri Adhikari, University of Missouri-St. Louis This file lists the commands/scripts to run to prepare the input features and output distance maps for the DeepCov and PSICOV sets. Everything related to deep learning is under the deep-learning/ folder. Additional evaluations on the CASP13 datasets are inside the evaluations folder. ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- Test feature generation (after installing all tools) ------------------------------------------------------------------------------- cd /tmp/ echo ">1guu" > 1guu.fasta echo "KTRWTREEDEKLKKLVEQNGTDDWKVIANYLPNRTDVQCQHRWQKVLNPE" >> 1guu.fasta python3 /home/badri/PDNET/v2_64x64/scripts/make-features.py -f 1guu.fasta -j . -p 1guu.pkl ------------------------------------------------------------------------------- Prepare I/O for PSICOV set ------------------------------------------------------------------------------- # Copy the original PSICOV chains to ./data/psicov/chains/ # Comment out the code in chains2io.py that creates pdb (so that PDBs are not downloaded and curated) rm ./data/psicov/fasta/* ./data/psicov/distance/* python3 ./scripts/chains2io.py ./data/psicov.lst ./data/psicov/chains/ ./data/psicov/distance/ ./data/psicov/fasta/ /tmp/downloads/ 512 24 rm -r ./data/psicov/features-raw/feature-generation-jobs/* cd ./data/psicov/ python3 ./data/psicov/features-psicov.py ./data/psicov.lst ./data/psicov/fasta/ ./data/psicov/features-raw/precomputed-features/ ./data/psicov/features-raw/precomputed-pssm/ ./data/psicov/features-raw/feature-generation-jobs/ ./data/psicov/features/ ------------------------------------------------------------------------------- Prepare I/O for DeepCov set ------------------------------------------------------------------------------- # Prepare chains and dmaps: rm ./data/deepcov/fasta/* ./data/deepcov/distance/* ./data/deepcov/chains/* python3 ./scripts/chains2io.py ./data/train_ecod.lst ./data/deepcov/chains/ ./data/deepcov/distance/ ./data/deepcov/fasta/ /tmp/downloads/ 512 50 &> ./logs-data-prep/chains2io-deepcov.log & # Prepare input features in Lewis server cluster: cd /home/badri/PDNET/v2_64x64/data/deepcov/ sbatch sbatch-full-feats.sh deepcov.lst # Copy the files to the local server: cd /home/badri/PDNET/v2_64x64/data/deepcov/raw-features rsync -av adhikarib@lewis.rnet.missouri.edu:/storage/htc/prayog/badri/deepcov-feature-gen-largefolder/raw-features/* /home/badri/PDNET/v2_64x64/data/deepcov/raw-features/ # Verify and obtain list with length: python3 ./scripts/final-list.py ./data/deepcov/chains/ ./data/deepcov/distance/ ./data/deepcov/fasta/ ./data/deepcov/features/ > ./data/deepcov.lst ------------------------------------------------------------------------------- Prepare final zip file ------------------------------------------------------------------------------- rm -f *.tar.gz tar zcvf train-data.tar.gz data/*.lst data/deepcov/features data/deepcov/distance data/psicov/features data/psicov/distance data/cameo/features data/cameo/distance tar zcvf chains.tar.gz data/cameo/chains data/deepcov/chains data/psicov/chains tar zcvf fasta.tar.gz data/deepcov/fasta data/cameo/fasta data/psicov/fasta tar zcvf msa.tar.gz data/cameo/aln data/deepcov/aln data/psicov/aln tar zcvf casp13-evals.tar.gz deep-learning/evaluations/casp13 tar zcvf scripts.tar.gz ./scripts tar zcvf train-src.tar.gz deep-learning/src tar zcvf models.tar.gz ./deep-learning/experiments ------------------------------------------------------------------------------- Some random notes on installing tools ------------------------------------------------------------------------------- CCMpred: git clone --recursive https://github.com/soedinglab/CCMpred.git cd CCMpred vim CMakeLists.txt -> THREADS_PREFER_PTHREAD_FLAG ON (replace with OFF) cmake -DWITH_OMP=OFF . make Making blast database: ../ncbi-blast-2.9.0+/bin/makeblastdb -in uniref90.fasta -dbtype prot -input_type fasta -out uniref90.blast.fasta Add the following line after psipred execution in runpsipredplus: mv $tmproot.mtx $rootname.mtx -------------------------------------------------------------------------------