Category: os x

Here are ways to get SIMD/SSE flags from machines running either Linux or OS X:

On Linux (CentOS 7):

$ cat /proc/cpuinfo | grep flags | uniq
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb pln pts dtherm intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local

On Mac OS X 10.12:

$ sysctl -a | grep machdep.cpu.features
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C
$ sysctl -a | grep machdep.cpu.leaf7_features
machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 AVX2 BMI2 INVPCID FPU_CSDS

See: https://stackoverflow.com/a/38345423/19410 for a discussion about how to detect instruction sets.

Read More

Finishing touches are in place for my convert2bed tool (GitHub site).

This utility converts common genomics data formats (BAM, GFF, GTF, PSL, SAM, VCF, WIG) to lexicographically-sorted UCSC BED format. It offers two benefits over alternatives:

  • It runs about 3-10x as fast as bedtools *ToBed equivalents
  • It converts all input fields in as non-lossy a way as possible, to allow recovery of data to the original format

As an example, here we use convert2bed on a 14M-read, indexed BAM file to a sorted BED file (data are piped to /dev/null) on a 4 GB, dual-Core 2 (2.4 GHz) workstation running RHEL 6:

$ samtools view -c ../DS27127A_GTTTCG_L001.uniques.sorted.bam
14090028

Conversion is performed with default options (sorted BED as output, using BEDOPS sort-bed):

$ time ./convert2bed -i bam < ../DS27127A_GTTTCG_L001.uniques.sorted.bam > /dev/null
[bam_header_read] EOF marker is absent. The input is probably truncated.

real 3m5.508s
user 0m25.702s
sys 0m8.602s

Here is the same conversion, performed with bedtools v2.22 bamToBed and sortBed:

$ time ../bedtools2/bin/bamToBed -i ../DS27127A_GTTTCG_L001.uniques.sorted.bam | ../bedtools2/bin/sortBed -i stdin > /dev/null

real    28m22.057s
user    2m58.579s
sys     0m41.605s

The use of convert2bed for this file offers a 9.1x speed improvement. Other large BAM files show similar conversion speedups.

Further time reductions are conferred with use of bam2bedcluster and bam2starchcluster scripts (TBA) which make use of GNU Parallel or a Sun Grid Engine job scheduler, reducing conversion time even further by breaking conversion tasks down by chromosome.

When testing is complete, code will be wrapped into the upcoming BEDOPS v2.4.3 release. Source is now available via GitHub.

Read More

MacPorts is useful for installing a variety of command-line utilities and programs for Mac OS X. There are others, e.g. Homebrew. After using MacPorts to update a GNU gcc installation, it is useful to select the new revision. Tips were posted to this Stack Overflow thread. Basically, it boils down to two steps:

  1. sudo port select --list gcc
  2. sudo port select --set gcc mp-gcc47

Read More