aibiology

Artificial intelligence in biology

0%

SAM文件flags解释

SAM flag

SAM文件是二进制比对文件,其中FLAG值记录了该read的比对信息。 FLAG巧妙地采用二进制来存储信息,解读FLAG即可确定read属性,所以samtools常常依据FLAG来过滤处理SAM/BAM。

Python code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
import argparse

def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument('-i', '--input', type=int, required=True,
help='flags in SAM')
return parser.parse_args()


def flags_cal(flag: int):
falgs = [0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80, 0x100, 0x200, 0x400, 0x800]
flags_dict = {
0x1: ['PAIRED', '..paired-end (or multiple-segment) sequencing technology'],
0x2: ['PROPER_PAIR', '..each segment properly aligned according to the aligner'],
0x4: ['UNMAP', '..segment unmapped'],
0x8: ['MUNMAP', '..next segment in the template unmapped'],
0x10: ['REVERSE', '..SEQ is reverse complemented'],
0x20: ['MREVERSE', '..SEQ of the next segment in the template is reversed'],
0x40: ['READ1', '..the first segment in the template'],
0x80: ['READ2', '..the last segment in the template'],
0x100: ['SECONDARY', '..secondary alignment'],
0x200: ['QCFAIL', '..not passing quality controls'],
0x400: ['DUPLICATE', '..PCR or optical duplicate'],
0x800: ['SUPPLEMENTARY', '..supplementary alignment']
}
binary = bin(flag)
index_falgs = list(map(int, list(binary[2:][::-1])))
index_list = []
hex_list = []
print()
print(f'Flags: {flag}')
print()
print("Hex\tDec\tProperty\tInformation")
Props = []
for index,i in enumerate(index_falgs):
if i == 1:
index_list.append(index_list)
hexs = falgs[index]
ints = int(hexs)
info = flags_dict[hexs]
Props.append(info[0])
print("{:<#x}\t{:<5d}\t{:<11s}\t{:<50s}".format(hexs, ints, info[0], info[1]))
# print("%#x %s %s %s "%(hexs, hexs, info[0], info[1]))
print()
print("{}\t{:<5d}\t{}".format(hex(flag), flag, ",".join(Props)))


if __name__ == '__main__':
args = parse_args()
flags_cal(args.input)

执行程序即可解析

1
2
3
4
5
6
7
8
9
10
11
12
$ python flags_explain.py -i 1294 
Flags: 1294

Hex Dec Property Information
0x2 2 PROPER_PAIR ..each segment properly aligned according to the aligner
0x4 4 UNMAP ..segment unmapped
0x8 8 MUNMAP ..next segment in the template unmapped
0x100 256 SECONDARY ..secondary alignment
0x400 1024 DUPLICATE ..PCR or optical duplicate

0x50e 1294 PROPER_PAIR,UNMAP,MUNMAP,SECONDARY,DUPLICATE

## htslib

htslib

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
/*! @abstract the read is paired in sequencing, no matter whether it is mapped in a pair */
#define BAM_FPAIRED 1
/*! @abstract the read is mapped in a proper pair */
#define BAM_FPROPER_PAIR 2
/*! @abstract the read itself is unmapped; conflictive with BAM_FPROPER_PAIR */
#define BAM_FUNMAP 4
/*! @abstract the mate is unmapped */
#define BAM_FMUNMAP 8
/*! @abstract the read is mapped to the reverse strand */
#define BAM_FREVERSE 16
/*! @abstract the mate is mapped to the reverse strand */
#define BAM_FMREVERSE 32
/*! @abstract this is read1 */
#define BAM_FREAD1 64
/*! @abstract this is read2 */
#define BAM_FREAD2 128
/*! @abstract not primary alignment */
#define BAM_FSECONDARY 256
/*! @abstract QC failure */
#define BAM_FQCFAIL 512
/*! @abstract optical or PCR duplicate */
#define BAM_FDUP 1024
/*! @abstract supplementary alignment */
#define BAM_FSUPPLEMENTARY 2048