3-letter codes of the 20 standard amino acids:
Kód: Vybrat vše
ALA ARG ASN ASP CYS GLU GLN GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL
Implement a program invoked like:
program_name configuration_file output_file
Command line always contains configuration file name and it can contain output file name. If the output file name is not listed, standard output should be used.
Both the configuration file and the data files are row-oriented.
Data file structure
One file describes one protein and contains information about all its amino acids and their spatial coordinates (x, y, z – discrete values). Each row begins with the 3-letter amino acid code and continues with the spatial coordinates for that amino acid.
Keep in mind: coordinates can also be negative numbers, space between strings on one line can be one or more whitespaces.
(Note: This is a simplification of a real PDB file describing protein structures).
Configuration file structure
Kód: Vybrat vše
R-neighborhood .. is an integer
Pattern ......... is a sequence of one or more amino acids separated by one or more whitespaces
protein_1 ....... is the name of the first data file
...
protein_N ....... is the name of the N-th data file
The R-neighborhood represents the neighborhood of a certain amino acid at a distance less than or equal to R. The R-neighborhood of an amino acid with coordinates [x,y,z] is defined as points with all coordinates in the range [x-R..x+R, y-R..y+R , z-R..z+R].
A histogram is constructed for each point in discrete 3D space in which an amino acid from the set of specified proteins is located. For each point, we calculate the number of amino acid types - specified in pattern - in its R-neighborhood. Let these numbers be (in order according to the specified pattern) [c1..cn]. Then the record corresponding to the values [c1..cn] is incremented. The resulting histogram is created by gradually incrementing the records according to the R-neighborhood of all points corresponding to the amino acids of all input proteins.
Output
The output format is row-oriented, one line is in the form
The output is sorted lexicographically, i.e.
Kód: Vybrat vše
[0 0 0 1]: xxx
[0 0 0 2]: xxx
...
[0 0 1 0]: xxx
[0 0 1 2]: xxx
...
[0 0 2 1]: xxx
Example
Configuration file:
Kód: Vybrat vše
6000
ARG LYS
simple.pdb
Kód: Vybrat vše
ARG 14872 -18107 30327
LYS 16112 -17325 26790
HIS 17615 -20594 25563
ILE 18797 -24042 26472
ARG 21860 -24523 24296
ARG 24156 -21734 23132
GLY 27393 -22378 21345
HIS 29225 -19697 19391
ALA 32741 -18808 18304
Kód: Vybrat vše
[1 0]: 1
[1 1]: 2
[2 0]: 3
[2 1]: 1
[3 0]: 1
Attached link, that will die soon probably, copied at the bottom
Assumptions and efficiency requirements
The discrete 3D space where all the amino acids are located is large, think on the order of 100000^3. It is therefore not possible to store in memory a map with data for every point in this space.
Space filling with amino acids is very sparse. Assume tens to small hundreds of amino acids (occupied points in space). Therefore, choose a suitable data representation so that the necessary operations are as efficient as possible.
It is certainly not efficient to search every point of the entire space for each amino acid, nor to go through all other amino acids entered.
You may find it useful to observe that for each amino acid in each dimension there are sufficiently few other amino acids in the range of R-neighborhoods (i.e., in the subspace [x-R..x+R, *, *]) that one can already search sequentially.
Configuration and data file syntax checking requirements
The primary evaluation criterion is functional correctness and efficiency on correctly entered data. The program must be stable (i.e. not perform any undefined operations, have unhandled exceptions, exit uncontrollably, etc.) on any (i.e. arbitrarily corrupted) data.
In order to achieve the full number of points, a check of the syntax of the configuration and data files is necessary, if it is violated, the program writes (to the output file or to the standard output, according to the parameters of the command line) the string "error" and ends (with a return code of 0). Consider a syntax violation other than a valid 3-letter amino acid code, a different number of coordinates, non-numeric characters at coordinate positions, etc.
If any data file specified in the configuration file cannot be opened (e.g. because it does not exist), it is not considered an error, simply skip the file. Being not able to open configuration file is an error.
----
File from downloadable from the link:
Kód: Vybrat vše
GLY -5902 73707 44647
PRO -6264 73743 40764
TYR -3705 71988 38542
LEU -3494 70898 34880
VAL -2843 67241 34000
ILE -2073 65571 30665
VAL -5000 63138 30106
GLU -3371 61798 26891
GLN 0298 62526 26068
PRO 1548 63200 22516
LYS 2971 60182 20620
GLN 6732 60114 21213
ARG 7588 58664 17782
GLY 6245 58612 14266
PHE 5232 62279 13870
ARG 6704 64364 11105
PHE 7776 67992 11705
ARG 7017 70083 8594
TYR 9026 72923 7063
GLY 7212 76167 6207
CYS 7470 75226 2529
GLU 5463 71982 2791
GLY 2067 73278 3865
PRO -0149 73817 6887
SER -2581 70903 6312
HIS -0429 67805 6799
GLY -1985 66380 9962
GLY -2112 67101 13681
LEU 0271 65758 16340
PRO -1704 62677 17591
GLY -2080 61172 21099
ALA -0097 58364 22710
SER -2608 55734 21533
SER -3856 56280 17981
GLU -4190 52984 16070
LYS -6365 52493 12940
GLY -9823 54116 12842
ARG -9728 55799 16290
LYS -7067 58485 16685
THR -6530 60460 19863
TYR -4810 63883 20210
PRO -2865 65278 23197
THR -5445 66082 25941
VAL -5149 67919 29242
LYS -7630 68753 31976
ILE -7906 71378 34717
CYS -8719 69194 37749
ASN -10404 70905 40716
TYR -11209 73822 38373
GLU -11458 77453 39415
GLY -13284 79824 37052
PRO -13557 79329 33267
ALA -9954 79073 32030
LYS -7299 79768 29439
ILE -4861 77005 28304
GLU -1846 77730 26095
VAL 0660 75301 24561
ASP 4077 75922 23036
LEU 7228 74017 22190
VAL 9966 73747 24855
THR 13386 72129 24601
HIS 14112 68506 25486
SER 16709 69869 27923
ASP 15438 70035 31507
PRO 14855 73696 32442
PRO 12178 73987 29700
ARG 13172 76853 27501
ALA 11308 77859 24303
HIS 12261 75907 21143
ALA 13204 77563 17826
HIS 10407 75400 16314
SER 6815 76712 16480
LEU 3416 75054 16816
VAL 1340 75471 13671
GLY -2446 75104 13090
LYS -5815 75395 14808
GLN -5961 78357 17279
CYS -2120 78878 17123
SER -0589 82386 16999
GLU 2146 83357 14620
LEU 4788 82755 17230
GLY 3830 79151 17958
ILE 1670 79732 21043
CYS -1672 77951 21035
ALA -4276 79584 23267
VAL -7716 78245 24030
SER -10487 78742 26569
VAL -11923 75715 28467
GLY -15648 76296 29200
PRO -17220 76803 32657
LYS -18205 73099 32658
ASP -16010 70721 30632
MET -12476 71383 31856
THR -10707 69209 29289
ALA -8884 70494 26198
GLN -8229 68097 23312
PHE -5664 69825 21044
ASN -7002 68392 17826
ASN -4686 70039 15291
LEU -1173 70852 16446
GLY 1745 70832 14001
VAL 5462 71549 14380
LEU 6882 74031 11870
HIS 10501 73041 11348
VAL 13044 75904 11296
THR 15901 75530 8833
LYS 19433 75655 10285
LYS 19805 78943 8383
ASN 16548 80073 10015
MET 17317 79109 13580
MET 19235 82218 14690
GLY 16743 84782 13422
THR 13784 82759 14770
MET 15698 82476 18034
ILE 16634 86225 18473
GLN 12996 87040 17579
LYS 11470 84834 20294
LEU 14178 86005 22727
GLN 13369 89704 22123
ARG 9699 88988 22587
GLN 10682 87286 25841
ARG 12753 90306 26946
LEU 10048 92859 26526
ARG 7623 90445 28155
SER 7480 90991 31936
ARG 10089 93811 31835
PRO 10058 97252 29870
GLN 9652 98057 26142
GLY 12949 98079 24275
LEU 16442 96818 23593
THR 19556 98978 23660
GLU 22375 97823 21343
ALA 24145 96236 24282
GLU 20912 94388 25119
GLN 20607 93331 21481
ARG 24262 92206 21301
GLU 23787 89912 24374
LEU 20410 88680 23083
GLU 22215 87556 19909
GLN 24750 85902 22229
GLU 22082 83603 23681
ALA 20996 82677 20179
LYS 24443 81409 19128
GLU 25101 79517 22353
LEU 21496 78247 22381
LYS 21766 76912 18827
LYS 24900 74822 19619
VAL 23010 72824 22323
MET 19482 72631 20887
ASP 18136 69379 19421
LEU 16045 70184 16340
SER 14729 66589 16289
ILE 13135 66618 19726
VAL 10525 68948 21257
ARG 8191 68845 24255
LEU 4723 70383 24547
ARG 4344 72756 27479
PHE 0787 73383 28638
SER 0296 76558 30645
ALA -3150 76794 32224
PHE -4588 80131 33465
LEU -7519 80693 35772
ARG -9333 83809 34513
SER -4554 86056 33994
LEU -3785 83785 37002
PRO -1175 81039 36386
LEU -1311 77353 37450
LYS 1867 75230 37378
PRO 3065 74417 33818
VAL 2985 70745 32807
ILE 5618 69450 30341
SER 4877 66389 28188
GLN 7291 63642 27201
PRO 9670 64213 24228
ILE 8346 64193 20665
HIS 10762 62906 18060
ASP 10791 64095 14467
SER 10061 61036 12283
LYS 12033 62736 9520
SER 15216 62997 11542
PRO 17245 59950 10317
GLY 17438 58727 13939
ALA 13952 59279 15418
SER 11119 57328 13782
ASN 9027 54290 14618
LEU 11099 51143 14313
LYS 9026 49054 11907
ILE 9506 45573 10446
SER 7956 45698 6957
ARG 8694 42101 6048
MET 11300 39415 6458
ASP 12873 36586 4550
LYS 12076 33624 6745
THR 10020 33044 9879
ALA 11498 29718 10916
GLY 14891 28175 11319
SER 16964 25704 13281
VAL 17143 25680 17052
ARG 20963 25794 16510
GLY 20824 29465 15507
GLY 22848 30896 12662
ASP 20178 31069 9967
GLU 20493 34243 7847
VAL 17354 36452 7654
TYR 16848 39532 5394
LEU 14658 42018 7295
LEU 13119 45024 5485
CYS 12551 47987 7815
ASP 11835 51741 7949
LYS 14999 53877 8126
VAL 17326 52835 10966
GLN 20796 53806 12137
LYS 23533 51175 11547
ASP 25093 51639 14954
ASP 22001 52243 17053
ILE 19859 49238 16143
GLU 19426 45655 17326
VAL 17308 42563 16717
ARG 15951 41370 20044
PHE 14592 37820 20221
TYR 12631 36842 23322
GLU 9749 34819 24787
ASP 8051 36240 27899
ASP 6886 33394 30254
GLU 7554 33506 34056
ASN 11262 33452 33047
GLY 12252 35701 30103
TRP 14890 34456 27603
GLN 16442 37019 25278
ALA 19102 36908 22576
PHE 20204 38978 19590
GLY 20627 38977 15814
ASP 24156 38799 14470
PHE 25101 41451 11922
SER 27787 44113 11384
PRO 26997 47664 10160
THR 27811 46422 6610
ASP 24814 44143 7078
VAL 22504 47113 7550
HIS 21818 48009 3911
LYS 21168 51714 3043
GLN 19246 52071 6376
TYR 16281 50029 5037
ALA 17352 46377 5349
ILE 19238 44272 7870
VAL 20851 40933 6960
PHE 21363 39049 10191
ARG 22067 35534 11465
THR 19508 34280 14044
PRO 20741 33513 17538
PRO 20765 29990 19008
TYR 17734 28967 21094
HIS 18353 28429 24837
LYS 17559 24665 24766
MET 19471 22662 22095
LYS 17386 19628 23051
ILE 13946 20695 21789
GLU 11678 17961 20507
ARG 8822 20158 19311
PRO 9074 23447 17403
VAL 8841 26555 19640
THR 7655 30009 18645
VAL 9550 32969 20076
PHE 9433 36593 18957
LEU 11879 39031 17452
GLN 11742 42784 17347
LEU 13867 45695 16363
LYS 15128 47655 19342
ARG 17014 50927 19625
LYS 20262 50606 21626
ARG 20462 54056 23188
GLY 16657 54294 23601
GLY 14906 50997 24262
ASP 11870 51519 22010
VAL 10922 48525 19942
SER 9069 47812 16756
ASP 5930 45668 16633
SER 7200 42113 17169
LYS 7071 39374 14520
GLN 7145 35655 15455
PHE 9903 33186 14637
THR 9762 29446 15181
TYR 12667 27357 16209
TYR 12735 23834 14772
PRO 14682 20629 15825
GLY 37829 72937 -44895
PRO 39423 72794 -41437
TYR 37085 70892 -39150
LEU 37150 69964 -35506
VAL 36522 66345 -34541
ILE 35992 64789 -31118
VAL 38760 62171 -30602
GLU 37605 60944 -27176
GLN 33947 61358 -26113
PRO 32931 62292 -22574
LYS 31774 59246 -20589
GLN 27986 58978 -20925
ARG 27254 57588 -17370
GLY 29050 57545 -14042
PHE 29709 61297 -13617
ARG 27942 63593 -11120
PHE 26922 67248 -11435
ARG 27902 69285 -8385
TYR 26066 72303 -6889
GLY 27852 75521 -5996
CYS 27794 74536 -2296
GLU 29922 71490 -2950
GLY 33186 73050 -4097
PRO 35508 73107 -7144
SER 37760 70053 -6401
HIS 35373 67158 -7101
GLY 36772 65325 -10144
GLY 36952 66091 -13863
LEU 34560 64640 -16456
PRO 36434 61522 -17763
GLY 36657 60146 -21346
ALA 34448 57382 -22743
SER 37231 54895 -21834
SER 38620 55456 -18366
GLU 39077 52164 -16546
LYS 41392 51549 -13499
GLY 44617 53604 -13698
ARG 44243 54637 -17369
LYS 42031 57687 -17194
THR 41083 59663 -20266
TYR 39365 63079 -20564
PRO 37316 64423 -23507
THR 39574 65366 -26395
VAL 39122 67040 -29740
LYS 41373 68063 -32609
ILE 41324 70539 -35486
CYS 41946 68584 -38686
ASN 43403 69977 -41943
TYR 44839 72761 -39735
GLU 44771 76441 -40494
GLY 46301 79189 -38229
PRO 47095 78906 -34391
ALA 43648 78184 -32947
LYS 41316 78587 -30054
ILE 38732 76075 -28770
GLU 35780 77030 -26562
VAL 33632 74468 -24686
ASP 30188 75142 -23224
LEU 27199 73091 -22019
VAL 24297 72664 -24407
THR 21014 70880 -24011
HIS 20088 67351 -24992
SER 17220 68465 -27306
ASP 18198 69046 -30976
PRO 18611 72685 -31991
PRO 21409 72986 -29380
ARG 20731 75777 -26945
ALA 22892 76815 -23963
HIS 22058 74953 -20771
ALA 21474 76500 -17328
HIS 24519 74535 -16100
SER 28005 75924 -16368
LEU 31391 74300 -16789
VAL 33790 74805 -13893
GLY 37519 74517 -13335
LYS 40668 74478 -15402
GLN 40425 77592 -17526
CYS 36675 78042 -17053
SER 35058 81522 -17152
GLU 32580 82666 -14533
LEU 29870 82035 -17146
GLY 30647 78441 -17876
ILE 32746 78936 -21037
CYS 35977 76948 -21311
ALA 38437 78503 -23737
VAL 41905 77243 -24548
SER 44516 77856 -27256
VAL 45636 74954 -29506
GLY 49323 75501 -30280
PRO 50759 76164 -33752
LYS 51624 72523 -34351
ASP 49767 70304 -31866
MET 46101 70914 -32799
THR 44598 68639 -30157
ALA 43079 69783 -26877
GLN 42566 67450 -23924
PHE 39931 68678 -21442
ASN 41585 67449 -18259
ASN 39319 69052 -15668
LEU 35681 69631 -16569
GLY 33250 70024 -13721
VAL 29536 70474 -14319
LEU 27859 73088 -12154
HIS 24307 72231 -11125
VAL 22013 75266 -11054
THR 19235 75040 -8477
LYS 15623 75391 -9703
LYS 15398 78571 -7664
ASN 18379 79795 -9696
MET 17636 78545 -13167
MET 15411 81499 -14058
GLY 18078 83957 -12924
THR 21083 82429 -14703
MET 19053 81406 -17729
ILE 18243 85108 -18162
GLN 21916 86068 -17595
LYS 22788 83669 -20436
LEU 20118 84884 -22882
GLN 21005 88549 -22110
ARG 24672 87693 -22616
GLN 23671 85807 -25723
ARG 21295 88476 -27022
LEU 23322 91574 -26330
ARG 26019 89432 -27930
SER 25805 89806 -31728
ARG 23461 92803 -31639
PRO 23802 95989 -29396
GLN 24416 96566 -25613
GLY 21159 96897 -23672
LEU 17588 95752 -23213
THR 14599 98104 -23056
GLU 11835 97171 -20554
ALA 9898 95329 -23247
GLU 13115 93418 -24002
GLN 13797 92428 -20363
ARG 10196 91204 -20045
GLU 10419 88862 -23062
LEU 13685 87547 -21613
GLU 12001 86316 -18456
GLN 9461 84426 -20610
GLU 11916 82414 -22732
ALA 13423 81462 -19352
LYS 10147 80698 -17671
GLU 9112 78601 -20684
LEU 12647 77405 -21077
LYS 12712 76076 -17490
LYS 9600 73963 -18084
VAL 11260 72077 -20982
MET 14891 71883 -19786
ASP 16243 68527 -18513
LEU 18654 69339 -15747
SER 20064 65778 -15666
ILE 21517 65642 -19203
VAL 23951 68034 -20892
ARG 26031 67834 -24067
LEU 29538 69313 -24601
ARG 29970 71840 -27425
PHE 33360 72571 -28962
SER 33548 75789 -30986
ALA 36966 76344 -32621
PHE 38292 79719 -33893
LEU 41297 80162 -36201
ARG 43042 83517 -35791
SER 38086 85130 -34988
LEU 37045 82737 -37790
PRO 34565 80041 -36628
LEU 34793 76331 -37577
LYS 31477 74455 -37338
PRO 30630 73532 -33676
VAL 30917 69785 -32897
ILE 28544 68398 -30256
SER 29373 65369 -28130
GLN 27045 62567 -27060
PRO 24822 63154 -23987
ILE 26287 63144 -20432
HIS 23973 61973 -17642
ASP 23935 63028 -13955
SER 24990 60041 -11793
LYS 23222 61538 -8765
SER 19909 62085 -10494
PRO 17509 59227 -9469
GLY 16966 58164 -13095
ALA 20657 57955 -14074
SER 22205 56453 -10944
ASN 24644 53523 -11154
LEU 22663 50243 -10992
LYS 24099 47657 -8536
ILE 22975 44300 -7088
SER 24336 44221 -3560
ARG 23105 40924 -2045
MET 20579 38271 -2955
ASP 18332 36099 -0843
LYS 18801 32885 -2669
THR 20739 32013 -5772
ALA 18894 28851 -6708
GLY 15322 27664 -7134
SER 13047 25395 -9187
VAL 12766 25518 -12968
ARG 9037 26422 -12579
GLY 10151 29791 -11316
GLY 7869 31477 -8797
ASP 10454 31473 -6040
GLU 10393 34614 -3941
VAL 13711 36507 -3830
TYR 14814 39437 -1636
LEU 17392 41375 -3661
LEU 19020 44375 -2038
CYS 20329 46965 -4451
ASP 21480 50568 -4929
LYS 18665 53141 -5412
VAL 16190 52357 -8208
GLN 12876 53968 -9091
LYS 9999 51542 -8573
ASP 8331 52370 -11854
ASP 11563 52273 -13922
ILE 13324 49080 -12825
GLU 13381 45447 -13927
VAL 15164 42232 -12928
ARG 16189 40548 -16147
PHE 17234 36834 -16128
TYR 19061 35699 -19259
GLU 21626 33484 -20983
ASP 23772 34090 -24092
ASP 24587 31202 -26397
GLU 24101 31390 -30186
ASN 20364 31765 -29327
GLY 19543 33559 -26099
TRP 16677 33700 -23613
GLN 15344 36329 -21221
ALA 12637 36581 -18586
PHE 11740 39050 -15875
GLY 11248 38679 -12138
ASP 7632 39425 -11182
PHE 6800 42104 -8657
SER 4558 45119 -8051
PRO 5610 48719 -7102
THR 4730 47736 -3511
ASP 7510 45118 -3736
VAL 10258 47658 -4256
HIS 11094 48572 -0658
LYS 12111 52249 -0233
GLN 14117 52266 -3519
TYR 16778 49927 -2025
ALA 15286 46475 -2137
ILE 13155 44447 -4511
VAL 11069 41418 -3389
PHE 10212 39404 -6508
ARG 9213 35982 -7791
THR 11689 34430 -10225
PRO 10526 33672 -13764
PRO 10450 30155 -15221
TYR 13435 28807 -17212
HIS 12592 27953 -20867
LYS 13119 24168 -20635
MET 11163 22511 -17793
LYS 12513 19051 -18392
ILE 16171 19712 -17420
GLU 17906 16675 -16002
ARG 20739 18749 -14442
PRO 20795 22205 -12656
VAL 21420 25231 -14950
THR 23035 28549 -13979
VAL 21787 31726 -15708
PHE 22609 35461 -15284
LEU 20442 38158 -13807
GLN 20811 41936 -13908
LEU 18869 45057 -12875
LYS 17767 47105 -15890
ARG 15886 50384 -16147
LYS 12986 49924 -18601
ARG 13828 53217 -20348
GLY 17599 53680 -20335
GLY 18712 50174 -21345
ASP 21222 50776 -18587
VAL 22137 47629 -16586
SER 23898 46535 -13407
ASP 26794 44083 -13346
SER 25489 40525 -13460
LYS 24824 38022 -10674
GLN 23829 34354 -11076
PHE 20884 32126 -10268
THR 20616 28329 -10690
TYR 17440 26526 -11742
TYR 16906 23050 -10274
PRO 15210 20064 -12063