GenomicRangeQuery/codility/preFix sums

22233 ワード

まずテーマを書きます.
A DNA sequence can be represented as a string consisting of the letters A, C, G and T, which correspond to the types of successive nucleotides in the sequence. Each nucleotide has an impact factor, which is an integer. Nucleotides of types A, C, G and T have impact factors of 1, 2, 3 and 4, respectively. You are going to answer several queries of the form: What is the minimal impact factor of nucleotides contained in a particular part of the given DNA sequence?
The DNA sequence is given as a non-empty string S =S[0]S[1]...S[N-1] consisting of N characters. There are M queries, which are given in non-empty arrays P and Q, each consisting of M integers. The K-th query (0 ≤ K < M) requires you to find the minimal impact factor of nucleotides contained in the DNA sequence between positions P[K] and Q[K] (inclusive).
For example, consider string S = CAGCCTA and arrays P, Q such that:
 P[0] = 2 Q[0] = 4 P[1] = 5 Q[1] = 5 P[2] = 0 Q[2] = 6

The answers to these M = 3 queries are as follows:
  • The part of the DNA between positions 2 and 4 contains nucleotides G and C (twice), whose impact factors are 3 and 2 respectively, so the answer is 2.
  • The part between positions 5 and 5 contains a single nucleotide T, whose impact factor is 4, so the answer is 4.
  • The part between positions 0 and 6 (the whole string) contains all nucleotides, in particular nucleotide Awhose impact factor is 1, so the answer is 1.

  • Assume that the following declarations are given:
    struct Results {  int * A;  int M;};
    Write a function:
    struct Results solution(char *S, int P[], int Q[], int M);
    that, given a non-empty zero-indexed string S consisting of N characters and two non-empty zero-indexed arrays P and Q consisting of M integers, returns an array consisting of M integers specifying the consecutive answers to all queries.
    The sequence should be returned as:
  • a Results structure (in C), or
  • a vector of integers (in C++), or
  • a Results record (in Pascal), or
  • an array of integers (in any other programming language).

  • For example, given the string S = CAGCCTA and arrays P, Q such that:
     P[0] = 2 Q[0] = 4 P[1] = 5 Q[1] = 5 P[2] = 0 Q[2] = 6

    the function should return the values [2, 4, 1], as explained above.
    Assume that:
  • N is an integer within the range [1..100,000];
  • M is an integer within the range [1..50,000];
  • each element of arrays P, Q is an integer within the range [0..N − 1];
  • P[K] ≤ Q[K], where 0 ≤ K < M;
  • string S consists only of upper-case English letters A, C, G, T.

  • Complexity:
  • expected worst-case time complexity is O(N+M);
  • expected worst-case space complexity is O(N), beyond input storage (not counting the storage required for input arguments).

  • Elements of input arrays can be modified.
    Copyright 2009–2015 by Codility Limited. All Rights Reserved. Unauthorized copying, publication or disclosure prohibited.
     
    2.テーマ分析
    このテーマは簡単に見えますが、1つのシーケンスのいずれかのsliceで、現れた最小値を見つけます.
    しかし,要求時間の複雑さはO(N)であり,いくつかの苦労を費やした.
       1. まずstrをint配列に変換し,毎回遍歴しないように後の処理を加速する.
    strlen関数を用いてstrの長さを求め,tempポインタを介してこのstrを遍歴し,Aを1,Cを2,Gを3,Tを4に変えた.
    この配列を手に入れた方が操作しやすい~
       2. 任意のセグメントの最小値を得るのは簡単そうに見えますが、遍歴すればいいのですが、時間の複雑さはO(N*M)になります.
    時間の複雑さを減らすにはどうすればいいですか?答えは空間で時間を変えることです.
    prefixの考え方により,各位置の前にこの位置,1出現回数,2出現回数,3出現回数,4出現回数を含めて集計した.
    配列A,B,C,Dをそれぞれ格納する.
    クエリーのたびに、A[end of slice]-A[Begin of Slice]がゼロより大きいかどうかを判断すれば、1が発生したかどうかを判断できます.このステップの時間複雑度はO(N)からO(1)に低下する.
    str[begin of slice]が1であるかどうかを確認する必要がある.この位置が1である場合、表現されないからである.
    残りのクエリは、スペースの複雑さを増すことで、時間の複雑さを低減します.
    最小の出現値を見つければよい.
     
    3.コードは次のとおりです.
      1 // you can write to stdout for debugging purposes, e.g.
    
      2 // printf("this is a debug message
    ");
    3 4 struct Results solution(char *S, int P[], int Q[], int M) { 5 struct Results result; 6 // write your code in C99 7 int len = strlen(S); 8 // printf("len is %d",len); 9 10 11 int arr[len]; 12 int i; 13 char* temp = S; 14 for(i=0;i<len;i++) 15 { 16 if(*temp=='C') 17 { 18 arr[i]=2; 19 } 20 else if(*temp=='A') 21 { 22 arr[i]=1; 23 } 24 else if(*temp=='G') 25 { 26 arr[i]=3; 27 } 28 else if(*temp=='T') 29 { 30 arr[i]=4; 31 } 32 temp++; 33 } 34 35 int A[len]; 36 int B[len]; 37 int C[len]; 38 int D[len]; 39 40 if(arr[0]==1) 41 { 42 A[0]=1; 43 B[0]=0; 44 C[0]=0; 45 D[0]=0; 46 } 47 48 if(arr[0]==2) 49 { 50 A[0]=0; 51 B[0]=1; 52 C[0]=0; 53 D[0]=0; 54 } 55 56 57 if(arr[0]==3) 58 { 59 A[0]=0; 60 B[0]=0; 61 C[0]=1; 62 D[0]=0; 63 } 64 65 if(arr[0]==4) 66 { 67 A[0]=0; 68 B[0]=0; 69 C[0]=0; 70 D[0]=1; 71 } 72 73 // printf("%d %d %d %d
    ",A[0],B[0],C[0],D[0]);
    74 for(i=1;i<len;i++) 75 { 76 // printf("%d
    ",arr[i]);
    77 if(arr[i]==1) 78 { 79 A[i]=A[i-1]+1; 80 B[i]=B[i-1]; 81 C[i]=C[i-1]; 82 D[i]=D[i-1]; 83 } 84 85 if(arr[i]==2) 86 { 87 A[i]=A[i-1]; 88 B[i]=B[i-1]+1; 89 C[i]=C[i-1]; 90 D[i]=D[i-1]; 91 } 92 93 94 if(arr[i]==3) 95 { 96 A[i]=A[i-1]; 97 B[i]=B[i-1]; 98 C[i]=C[i-1]+1; 99 D[i]=D[i-1]; 100 } 101 102 if(arr[i]==4) 103 { 104 A[i]=A[i-1]; 105 B[i]=B[i-1]; 106 C[i]=C[i-1]; 107 D[i]=D[i-1]+1; 108 } 109 110 // printf("%d %d %d %d
    ",A[i],B[i],C[i],D[i]);
    111 } 112 result.A = malloc(sizeof(int)*M); 113 result.M = M; 114 115 for(i=0;i<M;i++) 116 { 117 int tempP = P[i]; 118 int tempQ = Q[i]; 119 if(arr[tempP]==1 || (A[tempQ]-A[tempP]) > 0) 120 { 121 result.A[i]=1; 122 } 123 else if(arr[tempP]==2 || (B[tempQ]-B[tempP]) > 0) 124 { 125 result.A[i]=2; 126 } 127 else if(arr[tempP]==3 || (C[tempQ]-C[tempP]) > 0) 128 { 129 result.A[i]=3; 130 } 131 else 132 { 133 result.A[i]=4; 134 } 135 } 136 137 138 return result; 139 }