GenomicRangeQuery/codility/preFix sums
22233 ワード
まずテーマを書きます.
A DNA sequence can be represented as a string consisting of the letters A, C, G and T, which correspond to the types of successive nucleotides in the sequence. Each nucleotide has an impact factor, which is an integer. Nucleotides of types A, C, G and T have impact factors of 1, 2, 3 and 4, respectively. You are going to answer several queries of the form: What is the minimal impact factor of nucleotides contained in a particular part of the given DNA sequence?
The DNA sequence is given as a non-empty string S =S[0]S[1]...S[N-1] consisting of N characters. There are M queries, which are given in non-empty arrays P and Q, each consisting of M integers. The K-th query (0 ≤ K < M) requires you to find the minimal impact factor of nucleotides contained in the DNA sequence between positions P[K] and Q[K] (inclusive).
For example, consider string S = CAGCCTA and arrays P, Q such that:
The answers to these M = 3 queries are as follows: The part of the DNA between positions 2 and 4 contains nucleotides G and C (twice), whose impact factors are 3 and 2 respectively, so the answer is 2. The part between positions 5 and 5 contains a single nucleotide T, whose impact factor is 4, so the answer is 4. The part between positions 0 and 6 (the whole string) contains all nucleotides, in particular nucleotide Awhose impact factor is 1, so the answer is 1.
Assume that the following declarations are given:
struct Results { int * A; int M;};
Write a function:
struct Results solution(char *S, int P[], int Q[], int M);
that, given a non-empty zero-indexed string S consisting of N characters and two non-empty zero-indexed arrays P and Q consisting of M integers, returns an array consisting of M integers specifying the consecutive answers to all queries.
The sequence should be returned as: a Results structure (in C), or a vector of integers (in C++), or a Results record (in Pascal), or an array of integers (in any other programming language).
For example, given the string S = CAGCCTA and arrays P, Q such that:
the function should return the values [2, 4, 1], as explained above.
Assume that: N is an integer within the range [1..100,000]; M is an integer within the range [1..50,000]; each element of arrays P, Q is an integer within the range [0..N − 1]; P[K] ≤ Q[K], where 0 ≤ K < M; string S consists only of upper-case English letters A, C, G, T.
Complexity: expected worst-case time complexity is O(N+M); expected worst-case space complexity is O(N), beyond input storage (not counting the storage required for input arguments).
Elements of input arrays can be modified.
Copyright 2009–2015 by Codility Limited. All Rights Reserved. Unauthorized copying, publication or disclosure prohibited.
2.テーマ分析
このテーマは簡単に見えますが、1つのシーケンスのいずれかのsliceで、現れた最小値を見つけます.
しかし,要求時間の複雑さはO(N)であり,いくつかの苦労を費やした.
1. まずstrをint配列に変換し,毎回遍歴しないように後の処理を加速する.
strlen関数を用いてstrの長さを求め,tempポインタを介してこのstrを遍歴し,Aを1,Cを2,Gを3,Tを4に変えた.
この配列を手に入れた方が操作しやすい~
2. 任意のセグメントの最小値を得るのは簡単そうに見えますが、遍歴すればいいのですが、時間の複雑さはO(N*M)になります.
時間の複雑さを減らすにはどうすればいいですか?答えは空間で時間を変えることです.
prefixの考え方により,各位置の前にこの位置,1出現回数,2出現回数,3出現回数,4出現回数を含めて集計した.
配列A,B,C,Dをそれぞれ格納する.
クエリーのたびに、A[end of slice]-A[Begin of Slice]がゼロより大きいかどうかを判断すれば、1が発生したかどうかを判断できます.このステップの時間複雑度はO(N)からO(1)に低下する.
str[begin of slice]が1であるかどうかを確認する必要がある.この位置が1である場合、表現されないからである.
残りのクエリは、スペースの複雑さを増すことで、時間の複雑さを低減します.
最小の出現値を見つければよい.
3.コードは次のとおりです.
A DNA sequence can be represented as a string consisting of the letters A, C, G and T, which correspond to the types of successive nucleotides in the sequence. Each nucleotide has an impact factor, which is an integer. Nucleotides of types A, C, G and T have impact factors of 1, 2, 3 and 4, respectively. You are going to answer several queries of the form: What is the minimal impact factor of nucleotides contained in a particular part of the given DNA sequence?
The DNA sequence is given as a non-empty string S =S[0]S[1]...S[N-1] consisting of N characters. There are M queries, which are given in non-empty arrays P and Q, each consisting of M integers. The K-th query (0 ≤ K < M) requires you to find the minimal impact factor of nucleotides contained in the DNA sequence between positions P[K] and Q[K] (inclusive).
For example, consider string S = CAGCCTA and arrays P, Q such that:
P[0] = 2 Q[0] = 4 P[1] = 5 Q[1] = 5 P[2] = 0 Q[2] = 6
The answers to these M = 3 queries are as follows:
Assume that the following declarations are given:
struct Results { int * A; int M;};
Write a function:
struct Results solution(char *S, int P[], int Q[], int M);
that, given a non-empty zero-indexed string S consisting of N characters and two non-empty zero-indexed arrays P and Q consisting of M integers, returns an array consisting of M integers specifying the consecutive answers to all queries.
The sequence should be returned as:
For example, given the string S = CAGCCTA and arrays P, Q such that:
P[0] = 2 Q[0] = 4 P[1] = 5 Q[1] = 5 P[2] = 0 Q[2] = 6
the function should return the values [2, 4, 1], as explained above.
Assume that:
Complexity:
Elements of input arrays can be modified.
Copyright 2009–2015 by Codility Limited. All Rights Reserved. Unauthorized copying, publication or disclosure prohibited.
2.テーマ分析
このテーマは簡単に見えますが、1つのシーケンスのいずれかのsliceで、現れた最小値を見つけます.
しかし,要求時間の複雑さはO(N)であり,いくつかの苦労を費やした.
1. まずstrをint配列に変換し,毎回遍歴しないように後の処理を加速する.
strlen関数を用いてstrの長さを求め,tempポインタを介してこのstrを遍歴し,Aを1,Cを2,Gを3,Tを4に変えた.
この配列を手に入れた方が操作しやすい~
2. 任意のセグメントの最小値を得るのは簡単そうに見えますが、遍歴すればいいのですが、時間の複雑さはO(N*M)になります.
時間の複雑さを減らすにはどうすればいいですか?答えは空間で時間を変えることです.
prefixの考え方により,各位置の前にこの位置,1出現回数,2出現回数,3出現回数,4出現回数を含めて集計した.
配列A,B,C,Dをそれぞれ格納する.
クエリーのたびに、A[end of slice]-A[Begin of Slice]がゼロより大きいかどうかを判断すれば、1が発生したかどうかを判断できます.このステップの時間複雑度はO(N)からO(1)に低下する.
str[begin of slice]が1であるかどうかを確認する必要がある.この位置が1である場合、表現されないからである.
残りのクエリは、スペースの複雑さを増すことで、時間の複雑さを低減します.
最小の出現値を見つければよい.
3.コードは次のとおりです.
1 // you can write to stdout for debugging purposes, e.g.
2 // printf("this is a debug message
");
3
4 struct Results solution(char *S, int P[], int Q[], int M) {
5 struct Results result;
6 // write your code in C99
7 int len = strlen(S);
8 // printf("len is %d",len);
9
10
11 int arr[len];
12 int i;
13 char* temp = S;
14 for(i=0;i<len;i++)
15 {
16 if(*temp=='C')
17 {
18 arr[i]=2;
19 }
20 else if(*temp=='A')
21 {
22 arr[i]=1;
23 }
24 else if(*temp=='G')
25 {
26 arr[i]=3;
27 }
28 else if(*temp=='T')
29 {
30 arr[i]=4;
31 }
32 temp++;
33 }
34
35 int A[len];
36 int B[len];
37 int C[len];
38 int D[len];
39
40 if(arr[0]==1)
41 {
42 A[0]=1;
43 B[0]=0;
44 C[0]=0;
45 D[0]=0;
46 }
47
48 if(arr[0]==2)
49 {
50 A[0]=0;
51 B[0]=1;
52 C[0]=0;
53 D[0]=0;
54 }
55
56
57 if(arr[0]==3)
58 {
59 A[0]=0;
60 B[0]=0;
61 C[0]=1;
62 D[0]=0;
63 }
64
65 if(arr[0]==4)
66 {
67 A[0]=0;
68 B[0]=0;
69 C[0]=0;
70 D[0]=1;
71 }
72
73 // printf("%d %d %d %d
",A[0],B[0],C[0],D[0]);
74 for(i=1;i<len;i++)
75 {
76 // printf("%d
",arr[i]);
77 if(arr[i]==1)
78 {
79 A[i]=A[i-1]+1;
80 B[i]=B[i-1];
81 C[i]=C[i-1];
82 D[i]=D[i-1];
83 }
84
85 if(arr[i]==2)
86 {
87 A[i]=A[i-1];
88 B[i]=B[i-1]+1;
89 C[i]=C[i-1];
90 D[i]=D[i-1];
91 }
92
93
94 if(arr[i]==3)
95 {
96 A[i]=A[i-1];
97 B[i]=B[i-1];
98 C[i]=C[i-1]+1;
99 D[i]=D[i-1];
100 }
101
102 if(arr[i]==4)
103 {
104 A[i]=A[i-1];
105 B[i]=B[i-1];
106 C[i]=C[i-1];
107 D[i]=D[i-1]+1;
108 }
109
110 // printf("%d %d %d %d
",A[i],B[i],C[i],D[i]);
111 }
112 result.A = malloc(sizeof(int)*M);
113 result.M = M;
114
115 for(i=0;i<M;i++)
116 {
117 int tempP = P[i];
118 int tempQ = Q[i];
119 if(arr[tempP]==1 || (A[tempQ]-A[tempP]) > 0)
120 {
121 result.A[i]=1;
122 }
123 else if(arr[tempP]==2 || (B[tempQ]-B[tempP]) > 0)
124 {
125 result.A[i]=2;
126 }
127 else if(arr[tempP]==3 || (C[tempQ]-C[tempP]) > 0)
128 {
129 result.A[i]=3;
130 }
131 else
132 {
133 result.A[i]=4;
134 }
135 }
136
137
138 return result;
139 }