TCHS-1-500
Problem Statement
In written languages, some symbols may appear more often than others. Expected frequency tables have been defined for many languages. For each symbol in a language, a frequency table will contain its expected percentage in a typical passage written in that language. For example, if the symbol "a"has an expected percentage of 5, then 5% of the letters in a typical passage will be "a". If a passage contains 350 letters, then 'a' has an expected count of 17.5 for that passage (17.5 = 350 * 5%). Please note that the expected count can be a non-integer value. The deviation of a text with respect to a language frequency table can be computed in the following manner. For each letter ('a'-'z') determine the difference between the expected count and the actual count in the text. The deviation is the sum of the squares of these differences. Blank spaces (' ') and line breaks (each element of text is a line) are ignored when calculating percentages. Each frequency table will be described as a concatenation of up to 16 strings of the form "ANN", where A is a lowercase letter ('a'-'z') and NN its expected frequency as a two-digit percentage between "00"(meaning 0%) and "99"(meaning 99%), inclusive. Any letter not appearing in a table is not expected to appear in a typical passage (0%). You are given a String[] frequencies of frequency tables of different languages. Return the lowest deviation the given text has with respect to the frequency tables.
Definition
Class:
SymbolFrequency
Method:
language
Parameters:
String[], String[]
Returns:
double
Method signature:
double language(String[] frequencies, String[] text)
(be sure your method is public)
Notes
-
The returned value must be accurate to within a relative or absolute value of 1E-9.
Constraints
-
frequencies will contain between 1 and 10 elements, inclusive.
-
Each element of frequencies will be formatted as described in the statement.
-
Each element of frequencies will contain between 6 and 48 characters, inclusive.
-
No letter will appear twice in the same element of frequencies.
-
The sum of the percentages in each element of frequencies will be equal to 100.
-
text will contain between 1 and 10 elements, inclusive.
-
Each element of text will contain between 1 and 50 characters, inclusive.
-
Each element of text will contain only lowercase letters ('a'-'z') and spaces (' ').
-
text will have at least one non-space character.
Examples
0)
{"a30b30c40","a20b40c40"}
1)
2)
3)
4)
5)
In written languages, some symbols may appear more often than others. Expected frequency tables have been defined for many languages. For each symbol in a language, a frequency table will contain its expected percentage in a typical passage written in that language. For example, if the symbol "a"has an expected percentage of 5, then 5% of the letters in a typical passage will be "a". If a passage contains 350 letters, then 'a' has an expected count of 17.5 for that passage (17.5 = 350 * 5%). Please note that the expected count can be a non-integer value. The deviation of a text with respect to a language frequency table can be computed in the following manner. For each letter ('a'-'z') determine the difference between the expected count and the actual count in the text. The deviation is the sum of the squares of these differences. Blank spaces (' ') and line breaks (each element of text is a line) are ignored when calculating percentages. Each frequency table will be described as a concatenation of up to 16 strings of the form "ANN", where A is a lowercase letter ('a'-'z') and NN its expected frequency as a two-digit percentage between "00"(meaning 0%) and "99"(meaning 99%), inclusive. Any letter not appearing in a table is not expected to appear in a typical passage (0%). You are given a String[] frequencies of frequency tables of different languages. Return the lowest deviation the given text has with respect to the frequency tables.
Definition
Class:
SymbolFrequency
Method:
language
Parameters:
String[], String[]
Returns:
double
Method signature:
double language(String[] frequencies, String[] text)
(be sure your method is public)
Notes
-
The returned value must be accurate to within a relative or absolute value of 1E-9.
Constraints
-
frequencies will contain between 1 and 10 elements, inclusive.
-
Each element of frequencies will be formatted as described in the statement.
-
Each element of frequencies will contain between 6 and 48 characters, inclusive.
-
No letter will appear twice in the same element of frequencies.
-
The sum of the percentages in each element of frequencies will be equal to 100.
-
text will contain between 1 and 10 elements, inclusive.
-
Each element of text will contain between 1 and 50 characters, inclusive.
-
Each element of text will contain only lowercase letters ('a'-'z') and spaces (' ').
-
text will have at least one non-space character.
Examples
0)
{"a30b30c40","a20b40c40"}
{"aa bbbb cccc"}
Returns: 0.0
The first table indicates that 30% of the letters are expected to be 'a', 30% to be 'b', and 40% to be 'c'. The second table indicates that 20% are expected to be 'a', 40% to be 'b', and 40% to be 'c'. We consider the text to have length 10, as blank spaces are ignored. With respect to the first table, there are 2 'a' where 3 were expected (a difference of 1), one more 'b' than expected (again a difference of 1) and as many 'c' as expected. The sum of the squares of those numbers gives a deviation of 2.0. As for the second table, the text matches expected counts exactly, so its deviation with respect to that language is 0.0. |
|
||
Returns: 2.0 |
||
|
|
||
Returns: 10.8 |
||
|
|
||
Returns: 130.6578 |
||
|
|
||
Returns: 114.9472 |
||
|
|
||
Returns: 495050.0 |
||
import java.util.*;
public class SymbolFrequency {
static int N = 26;
public double language(String[] freqs, String[] text) {
List<int[]> maps = new ArrayList<int[]>();
for (int i = 0; i < freqs.length; i++) {
int[] map = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
for (int j = 0; j < freqs[i].length(); j += 3)
map[freqs[i].charAt(j)-'a'] = Integer.parseInt(freqs[i].substring(j+1, j+3));
maps.add(map);
}
int count = 0;
int freq[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
for (int i = 0; i < text.length; i++)
for (int j = 0; j < text[i].length(); j++) {
char c = text[i].charAt(j);
if (Character.isLowerCase(c)) {
count++;
freq[c-'a']++;
}
}
double min = Double.MAX_VALUE;
for (int i = 0; i < maps.size(); i++) {
int[] map = maps.get(i);
double now = 0.0;
for (int j = 0; j < N; j++)
now += Math.pow(freq[j] - (count * map[j] / 100.0), 2);
if (now < min)
min = now;
}
return min;
}
}