Unit testing for Data Science in Python
4183 ワード
pytest docs
API ref: https://docs.pytest.org/en/6.2.x/reference.html
Assert
ref: https://wikidocs.net/21050
understanding test result report
F
: failure (an exception is raised) .
: passed .F. : pass , fail, pass
test types
Mastering assert statements
actual = [ sth ]
expected = None
message = ( sth )
assert actual is expected, message
Don't do this
pytest.approx()
testing for exceptions instead of return values
with pytest.raises(ValueError):
sth
Test Driven Development (TDD)
def convert_to_int(integer_string_with_commas):
comma_separated_parts = integer_string_with_commas.split(",")
for i in range(len(comma_separated_parts)):
# Write an if statement for checking missing commas
if len(comma_separated_parts[i]) > 3:
return None
# Write the if statement for incorrectly placed commas
if i != 0 and len(comma_separated_parts[i]) != 3:
return None
integer_string_without_commas = "".join(comma_separated_parts)
try:
return int(integer_string_without_commas)
# Fill in with the correct exception for float valued argument strings
except ValueError:
return None
How to organize a growing set of tests?
run all the tests in the test class using node IDs :
!pytest models/test_train.py::[name]
run only the previously failing test:
!pytest models/[name].py::[f name]
Expected failures and conditional skipping
class
@pytest.mark.skipif( [condition] , reason = " sth ")
def name(args):
assert ~
the command that would only show the reason for expected failures in the test result report:
!pytest -rs
x
Mocking
pytest-mock
unittest.mock
Mocker
# Add the correct argument to use the mocking fixture in this test
def test_on_raw_data(self, raw_and_clean_data_file, mocker):
raw_path, clean_path = raw_and_clean_data_file
# Replace the dependency with the bug-free mock
convert_to_int_mock = mocker.patch("data.preprocessing_helpers.convert_to_int",
side_effect=convert_to_int_bug_free)
preprocess(raw_path, clean_path)
# Check if preprocess() called the dependency correctly
assert convert_to_int_mock.call_args_list == [call("1,801"), call("201,411"), call("2,002"), call("333,209"),
call("1990"), call("782,911"), call("1,285"), call("389129")
]
with open(clean_path, "r") as f:
lines = f.readlines()
first_line = lines[0]
assert first_line == "1801\\t201411\\n"
second_line = lines[1]
assert second_line == "2002\\t333209\\n"
Reference
この問題について(Unit testing for Data Science in Python), 我々は、より多くの情報をここで見つけました https://velog.io/@jee-9/Unit-testing-for-Data-Science-in-Pythonテキストは自由に共有またはコピーできます。ただし、このドキュメントのURLは参考URLとして残しておいてください。
Collection and Share based on the CC Protocol