Unit testing for Data Science in Python


pytest docs


API ref: https://docs.pytest.org/en/6.2.x/reference.html

Assert

  • Assert:後の条件がtrueでない場合はasserror
  • を返します.
    ref: https://wikidocs.net/21050

    understanding test result report

  • general information
  • test result
  • F : failure (an exception is raised)
  • . : passed
  • .F. : pass , fail, pass
  • information about failed tests

  • test types

  • data module
  • feature module
  • models module
  • unit test
  • unit : small, independent piece of code
  • Mastering assert statements

  • message is raised when AssertError caused
  • actual = [ sth ]
    expected = None
    message = ( sth )
    
    assert actual is expected, message

    Don't do this

  • use pytest.approx()
  • testing for exceptions instead of return values

    with pytest.raises(ValueError):
    	sth

    Test Driven Development (TDD)


    def convert_to_int(integer_string_with_commas):
        comma_separated_parts = integer_string_with_commas.split(",")
        for i in range(len(comma_separated_parts)):
            # Write an if statement for checking missing commas
            if len(comma_separated_parts[i]) > 3:
                return None
            # Write the if statement for incorrectly placed commas
            if i != 0 and len(comma_separated_parts[i]) != 3:
                return None
        integer_string_without_commas = "".join(comma_separated_parts)
        try:
            return int(integer_string_without_commas)
        # Fill in with the correct exception for float valued argument strings
        except ValueError:
            return None

    How to organize a growing set of tests?


  • run all the tests in the test class using node IDs : !pytest models/test_train.py::[name]

  • run only the previously failing test:!pytest models/[name].py::[f name]
  • Expected failures and conditional skipping

  • conditional skipping
  • class
    	@pytest.mark.skipif( [condition] , reason = " sth ")
        def name(args):
        
        assert ~
        

  • the command that would only show the reason for expected failures in the test result report: !pytest -rs
  • both skipped test : add x
  • Mocking

  • testing funcs independently of dependencies
  • pytest-mock
  • unittest.mock
  • Mocker

    # Add the correct argument to use the mocking fixture in this test
    def test_on_raw_data(self, raw_and_clean_data_file, mocker):
        raw_path, clean_path = raw_and_clean_data_file
        # Replace the dependency with the bug-free mock
        convert_to_int_mock = mocker.patch("data.preprocessing_helpers.convert_to_int",
                                           side_effect=convert_to_int_bug_free)
        preprocess(raw_path, clean_path)
        # Check if preprocess() called the dependency correctly
        assert convert_to_int_mock.call_args_list == [call("1,801"), call("201,411"), call("2,002"), call("333,209"),
                                                      call("1990"),  call("782,911"), call("1,285"), call("389129")
                                                      ]
        with open(clean_path, "r") as f:
            lines = f.readlines()
        first_line = lines[0]
        assert first_line == "1801\\t201411\\n"
        second_line = lines[1]
        assert second_line == "2002\\t333209\\n"