So, it has been awhile since I posted a major article. I have been working on my Job Application Tracker App, which I originally wrote to replace the Google spreadsheet I had been using to track jobs to which I have applied as a part of my job search, and which a friend said I should turn it into a demo site, make it multi-user and perhaps monetize it. And I did put it up as a demo site, and did a major amount of work to make it fully multi-user. But while working on it, I ran into some issues. First, I found a bug in allauth, which is a Django package which provides a full framework for authentication and account management, including things like validating that an email is valid before letting a user login, or doing password resets. I found that the code was stripping off spaces from passwords, and so I opened up a issue, fixed the problem, updated the tests, and submitted the fix back to the project. 

But I also found a issue in another package, which led to me doing more than just supply a simple fix.  I have been using pylint to make my code clean, but I noticed that while a GitHub action I added ages ago was saying I had X number of files checked, when I ran it locally, I was missing some files. It turned out that the action was supplying a list of files, while I was telling it to discover the files on its own, which is far simpler. And with some digging, I found that while I have two directories with the names of applications and applications_api, it was only "discovering" the first. I found that the code was in trying to legitimately ignore duplicates, it was also ignoring directories with longer names.  So I opened an issue with the Python Code Quality Authority, which maintains that package, did some digging, fixed it and submitted that fix. But rather than being a casual contributor, I also opened up an issue with the way the method was traversing the project tree. In many developer's environments, they will use what is called a virtual environment, which is storied under a directory typically named .venv or venv, where a whole bunch of dependencies are installed. And then, in a project such as mine, there are other directories such as node_packages, which is like .venv but stores JavaScript packages instead of Python packages. And while a project may have a few dozen or a few hundred directories itself, those two can turn it into multiple thousands. And like before, I submitted a fix, which is in the process of being reviewed. And I decided to contribute some more that project, but I digress.

The reason I wanted to post this, is to share a bit about that second fix.  Before my fix, as I said, it traversed the entire directory tree, but it was even traversing directories under those which were in the "ignore" list. It does this using the os.walk() Python function. But, I knew how to keep that function from going into sub-directories, by manipulating the data returned before looping back to look at the next returned directory. And the results are amazing.  When I profiled the change on the pylint project while developing the fix, I got a set of two numbers from both before and after the fix.  The first number, tottime, is the time spent in the function itself, while the second, cumtime, represents the function and all the function it calls.  On the pylint project, these went from 0.029 and 1.060 to 0.016 and 0.638 seconds. That represents a 44.8% and 39.8% reduction. But more astounding is what I saw on my own project.  For it, it dropped from 0.085 and 4.895 to 0.002 and 0.074 seconds. That represents a reduction of 97.6% and 98.5%. While the drops are minor in the overall time (these runs can take have a minute to several minutes, because of all the things it checks), it is still significant and adds up.

So, back to coding... I have really started upping my game and using python utilities like black, mypy, along with general utilities tbump and towncrier on this project, just like pylint does.

Categories