Tag: python
-
Working with the local filesystem and with S3 in the same code
As data people, we need to work with files: we use files to save and load data, models, configurations, images, and other things. When possible, I prefer working with local files because it’s fast and straightforward. However, sometimes, the production code needs to work with data stored on S3. What do we do? Until recently, you would have to rewrite multiple parts of the code. But not anymore. I created a
sshalosh
package that solves so many problems and spares a lot of code rewriting. Here’s how you work with it: -
Sharing the results of your Python code
If you work, but nobody knows about your results or cares about them, have you done any work at all?
-
How to become a Python professional in 42 hours?
Here’s an appealing ad that I saw
-
TicToc — a flexible and straightforward stopwatch library for Python.
Many years ago, I needed a way to measure execution times. I didn’t like the existing solutions so I wrote my own class. As time passed by, I added small changes and improvements, and recently, I decided to publish the code on GitHub, first as a gist, and now as a full-featured Github repository, and a pip package.
-
The difference between python decorators and inheritance that cost me three hours of hair-pulling
I don’t have much hair on my head, but recently, I encountered a funny peculiarity in Python due to which I have been pulling my hair for a couple of hours. In retrospect, this feature makes a lot of sense. In retrospect.
-
Conference Recap: EuroSciPy 2018 — Data for Breakfast
See my recap of the recent EuroSciPy, published on https://data.blog
-
One of the reasons I don't like R
I never liked R. I didn’t like it for the first time I tried to learn it, I didn’t like it when I had to switch to R as my primary work tool at my previous job. And didn’t like it one and a half year later, when I was comfortable enough to add R to my CV, right before leaving my previous job.
-
What is the best way to handle command line arguments in Python?
The best way to handle command line arguments with Python is
[defopt](http://evanunderscore/defopt: Effortless argument parser)
. It works like magic. You write a function, add a proper docstring using any standard format (I use [numpy doc]), and see the magic -
Measuring the wall time in python programs
-
Gender salary gap in the Israeli high-tech — now the code
Several people have asked me about the technology I used to create the graphs in my recent post about the gender salary gap in the Israeli high-tech. Like 99% of the graphs I create, I used matplotlib. I have uploaded the notebook that I used for that post to Github. Here’s the link. The published version uses seaborn style settings. The original one uses a slightly customized style.
-
The Y-axis doesn't have to be on the left
Line charts are great to convey the evolution of a variable over the time. This is a typical chart. It has three key components, the X-axis that represents the time, the Y-axis that represents the tracked value, and the line itself.
-
The fastest way to get first N items in each group of a Pandas DataFrame
In my work, the speed of code writing and reading is usually more important than the speed of its execution. Right now, I’m facing a challenge of optimizing the running time of a fairly complex data science project. After a lot of profiling, I identified the major time consumers. One of such time-consuming steps involved grouping a Pandas DataFrame by a key, sorting each group by a score column, and taking first N elements in each group. The tables in this step are pretty small not more than one hundred elements. But since I have to perform this step many times, the running time accumulates to a substantial fraction.