“Unleashing the beast in Python” — We are Pythonistas, let’s write lighter codes.
The field of Data Science is expanding everyday. An increasing population is resulting to gain more amount of data for analysis, information extraction and changing lives. Frequent and lots of data transformations, data formatting, feature filtering and many other jobs are required to be done successfully before coming up with a good data set for Modeling in the field of Data Science.
The Feature engineering part consumes a lot of power and time in this field followed by the Data preparation part. As we all know that there are a lot of tasks involved before looking at the modeling part, writing repetitive codes and writing codes in the traditional way consume a lot of time, makes our code bulkier and often increase the code-time complexity. In this article, I am going to explain and demonstrate: Lambda function, Map function, Filter function and List comprehension technique.
Pythonic In-line functions:
There are a lot of built-in functions in Python which were efficiently designed and written just to make our lives easier. Most of the functions are used as an in-line functions at the time of writing out scripts. As we know, everything that we see in an OO language like Python are objects, belong to various classes.
The Lambda function or lambda(), very well known as the anonymous function is designed specially to fit functions in a single line. We can definitely understand the power of this feature in terms of code readability, reduction in bulkiness of code and increase in efficiency.
Let’s say we have a data frame as mentioned below. We have to create a new feature that will have the concatenated values of two or more other object type features. If we have shifted recently to Python, we would approach it in the traditional method.
We can solve the same problem by using Lambda function and reduce the number of lines to two lines only. In order to see the difference, we can definitely refer to the code is below and compare the similar code above. We must also remember that Lambda function returns output of type function.
We all are familiar with how algebraic functions perform basic operations on a given set of input known as the Domain and which in turn maps on to another set as output known as the Co-domain. The Map function or map() in Python is a built-in function performs the similar way as it takes each item from any form of data structure as input and maps a function and generates a sequence of output.
In Python scripts, the Map function comes in handy with the Lambda function as it helps the function to apply over all the elements from the iterable object and perform the objective of the function on the data.
The Filter function or filter() is also one of the most important built-in functions Python hold. Just like normal filters in any other fields, in Python, the Filter function performs on a given set of conditions. It generate iterators from a given set of values if the passed conditions are satisfied.
While coding in Python, we often tend to increase the bulkiness of our code by not using this function where it should be used. In a given data set, let’s say we have to filter out the names of animals which doesn’t start with the letter “C”. We make the code bulkier if we use the traditional approach of filtering with a function as mentioned below:
The Filter function returns an iterator and they need to be further casted to any form of desired structures. The similar problem can be solved as well using the filter() function in a single line or two as demonstrated below:
Pythonic approach— List comprehension
Being lazy is good but writing small and smart codes in Python is better. Python has given power to some of it’s data structures such as the List and the most reputed among them is the power of List comprehension. A shorter syntax is one of the most important phase of a good programming language, mostly in the world of Data Science where you don’t have much time to write codes.
The code above is taking a lot of steps to perform a task of doubling values in a list and storing them in a new list. With the help of List comprehension technique, we can reduce those number of steps to not more than 3 lines in an optimized form. We can see that in the code below and compare it with the similar code written above: