Abstract Syntax Tree for Patching Code and Assessing Code Quality

Abdul Qadir

Why should you care?

How do we easily and scalably patch 100,000s of lines of source code? Read about how we used a simple yet powerful data structure – Abstract Syntax Tree (AST) to create a system that from one single central point, maps source code dependencies and in-turn patches all dependencies.

Abstract

A software system is usually built with assumptions around how dependencies such as the underlying language system, frameworks, libraries etc. are written. Changes in these dependencies may have a ripple effect into the software system itself. For example, recently, the famous Python package pandas released its 1.0.0 version, which has deprecated and changed several functionalities that existed in its previous 0.25.x version. An organization may have many systems using 0.25.x version of pandas. Hence, upgrading it to 1.0.0 will require developers of every system to go through the pandas change documentation and patch their code accordingly.


Since we developers love to automate tedious tasks, it is natural for us to think of writing a patch script that will update the source code of all the systems according to the changes in new pandas version. A patch script could be parsing the source code and doing some kind of find+replace. But such a patch script will likely be unreliable and not comprehensive. For example, say the patch script needs to change the name of a function get to create wherever it is called in the code base. A simple find+replace will end up replacing the word “get” even if it was not a function call. Another example would be that find+replace will not be able to handle cases where code statements spill over to multiple lines. We need the patch script to parse the source code, while understanding the language constructs. In this article, we propose the use of Abstract Syntax Trees (AST) to write such patch scripts. And then later, we present how ASTs can be used to assess code quality.

Abstract Syntax Tree (AST)

Abstract Syntax Tree (or AST) is a tree representation of source code Wikipedia page.

Almost every language has a way to generate AST from its code. We use Python to build several critical parts of our systems. Hence, this article uses Python to give examples and highlights, but the learnings from here can be applied to any other language.

Python has a package called ast to generate ASTs. Here is a small tutorial on it.

Code:
import ast

# Simple code that sets a variable var to 1 and then prints it.
code = """
var = 1
print(var)
"""

# Converts code to AST. Object 'head' points to the head of the AST.
head = ast.parse(code)

print(head)

Output:

<_ast.Module object at ...>
So, the head of the AST is a Module object, which makes sense. Let’s dig deeper in it. The ast package provides an ast.dump(node) function that returns a formatted view of the entire tree rooted at node. Let’s call it on head object and see what we get.

Code:
print(ast.dump(head))

Output (prettified):

Module(
    body=[
        Assign(
            targets=[Name(id='var', ctx=Store())], value=Num(n=1)
        ),
        Expr(
            value=Call(func=Name(id='print', ctx=Load()), args=[Name(id='var', ctx=Load())], keywords=[])
        )
    ]
)

Looking at the ast.dump output, we can see that the head object which is of type Module has an attribute body whose value is a list of 2 nodes – one representing var = 1 and the other representing print(var). The first node representing var = 1 has a target attribute representing the LHS var and a value attribute representing the RHS 1. Let’s see if we can print the RHS.


Code:
print(head.body[0].value.n)

Output:

1
So, it works as expected. Now let’s try to modify the RHS from value 1 to 2.

Code:

head.body[0].value.n = 2

print(ast.dump(head))

Output (prettified):

Module(
    body=[
        Assign(
            targets=[Name(id='var', ctx=Store())], value=Num(n=2)
        ),
        Expr(
            value=Call(func=Name(id='print', ctx=Load()), args=[Name(id='var', ctx=Load())], keywords=[])
        )
    ]
)
We can see the value of the corresponding attribute has changed to 2. Now, we will want to convert the AST back to code to get the modified code. To do that, we will use a Python package called astunparse, for ast doesn’t provide this functionality.

Code:
import astunparse

print(astunparse.unparse(head))

Output:

var = 2
print(var)
So, the modified code has statement var = 2 instead of var = 1 as expected.

IntelliPatch

Now that we understand ASTs and how to generate them, inspect them, modify them and re-create code from them, let’s go back to the problem of writing patch scripts to modify the code of a system to use pandas 1.0.0 instead of pandas 0.25.x. We call these AST based patch scripts as “IntelliPatch”.


All the backward incompatibilities in pandas 1.0.0 are listed on this page. Let’s take the first backward incompatibility on the list and write IntelliPatch for that.

Avoid using names from MultiIndex.levels

In pandas 1.0.0, the name of a MultiIndex level can not be updated using = operator, instead it requires the use of Index.set_names().

Code using pandas 0.25.x:

import pandas as pd

mi = pd.MultiIndex.from_product([[1, 2], ['a', 'b']], names=['x', 'y'])
print(mi.levels[0].name)

mi.levels[0].name = "new name"
print(mi.levels[0].name)

Output:

x 

new name 
The above code will raise a RunTimeError with pandas 1.0.0. For it to use pandas 1.0.0, it should be modified to the code below.

Equivalent code using pandas 1.0.0:
import pandas as pd 

mi = pd.MultiIndex.from_product([[1, 2], ['a', 'b']], names=['x', 'y']) 
print(mi.levels[0].name) 

mi = mi.set_names("new name", level=0) 
print(mi.levels[0].name) 

The IntelliPatch needs to do the following:

  1. Create AST of the given code and traverse it.
  2. Identify if any node represents the code of form <var>.levels[<idx>].name = <val> .
  3. Replace the identified node with the one that represents the code of form <var> = <var>.set_names(<val>, level=<idx>).

Below is the IntelliPatch script that does that.

intelli_patch.py

import ast

def is_multi_index_rename_node(node):
    """
    Checks if the given node represents the code: <var>.levels[<idx>].name = <val>
    and returns the corresponding var, idx and val if it does.
    """
    try:
        if (
            isinstance(node, ast.Assign)
            and node.targets[0].attr == "name"
            and node.targets[0].value.value.attr == "levels"
        ):
            var = node.targets[0].value.value.value.id
            idx = node.targets[0].value.slice.value.n
            val = node.value

            return True, var, idx, val
    except:
        pass
    return False, None, None, None

def get_new_multi_index_rename_node(var, idx, val):
    """
    Returns AST node that represents the code: <var> = <var>.set_names(<val>, level=<idx>)
    for the given var, idx and val.
    """
    return ast.Assign(
        targets=[ast.Name(id=var)],
        value=ast.Call(
            func=ast.Attribute(value=ast.Name(id=var), attr="set_names"),
            args=[val],
            keywords=[ast.keyword(arg="level", value=ast.Num(n=idx))],
        ),
    )

def patch(node):
    """
    Takes an AST rooted at the give node and patches it.
    """
    # If it is a leaf node, then no patching needed.
    if not hasattr(node, "_fields"):
        return node

    # For every child of the node, modify it if needed and recursively call patch on it.
    for (name, field) in ast.iter_fields(node):
        if isinstance(field, list):
            for i in range(len(field)):
                check, var, idx, val = is_multi_index_rename_node(field[i])
                if check:
                    field[i] = get_new_multi_index_rename_node(var, idx, val)
                else:
                    patch(field[i])
        else:
            check, var, idx, val = is_multi_index_rename_node(field)
            if check:
                setattr(node, name, get_new_multi_index_rename_node(var, idx, val))
            else:
                patch(field)

Usage Example 1:

from intelli_patch import patch
import ast
import astunparse

code = """
import pandas as pd
mi = pd.MultiIndex.from_product([[1, 2], ['a', 'b']], names=['x', 'y'])
mi.levels[0].name = "new name"
"""

head = ast.parse(code)
patch(head)
print(astunparse.unparse(head))

Output:

import pandas as pd
mi = pd.MultiIndex.from_product([[1, 2], ['a', 'b']], names=['x', 'y'])
mi = mi.set_names('new name', level=0)

Usage Example 2:

from intelli_patch import patch
import ast
import astunparse

code = """
import pandas as pd
class C():
    def f():
        def g():
            mi.levels[
                0
            ].name = "new name"
mi = pd.MultiIndex.from_product([[1, 2], ['a', 'b']], names=['x', 'y'])
"""
head = ast.parse(code)
patch(head)
print(astunparse.unparse(head))

Output:

import pandas as pd

class C():

    def f():

        def g():
            mi = mi.set_names('new name', level=0)
mi = pd.MultiIndex.from_product([[1, 2], ['a', 'b']], names=['x', 'y'])
In usage example 2, note that the code statement that is to be replaced expands to more than 1 line and is present within a function g that is present within a function f that is present within a class C. IntelliPatch handles this case as well.


One can extend the patch script to take care of all backward incompatibilities in pandas 1.0.0. And then write an outer function that goes through every Python file of a system, reads its code, patches it and writes it back to disk.

It is important to note that a developer should review the changes done by the IntelliPatch before committing it. For example, if code is hosted on git, then a git diff should be performed and reviewed by the developer.

Impact

At Soroco, we have written 5 IntelliPatch scripts so far that were ran on 10 systems. Each script successfully parsed and patched about 150,000 lines of code across 10 systems. In terms of productivity, this effort took one of our engineers three full days to complete. This engineer learnt about ASTs before implementing these solutions.

Of the five scripts, one particular script was unique – a code scrubber and not a traditional patch. This need stemmed from an external party seeking to review the outline of the code, without sharing the actual logic and specifics of the code. Hence, we wrote a scrubber, that scrubs logic and other key elements in the code while retaining only the imports, class and function definitions, docstrings, type annotations and some very specific information required for the review. Therefore, the AST proved to be a valuable tool for buiding a code scrubber as well.

Limitations

One of the problems of patching code using ast package of Python is that it loses all the formatting and comments of the original source code. This can be solved by making the patch script a little smarter. Instead of having it unparse the entire patched AST and write that to disk, we can make it unparse only the nodes it modified and insert the modified code at the corresponding line number in the file. The ast nodes have lineno attribute that can be used to retrieve the line number of the file to be injected the patched code with.
If you enjoy reading this article and want to work on similar problems, apply here and come work with us!

Code Quality Assessment

Now that we understand how ASTs can be very useful to write intelligent patch scripts, in this section we will explain how it can be used to assess code quality.

Many of the IDEs, linters and code inspectors, like PyCharm and SonarQube, use ASTs to perform code quality checks. We can use ASTs to create our own code quality checks specific to our needs. Below are a few examples:

Example 1: Non self-explanatory variable names

You want the developers in your organization to use good self-explanatory variable names in the code. The most frequent problem that you see in the code is the use of single character variable names like i, j, etc. Below is a script that can check that.

variable_name_check.py

import ast

def check(node):
    """
    Takes an AST rooted at the given node and checks if there are single character
    variable names.
    """
    # If it is a leaf node, then return.
    if not hasattr(node, "_fields"):
        return

    # For every child of the node, check if it is a variable having single character
    # name and recursively call check on it.
    for child_node in ast.iter_child_nodes(node):
        if isinstance(child_node, ast.Name) and len(child_node.id) == 1:
            print(
                f"Single character name '{child_node.id}' used at line number {child_node.lineno}"
            )
        check(child_node)

Usage:

from variable_name_check import check
import ast

code = """
a = 1
b = a
print(b)
"""

head = ast.parse(code)
check(head)

Output:

Single character name 'a' used at line number 2
Single character name 'b' used at line number 3
Single character name 'a' used at line number 3
Single character name 'b' used at line number 4

Example 2: Un-logged except block of code

You want the developers in your organization to make sure to put logging when an exception is caught. You expect that either an error or exception function of logging module is called from every except block of code. Below is a script that can check that using AST.

unlogged_except_check.py
import ast

def check(node):
    """
    Takes an AST rooted at the given node and checks if there are un-logged except
    code blocks.
    """
    # If it is a leaf node, then return.
    if not hasattr(node, "_fields"):
        return

    # For every child of the node, check if it is un-logged except code block
    # and recursively call check on it.
    for child_node in ast.iter_child_nodes(node):
        if isinstance(child_node, ast.ExceptHandler) and not is_logging_present(
            child_node
        ):
            print(
                f"Neither 'error' nor 'exception' logging is present within the except block starting at line number {child_node.lineno}"
            )
        check(child_node)

def is_logging_present(node):
    """
    Takes an AST rooted at the given node and checks whether there is an 'error'
    or 'exception' logging present in it.
    """
    # If it is a leaf node, then return False.
    if not hasattr(node, "_fields"):
        return False

    # If it represents an `error` or `exception`function call then return True.
    if (
        isinstance(node, ast.Call)
        and isinstance(node.func, ast.Attribute)
        and node.func.attr in ["error", "exception"]
    ):
        return True

    # Recursively checking if logging is present in the children nodes.
    for child_node in ast.iter_child_nodes(node):
        if is_logging_present(child_node):
            return True

    return False

Usage:

from unlogged_except_check import check
import ast

code="""
try:
    pass

except ValueError:
    logging.error("Error occurred")

    try:
        pass

    except KeyError:
        logging.exception("Exception handled")

    except NameError:
        pass

try:
    pass
except:
    logging.info("Info level logging")
"""

head = ast.parse(code)
check(head)

Output:

Neither 'error' nor 'exception' logging is present within the except block starting at line number 14
Neither 'error' nor 'exception' logging is present within the except block starting at line number 19
This can be taken one step further where if an except code block is found without any logging, then the code quality checker can put the logging in the code by adding a corresponding node in the AST.

Conclusion

The usefulness of ASTs extends far beyond the discussion in this article. For example, the ASTs of the files in a given system can be used to create a call graph. A call graph created during run-time may not cover all the code paths. But a call graph created using ASTs statically will cover all the code paths and thus will be comprehensive. The call graph then can be used to generate a human readable documentation of the system. We have built such a functionality in Soroco that we call “LiveDoc”, but that is a topic for another day in an another article 🙂

If you enjoy reading this article and want to work on similar problems, apply here and come work with us!

Like this article? Spread the word 

Share on facebook
Share on twitter
Share on linkedin
Share on reddit
Share on mix
Share on email

Content Explorer

4 Responses

  1. Slots are slots. If you’re used to playing them in a brick-and-mortar casino, you’re not customary to partake of any vex adapting to their online cousin. The process is the notwithstanding: interpolate your small change, finest your paylines and hit the spin button to bet.

    What you longing make out, although, is that the online position games are more convenient. It takes nothing but seconds to swap machines, and you don’t even include to worry give someone hogging a choosy ring, acting nasty (drunk) or blowing smoke in your face. You can serene swap casinos if you need to. Online casinos are also cheaper to deposit to, and you can be occupied in for allowed if you’re not amenable to pit oneself against with money.

    lotsa slots

    The davy jones’s locker line? Online slots are like brick-and-mortar slots in approximately every technique, with additional benefits. If you’re a buff of these money-sucking machines, then we recommend giving their online counterpart a shot. But first, start with this page. Learn on every side all the numerous games you can play.

    Our plat also offers sections for online players. We offer 10,000+ free space games. The free games page includes some of the outdo made slots for online players and all of the games weight instantly in your browser. You longing also find sections relating to where to play 3D slots, steep limit and rude limit games (such as penny slots) as approvingly as physical money sites. If you oblige any questions, please caress let go to communication us.

  2. Пожалуй, стопроцентно верного рецепта «точный выбрать лучшую букмекерскую контору в 2021 году» не существует. Сколь людей, столько и мнений. Бытность том, который отдельный оценивает компанию согласие своим внутренним предпочтениям. Кто-то, заранее один, учитывает ширину линии и глубину росписи, кому-то интересны высокие коэффициенты, употреблять и те, что смотрит, в первую очередь, для простоту регистрации и удобство гнездиться игре на официальном сайте.

    Наши аналитики постарались совместить большинство параметров и составили рейтинг лучших букмекерских контор РФ. Список, кто вы найдете для этой странице, поможет вам сориентироваться для рынке, будто вы кроме новичок в ставках и ищете надежного и долгосрочного «партнера».

    мелбет зеркало личный кабинет

    Кроме единовластно сноровка выяснить, какая букмекерская контора лучше в России – отзывы реальных пользователей. Для Prosports мы публикуем мнения чистый действующих игроков. Сноровка автоматом отсеивает ботов и вымышленные аккаунты. Пятнышко зрения клиентов БК о компании – капитальный критерий формирования рейтинга лучших контор, однако не единственный. О других факторах поговорим ниже.

  3. Ставки для спорт с каждым годом становятся популярнее. Ради одних они остаются способом подогрева интереса к матчу, для других способом пополнить принадлежащий банковский счет. Вне зависимости путем того, какой для вас беттинг, правила ради всех одни и те же. Теснить события, ставки и результат. Букмекерские конторы: ликование с приятным бонусом. Несмотря чтобы кажущуюся простоту, эта царство живет соответственно своим правилам и для успешного плавания, нуждаться бомонд правила. Ориентироваться во всем многообразии тотализаторов, какую выбрать букмекерскую контору и подводных камнях мира беттинга позволяет сайт Букмекер Эксперт.

    бвин футбол

    Огулом букмекерские компании работают пропорционально одному принципу: выбирается событие и предлагается коэффициент чтобы его исход. Величина коэффициента может меняться в зависимости путем события и количества сделанных ставок. Это связано с тем, воеже букмекерская компания не работала себе в убыток. Но стоит памятовать, какой ставки ради спорт – не лукавство и не обман, а легальный способ получить польза, заключив пари с выбранной букмекерской конторой. Получение выигрыша всегда станет приятным, даже буде ваша любимая команда проиграла. Известно значительно болельщиков, которые ставят чтобы поражение любимой команды, чтобы как-то портить гадкий осадок. Для других бонусом довольно отголосок подсказки собственной интуиции, а третьи найдут в этом подтверждение правильности выбранной стратегии. Клиент, поставивший ставку, получает исключительный польза в часть случае, если был положительный конец пропорционально выбранному событию. Основания точно в большинстве компаний предлагают широкую линию ставок, то выбрать должен тот вид спорта, в котором вы разбираетесь. Подробнее с видами ставок, точный их совершать онлайн и надежностью букмекерских контор, вы познакомитесь выключая

  4. Как принцип, нельзя. Тем, что ищет что-то вроде «букмекерские конторы без идентификации и без паспорта», стоит сияние: совершенно букмекеры проверяют документы игроков. Избыток чистый в том, какой официальные российские БК делают это прежде того, как покупатель начнет делать ставки, а запрещенные в России букмекеры – вроде тогда, ежели игрок пытается следовательно выигрыш.

    Точно выучить идентификацию разговоры у российского букмекера
    Каждая легальная букмекерская контора в интернете устанавливает приманка правила и требования для клиентов. Одни правила вводятся чтобы соблюдения законов РФ, другие прописываются букмекером для защиты после мошенников тож судебных исков.

    Особое рвение стоит обратить чтобы содержание нарушений правил букмекерской конторы:

    говорят

    Несовершеннолетним запрещено забавлять в БК. Согласно законодательству Российской Федерации, забавлять в букмекерской конторе могут лица старше 18 лет.
    Игрок может иметь только единственный счёт в БК. Суть и использование нескольких игровых счетов может привести к их блокировке и конфискации всех денежных средств.
    Выплаты производятся подобно для честную игру. Букмекер вправе не раскошеливаться барыш, когда заподозрит игрока в мошенничестве. Также возможна блокировка счёта. В качестве мошенничества могут содержаться расценены и злоупотребления бонусами.
    Играть «на ошибках» не получится. Букмекерские конторы РФ могут не засчитать выигрышную ставку, коли решат, какой потребитель выиграл из-за ошибки в линии.
    Кроме того, букмекерские конторы не принимают ставки у лиц, причастных к спортивному состязанию. Сотрудникам БК тоже запрещается содержать пари с компанией, в которой они работают. Впрочем, стоит отметить, кто этот часть индивидуален для каждой компании.

Leave a Reply

Your email address will not be published. Required fields are marked *