Lattix White Paper A Multifaceted Dependency Map of MySQL Lattix White Paper April 2016 Copyright 2016 Lattix, Inc. All rights reserved
Analyzing MySQL Introduction MySQL is one of the most popular open source databases and an integral part of the infrastructure of the Internet. MySQL is a modern database. It came into prominence at a time when relational databases were well established, SQL was standardized, and C++ was already in wide use. Our interest is in exploring MySQL s code to discover its various parts and to understand how they relate to each other. Using Lattix Architect we will identify all code elements such as methods, structs, classes, and global variables and discover the dependencies among them. This creates the underlying dependency map that can be explored both in terms of how the source code is organized into its file directory structure and in terms of how relationships manifest themselves in the binary artifacts. The result is a view that provides high level visibility along with the ability to drill down to a specific line in code. Starting out MySQL is not one program As developers we started out by building the MySQL source distribution. One of the first things to notice is that MySQL is not one program. Instead, it is a collection of binaries, one of which is the database server (mysqld). However, it also includes multiple libraries including ones that sql client applications can link into, along with many other utilities and test programs. When we built MySQL we saw close to 100 binaries. As a first step, we started our exploration to develop an understanding of the relationships between these binaries. Our goal is to create a dependency map and to answer a number of interesting questions: 1. What are the various executables and libraries and how are they related? For instance, can we say exactly what libraries the program mysqld uses and identify those dependencies down to the line in code? 2. How coupled are each of the individual executables and libraries? Which ones have greater coupling? 3. Are some header files being included needlessly? Are there header files included indirectly even though there are direct dependencies on them? Are external declarations being used to circumvent including header files? These are thorny questions that have an important bearing on the compilation speed and in managing the complexity of the C/C++ code. 4. Can we identify and prioritize the dependencies that lead to unwanted coupling? 5. When we change one or more files, how does that change ripple through the code and what components are affected? 6. What are the differences between various versions of MySQL? Can we examine trends of stability, coupling, and other important architectural metrics? 7. Ultimately, how closely does our implementation match the intended architecture and what can we do to make it more modular so that it becomes easier to understand, maintain, and test? Creating a project Analyzing MySQL is a two-step process: 1. Configure the MySQL build. Run make through the utility lxbuild (lxbuild make) to build MySQL while lxbuild captures the build options. The output of this process is a build specification file. 2. Run lattixarchitect to take the build specification file as the input to generate a Lattix Architect project. A Dependency Map of MySQL Page 2 Copyright 2016 Lattix, Inc. All rights reserved
Once this process is complete, the resulting Lattix Architect project is the dependency map. We can query it, report on it, and view it in different ways. It is easy to identify the bad dependencies, trace them to the line in code, and understand the impact of change. We ended up analyzing about 2.5 million lines of code organized into 1,369 files. View organized by source structure Hierarchical decomposition Reordering algorithms help reveal the architecture Users can identify the cyclic dependencies and then drill down to the actual code for those dependencies. Dependencies by Source Code Organization The hierarchy of the map aggregates the dependencies so you can see how various source code directories relate to each other. The hierarchy also allows you to drill down to the lowest level element (method, data, struct, etc.) and jump directly to the source code. A partitioning algorithm was applied to order the subsystems in such a way as to reveal the underlying design intent. Notice how the dbug and test directories move to the top because they have dependencies to the libraries but the libraries don t have dependencies on the tests directory (as would be expected). Furthermore, elements in each layer were ordered to minimize the strength A Dependency Map of MySQL Page 3 Copyright 2016 Lattix, Inc. All rights reserved
above the diagonal. For instance, the automatic ordering reveals that mysys is at a lower level compared to client as would be expected because the client programs depend on the common code in mysys. View organized by datasource (programs and libraries) Dependencies by Datasource This view shows the dependencies organized by programs and libraries (called datasources in Lattix Architect). To get an even bigger picture view, the datasources are organized by their file and directory structure. We can see that the major coupling is between storage and sql, which together contain 60% of the source files. Further drill down will show programs such as mysqld, mysql, mysqladmin, mysqldump, libraries such as libmyisam.a, ha_innodb.a, libmysqlclient.a, and many others. This means that you can identify, down to the line in code, how programs and libraries depend on each other. Drilling further down into each of the datasources reveals the source organization. Not only can you see the dependencies across datasources, you can also identify to the line of source code exactly what those dependencies are. A Dependency Map of MySQL Page 4 Copyright 2016 Lattix, Inc. All rights reserved
Conclusion The dependency map of MySQL allows us to see the big picture view and then drill down to the lowest level element and ultimately to the source code. This means that unwanted coupling can be identified quickly and efficiently. The dependency map also allows us to perform detailed include file analysis to reduce the complexity and to streamline the inclusion of header files. By analyzing successive versions of the code, it is easy to see what has changed from one version to another and to see whether those changes are contributing to further erosion of the intended design. Try Lattix on your Project Is your software complex, buggy and hard to maintain? Do you want to see what your code looks like and what you can do to modularize it? If so, contact Lattix: sales@lattix.com. A Dependency Map of MySQL Page 5 Copyright 2016 Lattix, Inc. All rights reserved