Teach you how to analyze Android anr problem Android handler source code analysis (detailed)
The full name of anr is application no responding, that is, the application does not respond. Specifically, some specific messages (key dispatch, broadcast, service) are not processed within the specified time in the application UI thread (main thread), which triggers anr exceptions.
Anr is guaranteed by the message processing mechanism. Android implements a sophisticated mechanism to discover anr at the system layer. The core principle is message scheduling and timeout processing. The main body of anr mechanism is implemented in the system layer. All messages related to anr will pass through the system process system_ Server scheduling, specifically the activitymanagerservice service, and then send it to the application process to complete the actual processing of messages. At the same time, the system process designs different timeout limits to track the processing of messages. Once the application processes messages improperly, the timeout limit works. It collects some system status, such as CPU / Io usage, process function call stack callstack (some platforms, such as MTK, also print corresponding messages for debugging and analysis), and finally reports whether the user has responded to the process (ANR dialog box).
There are generally three types of ANRS:
This is mainly because the key or touch event does not respond within a specific time. Generally, the default timeout of Android platform is 5S, and anr will be reported. However, some platforms will modify this time, such as MTK, and some platforms have a timeout of 8s.
This timeout can be viewed in activitymanagerservice.java:
When such timeout occurs, an anr prompt box will pop up, and the user can choose forcestop or continue to wait.
This is mainly because the broadcastreceiver cannot complete the processing within the specified time. The foreground broadcast timeout is 10s and the background broadcast timeout is 60s. There is no prompt box for such timeout.
==> AMS.java
If the service fails to complete the operation within the specified time, it will report a service timeout. Similarly, there is no prompt box for this kind of ANR. The timeout is 20s for the foreground service and 200s for the background service.
==> ActivityServices.java
Only the first of the three ANRS will display the system prompt dialog box, because the user is doing interface interaction. If there is no response for a long time, the user will suspect that the device has crashed. At this time, most people will start to press indiscriminately, or even pull out the battery and restart. The user experience must be very bad.
When the three ANRS occur, the error information will be output in the log. You will find that the function stack information of each application process and system process is output to a file / data / anr / traces.txt. This file is the key file to analyze the causes of ANR. At the same time, you will see the CPU utilization at that time in the log, which is also important information, How to use them to analyze anr problems will be described in detail in later chapters.
These three ANRS are not isolated and may affect each other. For example, an application process has both an activity being displayed and a broadcastreceiver processing messages, both of which run in the main thread of the process. If the onReceive function of BR does not return, and the user clicks the screen, but onReceive still does not return for more than 5 seconds, the main thread cannot process the user input event, which will cause the first ANR. If it does not return for more than 10 seconds, the second anr will be caused.
Anr is essentially a performance problem. When anr occurs, if the problem may be APK itself, the main troubleshooting direction is APK itself. Analyze and see if time-consuming operations are performed on the UI thread?
Personally, I think some anr problems are difficult to analyze. For example, when the app is running, the current message processing fails due to some impact of the current system bottom. Such problems are often irregular and difficult to reproduce. For such anr problems, it generally takes a lot of time to understand some other behaviors of the system, which is beyond the scope of anr itself. Such problems are equivalent to a warning signal to tell you where there is a problem in the system.
Search keyword anr:
The meaning of the above paragraph is that anr occurred at 17:15:23 and 817ms on April 17. In fact, the reason for anr and CPU usage information are usually brought,
Here we write a demo. For example, we sleep for a period of time in the onReceive method of the broadcast receiver (the onReceive method of the broadcast receiver is executed by the main thread by default). If the main thread does not operate at this time, the broadcast receiver can complete the execution normally. Will not cause ANR. If I click the return button many times after sending the broadcast, and the main thread is asleep and cannot respond, anr will appear.
The corresponding complete log information is as follows:
First look at the first few lines:
These lines indicate the activity where the anr occurs, the process ID, and the reason for the anr (input event distribution timeout);
Anrmanager will print the CPU usage before and after anr, which can reflect the performance status of the system at that time:
At this time, we will see what the main thread is doing when anr occurs.
Log file only tells you the occurrence time of anr, but there is no specific details. At this time, you have to check the trace file (when an anr occurs in an app process, the system dumps all the active top processes, and all threads in the process are dumped into this trace file, so the trace file contains the runtime state of each thread).
The stack information for the current example is as follows:
In the above information, there is such a line
This is the root cause of ANR. In the onReceive method of the MyReceiver class, the sleep method called Thread causes the main thread to block and cause ANR.
Then we look at some field information.
These fields mean the following thread names:
Thread own information
The state in thread.java corresponds to the state in thread.cpp. You can see that the former is more general and easy to understand, and is oriented to Java users; The latter is more detailed and oriented to the internal environment of the virtual machine. The thread status displayed in traces.txt is defined in thread.cpp. In addition, all threads are local threads that follow the POSIX standard.
Thread status example:
Let's take a deadlock example:
The thread TID = 24 is waiting for a lock < 0x41a874a0 >, which is occupied by the thread TID = 12. Let's take a look at TID = 12:
The thread TID = 12 is waiting for the lock of < 0x41a7e2e8 >, which is occupied by the thread TID = 85. Let's take a look at TID = 85:
It is worth noting that the trace usually contains time. Try to analyze the trace with the time close to anr to avoid other interference. The thread with TID = 85 is waiting for the lock of < 0x41a7e420 >, which is occupied by TID = 24, so it sends a deadlock. In this case, we need to find the source code where the deadlock occurs, analyze and modify it.
Sometimes we have also analyzed logs and traces, but it is still difficult to analyze the causes of anr, so we may need to try to analyze from other aspects:
Reference articles