Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression from goblint/cil#134 causes infinite loops in d_plainexp and d_plaintyp #1290

Closed
michael-schwarz opened this issue Dec 12, 2023 · 15 comments · Fixed by #1292
Closed
Labels
bug performance Analysis time, memory usage upstream Issue in dependency, needs fix upstream
Milestone

Comments

@michael-schwarz
Copy link
Member

michael-schwarz commented Dec 12, 2023

Looking at some of the tasks where Mopsa is very good, I noticed the following in our logs for these tasks: DeviceDriversLinux64Large-ReachSafety

Of interest is the following: After approx. 30s no additional evals of any right hand sides happen (or complete at least) and no new unknowns are discovered but the memory consumption skyrockets to the point where the system kills the run.

runtime: 00:00:33.870
vars: 30768, evals: 27447
max=2857.21MB
runtime: 00:01:27.718
vars: 30768, evals: 27447
max=15286.80MB

(after this it is killed)

This behavior is very odd, and seems to point at some implementation bug somewhere.

Full log

Link: https://sv-comp.sosy-lab.org/2024/results/results-verified/goblint.2023-11-30_12-55-24.logfiles/SV-COMP24_unreach-call.linux-3.14_complex_emg_linux-alloc-spinlock_drivers-net-ethernet-sfc-sfc.cil.yml.log

./goblint --conf conf/svcomp24.json --sets ana.specification ../../sv-benchmarks/c/properties/unreach-call.prp --sets exp.architecture 64bit ../../sv-benchmarks/c/ldv-challenges/linux-3.14_complex_emg_linux-alloc-spinlock_drivers-net-ethernet-sfc-sfc.cil.i


--------------------------------------------------------------------------------


--sets is deprecated, use --set instead.
--sets is deprecated, use --set instead.
unrolling loop at lib/libc/stub/src/stdlib.c:10:5-12:5 with factor 12
unrolling loop at lib/libc/stub/src/stdlib.c:21:9-27:9 with factor 4
unrolling loop at lib/libc/stub/src/stdlib.c:40:3-45:3 with factor 6
Collected factors:
functions: 3022
functionCalls: 9132
loops: 6
loopBreaks: 0
controlFlowStatements: 16108
expressions: 92021
instructions: 21068
integralVars: (421,4532)
arrayVars: (41,130)
pointerVars: (2320,6387)

Complexity estimates:
File: 4159107
function efx_ef10_ptp_set_ts_sync_events is recursive, disable interval and interval_set contexts
malloc wrapper: ldv_xmalloc
malloc wrapper: ldv_malloc
no thread creation -> disabling thread analyses "race, deadlock, maylocks, symb_locks, thread, threadid, threadJoins, threadreturn, mhp, region"
Total: 4159107
SV-COMP specification: CHECK( init(main()), LTL(G ! call(reach_error())) )

runtime: 00:00:13.062
vars: 9816, evals: 7738

|rho|=9816
|stable|=9816
|infl|=9517
|wpoint|=0
|sides|=1386
|side_dep|=0
|side_infl|=0
|var_messages|=0
|rho_write|=0
|dep|=7048
|called|=299
Found 1384 contexts for 9 functions. Top 5 functions: [...]

Memory statistics: total=15797.48MB, max=464.37MB, minor=15772.02MB, major=594.94MB, promoted=569.48MB
    minor collections=7529  major collections=16 compactions=0


runtime: 00:00:23.176
vars: 25667, evals: 22428

|rho|=25667
|stable|=25643
|infl|=25665
|wpoint|=1
|sides|=2501
|side_dep|=0
|side_infl|=0
|var_messages|=0
|rho_write|=0
|dep|=20945
|called|=8
Found 2219 contexts for 59 functions. Top 5 functions: [...]


Memory statistics: total=27942.72MB, max=706.25MB, minor=27915.81MB, major=997.24MB, promoted=970.33MB
    minor collections=13321  major collections=18 compactions=0


runtime: 00:00:33.870
vars: 30768, evals: 27447

|rho|=30768
|stable|=30765
|infl|=30730
|wpoint|=1
|sides|=2563
|side_dep|=0
|side_infl|=0
|var_messages|=0
|rho_write|=0
|dep|=25931
|called|=39
Found 2273 contexts for 70 functions. Top 5 functions: [...]

Memory statistics: total=37800.84MB, max=2857.21MB, minor=37773.91MB, major=3242.22MB, promoted=3215.29MB
    minor collections=18023  major collections=21 compactions=0

[...]

runtime: 00:01:27.718
vars: 30768, evals: 27447

|rho|=30768
|stable|=30765
|infl|=30730
|wpoint|=1
|sides|=2563
|side_dep|=0
|side_infl|=0
|var_messages|=0
|rho_write|=0
|dep|=25931
|called|=39
Found 2273 contexts for 70 functions. Top 5 functions:


Memory statistics: total=83772.05MB, max=15286.80MB, minor=83745.12MB, major=15449.96MB, promoted=15423.04MB
    minor collections=39946  major collections=25 compactions=0
@michael-schwarz michael-schwarz added bug performance Analysis time, memory usage labels Dec 12, 2023
@michael-schwarz
Copy link
Member Author

screen

I am not an expert in debugging with memtrace, but if I look at the allocations in the last 5 seconds before I killed it, some CIL printer has allocated 99.8% of memory, so the issues seems to be there.

@michael-schwarz
Copy link
Member Author

| t,o ->
let s = GobPretty.sprintf "Addr.type_offset: could not follow offset in type. type: %a, offset: %a" d_plaintype t pretty o in
raise (Type_of_error (t, s))

Line 152 does 99.8% of the relevant allocations.

@michael-schwarz
Copy link
Member Author

michael-schwarz commented Dec 13, 2023

Replacing the printing with "" allows to proceed a bit further with solving, but it then gets stuck in a loop later. Same after removing the other invocation of d_plaintype.

@michael-schwarz
Copy link
Member Author

| x -> fallback (GobPretty.sprintf "CastE: e did evaluate to Int, but the type did not match %a" CilType.Typ.pretty t) st
else
fallback (GobPretty.sprintf "CastE: %a evaluates to %a which is bigger than the type it is cast to which is %a" d_plainexp e ID.pretty i CilType.Typ.pretty t) st

seems to fix the issue. The invocation of d_plainexp seems to be the cause for the infinite loop there.

@michael-schwarz michael-schwarz changed the title Investigate potential infinite allocating loop / some other oddity Functions d_plainexp and d_plaintyp from CIL cause infinite loops Dec 13, 2023
@michael-schwarz
Copy link
Member Author

It seems potentially related to goblint/cil#134.

@sim642
Copy link
Member

sim642 commented Dec 13, 2023

I wonder if goblint/cil#134 somehow accidentally caused this. Maybe reverting it has some effect?

The higher-level issue is the use of strings for these messages. Generally we don't even show them, so constructing the strings is a complete waste. Passing something else around could avoid actually calling d_plaintyp unless it's actually being printed to somewhere visible.

@michael-schwarz
Copy link
Member Author

Reverting goblint/cil#134 fixes the issue.

@michael-schwarz
Copy link
Member Author

How do you want to proceed here? Should I revert goblint/cil#134 on the CIL develop? Or do you want to try to fix that PR?

@michael-schwarz michael-schwarz changed the title Functions d_plainexp and d_plaintyp from CIL cause infinite loops Regression from goblint/cil#134 causes infinite loops in d_plainexp and d_plaintyp Dec 13, 2023
@michael-schwarz michael-schwarz added the upstream Issue in dependency, needs fix upstream label Dec 13, 2023
@michael-schwarz
Copy link
Member Author

The type for which d_plaintype does no longer terminate after goblint/cil#134 is

struct net_device {
   char name[16U] ;
   struct hlist_node name_hlist ;
   char *ifalias ;
   unsigned long mem_end ;
   unsigned long mem_start ;
   unsigned long base_addr ;
   int irq ;
   unsigned long state ;
   struct list_head dev_list ;
   struct list_head napi_list ;
   struct list_head unreg_list ;
   struct list_head close_list ;
   struct __anonstruct_adj_list_235 adj_list ;
   struct __anonstruct_all_adj_list_236 all_adj_list ;
   netdev_features_t features ;
   netdev_features_t hw_features ;
   netdev_features_t wanted_features ;
   netdev_features_t vlan_features ;
   netdev_features_t hw_enc_features ;
   netdev_features_t mpls_features ;
   int ifindex ;
   int iflink ;
   struct net_device_stats stats ;
   atomic_long_t rx_dropped ;
   struct iw_handler_def const *wireless_handlers ;
   struct iw_public_data *wireless_data ;
   struct net_device_ops const *netdev_ops ;
   struct ethtool_ops const *ethtool_ops ;
   struct forwarding_accel_ops const *fwd_ops ;
   struct header_ops const *header_ops ;
   unsigned int flags ;
   unsigned int priv_flags ;
   unsigned short gflags ;
   unsigned short padded ;
   unsigned char operstate ;
   unsigned char link_mode ;
   unsigned char if_port ;
   unsigned char dma ;
   unsigned int mtu ;
   unsigned short type ;
   unsigned short hard_header_len ;
   unsigned short needed_headroom ;
   unsigned short needed_tailroom ;
   unsigned char perm_addr[32U] ;
   unsigned char addr_assign_type ;
   unsigned char addr_len ;
   unsigned short neigh_priv_len ;
   unsigned short dev_id ;
   spinlock_t addr_list_lock ;
   struct netdev_hw_addr_list uc ;
   struct netdev_hw_addr_list mc ;
   struct netdev_hw_addr_list dev_addrs ;
   struct kset *queues_kset ;
   bool uc_promisc ;
   unsigned int promiscuity ;
   unsigned int allmulti ;
   struct vlan_info *vlan_info ;
   struct dsa_switch_tree *dsa_ptr ;
   struct tipc_bearer *tipc_ptr ;
   void *atalk_ptr ;
   struct in_device *ip_ptr ;
   struct dn_dev *dn_ptr ;
   struct inet6_dev *ip6_ptr ;
   void *ax25_ptr ;
   struct wireless_dev *ieee80211_ptr ;
   unsigned long last_rx ;
   unsigned char *dev_addr ;
   struct netdev_rx_queue *_rx ;
   unsigned int num_rx_queues ;
   unsigned int real_num_rx_queues ;
   rx_handler_func_t *rx_handler ;
   void *rx_handler_data ;
   struct netdev_queue *ingress_queue ;
   unsigned char broadcast[32U] ;
   struct netdev_queue *_tx ;
   unsigned int num_tx_queues ;
   unsigned int real_num_tx_queues ;
   struct Qdisc *qdisc ;
   unsigned long tx_queue_len ;
   spinlock_t tx_global_lock ;
   struct xps_dev_maps *xps_maps ;
   struct cpu_rmap *rx_cpu_rmap ;
   unsigned long trans_start ;
   int watchdog_timeo ;
   struct timer_list watchdog_timer ;
   int *pcpu_refcnt ;
   struct list_head todo_list ;
   struct hlist_node index_hlist ;
   struct list_head link_watch_list ;
   enum ldv_28509 reg_state : 8 ;
   bool dismantle ;
   enum ldv_28510 rtnl_link_state : 16 ;
   void (*destructor)(struct net_device * ) ;
   struct netpoll_info *npinfo ;
   struct net *nd_net ;
   union __anonunion____missing_field_name_237 __annonCompField74 ;
   struct garp_port *garp_port ;
   struct mrp_port *mrp_port ;
   struct device dev ;
   struct attribute_group const *sysfs_groups[4U] ;
   struct attribute_group const *sysfs_rx_queue_group ;
   struct rtnl_link_ops const *rtnl_link_ops ;
   unsigned int gso_max_size ;
   u16 gso_max_segs ;
   struct dcbnl_rtnl_ops const *dcbnl_ops ;
   u8 num_tc ;
   struct netdev_tc_txq tc_to_txq[16U] ;
   u8 prio_tc_map[16U] ;
   unsigned int fcoe_ddp_xid ;
   struct netprio_map *priomap ;
   struct phy_device *phydev ;
   struct lock_class_key *qdisc_tx_busylock ;
   int group ;
   struct pm_qos_request pm_qos_req ;
};

@sim642
Copy link
Member

sim642 commented Dec 14, 2023

I started printing out the tree of types that the printing traverses and it's not actually going into infinite recursion. It's just that this struct and all the structs transitively referred from it are so huge. Even without infinite recursion, it could be exponentially branching. I think it does terminate, but it just might take a very long time to do so.

This isn't just a problem for d_plaintyp though, but for other parts of Goblint as well, e.g. finding all deeply reachable fields (maybe related to #1194?). It might just show up with this printing because this requires allocating, copying and concatenating a lot of strings.

@michael-schwarz
Copy link
Member Author

It's just that this struct and all the structs transitively referred from it are so huge. Even without infinite recursion, it could be exponentially branching. I think it does terminate, but it just might take a very long time to do so.

Hmm, but even if they are really large, it still seems odd that we would allocate > 13GB of memory to print one of these types.

@sim642
Copy link
Member

sim642 commented Dec 14, 2023

I don't think it's that add. It's just unexpected how large this tree of structs is if you iterate through it. I used this CIL patch:

diff --git a/src/cil.ml b/src/cil.ml
index 8a28dc19..a2ebd22c 100755
--- a/src/cil.ml
+++ b/src/cil.ml
@@ -4580,6 +4580,7 @@ class plainCilPrinterClass =
   (* We keep track of the composite types that we have done to avoid
      recursion *)
   let donecomps : (int, unit) H.t = H.create 13 in
+  let indent = ref 0 in
   object (self)
 
   inherit defaultCilPrinterClass
@@ -4618,12 +4619,14 @@ class plainCilPrinterClass =
              (argsToList args))
          (if isva then "..." else "") self#pAttrs a
    | TComp (comp, a) ->
+       Printf.eprintf "%s %s %d\n" (String.init !indent (fun _ -> ' ')) comp.cname comp.ckey;
        if H.mem donecomps comp.ckey then
          dprintf "TCompLoop(%s %s, _, %a)"
            (if comp.cstruct then "struct" else "union") comp.cname
            self#pAttrs comp.cattr
        else begin
          H.add donecomps comp.ckey (); (* Add it before we do the fields *)
+         incr indent;
          let doc = dprintf "TComp(@[%s %s,@?%a,@?%a,@?%a@])"
            (if comp.cstruct then "struct" else "union") comp.cname
            (docList ~sep:(chr ',' ++ break)
@@ -4633,6 +4636,7 @@ class plainCilPrinterClass =
            self#pAttrs a
          in
          H.remove donecomps comp.ckey; (* Remove it after we do the fields, so printer doesn't have global state *)
+         decr indent; 
          doc
        end
    | TBuiltin_va_list a ->

After ~1M visited structs it's still 30 levels deep in the type. At some points so far it's even 50 levels deep. That's the magnitude of the exponent!

@michael-schwarz
Copy link
Member Author

I guess you're right then.

What surprises me is that we manage to initialize such structs if they are globals and manage to do anything meaningful at all.

The question is what we want to do here: Maybe we should introduce some kind of cutoff after which we stop descending into types for printing (A printout that is thousands of characters long is not really useful anyway).

For other cases such as the reachability computation, it seems a lot less clear what to do.

@sim642
Copy link
Member

sim642 commented Dec 14, 2023

What surprises me is that we manage to initialize such structs if they are globals and manage to do anything meaningful at all.

Many of these fields are pointers to structs, which global initialization does not recurse into, but just initializes to NULL. That probably prunes most of the tree away. But reachability follows pointers, so it doesn't benefit from that.

I want to first open a PR to just make the two problematic GobPretty.sprintf places be lazy such that they don't unnecessarily construct strings that will not appear in any user output.

Limiting the depth is not a bad option. I might even go as far as not recursing into the fields of a TComp in d_plaintyp at all. If you see a TComp with just its name in some tracing output, it's easy to just look up the struct definition from the program if it's really necessary. And rarely does one need to know all the fields (e.g. typeOf-style functions only care about one).

@michael-schwarz
Copy link
Member Author

michael-schwarz commented Dec 14, 2023

Ok!

I want to first open a PR to just make the two problematic GobPretty.sprintf places

It's actually three places though! Two of d_plaintype and one d_plainexpr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug performance Analysis time, memory usage upstream Issue in dependency, needs fix upstream
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants